Introduction to Data-Driven [0.1cm]Dependency Parsing · I Statistical constraint dependency...

Introduction to Data-DrivenDependency Parsing

Introductory Course, ESSLLI 2007

Ryan McDonald1 Joakim Nivre2

1Google Inc., New York, USAE-mail: [email protected]

2Uppsala University and Växjö University, SwedenE-mail: [email protected]

Introduction to Data-Driven Dependency Parsing 1(52)

Introduction

Overview of the Course

I Dependency parsing (Joakim)

I Machine learning methods (Ryan)

I Transition-based models (Joakim)

I Graph-based models (Ryan)I Loose ends (Joakim, Ryan):

I Other approachesI Empirical resultsI Available software


Other Approaches

Other Approaches – Overview

I Graph-based methods – new developments

I Transition-based methods – new developments

I Ensemble methods

I Constraint-based methods

I Phrase structure parsing

I Unsupervised parsing


Other Approaches

Graph-based Methods

I Last lecture we discussed arc-factored modelsI Models are inherently local

I Local feature scopeI Local structural constraints

I This is a strong assumption!!

I Question: how do we incorporate non-local information?


Other Approaches

Integer Linear Programming

I Often intractable inference problems can be written as IntegerLinear Programming (ILP) problems

I ILP’s are optimization problems with linear objectives andconstraints

I Non-projective parsing with global constraints can be writtenas an ILP [Riedel and Clarke 2006]

I ILP’s are still NP-hard, but have well knownbranch-and-bound solutions

I First, let’s define a set of binary variablesI akij ∈ {0, 1} is 1 if arc (i , j , k) is in the dependency graphI a is the vector of all variables akij


Other Approaches


I We can define the arc-factored parsing problem as thefollowing objective function

arg maxa

∑i,j,k

log wkij · akij

such that: ∀j > 0,∑

i,k akij = 1 (single head)∑

i,k aki0 = 0 (w0 is root)

∀ cycles C ,∑

(i,j,k)∈C akij ≤ |C | − 1 (no cycles)

I This is an ILP!!

I Linear objective

I Linear constraints over integer variables


Other Approaches


I [Riedel and Clarke 2006] showed that this formulation allows fornon-local constraints

I e.g., a verb can only have a single subject

∀wi that are verbs∑j

asbjij ≤ 1

I This is non-local since we are forcing constraints on all themodifiers of wi

I [Riedel and Clarke 2006] also includes constraints onco-ordination as well as projectivity if desired

I Is this still data-driven?


Other Approaches

Sampling Methods

I Used for dependency parsing with global features by[Nakagawa 2007]

I Define a conditional log-linear probability model

P(G |x) = 1Zx

ew·f(G)

I Zx =∑

G ′ ew·f(G ′)

I f(G ) is a global feature map – can contain global features ofdependency graph

I i.e., does not necessarily factor by the arcs


Other Approaches

Sampling Methods

I arg maxG P(G |x) cannot be solved efficientlyI Assume we have N samples from the distribution P(G |x)

I Can be found efficiently with Gibbs samplingI Call them G1, . . . ,GN

I We want marginal distribution of the arc (i , j , k), µkij

µkij ≈N∑

t=1

P(Gt |x)1[(i , j , k) ∈ Gt ] ≈1

N

N∑t=1

1[(i , j , k) ∈ Gt ]

I Since N should be a managable size, this can be calculated

I Set arc weights wkij = µkij and find the MST

I w is found using Monte Carlo sampling


Other Approaches

Transition-based Methods

I Transition-based models may suffer fromI Error propagation because of greedy inference,I Label bias because of local training.

I Recent developments seek to remedy this:I Beam search instead of greedy best-first search

[Johansson and Nugues 2006, Duan et al. 2007]I Globally trained probabilistic model

[Johansson and Nugues 2007, Titov and Henderson 2007]


Other Approaches

Generative Model [Titov and Henderson 2007]

I Probabilistic model defined in terms of transitions:

P(Gcm) = P(c0, c1, . . . , cm) =m∏

i=1

P(ci |c0, . . . , ci−1)

I Similar to HMMsI Transition system of [Nivre 2003] with two modifications:

I Splits Right-Arc into Right-Arc′ and ShiftI Adds transitions for generating words (generative model)

I P(ci |c0, c1, . . . , ci−1) modeled by neural networkapproximating Incremental Sigmoid Belief Network (ISBN)

I Belief network hidden layer acts as feature selection algorithm

I Parsing with heuristic beam search


Other Approaches

Ensemble Methods

I Input: Sentence x = w0,w1, . . . ,wnI Input: Output of N different parsers for x , G1,G2, . . . ,GNI Output: A single dependency graph G for sentence x

QuestionHow do we combine the the parsers and their outputs to create a

new and better parse for sentence x?

I Assumption: The N parsers make different mistakes

I Assumption: All of the N parsers are relatively strong


Other Approaches

Ensemble Methods [Sagae and Lavie 2006]

I Simple but elegant solution

I Use arc-factored graph-based models

I Set arc weights equal to the number of parsers that predictedthat arc

wkij = eP

i αi×1[(i ,j ,k)∈Gi ]

I αi usually equals 1, but can be modified if prior knowledgeexists

I Solution: Find MST for Gx with the above weights

I The resulting graph has on average the arcs that werepreferred by most systems


Other Approaches

Ensemble Methods [Sagae and Lavie 2006]

I Example: ensemble of parsers from this years CoNLLshared-task

5 10 15 20Number of Systems

80

82

84

86

88

Accu

racy

Unlabeled AccuracyLabeled Accuracy


Other Approaches

Constraint-based Methods

I Statistical constraint dependency parsing in two steps[Wang and Harper 2004]:

1. Supertagging using a trigram Hidden Markov Model to assignthe top n-best constraints to an input sentence x .

2. Stack-based, best-first search to build the most probabledependency graph given the constraints.

I Anytime transformation-based parsing with constraints[Foth and Menzel 2006]:

1. Use data-driven transition-based parser to derive initialdependency graph.

2. Use graph transformations to improve score relative toweighted constraints.


Other Approaches

Phrase Structure Parsing

I Phrase structure parsers used for dependency parsing:

1. Transform training data from dependencies to phrase structure2. Train a parser on the transformed structures3. Parse new sentences with the trained parser4. Transform parser output from phrase structure to dependencies

I Example:I Parsing Czech with the Collins and Charniak parsers

[Collins et al. 1999, Hall and Novák 2005]

I Note:I Both of these parsers internally extract dependencies from

phrase structures.


Other Approaches

Unsupervised Parsing

I Often we do not have a large corpus with annotateddependency graphs

I Can we still learn to parse dependencies from unlabeled data?I There has been much research along these lines lately

I Lexical attraction [Yuret 1998]I Grammatical bi-grams [Paskin 2001]I Top-down generative models [Klein and Manning 2004]I Contrastive estimation [Smith and Eisner 2005]I Non-projective examples [McDonald and Satta 2007]


Empirical Results

Empirical Results – Overview

I Evaluation metricsI Benchmarks:

I Penn Treebank (Wall Street Journal)I Prague Dependency Treebank

I CoNLL 2006 shared task [Buchholz and Marsi 2006]:I 19 parsers for 13 languagesI Error analysis for the two top systems

[McDonald and Nivre 2007]

I CoNLL 2007 shared task [Nivre et al. 2007]:I 23 parsers for 10 languagesI Domain adaptation for English


Empirical Results

Evaluation Metrics

I Per token:I Labeled attachment score (LAS):

I Percentage of tokens with correct head and label

I Unlabeled attachment score (UAS):I Percentage of tokens with correct head

I Label accuracy (LA):I Percentage of tokens with correct label

I Per sentence:I Labeled complete match (LCM):

I Percentage of sentences with correct labeled graph

I Unlabeled complete match (UCM):I Percentage of sentences with correct unlabeled graph


Empirical Results

State of the Art – English

I Penn Treebank (WSJ) converted to dependency graphsI Transition-based parsers

[Yamada and Matsumoto 2003, Isozaki et al. 2004]I Graph-based parsers

[McDonald et al. 2005a, McDonald and Pereira 2006]I Ensemble parser [Sagae and Lavie 2006, McDonald 2006]I Phrase structure parsers [Collins 1999, Charniak 2000]

Parser UAS UCMMcDonald 93.2 47.1Sagae and Lavie 92.7 –Charniak 92.2 45.2Collins 91.7 43.3McDonald and Pereira 91.5 42.1Isozaki et al. 91.4 40.7McDonald et al. 91.0 37.5Yamada and Matsumoto 90.4 38.4


Empirical Results

State of the Art – CzechI Prague Dependency Treebank (PDT)

I Pseudo-projective transition-based parser [Nilsson et al. 2006]I Non-projective spanning tree parser [McDonald et al. 2005b]I Approximate second-order spanning tree parser

[McDonald and Pereira 2006]I Phrase structure projective (Charniak, Collins)I Phrase structure (Charniak) + corrective modeling

[Hall and Novák 2005]

Parser UAS UCMMcDonald and Pereira 85.2 35.9Hall and Novák 85.1 —Nilsson et al. 84.6 37.7McDonald et al. 84.4 32.3Charniak 84.4 –Collins 81.8 –


Empirical Results

CoNLL Shared Task 2006

I Multilingual dependency parsing:I Train a single parser on data from thirteen languagesI Gold standard annotation (postags, lemmas, etc.)I Main evaluation metric: LAS

I Results:I 19 systems, 17 described in [Buchholz and Marsi 2006]I Considerable variation across languages (top scores):

I Japanese: 91.7%I Turkish: 65.7%

I Best systems:I MSTParser (graph-based) [McDonald et al. 2006]I MaltParser (transition-based) [Nivre et al. 2006]


Empirical Results

MSTParser and MaltParser

MST Malt

Arabic 66.91 66.71Bulgarian 87.57 87.41

Chinese 85.90 86.92Czech 80.18 78.42

Danish 84.79 84.77Dutch 79.19 78.59

German 87.34 85.82Japanese 90.71 91.65

Portuguese 86.82 87.60Slovene 73.44 70.30Spanish 82.25 81.29Swedish 82.55 84.58Turkish 63.19 65.68

Overall 80.83 80.75


Empirical Results

Comparing the Models

I Inference:I Exhaustive (MSTParser)I Greedy (MaltParser)

I Training:I Global structure learning (MSTParser)I Local decision learning (MaltParser)

I Features:I Local features (MSTParser)I Rich decision history (MaltParser)

I Fundamental trade-off:I Global learning and inference vs. rich feature space


Empirical Results

Error Analysis

I Aim:I Relate parsing errors to linguistic and structural properties of

the input and predicted/gold standard dependency graphs

I Three types of factors:I Length factors: sentence length, dependency lengthI Graph factors: tree depth, branching factor, non-projectivityI Linguistic factors: part of speech, dependency type

I Statistics:I Labeled accuracy, precision and recallI Computed over the test sets for all 13 languages


Empirical Results

Sentence Length

10 20 30 40 50 50+Sentence Length (bins of size 10)

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

Depe

nden

cy A

ccur

acy MSTParser

MaltParser

I MaltParser is more accurate than MSTParser for shortsentences (1–10 words) but its performance degrades morewith increasing sentence length.


Empirical Results

Dependency Length

0 5 10 15 20 25 30Dependency Length

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Depe

nden

cy P

recis

ion MSTParser

MaltParser

0 5 10 15 20 25 30Dependency Length

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Depe

nden

cy R

ecal

l MSTParserMaltParser

I MaltParser is more precise than MSTParser for shortdependencies (1–3 words) but its performance degradesdrastically with increasing dependency length (> 10 words).

I MSTParser has more or less constant precision fordependencies longer than 3 words.

I Recall is very similar across systems.


Empirical Results

Tree Depth (Distance to Root)

2 4 6 8 10Distance to Root

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0.88

0.9

Depe

nden

cy P

recis

ion MSTParser

MaltParser

2 4 6 8 10Distance to Root

0.76

0.78

0.8

0.82

0.84

0.86

0.88

Depe

nden

cy R

ecal


I MSTParser is much more precise than MaltParser fordependents of the root and has roughly constant precision fordepth > 1, while MaltParser’s precision improves withincreasing depth (up to 7 arcs).

I Recall is very similar across systems.


Empirical Results

Degrees of Non-Projectivity

0 1 2+Non-Projective Arc Degree

0

0.2

0.4

0.6

0.8

Depe

nden

cy P

recis

ion MSTParser

MaltParser

0 1 2+Non-Projective Arc Degree

0

0.2

0.4

0.6

0.8

1

Depe

nden

cy R

ecal


I Degree of a dependency arc (i , j , k) = The number of wordsin the span min(i , j), . . . ,max(i , j) that are not descendants ofi and have their head outside the span.

I MaltParser has slightly higher precision, and MSTParserslightly higher recall, for non-projective arcs (degree > 0).

I No system predicts arcs with a higher degree than 2.


Empirical Results

Part of Speech

60.0%

65.0%

70.0%

75.0%

80.0%

85.0%

90.0%

95.0%

Verb Noun Pron Adj Adv Adpos Conj

Part of Speech (POS)

Labe

led

Atta

chm

ent S

core

(LAS

)

MSTParserMaltParser

I MSTParser is more accurate for verbs, adjectives, adverbs,adpositions, and conjunctions.

I MaltParser is more accurate for nouns and pronouns.


Empirical Results

Dependency Type: Root, Subject, Object

65.0%

70.0%

75.0%

80.0%

85.0%

90.0%

95.0%

Root Subj Obj

Dependency Type (DEP)

Depe

nden

cy P

recis

ion

MSTParserMaltParser

72.0%

74.0%

76.0%

78.0%

80.0%

82.0%

84.0%

86.0%

88.0%

90.0%

Root Subj Obj

Dependency Type (DEP)

Depe

nden

cy R

ecal

l

MSTParserMaltParser

I MSTParser has higher precision (and recall) for roots.

I MSTParser has higher recall (and precision) for subjects.


Empirical Results

Discussion

I Many of the results are indicative of the fundamentaltrade-off: global learning/inference versus rich features.

I Global inference improves decisions for long sentences andthose near the top of graphs.

I Rich features improve decisions for short sentences and thosenear the leaves of the graphs.

I Important question:I How do we use this to improve parser performance?

I Oracle Experiments:I Graph-based selection: 81% → 85%I Arc-based selection [Sagae and Lavie 2006]: 81% → 87%


Empirical Results

CoNLL Shared Task 2007

I Two tracks:I Multilingual dependency parsing (10 languages)I Domain adaptation (English)

I Results (multilingual track):I 28 systems, 23 described in [Nivre et al. 2007]I A little less variation across languages (top scores):

I English: 89.6%I Greek: 76.3%

I Best systems:I Ensemble systems [Hall et al. 2007, Sagae and Tsujii 2007]I Graph-based systems with global features

[Nakagawa 2007, Carreras 2007]I Transition-based systems with global training

[Titov and Henderson 2007]


Available Software

Available Software – Overview

I Dependency Parsing Wiki:I http:depparse.uvt.nl/depparse-wiki/

I Parsers:I Trainable data-driven parsersI Parsers for specific languages (grammar-based)

I Other tools:I Pseudo-projective parsingI Evaluation softwareI Constituency-to-dependency conversion

I Data sets:I Dependency treebanksI Other treebanks with dependency conversions


http:depparse.uvt.nl/depparse-wiki/

Available Software

Trainable Parsers

I Jason Eisner’s probabilistic dependency parserI Based on bilexical grammarI Contact Jason Eisner: [email protected] Written in LISP

I Ryan McDonald’s MSTParserI Graph-based spanning tree parsers with online learningI URL: http://sourceforge.net/projects/mstparserI Written in Java


http://sourceforge.net/projects/mstparser

Available Software

Trainable Parsers (2)

I Joakim Nivre’s MaltParserI Transition-based parsers with MBL and SVMI URL:

http://w3.msi.vxu.se/~nivre/research/MaltParser.html

I Executable versions are available for Solaris, Linux, Windows,and MacOS (open source version in Java planned for fall 2007)

I Ivan Titov’s ISBN Dependency ParserI Incremental Sigmoid Belief Network Dependency ParserI Transition-based inferenceI URL: http://cui.unige.ch/~titov/idp/I Written in C


http://w3.msi.vxu.se/~nivre/research/MaltParser.htmlhttp://cui.unige.ch/~titov/idp/

Available Software

Parsers for Specific Languages

I Dekang Lin’s MiniparI Principle-based parserI Grammar for EnglishI URL: http://www.cs.ualberta.ca/~lindek/minipar.htmI Executable versions for Linux, Solaris, and Windows

I Wolfgang Menzel’s CDG Parser:I Weighted constraint dependency parserI Grammar for German, (English under construction)I Online demo: http:

//nats-www.informatik.uni-hamburg.de/Papa/ParserDemo

I Download:http://nats-www.informatik.uni-hamburg.de/download


http://www.cs.ualberta.ca/~lindek/minipar.htmhttp://nats-www.informatik.uni-hamburg.de/Papa/ParserDemohttp://nats-www.informatik.uni-hamburg.de/Papa/ParserDemohttp://nats-www.informatik.uni-hamburg.de/download

Available Software

Parsers for Specific Languages (2)

I Taku Kudo’s CaboChaI Based on algorithms of [Kudo and Matsumoto 2002], uses SVMsI URL: http://www.chasen.org/~taku/software/cabocha/I Web page in Japanese

I Gerold Schneider’s Pro3GresI Probability-based dependency parserI Grammar for EnglishI URL: http://www.ifi.unizh.ch/CL/gschneid/parser/I Written in PROLOG

I Daniel Sleator’s & Davy Temperley’s Link Grammar ParserI Undirected links between wordsI Grammar for EnglishI URL: http://www.link.cs.cmu.edu/link/


http://www.chasen.org/~taku/software/cabocha/http://www.ifi.unizh.ch/CL/gschneid/parser/http://www.link.cs.cmu.edu/link/

Available Software

Other Tools

I Pseudo-projective parsing:I Software based on [Nivre and Nilsson 2005]I http://w3.msi.vxu.se/~nivre/research/proj/0.2/doc/Proj.html

I Evaluation software:I CoNLL shared tasks:

I http://nextens.uvt.nl/~conll/software.htmlI http://deppare.uvt.nl/depparse-wiki/SoftwarePage

I Treebank conversion software:I CoNLL 2006 shared task treebanks:

I http://depparse.uvt.nl/depparse-wiki/SoftwarePage

I Penn Treebank:I http://w3.msi.vxu.se/~nivre/research/Penn2Malt.htmlI http://nlp.cs.lth.se/pennconverter/


http://w3.msi.vxu.se/~nivre/research/proj/0.2/doc/Proj.htmlhttp://nextens.uvt.nl/~conll/software.htmlhttp://deppare.uvt.nl/depparse-wiki/SoftwarePagehttp://depparse.uvt.nl/depparse-wiki/SoftwarePagehttp://w3.msi.vxu.se/~nivre/research/Penn2Malt.htmlhttp://nlp.cs.lth.se/pennconverter/

Available Software

Dependency Treebanks

I Arabic: Prague Arabic Dependency Treebank

I Basque: Eus3LB

I Czech: Prague Dependency Treebank

I Danish: Danish Dependency Treebank

I Greek: Greek Dependency Treebank

I Portuguese: Bosque: Floresta sintá(c)tica

I Slovene: Slovene Dependency Treebank

I Turkish: METU-Sabanci Turkish Treebank


Available Software

Other Treebanks

I Bulgarian: BulTreebank

I Catalan: CESS-ECE

I Chinese: Penn Chinese Treebank, Sinica Treebank

I Dutch: Alpino Treebank for Dutch

I English: Penn Treebank

I German: TIGER/NEGRA, TüBa-D/Z

I Hungarian: Szeged Treebank

I Italian: Italian Syntactic-Semantic Treebank

I Japanese: TüBa-J/S

I Spanish: Cast3LB

I Swedish: Talbanken05


Conclusion

Summary

I State of the art in data-driven dependency parsing:I Transition-based modelsI Graph-based modelsI New developments (often) targeting the weaknesses of

standard models

I Empirical results:I CoNLL shared tasks: Dependency parsing results for some

twenty languagesI Many (different) systems achieve similar accuracy, but

performance varies across languages

I Available resources: Try them out!


Treebanks

Dependency Treebanks (1)

I Prague Arabic Dependency TreebankI ca. 100 000 wordsI Available from LDC, license fee

(CoNLL-X shared task data, catalogue number LDC2006E01)I URL: http://ufal.mff.cuni.cz/padt/

I Eus3LBI ca. 50 000 wordsI Restricted availabilityI URL: http://ixa.si.ehu.es/lxa/lkerlerroak


http://ufal.mff.cuni.cz/padt/http://ixa.si.ehu.es/lxa/lkerlerroak

Treebanks


I Prague Dependency TreebankI 1.5 million wordsI 3 layers of annotation: morphological, syntactical,

tectogrammaticalI Available from LDC, license fee

(CoNLL-X shared task data, catalogue number LDC2006E02)I URL: http://ufal.mff.cuni.cz/pdt2.0/

I Danish Dependency TreebankI ca. 5 500 treesI Annotation based on Discontinuous Grammar [Kromann 2003]I Freely downloadableI URL: http://www.id.cbs.dk/~mtk/treebank/


http://ufal.mff.cuni.cz/pdt2.0/http://www.id.cbs.dk/~mtk/treebank/

Treebanks


I Greek Dependency TreebankI ca. 70 000 wordsI Restricted availability.I Contact ILSP, Athens, Greece.

I Bosque, Floresta sintá(c)ticaI ca. 10 000 treesI Freely downloadableI URL: http://acdc.linguateca.pt/treebank/info_floresta_

English.html


http://acdc.linguateca.pt/treebank/info_floresta_English.htmlhttp://acdc.linguateca.pt/treebank/info_floresta_English.html

Treebanks


I Slovene Dependency TreebankI ca. 30 000 wordsI Freely downloadableI URL: http://nl.ijs.si/sdt/

I METU-Sabanci Turkish TreebankI ca. 7 000 treesI Freely available, license agreementI URL: http://www.ii.metu.edu.tr/~corpus/treebank.html


http://nl.ijs.si/sdt/http://www.ii.metu.edu.tr/~corpus/treebank.html

Treebanks

Other Treebanks (1)

I BulTreebankI ca. 14 000 sentencesI URL: http://www.bultreebank.org/I Dependency version available from Kiril Simov

([email protected])

I CESS-ECEI ca. 500 000 wordsI Freely available for researchI URL: http://www.lsi.upc.edu/~mbertran/cess-ece2/I Dependency version available from Toni Marti


http://www.bultreebank.org/[email protected]://www.lsi.upc.edu/~mbertran/cess-ece2/

Treebanks

Other Treebanks (2)

I Penn Chinese TreebankI ca. 4 000 sentencesI Available from LDC, license feeI URL: http://www.cis.upenn.edu/~chinese/ctb.htmlI For conversion with arc labels: Penn2Malt:

http://w3.msi.vxu.se/~nivre/research/Penn2Malt.html

I Sinica TreebankI ca. 61 000 sentencesI Available Academia Sinica, license feeI URL: http:

//godel.iis.sinica.edu.tw/CKIP/engversion/treebank.htm

I Dependency version available from Academia Sinica


http://www.cis.upenn.edu/~chinese/ctb.htmlhttp://w3.msi.vxu.se/~nivre/research/Penn2Malt.htmlhttp://godel.iis.sinica.edu.tw/CKIP/engversion/treebank.htmhttp://godel.iis.sinica.edu.tw/CKIP/engversion/treebank.htm

Treebanks

Other Treebanks (3)

I Alpino Treebank for DutchI ca. 150 000 wordsI Freely downloadableI URL: http://www.let.rug.nl/vannoord/trees/I Dependency version downloadable at

http://nextens.uvt.nl/~conll/free_data.html

I Penn TreebankI ca. 1 million wordsI Available from LDC, license feeI URL: http://www.cis.upenn.edu/~treebank/home.htmlI Conversion to labeled dependencies: Penn2Malt,

pennconverter (see above)


http://www.let.rug.nl/vannoord/trees/http://nextens.uvt.nl/~conll/free_data.htmlhttp://www.cis.upenn.edu/~treebank/home.html

Treebanks

Other Treebanks (4)

I TIGER/NEGRAI ca. 50 000/20 000 sentencesI Freely available, license agreementI TIGER URL: http:

//www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/

NEGRA URL: http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/

I Dependency version of TIGER is included in release

I TüBa-D/ZI ca. 22 000 sentencesI Freely available, license agreementI URL: http://www.sfs.uni-tuebingen.de/en_tuebadz.shtmlI Dependency version available from SfS Tübingen


http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

Treebanks

Other Treebanks (5)

I Szeged TreebankI ca. 82 000 sentences (1.2 million words)I Freely available, license agreementI URL: http://www.inf.u-szeged.hu/hltI Subset in dependency format (6 000 sentences)

I Italian Syntactic-Semantic TreebankI ca. 300 000 wordsI Available through ELDAI URL: http://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=

874/vers=ita

I Dependency version available


http://www.inf.u-szeged.hu/hlthttp://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=874/vers=itahttp://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=874/vers=ita

Treebanks

Other Treebanks (6)

I Cast3LBI ca. 18 000 sentencesI URL: http://www.dlsi.ua.es/projectes/3lb/index_en.htmlI Dependency version available from Toni Mart́ı ([email protected])

I Talbanken05 (Swedish)I ca. 300 000 wordsI Freely downloadableI URL:

http://w3.msi.vxu.se/~nivre/research/Talbanken05.html

I Dependency version also available


http://www.dlsi.ua.es/projectes/3lb/index_en.htmlhttp://w3.msi.vxu.se/~nivre/research/Talbanken05.html

References and Further Reading


I Sabine Buchholz and Erwin Marsi. 2006.CoNLL-X shared task on multilingual dependency parsing. In Proceedings of theTenth Conference on Computational Natural Language Learning, pages 149–164.

I X. Carreras. 2007.Experiments with a high-order projective dependency parser. In Proc. of theCoNLL 2007 Shared Task. EMNLP-CoNLL.

I Eugene Charniak. 2000.A maximum-entropy-inspired parser. In Proceedings of the First Meeting of theNorth American Chapter of the Association for Computational Linguistics(NAACL), pages 132–139.

I Michael Collins, Jan Hajič, Lance Ramshaw, and Christoph Tillmann. 1999.A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of theAssociation for Computational Linguistics (ACL), pages 505–512.

I Michael Collins. 1999.Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis,University of Pennsylvania.

I X. Duan, J. Zhao, and B. Xu. 2007.Probabilistic parsing action models for multi-lingual dependency parsing. In Proc.of the CoNLL 2007 Shared Task. EMNLP-CoNLL.



I Kilian A. Foth and Wolfgang Menzel. 2006.Hybrid parsing: Using probabilistic models as predictors for a symbolic parser. InProceedings of the 21st International Conference on Computational Linguistics and44th Annual Meeting of the Association for Computational Linguistics(COLING-ACL), pages 321–328.

I Keith Hall and Vaclav Novák. 2005.Corrective modeling for non-projective dependency parsing. In Proceedings of the9th International Workshop on Parsing Technologies (IWPT), pages 42–52.

I J. Hall, J. Nilsson, J. Nivre, G. Eryiugit, B. Megyesi, M. Nilsson, and M. Saers.2007.Single malt or blended? A study in multilingual parser optimization. In Proc. of theCoNLL 2007 Shared Task. EMNLP-CoNLL.

I Hideki Isozaki, Hideto Kazawa, and Tsutomu Hirao. 2004.A deterministic word dependency analyzer enhanced with preference learning. InProceedings of the 20th International Conference on Computational Linguistics(COLING), pages 275–281.

I R. Johansson and P. Nugues. 2006.Investigating multilingual dependency parsing. In Proceedings of the TenthConference on Computational Natural Language Learning (CoNLL), pages206–210.

I R. Johansson and P. Nugues. 2007.



Incremental dependency parsing using online learning. In Proc. of the CoNLL 2007Shared Task. EMNLP-CoNLL.

I D. Klein and C. Manning. 2004.Corpus-based induction of syntactic structure: Models of dependency andconstituency. In Proc. ACL.

I Matthias Trautner Kromann. 2003.The Danish Dependency Treebank and the DTAG treebank tool. In Joakim Nivreand Erhard Hinrichs, editors, Proceedings of the Second Workshop on Treebanksand Linguistic Theories (TLT), pages 217–220. Växjö University Press.

I Taku Kudo and Yuji Matsumoto. 2002.Japanese dependency analysis using cascaded chunking. In Proceedings of theSixth Workshop on Computational Language Learning (CoNLL), pages 63–69.

I Ryan McDonald and Joakim Nivre. 2007.Characterizing the errors of data-driven dependency parsing models. In Proceedingsof EMNLP-CoNLL 2007.

I R. McDonald and F. Pereira. 2006.Online learning of approximate dependency parsing algorithms. In Proc EACL.

I R. McDonald and G. Satta. 2007.On the complexity of non-projective data-driven dependency parsing. In Proc.IWPT.



I R. McDonald, K. Crammer, and F. Pereira. 2005a.Online large-margin training of dependency parsers. In Proc. ACL.

I R. McDonald, F. Pereira, K. Ribarov, and J. Hajič. 2005b.Non-projective dependency parsing using spanning tree algorithms. In Proc.HLT/EMNLP.

I R. McDonald, K. Lerman, and F. Pereira. 2006.Multilingual dependency analysis with a two-stage discriminative parser. In Proc.CoNLL.

I R. McDonald. 2006.Discriminative Training and Spanning Tree Algorithms for Dependency Parsing.Ph.D. thesis, University of Pennsylvania.

I T. Nakagawa. 2007.Multilingual dependency parsing using Gibbs sampling. In Proc. of the CoNLL 2007Shared Task. EMNLP-CoNLL.

I Jens Nilsson, Joakim Nivre, and Johan Hall. 2006.Graph transformations in data-driven dependency parsing. In Proceedings of the21st International Conference on Computational Linguistics and 44th AnnualMeeting of the Association for Computational Linguistics (COLING-ACL), pages257–264.

I Joakim Nivre and Jens Nilsson. 2005.



Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meetingof the Association for Computational Linguistics (ACL), pages 99–106.

I Joakim Nivre, Johan Hall, Jens Nilsson, Gülsen Eryiugit, and Svetoslav Marinov.2006.Labeled pseudo-projective dependency parsing with support vector machines. InProceedings of the Tenth Conference on Computational Natural LanguageLearning (CoNLL), pages 221–225.

I Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, SebastianRiedel, and Deniz Yuret. 2007.The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLLShared Task of EMNLP-CoNLL 2007.

I Joakim Nivre. 2003.An efficient algorithm for projective dependency parsing. In Proceedings of the 8thInternational Workshop on Parsing Technologies (IWPT), pages 149–160.

I M.A. Paskin. 2001.Cubic-time parsing and learning algorithms for grammatical bigram models.Technical Report UCB/CSD-01-1148, Computer Science Division, University ofCalifornia Berkeley.

I S. Riedel and J. Clarke. 2006.Incremental integer linear programming for non-projective dependency parsing. InProc. EMNLP.



I K. Sagae and A. Lavie. 2006.Parser combination by reparsing. In Proc. HLT/NAACL.

I K. Sagae and J. Tsujii. 2007.Dependency parsing and domain adaptation with LR models and parser ensembles.In Proc. of the CoNLL 2007 Shared Task. EMNLP-CoNLL.

I N. Smith and J. Eisner. 2005.Guiding unsupervised grammar induction using contrastive estimation. In WorkingNotes of the International Joint Conference on Artificial Intelligence Workshop onGrammatical Inference Applications.

I I. Titov and J. Henderson. 2007.Fast and robust multilingual dependency parsing with a generative latent variablemodel. In Proc. of the CoNLL 2007 Shared Task. EMNLP-CoNLL.

I Wen Wang and Mary P. Harper. 2004.A statistical constraint dependency grammar (CDG) parser. In Proceedings of theWorkshop on Incremental Parsing: Bringing Engineering and Cognition Together(ACL), pages 42–29.

I Hiroyasu Yamada and Yuji Matsumoto. 2003.Statistical dependency analysis with support vector machines. In Proceedings ofthe 8th International Workshop on Parsing Technologies (IWPT), pages 195–206.

I D. Yuret. 1998.



Discovery of linguistic relations using lexical attraction. Ph.D. thesis, MIT.


IntroductionOther ApproachesEmpirical ResultsAvailable SoftwareConclusionAppendixTreebanksReferences and Further Reading

Introduction to Data-Driven [0.1cm]Dependency Parsing · I Statistical constraint dependency...

Documents

Transcript of Introduction to Data-Driven [0.1cm]Dependency Parsing · I Statistical constraint dependency...