Introduction to Data-Driven [0.1cm]Dependency Parsing · I Statistical constraint dependency...

59
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: [email protected] 2 Uppsala University and V¨ axj¨ o University, Sweden E-mail: [email protected] Introduction to Data-Driven Dependency Parsing 1(52)

Transcript of Introduction to Data-Driven [0.1cm]Dependency Parsing · I Statistical constraint dependency...

  • Introduction to Data-DrivenDependency Parsing

    Introductory Course, ESSLLI 2007

    Ryan McDonald1 Joakim Nivre2

    1Google Inc., New York, USAE-mail: [email protected]

    2Uppsala University and Växjö University, SwedenE-mail: [email protected]

    Introduction to Data-Driven Dependency Parsing 1(52)

  • Introduction

    Overview of the Course

    I Dependency parsing (Joakim)

    I Machine learning methods (Ryan)

    I Transition-based models (Joakim)

    I Graph-based models (Ryan)I Loose ends (Joakim, Ryan):

    I Other approachesI Empirical resultsI Available software

    Introduction to Data-Driven Dependency Parsing 2(52)

  • Other Approaches

    Other Approaches – Overview

    I Graph-based methods – new developments

    I Transition-based methods – new developments

    I Ensemble methods

    I Constraint-based methods

    I Phrase structure parsing

    I Unsupervised parsing

    Introduction to Data-Driven Dependency Parsing 3(52)

  • Other Approaches

    Graph-based Methods

    I Last lecture we discussed arc-factored modelsI Models are inherently local

    I Local feature scopeI Local structural constraints

    I This is a strong assumption!!

    I Question: how do we incorporate non-local information?

    Introduction to Data-Driven Dependency Parsing 4(52)

  • Other Approaches

    Integer Linear Programming

    I Often intractable inference problems can be written as IntegerLinear Programming (ILP) problems

    I ILP’s are optimization problems with linear objectives andconstraints

    I Non-projective parsing with global constraints can be writtenas an ILP [Riedel and Clarke 2006]

    I ILP’s are still NP-hard, but have well knownbranch-and-bound solutions

    I First, let’s define a set of binary variablesI akij ∈ {0, 1} is 1 if arc (i , j , k) is in the dependency graphI a is the vector of all variables akij

    Introduction to Data-Driven Dependency Parsing 5(52)

  • Other Approaches

    Integer Linear Programming

    I We can define the arc-factored parsing problem as thefollowing objective function

    arg maxa

    ∑i,j,k

    log wkij · akij

    such that: ∀j > 0,∑

    i,k akij = 1 (single head)∑

    i,k aki0 = 0 (w0 is root)

    ∀ cycles C ,∑

    (i,j,k)∈C akij ≤ |C | − 1 (no cycles)

    I This is an ILP!!

    I Linear objective

    I Linear constraints over integer variables

    Introduction to Data-Driven Dependency Parsing 6(52)

  • Other Approaches

    Integer Linear Programming

    I [Riedel and Clarke 2006] showed that this formulation allows fornon-local constraints

    I e.g., a verb can only have a single subject

    ∀wi that are verbs∑j

    asbjij ≤ 1

    I This is non-local since we are forcing constraints on all themodifiers of wi

    I [Riedel and Clarke 2006] also includes constraints onco-ordination as well as projectivity if desired

    I Is this still data-driven?

    Introduction to Data-Driven Dependency Parsing 7(52)

  • Other Approaches

    Sampling Methods

    I Used for dependency parsing with global features by[Nakagawa 2007]

    I Define a conditional log-linear probability model

    P(G |x) = 1Zx

    ew·f(G)

    I Zx =∑

    G ′ ew·f(G ′)

    I f(G ) is a global feature map – can contain global features ofdependency graph

    I i.e., does not necessarily factor by the arcs

    Introduction to Data-Driven Dependency Parsing 8(52)

  • Other Approaches

    Sampling Methods

    I arg maxG P(G |x) cannot be solved efficientlyI Assume we have N samples from the distribution P(G |x)

    I Can be found efficiently with Gibbs samplingI Call them G1, . . . ,GN

    I We want marginal distribution of the arc (i , j , k), µkij

    µkij ≈N∑

    t=1

    P(Gt |x)1[(i , j , k) ∈ Gt ] ≈1

    N

    N∑t=1

    1[(i , j , k) ∈ Gt ]

    I Since N should be a managable size, this can be calculated

    I Set arc weights wkij = µkij and find the MST

    I w is found using Monte Carlo sampling

    Introduction to Data-Driven Dependency Parsing 9(52)

  • Other Approaches

    Transition-based Methods

    I Transition-based models may suffer fromI Error propagation because of greedy inference,I Label bias because of local training.

    I Recent developments seek to remedy this:I Beam search instead of greedy best-first search

    [Johansson and Nugues 2006, Duan et al. 2007]I Globally trained probabilistic model

    [Johansson and Nugues 2007, Titov and Henderson 2007]

    Introduction to Data-Driven Dependency Parsing 10(52)

  • Other Approaches

    Generative Model [Titov and Henderson 2007]

    I Probabilistic model defined in terms of transitions:

    P(Gcm) = P(c0, c1, . . . , cm) =m∏

    i=1

    P(ci |c0, . . . , ci−1)

    I Similar to HMMsI Transition system of [Nivre 2003] with two modifications:

    I Splits Right-Arc into Right-Arc′ and ShiftI Adds transitions for generating words (generative model)

    I P(ci |c0, c1, . . . , ci−1) modeled by neural networkapproximating Incremental Sigmoid Belief Network (ISBN)

    I Belief network hidden layer acts as feature selection algorithm

    I Parsing with heuristic beam search

    Introduction to Data-Driven Dependency Parsing 11(52)

  • Other Approaches

    Ensemble Methods

    I Input: Sentence x = w0,w1, . . . ,wnI Input: Output of N different parsers for x , G1,G2, . . . ,GNI Output: A single dependency graph G for sentence x

    QuestionHow do we combine the the parsers and their outputs to create a

    new and better parse for sentence x?

    I Assumption: The N parsers make different mistakes

    I Assumption: All of the N parsers are relatively strong

    Introduction to Data-Driven Dependency Parsing 12(52)

  • Other Approaches

    Ensemble Methods [Sagae and Lavie 2006]

    I Simple but elegant solution

    I Use arc-factored graph-based models

    I Set arc weights equal to the number of parsers that predictedthat arc

    wkij = eP

    i αi×1[(i ,j ,k)∈Gi ]

    I αi usually equals 1, but can be modified if prior knowledgeexists

    I Solution: Find MST for Gx with the above weights

    I The resulting graph has on average the arcs that werepreferred by most systems

    Introduction to Data-Driven Dependency Parsing 13(52)

  • Other Approaches

    Ensemble Methods [Sagae and Lavie 2006]

    I Example: ensemble of parsers from this years CoNLLshared-task

    5 10 15 20Number of Systems

    80

    82

    84

    86

    88

    Accu

    racy

    Unlabeled AccuracyLabeled Accuracy

    Introduction to Data-Driven Dependency Parsing 14(52)

  • Other Approaches

    Constraint-based Methods

    I Statistical constraint dependency parsing in two steps[Wang and Harper 2004]:

    1. Supertagging using a trigram Hidden Markov Model to assignthe top n-best constraints to an input sentence x .

    2. Stack-based, best-first search to build the most probabledependency graph given the constraints.

    I Anytime transformation-based parsing with constraints[Foth and Menzel 2006]:

    1. Use data-driven transition-based parser to derive initialdependency graph.

    2. Use graph transformations to improve score relative toweighted constraints.

    Introduction to Data-Driven Dependency Parsing 15(52)

  • Other Approaches

    Phrase Structure Parsing

    I Phrase structure parsers used for dependency parsing:

    1. Transform training data from dependencies to phrase structure2. Train a parser on the transformed structures3. Parse new sentences with the trained parser4. Transform parser output from phrase structure to dependencies

    I Example:I Parsing Czech with the Collins and Charniak parsers

    [Collins et al. 1999, Hall and Novák 2005]

    I Note:I Both of these parsers internally extract dependencies from

    phrase structures.

    Introduction to Data-Driven Dependency Parsing 16(52)

  • Other Approaches

    Unsupervised Parsing

    I Often we do not have a large corpus with annotateddependency graphs

    I Can we still learn to parse dependencies from unlabeled data?I There has been much research along these lines lately

    I Lexical attraction [Yuret 1998]I Grammatical bi-grams [Paskin 2001]I Top-down generative models [Klein and Manning 2004]I Contrastive estimation [Smith and Eisner 2005]I Non-projective examples [McDonald and Satta 2007]

    Introduction to Data-Driven Dependency Parsing 17(52)

  • Empirical Results

    Empirical Results – Overview

    I Evaluation metricsI Benchmarks:

    I Penn Treebank (Wall Street Journal)I Prague Dependency Treebank

    I CoNLL 2006 shared task [Buchholz and Marsi 2006]:I 19 parsers for 13 languagesI Error analysis for the two top systems

    [McDonald and Nivre 2007]

    I CoNLL 2007 shared task [Nivre et al. 2007]:I 23 parsers for 10 languagesI Domain adaptation for English

    Introduction to Data-Driven Dependency Parsing 18(52)

  • Empirical Results

    Evaluation Metrics

    I Per token:I Labeled attachment score (LAS):

    I Percentage of tokens with correct head and label

    I Unlabeled attachment score (UAS):I Percentage of tokens with correct head

    I Label accuracy (LA):I Percentage of tokens with correct label

    I Per sentence:I Labeled complete match (LCM):

    I Percentage of sentences with correct labeled graph

    I Unlabeled complete match (UCM):I Percentage of sentences with correct unlabeled graph

    Introduction to Data-Driven Dependency Parsing 19(52)

  • Empirical Results

    State of the Art – English

    I Penn Treebank (WSJ) converted to dependency graphsI Transition-based parsers

    [Yamada and Matsumoto 2003, Isozaki et al. 2004]I Graph-based parsers

    [McDonald et al. 2005a, McDonald and Pereira 2006]I Ensemble parser [Sagae and Lavie 2006, McDonald 2006]I Phrase structure parsers [Collins 1999, Charniak 2000]

    Parser UAS UCMMcDonald 93.2 47.1Sagae and Lavie 92.7 –Charniak 92.2 45.2Collins 91.7 43.3McDonald and Pereira 91.5 42.1Isozaki et al. 91.4 40.7McDonald et al. 91.0 37.5Yamada and Matsumoto 90.4 38.4

    Introduction to Data-Driven Dependency Parsing 20(52)

  • Empirical Results

    State of the Art – CzechI Prague Dependency Treebank (PDT)

    I Pseudo-projective transition-based parser [Nilsson et al. 2006]I Non-projective spanning tree parser [McDonald et al. 2005b]I Approximate second-order spanning tree parser

    [McDonald and Pereira 2006]I Phrase structure projective (Charniak, Collins)I Phrase structure (Charniak) + corrective modeling

    [Hall and Novák 2005]

    Parser UAS UCMMcDonald and Pereira 85.2 35.9Hall and Novák 85.1 —Nilsson et al. 84.6 37.7McDonald et al. 84.4 32.3Charniak 84.4 –Collins 81.8 –

    Introduction to Data-Driven Dependency Parsing 21(52)

  • Empirical Results

    CoNLL Shared Task 2006

    I Multilingual dependency parsing:I Train a single parser on data from thirteen languagesI Gold standard annotation (postags, lemmas, etc.)I Main evaluation metric: LAS

    I Results:I 19 systems, 17 described in [Buchholz and Marsi 2006]I Considerable variation across languages (top scores):

    I Japanese: 91.7%I Turkish: 65.7%

    I Best systems:I MSTParser (graph-based) [McDonald et al. 2006]I MaltParser (transition-based) [Nivre et al. 2006]

    Introduction to Data-Driven Dependency Parsing 22(52)

  • Empirical Results

    MSTParser and MaltParser

    MST Malt

    Arabic 66.91 66.71Bulgarian 87.57 87.41

    Chinese 85.90 86.92Czech 80.18 78.42

    Danish 84.79 84.77Dutch 79.19 78.59

    German 87.34 85.82Japanese 90.71 91.65

    Portuguese 86.82 87.60Slovene 73.44 70.30Spanish 82.25 81.29Swedish 82.55 84.58Turkish 63.19 65.68

    Overall 80.83 80.75

    Introduction to Data-Driven Dependency Parsing 23(52)

  • Empirical Results

    Comparing the Models

    I Inference:I Exhaustive (MSTParser)I Greedy (MaltParser)

    I Training:I Global structure learning (MSTParser)I Local decision learning (MaltParser)

    I Features:I Local features (MSTParser)I Rich decision history (MaltParser)

    I Fundamental trade-off:I Global learning and inference vs. rich feature space

    Introduction to Data-Driven Dependency Parsing 24(52)

  • Empirical Results

    Error Analysis

    I Aim:I Relate parsing errors to linguistic and structural properties of

    the input and predicted/gold standard dependency graphs

    I Three types of factors:I Length factors: sentence length, dependency lengthI Graph factors: tree depth, branching factor, non-projectivityI Linguistic factors: part of speech, dependency type

    I Statistics:I Labeled accuracy, precision and recallI Computed over the test sets for all 13 languages

    Introduction to Data-Driven Dependency Parsing 25(52)

  • Empirical Results

    Sentence Length

    10 20 30 40 50 50+Sentence Length (bins of size 10)

    0.7

    0.72

    0.74

    0.76

    0.78

    0.8

    0.82

    0.84

    Depe

    nden

    cy A

    ccur

    acy MSTParser

    MaltParser

    I MaltParser is more accurate than MSTParser for shortsentences (1–10 words) but its performance degrades morewith increasing sentence length.

    Introduction to Data-Driven Dependency Parsing 26(52)

  • Empirical Results

    Dependency Length

    0 5 10 15 20 25 30Dependency Length

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    Depe

    nden

    cy P

    recis

    ion MSTParser

    MaltParser

    0 5 10 15 20 25 30Dependency Length

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    Depe

    nden

    cy R

    ecal

    l MSTParserMaltParser

    I MaltParser is more precise than MSTParser for shortdependencies (1–3 words) but its performance degradesdrastically with increasing dependency length (> 10 words).

    I MSTParser has more or less constant precision fordependencies longer than 3 words.

    I Recall is very similar across systems.

    Introduction to Data-Driven Dependency Parsing 27(52)

  • Empirical Results

    Tree Depth (Distance to Root)

    2 4 6 8 10Distance to Root

    0.74

    0.76

    0.78

    0.8

    0.82

    0.84

    0.86

    0.88

    0.9

    Depe

    nden

    cy P

    recis

    ion MSTParser

    MaltParser

    2 4 6 8 10Distance to Root

    0.76

    0.78

    0.8

    0.82

    0.84

    0.86

    0.88

    Depe

    nden

    cy R

    ecal

    l MSTParserMaltParser

    I MSTParser is much more precise than MaltParser fordependents of the root and has roughly constant precision fordepth > 1, while MaltParser’s precision improves withincreasing depth (up to 7 arcs).

    I Recall is very similar across systems.

    Introduction to Data-Driven Dependency Parsing 28(52)

  • Empirical Results

    Degrees of Non-Projectivity

    0 1 2+Non-Projective Arc Degree

    0

    0.2

    0.4

    0.6

    0.8

    Depe

    nden

    cy P

    recis

    ion MSTParser

    MaltParser

    0 1 2+Non-Projective Arc Degree

    0

    0.2

    0.4

    0.6

    0.8

    1

    Depe

    nden

    cy R

    ecal

    l MSTParserMaltParser

    I Degree of a dependency arc (i , j , k) = The number of wordsin the span min(i , j), . . . ,max(i , j) that are not descendants ofi and have their head outside the span.

    I MaltParser has slightly higher precision, and MSTParserslightly higher recall, for non-projective arcs (degree > 0).

    I No system predicts arcs with a higher degree than 2.

    Introduction to Data-Driven Dependency Parsing 29(52)

  • Empirical Results

    Part of Speech

    60.0%

    65.0%

    70.0%

    75.0%

    80.0%

    85.0%

    90.0%

    95.0%

    Verb Noun Pron Adj Adv Adpos Conj

    Part of Speech (POS)

    Labe

    led

    Atta

    chm

    ent S

    core

    (LAS

    )

    MSTParserMaltParser

    I MSTParser is more accurate for verbs, adjectives, adverbs,adpositions, and conjunctions.

    I MaltParser is more accurate for nouns and pronouns.

    Introduction to Data-Driven Dependency Parsing 30(52)

  • Empirical Results

    Dependency Type: Root, Subject, Object

    65.0%

    70.0%

    75.0%

    80.0%

    85.0%

    90.0%

    95.0%

    Root Subj Obj

    Dependency Type (DEP)

    Depe

    nden

    cy P

    recis

    ion

    MSTParserMaltParser

    72.0%

    74.0%

    76.0%

    78.0%

    80.0%

    82.0%

    84.0%

    86.0%

    88.0%

    90.0%

    Root Subj Obj

    Dependency Type (DEP)

    Depe

    nden

    cy R

    ecal

    l

    MSTParserMaltParser

    I MSTParser has higher precision (and recall) for roots.

    I MSTParser has higher recall (and precision) for subjects.

    Introduction to Data-Driven Dependency Parsing 31(52)

  • Empirical Results

    Discussion

    I Many of the results are indicative of the fundamentaltrade-off: global learning/inference versus rich features.

    I Global inference improves decisions for long sentences andthose near the top of graphs.

    I Rich features improve decisions for short sentences and thosenear the leaves of the graphs.

    I Important question:I How do we use this to improve parser performance?

    I Oracle Experiments:I Graph-based selection: 81% → 85%I Arc-based selection [Sagae and Lavie 2006]: 81% → 87%

    Introduction to Data-Driven Dependency Parsing 32(52)

  • Empirical Results

    CoNLL Shared Task 2007

    I Two tracks:I Multilingual dependency parsing (10 languages)I Domain adaptation (English)

    I Results (multilingual track):I 28 systems, 23 described in [Nivre et al. 2007]I A little less variation across languages (top scores):

    I English: 89.6%I Greek: 76.3%

    I Best systems:I Ensemble systems [Hall et al. 2007, Sagae and Tsujii 2007]I Graph-based systems with global features

    [Nakagawa 2007, Carreras 2007]I Transition-based systems with global training

    [Titov and Henderson 2007]

    Introduction to Data-Driven Dependency Parsing 33(52)

  • Available Software

    Available Software – Overview

    I Dependency Parsing Wiki:I http:depparse.uvt.nl/depparse-wiki/

    I Parsers:I Trainable data-driven parsersI Parsers for specific languages (grammar-based)

    I Other tools:I Pseudo-projective parsingI Evaluation softwareI Constituency-to-dependency conversion

    I Data sets:I Dependency treebanksI Other treebanks with dependency conversions

    Introduction to Data-Driven Dependency Parsing 34(52)

    http:depparse.uvt.nl/depparse-wiki/

  • Available Software

    Trainable Parsers

    I Jason Eisner’s probabilistic dependency parserI Based on bilexical grammarI Contact Jason Eisner: [email protected] Written in LISP

    I Ryan McDonald’s MSTParserI Graph-based spanning tree parsers with online learningI URL: http://sourceforge.net/projects/mstparserI Written in Java

    Introduction to Data-Driven Dependency Parsing 35(52)

    http://sourceforge.net/projects/mstparser

  • Available Software

    Trainable Parsers (2)

    I Joakim Nivre’s MaltParserI Transition-based parsers with MBL and SVMI URL:

    http://w3.msi.vxu.se/~nivre/research/MaltParser.html

    I Executable versions are available for Solaris, Linux, Windows,and MacOS (open source version in Java planned for fall 2007)

    I Ivan Titov’s ISBN Dependency ParserI Incremental Sigmoid Belief Network Dependency ParserI Transition-based inferenceI URL: http://cui.unige.ch/~titov/idp/I Written in C

    Introduction to Data-Driven Dependency Parsing 36(52)

    http://w3.msi.vxu.se/~nivre/research/MaltParser.htmlhttp://cui.unige.ch/~titov/idp/

  • Available Software

    Parsers for Specific Languages

    I Dekang Lin’s MiniparI Principle-based parserI Grammar for EnglishI URL: http://www.cs.ualberta.ca/~lindek/minipar.htmI Executable versions for Linux, Solaris, and Windows

    I Wolfgang Menzel’s CDG Parser:I Weighted constraint dependency parserI Grammar for German, (English under construction)I Online demo: http:

    //nats-www.informatik.uni-hamburg.de/Papa/ParserDemo

    I Download:http://nats-www.informatik.uni-hamburg.de/download

    Introduction to Data-Driven Dependency Parsing 37(52)

    http://www.cs.ualberta.ca/~lindek/minipar.htmhttp://nats-www.informatik.uni-hamburg.de/Papa/ParserDemohttp://nats-www.informatik.uni-hamburg.de/Papa/ParserDemohttp://nats-www.informatik.uni-hamburg.de/download

  • Available Software

    Parsers for Specific Languages (2)

    I Taku Kudo’s CaboChaI Based on algorithms of [Kudo and Matsumoto 2002], uses SVMsI URL: http://www.chasen.org/~taku/software/cabocha/I Web page in Japanese

    I Gerold Schneider’s Pro3GresI Probability-based dependency parserI Grammar for EnglishI URL: http://www.ifi.unizh.ch/CL/gschneid/parser/I Written in PROLOG

    I Daniel Sleator’s & Davy Temperley’s Link Grammar ParserI Undirected links between wordsI Grammar for EnglishI URL: http://www.link.cs.cmu.edu/link/

    Introduction to Data-Driven Dependency Parsing 38(52)

    http://www.chasen.org/~taku/software/cabocha/http://www.ifi.unizh.ch/CL/gschneid/parser/http://www.link.cs.cmu.edu/link/

  • Available Software

    Other Tools

    I Pseudo-projective parsing:I Software based on [Nivre and Nilsson 2005]I http://w3.msi.vxu.se/~nivre/research/proj/0.2/doc/Proj.html

    I Evaluation software:I CoNLL shared tasks:

    I http://nextens.uvt.nl/~conll/software.htmlI http://deppare.uvt.nl/depparse-wiki/SoftwarePage

    I Treebank conversion software:I CoNLL 2006 shared task treebanks:

    I http://depparse.uvt.nl/depparse-wiki/SoftwarePage

    I Penn Treebank:I http://w3.msi.vxu.se/~nivre/research/Penn2Malt.htmlI http://nlp.cs.lth.se/pennconverter/

    Introduction to Data-Driven Dependency Parsing 39(52)

    http://w3.msi.vxu.se/~nivre/research/proj/0.2/doc/Proj.htmlhttp://nextens.uvt.nl/~conll/software.htmlhttp://deppare.uvt.nl/depparse-wiki/SoftwarePagehttp://depparse.uvt.nl/depparse-wiki/SoftwarePagehttp://w3.msi.vxu.se/~nivre/research/Penn2Malt.htmlhttp://nlp.cs.lth.se/pennconverter/

  • Available Software

    Dependency Treebanks

    I Arabic: Prague Arabic Dependency Treebank

    I Basque: Eus3LB

    I Czech: Prague Dependency Treebank

    I Danish: Danish Dependency Treebank

    I Greek: Greek Dependency Treebank

    I Portuguese: Bosque: Floresta sintá(c)tica

    I Slovene: Slovene Dependency Treebank

    I Turkish: METU-Sabanci Turkish Treebank

    Introduction to Data-Driven Dependency Parsing 40(52)

  • Available Software

    Other Treebanks

    I Bulgarian: BulTreebank

    I Catalan: CESS-ECE

    I Chinese: Penn Chinese Treebank, Sinica Treebank

    I Dutch: Alpino Treebank for Dutch

    I English: Penn Treebank

    I German: TIGER/NEGRA, TüBa-D/Z

    I Hungarian: Szeged Treebank

    I Italian: Italian Syntactic-Semantic Treebank

    I Japanese: TüBa-J/S

    I Spanish: Cast3LB

    I Swedish: Talbanken05

    Introduction to Data-Driven Dependency Parsing 41(52)

  • Conclusion

    Summary

    I State of the art in data-driven dependency parsing:I Transition-based modelsI Graph-based modelsI New developments (often) targeting the weaknesses of

    standard models

    I Empirical results:I CoNLL shared tasks: Dependency parsing results for some

    twenty languagesI Many (different) systems achieve similar accuracy, but

    performance varies across languages

    I Available resources: Try them out!

    Introduction to Data-Driven Dependency Parsing 42(52)

  • Treebanks

    Dependency Treebanks (1)

    I Prague Arabic Dependency TreebankI ca. 100 000 wordsI Available from LDC, license fee

    (CoNLL-X shared task data, catalogue number LDC2006E01)I URL: http://ufal.mff.cuni.cz/padt/

    I Eus3LBI ca. 50 000 wordsI Restricted availabilityI URL: http://ixa.si.ehu.es/lxa/lkerlerroak

    Introduction to Data-Driven Dependency Parsing 43(52)

    http://ufal.mff.cuni.cz/padt/http://ixa.si.ehu.es/lxa/lkerlerroak

  • Treebanks

    Dependency Treebanks (2)

    I Prague Dependency TreebankI 1.5 million wordsI 3 layers of annotation: morphological, syntactical,

    tectogrammaticalI Available from LDC, license fee

    (CoNLL-X shared task data, catalogue number LDC2006E02)I URL: http://ufal.mff.cuni.cz/pdt2.0/

    I Danish Dependency TreebankI ca. 5 500 treesI Annotation based on Discontinuous Grammar [Kromann 2003]I Freely downloadableI URL: http://www.id.cbs.dk/~mtk/treebank/

    Introduction to Data-Driven Dependency Parsing 44(52)

    http://ufal.mff.cuni.cz/pdt2.0/http://www.id.cbs.dk/~mtk/treebank/

  • Treebanks

    Dependency Treebanks (3)

    I Greek Dependency TreebankI ca. 70 000 wordsI Restricted availability.I Contact ILSP, Athens, Greece.

    I Bosque, Floresta sintá(c)ticaI ca. 10 000 treesI Freely downloadableI URL: http://acdc.linguateca.pt/treebank/info_floresta_

    English.html

    Introduction to Data-Driven Dependency Parsing 45(52)

    http://acdc.linguateca.pt/treebank/info_floresta_English.htmlhttp://acdc.linguateca.pt/treebank/info_floresta_English.html

  • Treebanks

    Dependency Treebanks (4)

    I Slovene Dependency TreebankI ca. 30 000 wordsI Freely downloadableI URL: http://nl.ijs.si/sdt/

    I METU-Sabanci Turkish TreebankI ca. 7 000 treesI Freely available, license agreementI URL: http://www.ii.metu.edu.tr/~corpus/treebank.html

    Introduction to Data-Driven Dependency Parsing 46(52)

    http://nl.ijs.si/sdt/http://www.ii.metu.edu.tr/~corpus/treebank.html

  • Treebanks

    Other Treebanks (1)

    I BulTreebankI ca. 14 000 sentencesI URL: http://www.bultreebank.org/I Dependency version available from Kiril Simov

    ([email protected])

    I CESS-ECEI ca. 500 000 wordsI Freely available for researchI URL: http://www.lsi.upc.edu/~mbertran/cess-ece2/I Dependency version available from Toni Marti

    Introduction to Data-Driven Dependency Parsing 47(52)

    http://www.bultreebank.org/[email protected]://www.lsi.upc.edu/~mbertran/cess-ece2/

  • Treebanks

    Other Treebanks (2)

    I Penn Chinese TreebankI ca. 4 000 sentencesI Available from LDC, license feeI URL: http://www.cis.upenn.edu/~chinese/ctb.htmlI For conversion with arc labels: Penn2Malt:

    http://w3.msi.vxu.se/~nivre/research/Penn2Malt.html

    I Sinica TreebankI ca. 61 000 sentencesI Available Academia Sinica, license feeI URL: http:

    //godel.iis.sinica.edu.tw/CKIP/engversion/treebank.htm

    I Dependency version available from Academia Sinica

    Introduction to Data-Driven Dependency Parsing 48(52)

    http://www.cis.upenn.edu/~chinese/ctb.htmlhttp://w3.msi.vxu.se/~nivre/research/Penn2Malt.htmlhttp://godel.iis.sinica.edu.tw/CKIP/engversion/treebank.htmhttp://godel.iis.sinica.edu.tw/CKIP/engversion/treebank.htm

  • Treebanks

    Other Treebanks (3)

    I Alpino Treebank for DutchI ca. 150 000 wordsI Freely downloadableI URL: http://www.let.rug.nl/vannoord/trees/I Dependency version downloadable at

    http://nextens.uvt.nl/~conll/free_data.html

    I Penn TreebankI ca. 1 million wordsI Available from LDC, license feeI URL: http://www.cis.upenn.edu/~treebank/home.htmlI Conversion to labeled dependencies: Penn2Malt,

    pennconverter (see above)

    Introduction to Data-Driven Dependency Parsing 49(52)

    http://www.let.rug.nl/vannoord/trees/http://nextens.uvt.nl/~conll/free_data.htmlhttp://www.cis.upenn.edu/~treebank/home.html

  • Treebanks

    Other Treebanks (4)

    I TIGER/NEGRAI ca. 50 000/20 000 sentencesI Freely available, license agreementI TIGER URL: http:

    //www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/

    NEGRA URL: http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/

    I Dependency version of TIGER is included in release

    I TüBa-D/ZI ca. 22 000 sentencesI Freely available, license agreementI URL: http://www.sfs.uni-tuebingen.de/en_tuebadz.shtmlI Dependency version available from SfS Tübingen

    Introduction to Data-Driven Dependency Parsing 50(52)

    http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

  • Treebanks

    Other Treebanks (5)

    I Szeged TreebankI ca. 82 000 sentences (1.2 million words)I Freely available, license agreementI URL: http://www.inf.u-szeged.hu/hltI Subset in dependency format (6 000 sentences)

    I Italian Syntactic-Semantic TreebankI ca. 300 000 wordsI Available through ELDAI URL: http://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=

    874/vers=ita

    I Dependency version available

    Introduction to Data-Driven Dependency Parsing 51(52)

    http://www.inf.u-szeged.hu/hlthttp://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=874/vers=itahttp://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=874/vers=ita

  • Treebanks

    Other Treebanks (6)

    I Cast3LBI ca. 18 000 sentencesI URL: http://www.dlsi.ua.es/projectes/3lb/index_en.htmlI Dependency version available from Toni Mart́ı ([email protected])

    I Talbanken05 (Swedish)I ca. 300 000 wordsI Freely downloadableI URL:

    http://w3.msi.vxu.se/~nivre/research/Talbanken05.html

    I Dependency version also available

    Introduction to Data-Driven Dependency Parsing 52(52)

    http://www.dlsi.ua.es/projectes/3lb/index_en.htmlhttp://w3.msi.vxu.se/~nivre/research/Talbanken05.html

  • References and Further Reading

    References and Further Reading

    I Sabine Buchholz and Erwin Marsi. 2006.CoNLL-X shared task on multilingual dependency parsing. In Proceedings of theTenth Conference on Computational Natural Language Learning, pages 149–164.

    I X. Carreras. 2007.Experiments with a high-order projective dependency parser. In Proc. of theCoNLL 2007 Shared Task. EMNLP-CoNLL.

    I Eugene Charniak. 2000.A maximum-entropy-inspired parser. In Proceedings of the First Meeting of theNorth American Chapter of the Association for Computational Linguistics(NAACL), pages 132–139.

    I Michael Collins, Jan Hajič, Lance Ramshaw, and Christoph Tillmann. 1999.A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of theAssociation for Computational Linguistics (ACL), pages 505–512.

    I Michael Collins. 1999.Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis,University of Pennsylvania.

    I X. Duan, J. Zhao, and B. Xu. 2007.Probabilistic parsing action models for multi-lingual dependency parsing. In Proc.of the CoNLL 2007 Shared Task. EMNLP-CoNLL.

    Introduction to Data-Driven Dependency Parsing 52(52)

  • References and Further Reading

    I Kilian A. Foth and Wolfgang Menzel. 2006.Hybrid parsing: Using probabilistic models as predictors for a symbolic parser. InProceedings of the 21st International Conference on Computational Linguistics and44th Annual Meeting of the Association for Computational Linguistics(COLING-ACL), pages 321–328.

    I Keith Hall and Vaclav Novák. 2005.Corrective modeling for non-projective dependency parsing. In Proceedings of the9th International Workshop on Parsing Technologies (IWPT), pages 42–52.

    I J. Hall, J. Nilsson, J. Nivre, G. Eryiugit, B. Megyesi, M. Nilsson, and M. Saers.2007.Single malt or blended? A study in multilingual parser optimization. In Proc. of theCoNLL 2007 Shared Task. EMNLP-CoNLL.

    I Hideki Isozaki, Hideto Kazawa, and Tsutomu Hirao. 2004.A deterministic word dependency analyzer enhanced with preference learning. InProceedings of the 20th International Conference on Computational Linguistics(COLING), pages 275–281.

    I R. Johansson and P. Nugues. 2006.Investigating multilingual dependency parsing. In Proceedings of the TenthConference on Computational Natural Language Learning (CoNLL), pages206–210.

    I R. Johansson and P. Nugues. 2007.

    Introduction to Data-Driven Dependency Parsing 52(52)

  • References and Further Reading

    Incremental dependency parsing using online learning. In Proc. of the CoNLL 2007Shared Task. EMNLP-CoNLL.

    I D. Klein and C. Manning. 2004.Corpus-based induction of syntactic structure: Models of dependency andconstituency. In Proc. ACL.

    I Matthias Trautner Kromann. 2003.The Danish Dependency Treebank and the DTAG treebank tool. In Joakim Nivreand Erhard Hinrichs, editors, Proceedings of the Second Workshop on Treebanksand Linguistic Theories (TLT), pages 217–220. Växjö University Press.

    I Taku Kudo and Yuji Matsumoto. 2002.Japanese dependency analysis using cascaded chunking. In Proceedings of theSixth Workshop on Computational Language Learning (CoNLL), pages 63–69.

    I Ryan McDonald and Joakim Nivre. 2007.Characterizing the errors of data-driven dependency parsing models. In Proceedingsof EMNLP-CoNLL 2007.

    I R. McDonald and F. Pereira. 2006.Online learning of approximate dependency parsing algorithms. In Proc EACL.

    I R. McDonald and G. Satta. 2007.On the complexity of non-projective data-driven dependency parsing. In Proc.IWPT.

    Introduction to Data-Driven Dependency Parsing 52(52)

  • References and Further Reading

    I R. McDonald, K. Crammer, and F. Pereira. 2005a.Online large-margin training of dependency parsers. In Proc. ACL.

    I R. McDonald, F. Pereira, K. Ribarov, and J. Hajič. 2005b.Non-projective dependency parsing using spanning tree algorithms. In Proc.HLT/EMNLP.

    I R. McDonald, K. Lerman, and F. Pereira. 2006.Multilingual dependency analysis with a two-stage discriminative parser. In Proc.CoNLL.

    I R. McDonald. 2006.Discriminative Training and Spanning Tree Algorithms for Dependency Parsing.Ph.D. thesis, University of Pennsylvania.

    I T. Nakagawa. 2007.Multilingual dependency parsing using Gibbs sampling. In Proc. of the CoNLL 2007Shared Task. EMNLP-CoNLL.

    I Jens Nilsson, Joakim Nivre, and Johan Hall. 2006.Graph transformations in data-driven dependency parsing. In Proceedings of the21st International Conference on Computational Linguistics and 44th AnnualMeeting of the Association for Computational Linguistics (COLING-ACL), pages257–264.

    I Joakim Nivre and Jens Nilsson. 2005.

    Introduction to Data-Driven Dependency Parsing 52(52)

  • References and Further Reading

    Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meetingof the Association for Computational Linguistics (ACL), pages 99–106.

    I Joakim Nivre, Johan Hall, Jens Nilsson, Gülsen Eryiugit, and Svetoslav Marinov.2006.Labeled pseudo-projective dependency parsing with support vector machines. InProceedings of the Tenth Conference on Computational Natural LanguageLearning (CoNLL), pages 221–225.

    I Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, SebastianRiedel, and Deniz Yuret. 2007.The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLLShared Task of EMNLP-CoNLL 2007.

    I Joakim Nivre. 2003.An efficient algorithm for projective dependency parsing. In Proceedings of the 8thInternational Workshop on Parsing Technologies (IWPT), pages 149–160.

    I M.A. Paskin. 2001.Cubic-time parsing and learning algorithms for grammatical bigram models.Technical Report UCB/CSD-01-1148, Computer Science Division, University ofCalifornia Berkeley.

    I S. Riedel and J. Clarke. 2006.Incremental integer linear programming for non-projective dependency parsing. InProc. EMNLP.

    Introduction to Data-Driven Dependency Parsing 52(52)

  • References and Further Reading

    I K. Sagae and A. Lavie. 2006.Parser combination by reparsing. In Proc. HLT/NAACL.

    I K. Sagae and J. Tsujii. 2007.Dependency parsing and domain adaptation with LR models and parser ensembles.In Proc. of the CoNLL 2007 Shared Task. EMNLP-CoNLL.

    I N. Smith and J. Eisner. 2005.Guiding unsupervised grammar induction using contrastive estimation. In WorkingNotes of the International Joint Conference on Artificial Intelligence Workshop onGrammatical Inference Applications.

    I I. Titov and J. Henderson. 2007.Fast and robust multilingual dependency parsing with a generative latent variablemodel. In Proc. of the CoNLL 2007 Shared Task. EMNLP-CoNLL.

    I Wen Wang and Mary P. Harper. 2004.A statistical constraint dependency grammar (CDG) parser. In Proceedings of theWorkshop on Incremental Parsing: Bringing Engineering and Cognition Together(ACL), pages 42–29.

    I Hiroyasu Yamada and Yuji Matsumoto. 2003.Statistical dependency analysis with support vector machines. In Proceedings ofthe 8th International Workshop on Parsing Technologies (IWPT), pages 195–206.

    I D. Yuret. 1998.

    Introduction to Data-Driven Dependency Parsing 52(52)

  • References and Further Reading

    Discovery of linguistic relations using lexical attraction. Ph.D. thesis, MIT.

    Introduction to Data-Driven Dependency Parsing 52(52)

    IntroductionOther ApproachesEmpirical ResultsAvailable SoftwareConclusionAppendixTreebanksReferences and Further Reading