Download - Dependency Parsing Some slides are based on: 1)PPT presentation on dependency parsing by Prashanth Mannem 2)Seven Lectures on Statistical Parsing by Christopher.

Dependency Parsing Some slides are based on: 1)PPT presentation on dependency parsing by Prashanth Mannem 2)Seven Lectures on Statistical Parsing by Christopher Manning

Constituency parsing Breaks sentence into constituents (phrases), which are then broken into smaller constituents Describes phrase structure and clause structure ( NP, PP, VP, etc.) Structures often recursive

momisanamazingshow VP NP VPNP S

Dependency parsing Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies Interested in grammatical relations between individual words (governing & dependent words) Does not propose a recursive structure, rather a network of relations These relations can also have labels

Dependency vs. Constituency Dependency structures explicitly represent Head-dependent relations (directed arcs) Functional categories (arc labels) Possibly some structural categories (parts-of-speech) Constituency structure explicitly represent Phrases (non-terminal nodes) Structural categories (non-terminal labels) Possibly some functional categories (grammatical functions)

Dependency vs. Constituency A dependency grammar has a notion of a head Officially, CFGs dont But modern linguistic theory and all modern statistical parsers (Charniak, Collins, ) do, via hand- written phrasal head rules: The head of a Noun Phrase is a noun/number/ The head of a Verb Phrase is a verb/modal/. Based on a slide by Chris Manning

Dependency vs. Constituency The head rules can be used to extract a dependency parse from a CFG parse (follow the heads) A phrase structure tree can be got from a dependency tree, but dependents are flat Based on a slide by Chris Manning

Definition: dependency graph An input word sequence w 1 w n Dependency graph G = (V,E) where V is the set of nodes i.e. word tokens in the input seq. E is the set of unlabeled tree edges (i, j) i, j V (i i, j) indicates an edge from i (parent, head, governor) to j (child, dependent)

Definition: dependency graph A dependency graph is well-formed iff Single head: Each word has only one head Acyclic: The graph should be acyclic Connected: The graph should be a single tree with all the words in the sentence Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)

Non-projective dependencies Ram saw a dog yesterday which was a Yorkshire Terrier

Parsing algorithms Dependency based parsers can be broadly categorized into Grammar driven approaches Parsing done using grammars Data driven approaches Parsing by training on annotated/un-annotated data

Unlabeled graphs Dan Klein recently showed that labeling is relatively easy and that the difculty of parsing lies in creating bracketing (Klein, 2004) Therefore some parsers run in two steps: 1) bracketing; 2) labeling

Traditions Dynamic programming e.g., Eisner (1996), McDonald (2006) Deterministic search e.g., Covington (2001), Yamada and Matsumoto, Nivre (2006) Constraints satisfaction e.g., Maruyama, Foth et al.

Data driven Two main approaches Global, Exhaustive, Graph-based parsing Local, greedy, transition-based parsing

Graph-based parsing Assume there is a scoring function: The score of a graph is Parsing for input string x is All dependency graphs

MST algorithm (McDonald, 2006) Scores are based on features, independent of other dependencies Features can be Head and dependent word and POS separately Head and dependent word and POS bigram features Words between head and dependent Length and direction of dependency

MST algorithm (McDonald, 2006) Parsing can be formulated as maximum spanning tree problem Use Chu-Liu-Edmonds (CLE) algorithm for MST (runs in, considers non-projective arcs) Uses online learning for determining weight vector w

Transition-based parsing A transition system for dependency parsing defines: a set C of parser configurations, each of which defines a (partially built) dependency graph G a set T of transitions, each a function t :C C for every sentence x = w 0,w 1,...,w n a unique initial configuration c x a set Qx of terminal configurations

Transition sequence A transition sequence Cx,m = (c x, c 1,..., c m ) for a sentence x is a sequence of configurations such that and, for every there is a transition such that The graph defined by is the dependency graph of x

Transition scoring function The score of a transition t in a configuration c s(c, t) represents the likelihood of taking transition t out of configuration c Parsing is finding the optimal transition sequence ( )

Yamada and Matsumoto (2003) A transition-based (shift-reduce) parser Considers two adjacent words Runs in iterations, continues as long as new dependencies are created In every iteration, consider 3 different actions and choose one using SVM (or other discriminative learning technique) Time complexity Accuracy was shown to be close to the state-of-the-art algorithms (e.g., Eisners)

Y&M (2003) Actions Shift Left Right

Y&M (2003) Learning Features (lemma, POS tag) are collected from the context

Stack-based parsing Introducing a stack and a buffer The buffer is a queue of all input words (left to right) The stack begins empty; words are pushed to the stack by the defined actions Reduces Y&M complexity to linear time

2 stack-based parsers Nivres (2003, 2006) arc-standard Stack Buffer i doesnt have a head already j doesnt have a head already

2 stack-based parsers Nivres (2003, 2006) arc-eager

Example (arc eager) Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figureson the screenindicatedfallingstocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figureson the screenindicatedfallingstocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figureson thescreen indicatedfallingstocks_ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)

Example Red figures onthescreen indicatedfallingstocks_ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Shift Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreen indicated falling stocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreen indicated falling stocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreen indicated fallingstocks _ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)

Example Red figuresonthescreenindicatedfallingstocks _ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)

Graph (MSTParser) vs. Transitions (MaltParser) Accuracy on different languages Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007

Graph (MSTParser) vs. Transitions (MaltParser) Sentence length vs. accuracy Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007

Graph (MSTParser) vs. Transitions (MaltParser) Dependency length vs. precision Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007

Known Parsers Stanford (constituency + dependency) MaltParser (dependency) MSTParser (dependency) Hebrew Yoav Goldbergs parser (http://www.cs.bgu.ac.il/~yoavg/software/hebpar sers/hebdepparser/)