Dependency Parsing Some slides are based on: 1)PPT presentation on dependency parsing by Prashanth...

download Dependency Parsing Some slides are based on: 1)PPT presentation on dependency parsing by Prashanth Mannem 2)Seven Lectures on Statistical Parsing by Christopher.

If you can't read please download the document

Transcript of Dependency Parsing Some slides are based on: 1)PPT presentation on dependency parsing by Prashanth...

  • Slide 1
  • Dependency Parsing Some slides are based on: 1)PPT presentation on dependency parsing by Prashanth Mannem 2)Seven Lectures on Statistical Parsing by Christopher Manning
  • Slide 2
  • Constituency parsing Breaks sentence into constituents (phrases), which are then broken into smaller constituents Describes phrase structure and clause structure ( NP, PP, VP, etc.) Structures often recursive
  • Slide 3
  • momisanamazingshow VP NP VPNP S
  • Slide 4
  • Dependency parsing Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies Interested in grammatical relations between individual words (governing & dependent words) Does not propose a recursive structure, rather a network of relations These relations can also have labels
  • Slide 5
  • Slide 6
  • Dependency vs. Constituency Dependency structures explicitly represent Head-dependent relations (directed arcs) Functional categories (arc labels) Possibly some structural categories (parts-of-speech) Constituency structure explicitly represent Phrases (non-terminal nodes) Structural categories (non-terminal labels) Possibly some functional categories (grammatical functions)
  • Slide 7
  • Dependency vs. Constituency A dependency grammar has a notion of a head Officially, CFGs dont But modern linguistic theory and all modern statistical parsers (Charniak, Collins, ) do, via hand- written phrasal head rules: The head of a Noun Phrase is a noun/number/ The head of a Verb Phrase is a verb/modal/. Based on a slide by Chris Manning
  • Slide 8
  • Dependency vs. Constituency The head rules can be used to extract a dependency parse from a CFG parse (follow the heads) A phrase structure tree can be got from a dependency tree, but dependents are flat Based on a slide by Chris Manning
  • Slide 9
  • Definition: dependency graph An input word sequence w 1 w n Dependency graph G = (V,E) where V is the set of nodes i.e. word tokens in the input seq. E is the set of unlabeled tree edges (i, j) i, j V (i i, j) indicates an edge from i (parent, head, governor) to j (child, dependent)
  • Slide 10
  • Definition: dependency graph A dependency graph is well-formed iff Single head: Each word has only one head Acyclic: The graph should be acyclic Connected: The graph should be a single tree with all the words in the sentence Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)
  • Slide 11
  • Non-projective dependencies Ram saw a dog yesterday which was a Yorkshire Terrier
  • Slide 12
  • Parsing algorithms Dependency based parsers can be broadly categorized into Grammar driven approaches Parsing done using grammars Data driven approaches Parsing by training on annotated/un-annotated data
  • Slide 13
  • Unlabeled graphs Dan Klein recently showed that labeling is relatively easy and that the difculty of parsing lies in creating bracketing (Klein, 2004) Therefore some parsers run in two steps: 1) bracketing; 2) labeling
  • Slide 14
  • Traditions Dynamic programming e.g., Eisner (1996), McDonald (2006) Deterministic search e.g., Covington (2001), Yamada and Matsumoto, Nivre (2006) Constraints satisfaction e.g., Maruyama, Foth et al.
  • Slide 15
  • Data driven Two main approaches Global, Exhaustive, Graph-based parsing Local, greedy, transition-based parsing
  • Slide 16
  • Graph-based parsing Assume there is a scoring function: The score of a graph is Parsing for input string x is All dependency graphs
  • Slide 17
  • MST algorithm (McDonald, 2006) Scores are based on features, independent of other dependencies Features can be Head and dependent word and POS separately Head and dependent word and POS bigram features Words between head and dependent Length and direction of dependency
  • Slide 18
  • MST algorithm (McDonald, 2006) Parsing can be formulated as maximum spanning tree problem Use Chu-Liu-Edmonds (CLE) algorithm for MST (runs in, considers non-projective arcs) Uses online learning for determining weight vector w
  • Slide 19
  • Transition-based parsing A transition system for dependency parsing defines: a set C of parser configurations, each of which defines a (partially built) dependency graph G a set T of transitions, each a function t :C C for every sentence x = w 0,w 1,...,w n a unique initial configuration c x a set Qx of terminal configurations
  • Slide 20
  • Transition sequence A transition sequence Cx,m = (c x, c 1,..., c m ) for a sentence x is a sequence of configurations such that and, for every there is a transition such that The graph defined by is the dependency graph of x
  • Slide 21
  • Transition scoring function The score of a transition t in a configuration c s(c, t) represents the likelihood of taking transition t out of configuration c Parsing is finding the optimal transition sequence ( )
  • Slide 22
  • Yamada and Matsumoto (2003) A transition-based (shift-reduce) parser Considers two adjacent words Runs in iterations, continues as long as new dependencies are created In every iteration, consider 3 different actions and choose one using SVM (or other discriminative learning technique) Time complexity Accuracy was shown to be close to the state-of-the-art algorithms (e.g., Eisners)
  • Slide 23
  • Y&M (2003) Actions Shift Left Right
  • Slide 24
  • Y&M (2003) Learning Features (lemma, POS tag) are collected from the context
  • Slide 25
  • Stack-based parsing Introducing a stack and a buffer The buffer is a queue of all input words (left to right) The stack begins empty; words are pushed to the stack by the defined actions Reduces Y&M complexity to linear time
  • Slide 26
  • 2 stack-based parsers Nivres (2003, 2006) arc-standard Stack Buffer i doesnt have a head already j doesnt have a head already
  • Slide 27
  • 2 stack-based parsers Nivres (2003, 2006) arc-eager
  • Slide 28
  • Example (arc eager) Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Borrowed from Dependency Parsing (P. Mannem)
  • Slide 29
  • Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift Borrowed from Dependency Parsing (P. Mannem)
  • Slide 30
  • Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 31
  • Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift Borrowed from Dependency Parsing (P. Mannem)
  • Slide 32
  • Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 33
  • Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift Borrowed from Dependency Parsing (P. Mannem)
  • Slide 34
  • Example Red figureson the screenindicatedfallingstocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 35
  • Example Red figureson the screenindicatedfallingstocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 36
  • Example Red figureson thescreen indicatedfallingstocks_ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)
  • Slide 37
  • Example Red figures onthescreen indicatedfallingstocks_ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)
  • Slide 38
  • Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 39
  • Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 40
  • Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Shift Borrowed from Dependency Parsing (P. Mannem)
  • Slide 41
  • Example Red figuresonthescreen indicated falling stocks_ROOT_ SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 42
  • Example Red figuresonthescreen indicated falling stocks_ROOT_ SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)
  • Slide 43
  • Example Red figuresonthescreen indicated fallingstocks _ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)
  • Slide 44
  • Example Red figuresonthescreenindicatedfallingstocks _ROOT_ SQ Reduce Borrowed from Dependency Parsing (P. Mannem)
  • Slide 45
  • Graph (MSTParser) vs. Transitions (MaltParser) Accuracy on different languages Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
  • Slide 46
  • Graph (MSTParser) vs. Transitions (MaltParser) Sentence length vs. accuracy Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
  • Slide 47
  • Graph (MSTParser) vs. Transitions (MaltParser) Dependency length vs. precision Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
  • Slide 48
  • Known Parsers Stanford (constituency + dependency) MaltParser (dependency) MSTParser (dependency) Hebrew Yoav Goldbergs parser (http://www.cs.bgu.ac.il/~yoavg/software/hebpar sers/hebdepparser/)