Dependency Parsing Some slides are based on: 1)PPT presentation
on dependency parsing by Prashanth Mannem 2)Seven Lectures on
Statistical Parsing by Christopher Manning
Slide 2
Constituency parsing Breaks sentence into constituents
(phrases), which are then broken into smaller constituents
Describes phrase structure and clause structure ( NP, PP, VP, etc.)
Structures often recursive
Slide 3
momisanamazingshow VP NP VPNP S
Slide 4
Dependency parsing Syntactic structure consists of lexical
items, linked by binary asymmetric relations called dependencies
Interested in grammatical relations between individual words
(governing & dependent words) Does not propose a recursive
structure, rather a network of relations These relations can also
have labels
Dependency vs. Constituency A dependency grammar has a notion
of a head Officially, CFGs dont But modern linguistic theory and
all modern statistical parsers (Charniak, Collins, ) do, via hand-
written phrasal head rules: The head of a Noun Phrase is a
noun/number/ The head of a Verb Phrase is a verb/modal/. Based on a
slide by Chris Manning
Slide 8
Dependency vs. Constituency The head rules can be used to
extract a dependency parse from a CFG parse (follow the heads) A
phrase structure tree can be got from a dependency tree, but
dependents are flat Based on a slide by Chris Manning
Slide 9
Definition: dependency graph An input word sequence w 1 w n
Dependency graph G = (V,E) where V is the set of nodes i.e. word
tokens in the input seq. E is the set of unlabeled tree edges (i,
j) i, j V (i i, j) indicates an edge from i (parent, head,
governor) to j (child, dependent)
Slide 10
Definition: dependency graph A dependency graph is well-formed
iff Single head: Each word has only one head Acyclic: The graph
should be acyclic Connected: The graph should be a single tree with
all the words in the sentence Projective: If word A depends on word
B, then all words between A and B are also subordinate to B (i.e.
dominated by B)
Slide 11
Non-projective dependencies Ram saw a dog yesterday which was a
Yorkshire Terrier
Slide 12
Parsing algorithms Dependency based parsers can be broadly
categorized into Grammar driven approaches Parsing done using
grammars Data driven approaches Parsing by training on
annotated/un-annotated data
Slide 13
Unlabeled graphs Dan Klein recently showed that labeling is
relatively easy and that the difculty of parsing lies in creating
bracketing (Klein, 2004) Therefore some parsers run in two steps:
1) bracketing; 2) labeling
Data driven Two main approaches Global, Exhaustive, Graph-based
parsing Local, greedy, transition-based parsing
Slide 16
Graph-based parsing Assume there is a scoring function: The
score of a graph is Parsing for input string x is All dependency
graphs
Slide 17
MST algorithm (McDonald, 2006) Scores are based on features,
independent of other dependencies Features can be Head and
dependent word and POS separately Head and dependent word and POS
bigram features Words between head and dependent Length and
direction of dependency
Slide 18
MST algorithm (McDonald, 2006) Parsing can be formulated as
maximum spanning tree problem Use Chu-Liu-Edmonds (CLE) algorithm
for MST (runs in, considers non-projective arcs) Uses online
learning for determining weight vector w
Slide 19
Transition-based parsing A transition system for dependency
parsing defines: a set C of parser configurations, each of which
defines a (partially built) dependency graph G a set T of
transitions, each a function t :C C for every sentence x = w 0,w
1,...,w n a unique initial configuration c x a set Qx of terminal
configurations
Slide 20
Transition sequence A transition sequence Cx,m = (c x, c 1,...,
c m ) for a sentence x is a sequence of configurations such that
and, for every there is a transition such that The graph defined by
is the dependency graph of x
Slide 21
Transition scoring function The score of a transition t in a
configuration c s(c, t) represents the likelihood of taking
transition t out of configuration c Parsing is finding the optimal
transition sequence ( )
Slide 22
Yamada and Matsumoto (2003) A transition-based (shift-reduce)
parser Considers two adjacent words Runs in iterations, continues
as long as new dependencies are created In every iteration,
consider 3 different actions and choose one using SVM (or other
discriminative learning technique) Time complexity Accuracy was
shown to be close to the state-of-the-art algorithms (e.g.,
Eisners)
Slide 23
Y&M (2003) Actions Shift Left Right
Slide 24
Y&M (2003) Learning Features (lemma, POS tag) are collected
from the context
Slide 25
Stack-based parsing Introducing a stack and a buffer The buffer
is a queue of all input words (left to right) The stack begins
empty; words are pushed to the stack by the defined actions Reduces
Y&M complexity to linear time
Slide 26
2 stack-based parsers Nivres (2003, 2006) arc-standard Stack
Buffer i doesnt have a head already j doesnt have a head
already
Example (arc eager) Red
figuresonthescreenindicatedfallingstocks_ROOT_ SQ Borrowed from
Dependency Parsing (P. Mannem)
Slide 29
Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ
Shift Borrowed from Dependency Parsing (P. Mannem)
Slide 30
Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ
Left-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 31
Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ
Shift Borrowed from Dependency Parsing (P. Mannem)
Slide 32
Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ
Right-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 33
Example Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ
Shift Borrowed from Dependency Parsing (P. Mannem)
Slide 34
Example Red figureson the screenindicatedfallingstocks_ROOT_ SQ
Left-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 35
Example Red figureson the screenindicatedfallingstocks_ROOT_ SQ
Right-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 36
Example Red figureson thescreen indicatedfallingstocks_ROOT_ SQ
Reduce Borrowed from Dependency Parsing (P. Mannem)
Slide 37
Example Red figures onthescreen indicatedfallingstocks_ROOT_ SQ
Reduce Borrowed from Dependency Parsing (P. Mannem)
Slide 38
Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ
Left-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 39
Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ
Right-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 40
Example Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ
Shift Borrowed from Dependency Parsing (P. Mannem)
Slide 41
Example Red figuresonthescreen indicated falling stocks_ROOT_
SQ Left-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 42
Example Red figuresonthescreen indicated falling stocks_ROOT_
SQ Right-arc Borrowed from Dependency Parsing (P. Mannem)
Slide 43
Example Red figuresonthescreen indicated fallingstocks _ROOT_
SQ Reduce Borrowed from Dependency Parsing (P. Mannem)
Slide 44
Example Red figuresonthescreenindicatedfallingstocks _ROOT_ SQ
Reduce Borrowed from Dependency Parsing (P. Mannem)
Slide 45
Graph (MSTParser) vs. Transitions (MaltParser) Accuracy on
different languages Characterizing the Errors of Data-Driven
Dependency Parsing Models, McDonald and Nivre 2007
Slide 46
Graph (MSTParser) vs. Transitions (MaltParser) Sentence length
vs. accuracy Characterizing the Errors of Data-Driven Dependency
Parsing Models, McDonald and Nivre 2007
Slide 47
Graph (MSTParser) vs. Transitions (MaltParser) Dependency
length vs. precision Characterizing the Errors of Data-Driven
Dependency Parsing Models, McDonald and Nivre 2007