Lecture 8 Syntactic parsing - courses.cs.ut.ee

70
Lecture 8 Syntactic parsing LTAT.01.001 – Natural Language Processing Kairit Sirts ([email protected] ) 03.04.2020

Transcript of Lecture 8 Syntactic parsing - courses.cs.ut.ee

Page 1: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Lecture 8 Syntactic parsing

LTAT.01.001 – Natural Language ProcessingKairit Sirts ([email protected])

03.04.2020

Page 2: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Plan for today

● Why syntax?● Syntactic parsing

● Shallow parsing● Constituency parsing● Dependency parsing

● Transition-based dependency parsing● Neural dependency parsers● Evaluation of dependency parsing

2

Page 3: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Why syntax?

3

Page 4: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Syntax

● Internal structure of words

4

Page 5: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Syntactic ambiguity

5

Page 6: Lecture 8 Syntactic parsing - courses.cs.ut.ee

The chicken is ready to eat.

6

Page 7: Lecture 8 Syntactic parsing - courses.cs.ut.ee

7

Page 8: Lecture 8 Syntactic parsing - courses.cs.ut.ee

8

Page 9: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Exercise

How many different meanings has the sentence:

Time flies like an arrow.

9

Page 10: Lecture 8 Syntactic parsing - courses.cs.ut.ee

The role of syntax in NLP

● Text generation/summarization/machine translation● Useful features for various information extraction tasks● Syntactic structure also reflects the semantic relations between the words

10

Page 11: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Syntactic analysis/parsing

● Shallow parsing● Phrase structure / constituency parsing● Dependency parsing

11

Page 12: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Shallow parsing

12

Page 13: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Shallow parsing

● Also called chunking or light parsing● Split the sentence into non-overlapping syntactic phrases

13

John hit the ball.NP VP NP

NP – Noun phraseVP – Verb phrase

Page 14: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Shallow parsing

The morning flight from Denver has arrived. NP PP NP VP

NP – Noun phrasePP – Prepositional PhraseVP – Verb phrase

14

Page 15: Lecture 8 Syntactic parsing - courses.cs.ut.ee

BIO tagging

A labelling scheme often used in information extraction problems, treated as a sequence tagging task

The morning flight from Denver has arrived. B_NP I_NP I_NP B_PP B_NP B_VP I_VP

B_NP – Beginning of a noun phraseI_NP – Inside a noun phraseB_VB – Beginning of a verb phrase etc

15

Page 16: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Sequence classifier

● Need annotated data for training: POS-tagged, phrase-annotated

● Use a sequence classifier of your choice● CRF with engineered features● Neural sequence tagger

16

Page 17: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Constituency Parsing

17

Page 18: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Constituency parsing

● Full constituency parsing helps to resolve structural ambiguities

18

Page 19: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Bracketed style

● The trees can be represented linearly with brackets

(S (Pr I)(Aux will)(VP (V do)

(NP (Det my)(N homework))NP

)VP)S

19

Page 20: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Context-free grammars

S → NP VPVP → V NPVP → V NP PPNP → NP NPNP → NP PPNP → NNP → ePP → P NP

N → peopleN → fishN → tanksN → rodsV → peopleV → fishV → tanksP → with

20

G = (T, N, S, R)● T – set of terminal symbols● N – set of non-terminal symbols● S – start symbol (S ∈ N)● R – set of rules/productions of

the form X → 𝛾● X ∈ N● 𝛾 ∈ (N U T)*

A grammar G generates a language L

Page 21: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Probabilistic PCFG

G = (T, N, S, R, P)● T – set of terminal symbols● N – set of non-terminal symbols● S – start symbol (S ∈ N)● R – set of rules/productions of the form X → 𝛾● P – probability function

● P: R → [0, 1]● ∀ X ∈ N, ∑𝑿→𝜸∈𝑹𝑷 𝑿 → 𝜸 = 𝟏

A grammar G generates a language model L

21

Page 22: Lecture 8 Syntactic parsing - courses.cs.ut.ee

A PCFG

S → NP VP 1.0VP → V NP 0.6VP → V NP PP 0.4NP → NP NP 0.1NP → NP PP 0.2NP → N 0.7PP → P NP 1.0

N → people 0.5N → fish 0.2N → tanks 0.2N → rods 0.1V → people 0.1V → fish 0.6V → tanks 0.3P → with 1.0

22

Page 23: Lecture 8 Syntactic parsing - courses.cs.ut.ee

The probability of strings and trees

● P(t) – the probability of a tree t is the product of the probabilities of the rules used to generate it.

𝑃 𝑡 = -.→/∈0

𝑃 𝑋 → 𝛾

● P(s) – The probability of the string s is the sum of the probabilities of thetrees which have that string as their yield

𝑃 𝑠 =34

𝑃 𝑠, 𝑡4 =34

𝑃 𝑡4

● where t is a parse of s

23

Page 24: Lecture 8 Syntactic parsing - courses.cs.ut.ee

PCFG for efficient parsing

● For efficient parsing the rules should be unary or binary● Chomsky normal form – all rules have the form:

● X --> Y Z● X --> w● X, Y, Z - non-terminal symbols● w – terminal symbol● No epsilon rules

24

Page 25: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Before and after binarization

25

Page 26: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Finding the most likely tree: CKY parsing

● Dynamic programming algorithm● Proceeds bottom-up and performs Viterbi on trees

26

Page 27: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dependency Parsing

27

Page 28: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dependency parsing

28

• Dependency parse is a directed graph G = (V, A)• V – the set of vertices corresponding to words• A – the set of nodes corresponding to dependency relations

Page 29: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dependency parsing

29

Page 30: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dependency relations

● The arrows connect heads and their dependents● The main verb is the head or the root of the whole sentence● The arrows are labelled with grammatical functions/dependency

relations

30

Dependent

Head

Root of the sentence

Labelled dependency relation

Page 31: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Properties of a dependency graph

A dependency tree is a directed graph that satisfies the following constraints:1. There is a single designated root node that has no incoming arcs

● Typically the main verb of the sentence

2. Except for the root node, each node has exactly one incoming arc● Each dependent has a single head

3. There is a unique path from the root node to each vertex in V● The graph is acyclic and connected

31

Page 32: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Projectivity

● Projective trees – there are no arc crossings in the dependency graphs● Non-projective trees - crossings due to free word order

32

Page 33: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dependency relationsClausal argument relations Description Example

NSUBJ Nominal subject United canceled the flight.

DOBJ Direct object United canceled the flight.

IOBJ Indirect object We booked her the flight to Paris.

Nominal modifier relations Description Example

NMOD Nominal modifier We took the morning flight.

AMOD Adjectival modifier Book the cheapest flight.

NUMMOD Numeric modifier Before the storm JetBlue canceled 1000 flights

APPOS Appositional modifier United, a unit of UAL, matched the fares.

DET Determiner Which flight was delayed?

CASE Prepositions, postpositions Book the flight through Houston.

Other notable relations Description Example

CONJ Conjunct We flew to Denver and drove to Steamboat

CC Coordinating conjunction We flew to Denver and drove to Steamboat.. 33

Page 34: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Universal dependencies

● http://universaldependencies.org/● Annotated treebanks in many languages● Uniform annotation scheme across all UD languages:

● Universal POS tags● Universal morphological features● Universal dependency relations

● All data in CONLL-U tabulated format

34

Page 35: Lecture 8 Syntactic parsing - courses.cs.ut.ee

35

Page 36: Lecture 8 Syntactic parsing - courses.cs.ut.ee

CONLL-U format

● A standard tabular format for certain type of annotated data● Each word is in a separate line● 10 tab-separated columns on each line:

1. Word index2. The word itself3. Lemma4. Universal POS5. Language specific POS6. Morphological features7. Head of the current word8. Dependency relation to the head9. Enhanced dependencies10. Any other annotation

36

Page 37: Lecture 8 Syntactic parsing - courses.cs.ut.ee

CONLL-U format: example# text = They buy and sell books.

1 They they PRON PRP Case=Nom|Number=Plur 2 nsubj

2 buy buy VERB VBP Number=Plur|Person=3|Tense=Pres 0 root

3 and and CONJ CC _ 4 cc

4 sell sell VERB VBP Number=Plur|Person=3|Tense=Pres 2 conj

5 books book NOUN NNS Number=Plur 2 obj

6 . . PUNCT . _ 2 punct

# text = I had no clue.

1 I I PRON PRP Case=Nom|Number=Sing|Person=1 2 nsubj

2 had have VERB VBD Number=Sing|Person=1|Tense=Past 0 root

3 no no DET DT PronType=Neg 4 det

4 clue clue NOUN NN Number=Sing 2 obj

5 . . PUNCT . _ 2 punct 37

Page 38: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dependency parsing methods

● Transition-based parsing● stack-based algorithms/shift-reduce parsing● only generate projective trees

● Graph-based algorithms● can also generate non-projective trees

38

Page 39: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Transition-based dependency parsing

39

Page 40: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Transition-based parsing

● Three main components:● Stack● Buffer● Set of dependency relations

● A configuration is the current state of the stack, buffer and the relation set

40

Page 41: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Arc-standard parsing system

● Initial configuration:

● Stack 𝜎 contains the ROOT symbol● Buffer 𝛽 contains all words in the sentence● Dependency relation set A is empty

41

Page 42: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Arc-standard parsing system

At each step perform either:● Shift – move a word from the buffer to the stack:

42

Page 43: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Arc-standard parsing system

At each step perform either:● Shift – move a word from the buffer to the stack:

● LeftArc – left arc between top two words in the stack, pop the second word:

43

Page 44: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Arc-standard parsing system

At each step perform either:● Shift – move a word from the buffer to the stack:

● LeftArc – left arc between top two words in the stack, pop the second word:

● RightArc – right arc between top two words in the stack, pop the first word:

44

Page 45: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Oracle

● The annotated data is in the form of a treebank● Each sentence is annotated with its dependency tree

● The task of the transition-based parser is to predict the correct parsing operation at each step:● Input is configuration● Output is parsing action: Shift, RightArc or LeftArc

● The role of the oracle is to return the correct parsing operation for each configuration in the training set● It creates a training set for the parsing model

45

Page 46: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Oracle

● Choose LeftArc if it produces a correct head-dependent relation given the reference parse and the current configuration

● Choose RightArc if:● It produces a correct head-dependent relation given the reference parse and the

current configuration● All of the dependents of the word at the top of the stack have already been assigned

● Otherwise choose Shift

46

Page 47: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Example

47

Shift:

LeftArc:

RightArc:

Page 48: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Starting configurationStack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat]

48

Page 49: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat]

49

Page 50: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat]

50

Page 51: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat]

51

Page 52: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat]

52

Page 53: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat]

53

Page 54: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat]

54

Page 55: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat]

55

Page 56: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] []

56

Page 57: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] []

57

Page 58: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] []

58

Page 59: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)

[ROOT, sat] []59

Page 60: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)

[ROOT, sat] [] Right-Arc root(ROOT, sat)

[ROOT [] 60

Page 61: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Stack Buffer Action Arc

[ROOT] [The, cat, sat, on, the, mat] Shift

[ROOT, The] [cat, sat, on, the, mat] Shift

[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)

[ROOT, cat] [sat, on, the, mat] Shift

[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)

[ROOT, sat] [on, the, mat] Shift

[ROOT, sat, on] [the, mat] Shift

[ROOT, sat, on, the] [mat] Shift

[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)

[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)

[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)

[ROOT, sat] [] Right-Arc root(ROOT, sat)

[ROOT [] Done 61

Page 62: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Neural dependency parsers

62

Page 63: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Kipperwasser and Goldberg, 2016. Simple and Accurate Parsing Using Bidirectional LSTM Feature Representations

63

Page 64: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Kipperwasser and Goldberg, 2016. Simple and Accurate Parsing Using Bidirectional LSTM Feature Representations

● Features: concatenation of the following LSTM vectors:● three top words from the stack● first word in the buffer

● Prediction:● Score the features with a feed-forward NN (MLP)● Train using hinge loss: maximise the MLP score between the highest scoring

correct action and the highest scoring incorrect action

64

G – the set of correct transitions from the current configurationA – the set of all transitions from the current configuration

Page 65: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dozat et al., 2017. Stanford’s Graph-Based Neural Dependency Parser at the CoNLL 2017 Shared Task.

65

From each LSTM vector create four vectors for classification

● for head word ● for head relation● for dependent word● for dependent relation

head

head rel dep

dep rel

Page 66: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Dozat et al., 2017. Stanford’s Graph-Based Neural Dependency Parser at the CoNLL 2017 Shared Task.

66

● Use biaffine classifier to score thedep vector of each word with the head vector of every other word

● Then use biaffine classifier to score the relation between the head and dependent using the respective relation vectors

Page 67: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Evaluating dependency parsing

67

Page 68: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Evaluation

Unlabelled attachment score:● The proportion of correct head attachments

Labelled attachment score:● The proportion of correct head attachments labelled with the correct

relation

Label accuracy● The proportion of correct incoming relation labels ignoring the head

68

Page 69: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Evaluation

69

UAS = LAS = LA =

Page 70: Lecture 8 Syntactic parsing - courses.cs.ut.ee

Evaluation

70

UAS = 5/6 LAS = 4/6LA = 4/6