Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency...

24
Constituency Parsing 李宏毅 Hung-yi Lee

Transcript of Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency...

Page 1: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Constituency Parsing李宏毅 Hung-yi Lee

Page 2: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

One Sequence Multiple Sequences

One Class

Sentiment ClassificationStance Detection

Veracity PredictionIntent Classification

Dialogue Policy

NLISearch Engine

Relation Extraction

Class for each Token

POS taggingWord segmentation

Extractive SummarizationSlotting Filling

NER

Copyfrom Input

Extractive QA

GeneralSequence

Abstractive SummarizationTranslation

Grammar CorrectionNLG

General QAChatbot

State TrackerTask Oriented Dialogue

Other? Parsing, Coreference Resolution

Page 3: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Constituency Parsing

• Some text spans are constituents (“units”)

• Each constituent has a label.

deep learning is very powerful

constituent constituent

not constituent

NP ADJP

constituent

Page 4: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Constituency Parsing - Labels

+ All POS tags

Page 5: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Constituency Parsing

deep learning is very powerful

• Each word is a constituent (their labels are POS tags)

• Some consecutive constituents can form a larger one.

VP

S

NP ADJV

Form a tree

(Only considering binary tree in this course for simplicity)

Page 6: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Constituency Parsing

deep learning is very powerful

• Each word is a constituent (their labels are POS tags)

• Some consecutive constituents can form a larger one.

VP

S

ADJV

Form a tree

NP

Each constituent is a node.

Page 7: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Chart-based Approach

Source of image: https://web.stanford.edu/~jurafsky/slp3/13.pdf

CKY chart parsing

Page 8: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Chart-based

…… w2 w3 w4 w5 ……span

Constituent?

binary classification

Which label?

multi-classclassification

Classifier

Page 9: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Chart-based

deep learning is very powerful

Constituent? Which label?

Classifier

YES

Constituent? Which label?

Classifier

NP ADJPYES

Page 10: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Chart-based

deep learning is very powerful

Constituent? Which label?

Classifier

NO NO

Constituent? Which label?

Classifier

Don’t Care Don’t Care

Page 11: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Constituent?

w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

Span FeatureExtraction

Pre-trained Model ELMO, BERT …

Which Label?

Yes/No Label

Chart-based – Classifier

Page 12: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Chart-based

• Given a sequence with N tokens, then run the classifier N(N-1)/2 times ……

deep learning is very powerful

Contradiction!

Constituent?

Classifier

YESConstituent?

Classifier

YES

Page 13: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

I am good

I am good

Inference Enumerate all possible trees, and use the classifier to give scores. where you need CKY algorithm

I am good

I am good

Classifier

0.1

Classifier

0.9

Classifier

0.8

Classifier

0.9

Training?[Stern, et al., ACL’17]

Page 14: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Transition-based

Source of image: https://arxiv.org/pdf/1602.07776.pdf

Page 15: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Transition-based

Stack

Buffer

(empty at the beginning)

SHIFT REDUCECREATE(X)

More a token from buffer to stack

Close a constituteCreate a constitute X

(X = NP, VP …)

deep learning is very powerful

[Dyer, et al., NAACL’16]

Actions

Page 16: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Transition-based

deep learning is very powerful

CREATE(S)

(empty at the beginning)

Page 17: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Transition-based

deep learning is very powerful

CREATE(S)

(S

CREATE(NP)

(NP

SHIFT

deep

SHIFT

learning

REDUCE

)

CREATE(VP)

(VP

SHIFT

is

CREATE(ADJV)

(ADJV

SHIFT SHIFT

very powerful ) ) )

REDUCE REDUCE REDUCE

Page 18: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

RNN Grammar

Stack Buffer

Previous actions

RNN RNN

RNN

Network

SHIFT REDUCECREATE(X)

Page 19: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

deep learning is very powerful

CREATE(S)

(S

CREATE(NP)

(NP

SHIFT

deep

SHIFT

learning

REDUCE

Network

• typical classification task • RL is not needed

RNN Grammar – Training

Ground truth

RNN

RNN

RNN

Page 20: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Source of image: https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf

[Vinyals, et al., NIPS’15]

Page 21: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

deep learning is very powerful

VP

S

NP ADJV

Tree to Sequence

(S (NP deep learning )

1

2

3 4

5

6

7

8

9 10

11

12

13

(VP is

(ADJV very powerful ) ) )

Of course, you can try other tree traversal approaches

[Liu, et al., TACL’17]

Seq2seq!

Page 22: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

deep learning is very powerful

(S (NP deep learning ) (VP is

(ADJV very powerful ) ) )

CREATE(S) CREATE(NP) SHIFT SHIFT REDUCE

CREATE(VP) SHIFT CREATE(ADJV) SHIFT SHIFT

REDUCE REDUCE REDUCE

[Vinyals, et al., NIPS’15]

[Dyer, et al., NAACL’16]

Seq2seq v.s. RNN grammar

Page 23: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Unsupervised Parsing?

deep learning is very powerful

Can we find parsing trees without label data?

YES!

Reference: https://youtu.be/YIuBHB9Ejok

Page 24: Constituency Parsingspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ParsingC (v2).pdf · Constituency Parsing deep learning is very powerful •Each word is a constituent (their labels

Reference

• [Vinyals, et al., NIPS’15] Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton, Grammar as a foreign language, NIPS, 2015

• [Dyer, et al., NAACL’16] Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith, Recurrent Neural Network Grammars, NAACL, 2016

• [Stern, et al., ACL’17] Mitchell Stern, Jacob Andreas, Dan Klein, A Minimal Span-Based Neural Constituency Parser, ACL,2017

• [Liu, et al., TACL’17] Jiangming Liu, Yue Zhang, In-Order Transition-based Constituent Parsing, TACL, 2017