Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf ·...

30
Neural Models for Sequence Chunking Publish by Feifei et al (IBM Watson) Presented by Sagar Dahiwala CIS 601

Transcript of Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf ·...

Page 1: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Neural Models for Sequence Chunking

Publish by Feifei et al

(IBM Watson)

Presented by Sagar Dahiwala

CIS 601

Page 2: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Agenda

1. Natural language understanding

2. Problem in current system

3. Basic neural networks• RNN – LSTM

4. Implemented Model 1, 2, 3

5. Experiments

6. Conclusion

Page 3: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

1. Natural language understanding (NLU)

• NLU task such as,

1. Shallow parsing • analysis of a sentence - first identifies constituent parts of sentences (nouns,

verbs, adjectives, etc.)

• links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.)

2. semantic slot filling

• Require the assignment of representative labels to the meaningful chunks in a sentence.

Page 4: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

2. Problem in current system

• Most of the current Deep neural network (DNN) – based method considers this task as sequence labeling problem.

• Sequence labeling problem • words is treated as the basic unit of the labeling, rather than chunks.

Page 5: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

IOB-based (Inside-Outside-Beginning)Sequence labeling

• B – Stands for Beginning of chunk

• O – Artificial class

• VP – Verb phrase

• I – Inside of chunk, other words within the same semantic chunk

• NP – Noun phrase

Sentence : “But it could be much worse”

Page 6: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3. Basic neural networks

1. RNN – Recurrent Neural Network

2. LSTM – Long Short-term memory

Page 7: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3.1 RNN – Recurrent Neural Network

Page 8: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3.1 RNN – Recurrent Neural Network

Page 9: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3.1 RNN – Recurrent Neural Network

Page 10: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3.1 RNN – Recurrent Neural Network

Page 11: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3.1 RNN – Recurrent Neural Network

Page 12: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

3.2 LSTM – Long Short-term memory

• Element wise addition (+)

• Element wise multiplication (X)

Page 13: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

IOB schema for labeling problem has two drawbacks• No Explicit model

• to learn and identify the scope of chunks in sentence, instead we infer them implicitly.

• Some Neural Network (NN) like RNN and LSTM have the ability to encode context information • but don’t treat each chunk as a complete unit

Page 14: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Natural solution to overcome above two drawbacks is Sequence Chunking• Two sub task

• Segmentation – identify the scope of the chunks explicitly

• Labeling – to label each chunk as single unit based on segmentation results

• How humans remember things• Phone numbers are not typically seen or remembered as a long string of

numbers like 8605554589, but rather 860-555-4589.

• Birthdates are typically not recalled by 11261995, but rather 11/26/1995.

Page 15: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

4. Model 1

• Average(.) computes the average of input vectors.

• Uses softmax layer for labeling. • In Above Figure 2, “much

worse” is identified as a chunk with length 2;

• apply hidden states in formula, to finally get the “ADJP” label.

Page 16: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Bi-LSTM

• Given an input sentence 𝑥 = (𝑥1, 𝑥2, … , 𝑥𝑇)

• Forward LSTM reads Input sentence from 𝑥1 𝑡𝑜 𝑥𝑇• Generate, Forward hidden states (ℎ1 , ℎ2 , … , ℎ𝑇 )

• Backward LSTM reads Input sentence from 𝑥𝑇 𝑡𝑜 𝑥1• Generate, Backward hidden states (ℎ1, ℎ2, … , ℎ𝑇 )

• Then for each timestep t, the hidden states of Bi-LSTM is generated by concatenating

ℎ𝑡 𝑎𝑛𝑑 ℎ𝑡 , ℎ𝑡 = [ℎ𝑡 ; ℎ𝑡]

Page 17: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Drawbacks of model 1

•May not perform well •on both segmentation and labeling subtasks

Page 18: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

4. Model 2

• Follows encoder-decoder framework.

• Similar to Model 1, we Employ a Bi-LSTM for segmentation with IOB labels.

• This Bi-LSTM serve as encoder and create a sentence representation [ℎ𝑡 ; ℎ1]. Which is used to initialize the decoder LSTM.

• use chunks as the inputs instead of words

• for example: “much worse” is a chunk in Figure 3, and we take it as a single input to the decoder.

Page 19: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

4. Model 2

Where g(.) is CNNMax layer

Cwj is the concatenation of context word embeddingsDifference is {Cxj, Chj, Cwj}.

The generated hidden states are finally used for labeling by a softmax layer

Page 20: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Drawbacks of using IOB labels for segmentation

• Hard to use chunk level features for segmentation, like length of chunks

• IOB labels can not compare different chunks directly

Page 21: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

4. Model 3

• Model III is similar to Model II, • the only difference being the method of identifying chunks.

• Model III is a greedy process of segmentation and labeling, where we first identify one chunk, and then label it.

• Repeat the process till all word are processed. As all chunks are adjacent to each other 2, after one chunk is identified, the beginning point of the next one is also known, and only its ending point is to be determined.

Page 22: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

4. Model 3

Page 23: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

4. Model 3

Here, they implement pointer network to do so. Where j is decoder timestemp (chunk index)

The probability of choosing ending point candidate i is:

Page 24: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

5. Experiments

• Text Chunking Results

• Compare with publish report

Page 25: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

5. Experiments

• Slot Filling Results• Segmentation Results

• Labeling Results

• Compare with publish report

Page 26: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Segmentation Results

Page 27: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Labeling Results

Page 28: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Compare with publish report

Page 29: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

ReferenceAuthor / title Context Link

Lampel et al. (2016) – Neural Architectures for Named Entity

Recognition

Stack-LSTM and transition based algorithm https://arxiv.org/pdf/1603.01360.pdf

Dyer et al. Stack-LSTM http://www.cs.cmu.edu/~lingwang/papers/acl2015.pdf

Wiki Softmax layer https://en.wikipedia.org/wiki/Softmax_function

Bahdanau, Cho, and Bengio 2014.

On the Properties of Neural Machine Translation: Encoder–Decoder

Approaches

encoder-decoder framework https://arxiv.org/pdf/1409.1259.pdf

Convolutional Neural Networks (CNNs / ConvNets) CNN http://cs231n.github.io/convolutional-networks/

Nallapati et al. 2016 - Abstractive Text Summarization using Sequence-

to-sequence RNNs and Beyond

encoder-decoder-pointer framework https://arxiv.org/pdf/1602.06023.pdf

Pointer Networks -

Vinyals, Fortunato, and Jaitly 2015

Pointer Network https://arxiv.org/pdf/1506.03134.pdf

Brandon Rohrer - Recurrent Neural Networks (RNN) and Long Short-

Term Memory (LSTM)

RNN-LSTM https://www.youtube.com/watch?v=WCUNPb-5EYI

Spoken Language Understanding(SLU)/Slot Filling in Keras ATIS – Airline Travel Information System https://github.com/chsasank/ATIS.keras

Page 30: Neural Models for Sequence Chunkingcis.csuohio.edu/~sschung/CIS601/NN Sequence Chunking.pdf · 2017. 11. 27. · drawbacks is Sequence Chunking •Two sub task •Segmentation –identify

Thank You