Character-based Neural Networks for Sentence Pair Modeling · 2020-06-08 · PWIM tr Max pooling...

1
Language Modeling Objective (Multitask) The forward LSTM predicts the next word and the backword LSTM predicts the previous word. The log-likelihood loss for both language models is added to the training objective: Embedding Subwords in PWIM 1. Char RNN [Ling et.al 2015] 2. Char CNN [Kim et.al 2016] Character-based Neural Networks for Sentence Pair Modeling Wuwei Lan and Wei Xu Introduction Sentence pair modeling is critical for paraphrase identification, question answering, natural language inference and etc. Various neural models achieved state-of-the-art performance by using pretrained word embeddings, however they have poor coverage in domain (e.g., social media) with high OOV ratio. We explored character-based neural networks for sentence pair modeling, which is more challenging than individual sentence modeling: similarly spelled words with completely different meanings could introduce error (e.g., ware and war). Pairwise Word Interaction Model (PWIM [He et.al 2016]) 1. Context modeling: 2. Pairwise word interaction: 3. Similarity focus: sorting the interaction values and selecting top ranked pairs 4. Aggregation and prediction: 19-layer deep ConvNet Example for Sentence Pair Modeling Paraphrase task: given a sentence pair, predict whether they imply the same meaning. Sample from Twitter URL corpus [Lan et.al 2017]: On the Mat There Sit Cats T h e r e h1 h2 h3 h4 h5 h1 h2 h3 h4 h5 a*h1+b*h5+c PWIM tr Max pooling PWIM On the Mat There Sit Cats RNN/CNN hi hj mi Sit mi Mat Bi-LSTM Linear layer PWIM Conclusion Pretrained word embedding is not a necessity for sentence pair modeling. Subword models without any pretraining achieved new SOTA results in Twitter URL and PIT-2015. Multitask LM can improve subword model performance by injecting semantic information. Experiments Performance F1 0.5 0.58 0.66 0.74 0.82 0.9 Twitter-URL PIT-2015 MSRP 0.84 0.667 0.754 0.818 0.667 0.753 0.831 0.691 0.76 0.824 0.576 0.729 0.834 0.656 0.756 0.834 0.625 0.735 Word w/o pretraining Word w/ GloVe (previous best system) Char RNN Char RNN + LM (our best system) Char CNN Char CNN + LM (our best system) Convolution with multiple filters Embedding concatenation Highway network Bi-LSTM On the Mat There Sit Cats Model INV Words OOV Words any walking #airport brexit Word anything walk salomon bollocks anyone running 363 missynistic other dead #trumpdchotel patriarchy Subword analogy waffling @atlairport grexit nay slagging #dojreport bret away scaling #macbookpro juliet Subword + LM any1 warming airport #brexit many wagging #airports brit ang waging rapport ofbrexit Source Code https://github.com/lanwuwei/Subword-PWIM See Also Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. A continuously growing dataset of sentential paraphrases. In EMNLP 2017 Wuwei Lan and Wei Xu. Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering. In COLING 2018 Nearest neighbors of word (subword, subword + LM) vectors under cosine similarity in Twitter-URL dataset. We performed experiments on three benchmark datasets for paraphrase identification: Twitter URL (social media/news, OOV ratio 31.5%), PIT-2015 (social media, OOV ratio 13.7%) and MSRP (news, OOV ratio 9.0%).

Transcript of Character-based Neural Networks for Sentence Pair Modeling · 2020-06-08 · PWIM tr Max pooling...

Page 1: Character-based Neural Networks for Sentence Pair Modeling · 2020-06-08 · PWIM tr Max pooling PWIM On the Mat There Sit Cats RNN/CNN h i h j m i Sit m i Mat Bi-LSTM Linear layer

Language Modeling Objective (Multitask)

The forward LSTM predicts the next word and the backword LSTM predicts the previous word. The log-likelihood loss for both language models is added to the training objective:

Embedding Subwords in PWIM 1. Char RNN [Ling et.al 2015]

2. Char CNN [Kim et.al 2016]

Character-based Neural Networks for Sentence Pair ModelingWuwei Lan and Wei Xu

IntroductionSentence pair modeling is critical for paraphrase identification, question answering, natural language inference and etc. Various neural models achieved state-of-the-art performance by using pretrained word embeddings, however they have poor coverage in domain (e.g., social media) with high OOV ratio. We explored character-based neural networks for sentence pair modeling, which is more challenging than individual sentence modeling: similarly spelled words with completely different meanings could introduce error (e.g., ware and war).

Pairwise Word Interaction Model (PWIM [He et.al 2016])

1. Context modeling:

2. Pairwise word interaction:

3. Similarity focus: sorting the interaction values and selecting top ranked pairs

4. Aggregation and prediction: 19-layer deep ConvNet

Example for Sentence Pair ModelingParaphrase task: given a sentence pair, predict whether they imply the same meaning. Sample from Twitter URL corpus [Lan et.al 2017]:

On the Mat There Sit Cats

T h e r e

h1 h2 h3 h4 h5

h1 h2 h3 h4 h5

a*h1+b*h5+c

PWIM

tr

Max pooling

PWIM

On the Mat There Sit Cats

RNN/CNN

hi

hj

mi

Sit

mi

Mat

Bi-LSTM

Linear layerPWIM

Conclusion✓ Pretrained word embedding is not a necessity for

sentence pair modeling. ✓ Subword models without any pretraining achieved

new SOTA results in Twitter URL and PIT-2015. ✓ Multitask LM can improve subword model performance

by injecting semantic information.

Experiments

PerformanceF1

0.5

0.58

0.66

0.74

0.82

0.9

Twitter-URL PIT-2015 MSRP

0.84

0.667

0.754

0.818

0.667

0.753

0.831

0.691

0.76

0.824

0.576

0.729

0.834

0.656

0.756

0.834

0.625

0.735

Wordw/opretraining Wordw/GloVe(previousbestsystem)CharRNN CharRNN+LM(ourbestsystem)CharCNN CharCNN+LM(ourbestsystem)

Convolution with multiple filters

Embedding concatenation

Highway network

Bi-LSTM

On the Mat There Sit Cats

Model INVWords OOVWordsany walking #airport brexit

Wordanything walk salomon bollocksanyone running 363 missynisticother dead #trumpdchotel patriarchy

Subwordanalogy waffling @atlairport grexit

nay slagging #dojreport bretaway scaling #macbookpro juliet

Subword+LM

any1 warming airport #brexitmany wagging #airports britang waging rapport ofbrexit

Source Codehttps://github.com/lanwuwei/Subword-PWIM

See Also• Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. A

continuously growing dataset of sentential paraphrases. In EMNLP 2017

• Wuwei Lan and Wei Xu. Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering. In COLING 2018

Nearest neighbors of word (subword, subword + LM) vectors under cosine similarity in Twitter-URL dataset.

We performed experiments on three benchmark datasets for paraphrase identification: Twitter URL (social media/news, OOV ratio 31.5%), PIT-2015 (social media, OOV ratio 13.7%) and MSRP (news, OOV ratio 9.0%).