AINL 2016: Maraev

16
Character-level Convolutional Neural Network for Sentence Paraphrase Detection Vladislav Maraev NLX-Group, Faculty of Sciences, University of Lisbon Paraphrase detection for Russian workshop AINL FRUCT 2016

Transcript of AINL 2016: Maraev

Page 1: AINL 2016: Maraev

Character-level Convolutional NeuralNetwork for Sentence Paraphrase Detection

Vladislav MaraevNLX-Group, Faculty of Sciences, University of Lisbon

Paraphrase detection for Russian workshopAINL FRUCT 2016

Page 2: AINL 2016: Maraev

Objective

WhatTask 2 — Binary classification (paraphrase/non-paraphrase).

HowApply convolutional neural network (CNN) architecture:

Standard Non-standardWord embeddings ✓ ✓

Character embeddings ✓

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 2 / 16

Page 3: AINL 2016: Maraev

Related work

Convolutional neural networks in NLP

• Detecting semantically equivalent questions with CNN andword embeddings (Bogdanova et al., 2015)

• Convolutional Neural Networks for Sentence Classification(Zhang and Wallace, 2015)

• Attention-based CNN for modeling sentence pairs (Yin et al.,2016)

• Character embeddings for text classification(Zhang et al.,2015)

• Word+character embeddings for sentiment analysis (dosSantos and Gatti, 2014)

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 3 / 16

Page 4: AINL 2016: Maraev

How CNN works?

TR CONV POOL cosinesimilarity

Steps:

1. Token representation (Embedding)

2. Convolution

3. Pooling

4. Pair similarity estimation

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 4 / 16

Page 5: AINL 2016: Maraev

Convolutional Neural Network1. Token representation

Input

s = {t1, t2, . . . , tN}

Token representation

rt = W0vt , (1)

where

• W0 ∈ Rd×V is an embedding matrix

• vt ∈ RV is a one-hot encoded vector of size V

Output

sTR = {rt1 , rt2 , . . . , rtN} , where rtn ∈ Rd

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 5 / 16

Page 6: AINL 2016: Maraev

Convolutional Neural Network2. Convolution

Convolution

1. Concatenations zn of k-grams

2. Multiply by W1, add bias b1, andapply tanh function:

rzn = tanh(W1zn + b1

)where:

zn ∈ Rdk

W1 ∈ Rclu×dk

rzn ∈ Rclu

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 6 / 16

Page 7: AINL 2016: Maraev

Convolutional Neural Network3. Pooling

Sum (or Max) over all the rzn element-wise and apply tanhfunction:

rs = tanh

(∑n

rzn

)which will give us sentence representation rs ∈ Rclu

* This means that sentence representation doesn’t depend onsentence length.

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 7 / 16

Page 8: AINL 2016: Maraev

Convolutional Neural Network4. Compute similarity

TR CONV POOL cosinesimilarity

Estimate similarity between the pair of sentencerepresentations using cosine measure:

similarity =rs1 · rs2

∥rs1∥∥rs2∥

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 8 / 16

Page 9: AINL 2016: Maraev

Training the network

We train W0, W1 and b1.

Steps

1. Compute mean-squared error (w.r.t. cosine similarity)

2. Use the backpropagation algorithm (SGD/RMSProp) tocompute gradients of the network

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 9 / 16

Page 10: AINL 2016: Maraev

Several convolutional filters

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 10 / 16

Page 11: AINL 2016: Maraev

Standard Run HyperparametersWord embeddings

Parameter Value Descriptionk {3, 5, 8, 12} Sizes of k-gramsclu 100 Size of each convolutional filterd 300 Size of word representationepochs 5 Number of training epochspooling MAX pooling layer functionoptimiser RMSProp Keras’s optimiser

word embeddings Random (uniform)

Sentences were tokenised and lowercased using Keras.

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 11 / 16

Page 12: AINL 2016: Maraev

Standard Run HyperparametersCharacter embeddings

Parameter Value Descriptionk {2, 3, 5, 7, 9, 11} Sizes of k-gramsclu 100 Size of each conv. filterd 100 Size of word representationepochs 20 Number of training epochspooling MAX pooling layer functionoptimiser RMSProp Keras’s optimiser

char. embeddings Random (uniform)

Characters were lowercased, non-word characters were removed.

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 12 / 16

Page 13: AINL 2016: Maraev

Non-Standard Run Hyperparameters

Parameter Value Descriptionk 3 Size of k-gramclu 300 Size of convolutional filterd 300 Size of word representationepochs 5 Number of training epochspooling MAX pooling layer functionoptimiser RMSProp Keras’s optimiser

word embeddings RusVectores trained on Russian National Corpus(Kutuzov and Andreev, 2015)

Input sentences were tokenised, lemmatised and PoS-taggedwith MyStem (Segalovich, 2003).

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 13 / 16

Page 14: AINL 2016: Maraev

Main results

Accuracy F1

StandardNLX (characters) 72.74 78.80NLX (words) 66.19 76.44

Non-standard NLX (words) 69.94 76.80

BASELINE 49.66 54.03

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 14 / 16

Page 15: AINL 2016: Maraev

Discussion

1. The result for standard run is competing with the best systemand can be further improved by tuning hyperparametersautomatically and also picking the epoch for testingautomatically, based on the validation results.

2. Surprisingly, results for the standard run outperformednon-standard, however, non-standard used external resourcesfor lemmatisation and initial word embeddings. (Probably dueto a higher focus on the standard run).

Next? Attention-based CNN (Yin et al., 2016), combination ofcharacter and word embeddings (dos Santos and Gatti, 2014).

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 15 / 16

Page 16: AINL 2016: Maraev

References

Dasha Bogdanova, Cıcero dos Santos, Luciano Barbosa, and Bianca Zadrozny.Detecting semantically equivalent questions in online user forums. CoNLL 2015,page 123, 2015.

Cicero dos Santos and Maira Gatti. Deep convolutional neural networks for sentimentanalysis of short texts. In Proceedings of COLING 2014, the 25th InternationalConference on Computational Linguistics: Technical Papers, pages 69–78, Dublin,Ireland, August 2014. Dublin City University and Association for ComputationalLinguistics.

Andrey Kutuzov and Igor Andreev. Texts in, meaning out: neural language models insemantic similarity task for russian. In Proceedings of the Dialog Conference, 2015.

Ilya Segalovich. A fast morphological algorithm with unknown word guessing inducedby a dictionary for a web search engine. In MLMTA, pages 273–280. Citeseer, 2003.

Wenpeng Yin, Hinrich Schtze, Bing Xiang, and Bowen Zhou. Abcnn: Attention-basedconvolutional neural network for modeling sentence pairs. Transactions of theAssociation for Computational Linguistics, 4:259–272, 2016. ISSN 2307-387X.

Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networksfor text classification. In Advances in Neural Information Processing Systems,pages 649–657, 2015.

Ye Zhang and Byron Wallace. A sensitivity analysis of (and practitioners’ guide to)convolutional neural networks for sentence classification. arXiv preprintarXiv:1510.03820, 2015.

Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 16 / 16