AINL 2016: Maraev
-
Upload
lidia-pivovarova -
Category
Science
-
view
224 -
download
0
Transcript of AINL 2016: Maraev
Character-level Convolutional NeuralNetwork for Sentence Paraphrase Detection
Vladislav MaraevNLX-Group, Faculty of Sciences, University of Lisbon
Paraphrase detection for Russian workshopAINL FRUCT 2016
Objective
WhatTask 2 — Binary classification (paraphrase/non-paraphrase).
HowApply convolutional neural network (CNN) architecture:
Standard Non-standardWord embeddings ✓ ✓
Character embeddings ✓
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 2 / 16
Related work
Convolutional neural networks in NLP
• Detecting semantically equivalent questions with CNN andword embeddings (Bogdanova et al., 2015)
• Convolutional Neural Networks for Sentence Classification(Zhang and Wallace, 2015)
• Attention-based CNN for modeling sentence pairs (Yin et al.,2016)
• Character embeddings for text classification(Zhang et al.,2015)
• Word+character embeddings for sentiment analysis (dosSantos and Gatti, 2014)
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 3 / 16
How CNN works?
TR CONV POOL cosinesimilarity
Steps:
1. Token representation (Embedding)
2. Convolution
3. Pooling
4. Pair similarity estimation
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 4 / 16
Convolutional Neural Network1. Token representation
Input
s = {t1, t2, . . . , tN}
Token representation
rt = W0vt , (1)
where
• W0 ∈ Rd×V is an embedding matrix
• vt ∈ RV is a one-hot encoded vector of size V
Output
sTR = {rt1 , rt2 , . . . , rtN} , where rtn ∈ Rd
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 5 / 16
Convolutional Neural Network2. Convolution
Convolution
1. Concatenations zn of k-grams
2. Multiply by W1, add bias b1, andapply tanh function:
rzn = tanh(W1zn + b1
)where:
zn ∈ Rdk
W1 ∈ Rclu×dk
rzn ∈ Rclu
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 6 / 16
Convolutional Neural Network3. Pooling
Sum (or Max) over all the rzn element-wise and apply tanhfunction:
rs = tanh
(∑n
rzn
)which will give us sentence representation rs ∈ Rclu
* This means that sentence representation doesn’t depend onsentence length.
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 7 / 16
Convolutional Neural Network4. Compute similarity
TR CONV POOL cosinesimilarity
Estimate similarity between the pair of sentencerepresentations using cosine measure:
similarity =rs1 · rs2
∥rs1∥∥rs2∥
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 8 / 16
Training the network
We train W0, W1 and b1.
Steps
1. Compute mean-squared error (w.r.t. cosine similarity)
2. Use the backpropagation algorithm (SGD/RMSProp) tocompute gradients of the network
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 9 / 16
Several convolutional filters
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 10 / 16
Standard Run HyperparametersWord embeddings
Parameter Value Descriptionk {3, 5, 8, 12} Sizes of k-gramsclu 100 Size of each convolutional filterd 300 Size of word representationepochs 5 Number of training epochspooling MAX pooling layer functionoptimiser RMSProp Keras’s optimiser
word embeddings Random (uniform)
Sentences were tokenised and lowercased using Keras.
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 11 / 16
Standard Run HyperparametersCharacter embeddings
Parameter Value Descriptionk {2, 3, 5, 7, 9, 11} Sizes of k-gramsclu 100 Size of each conv. filterd 100 Size of word representationepochs 20 Number of training epochspooling MAX pooling layer functionoptimiser RMSProp Keras’s optimiser
char. embeddings Random (uniform)
Characters were lowercased, non-word characters were removed.
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 12 / 16
Non-Standard Run Hyperparameters
Parameter Value Descriptionk 3 Size of k-gramclu 300 Size of convolutional filterd 300 Size of word representationepochs 5 Number of training epochspooling MAX pooling layer functionoptimiser RMSProp Keras’s optimiser
word embeddings RusVectores trained on Russian National Corpus(Kutuzov and Andreev, 2015)
Input sentences were tokenised, lemmatised and PoS-taggedwith MyStem (Segalovich, 2003).
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 13 / 16
Main results
Accuracy F1
StandardNLX (characters) 72.74 78.80NLX (words) 66.19 76.44
Non-standard NLX (words) 69.94 76.80
BASELINE 49.66 54.03
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 14 / 16
Discussion
1. The result for standard run is competing with the best systemand can be further improved by tuning hyperparametersautomatically and also picking the epoch for testingautomatically, based on the validation results.
2. Surprisingly, results for the standard run outperformednon-standard, however, non-standard used external resourcesfor lemmatisation and initial word embeddings. (Probably dueto a higher focus on the standard run).
Next? Attention-based CNN (Yin et al., 2016), combination ofcharacter and word embeddings (dos Santos and Gatti, 2014).
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 15 / 16
References
Dasha Bogdanova, Cıcero dos Santos, Luciano Barbosa, and Bianca Zadrozny.Detecting semantically equivalent questions in online user forums. CoNLL 2015,page 123, 2015.
Cicero dos Santos and Maira Gatti. Deep convolutional neural networks for sentimentanalysis of short texts. In Proceedings of COLING 2014, the 25th InternationalConference on Computational Linguistics: Technical Papers, pages 69–78, Dublin,Ireland, August 2014. Dublin City University and Association for ComputationalLinguistics.
Andrey Kutuzov and Igor Andreev. Texts in, meaning out: neural language models insemantic similarity task for russian. In Proceedings of the Dialog Conference, 2015.
Ilya Segalovich. A fast morphological algorithm with unknown word guessing inducedby a dictionary for a web search engine. In MLMTA, pages 273–280. Citeseer, 2003.
Wenpeng Yin, Hinrich Schtze, Bing Xiang, and Bowen Zhou. Abcnn: Attention-basedconvolutional neural network for modeling sentence pairs. Transactions of theAssociation for Computational Linguistics, 4:259–272, 2016. ISSN 2307-387X.
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networksfor text classification. In Advances in Neural Information Processing Systems,pages 649–657, 2015.
Ye Zhang and Byron Wallace. A sensitivity analysis of (and practitioners’ guide to)convolutional neural networks for sentence classification. arXiv preprintarXiv:1510.03820, 2015.
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 16 / 16