On One Approach of Solving Sentiment Analysis Task for...

On One Approach of Solving SentimentAnalysis Task for Kazakh and Russian

Languages Using Deep Learning

Narynov Sergazy Sakenovichand Arman Serikuly Zharmagambetov(&)

“Alem Research” LLP, Dostyk Avenue 132, Office 13,050051 Almaty, Kazakhstan

[email protected], [email protected]

Abstract. The given research paper describes modern approaches of solvingthe task of sentiment analysis of the news articles in Kazakh and Russianlanguages by using deep recurrent neural networks. Particularly, we usedLong-Short Term Memory (LSTM) in order to consider long term dependenciesof the whole text. Thereby, research shows that good results can be achievedeven without knowing linguistic features of particular language. Here we aregoing to use word embedding (word2vec, GloVes) as the main feature in ourmachine learning algorithms. The main idea of word embedding is the repre-sentations of words with the help of vectors in such manner that semanticrelationships between words preserved as basic linear algebra operations.

Keywords: NLP � Sentiment analysis � Deep learning � Machine learning �Text classification

1 Introduction

In recent years, there is an active trend towards using various machine learning tech-niques for solving problems related to Natural Language Processing (NLP). One ofthese problems is the automatic detection of emotional coloring (positive, negative,neutral) of the text data, i.e. sentiment analysis. The goal of this task is to determinewhether a given document is positive, negative or neutral according to its generalemotional coloring. We don’t perform sentiment analysis related to particular object,i.e. it is not an aspect based sentiment analysis. Therefore, we deleted from our datasetdocument with mixed sentiment. Nevertheless, analyzing general sentiment of a doc-ument is difficult task by itself. The difficulty of sentiment analysis is determined by theemotional language enriched by slang, polysemy, ambiguity, sarcasm; all this factorsare misleading for both humans and computers.

The high interest of business and researchers to the development of sentimentanalysis are caused by the quality and performance issues. Apparently the sentimentanalysis is one of the most in-demand NLP tasks. For instance, there are severalinternational competitions and contests [1], which try to identify the best method forsentiment classification. Sentiment analysis had been applied on various levels, starting

© Springer International Publishing Switzerland 2016N.T. Nguyen et al. (Eds.): ICCCI 2016, Part II, LNAI 9876, pp. 537–545, 2016.DOI: 10.1007/978-3-319-45246-3_51

from the whole text level, then going towards the sentence and\or phrase level. Ingeneral, importance of solving this problem is considered in [16].

It is obvious that similar sentimental messages (text, sentence…) can have variousthesauruses, styles, and structure of narration. Thus, points corresponding to the similarmessages can be located far away from each other that make the sentiment classifi-cation task much harder [2]. The scientific novelty of the paper is in applyingWord2Vec algorithm [3] in the sentiment classification task for Kazakh and Russianlanguages and use this word vectors as input to deep recurrent neural networks to dealwith long term dependency of the textual document.

2 Related Works

The study of sentiment analysis has relatively small history. Reference [4] is generallyconsidered the principal work on using machine learning methods of text classificationfor sentiment analysis. The previous works related to this field includes approachesbased on maximum relative entropy and binary linear classification [5] and unsuper-vised learning [6].

Most of these methods use well known features as bag-of-words, n-grams, tf-idf,which considered as the simplest one [7]. But as show the experiment results the simplemodels often works better than complicated ones. Reference [7] use distant learning toacquire sentiment data. Additionally, since they mostly work with movie commentsand tweets, they used additional features as ending in positive emoticons like “:)” “:-)”as positive and negative emoticons like “:(” “:-(” as negative. They build models usingNaive Bayes, Maximum Entropy and Support Vector Machines (SVM), and they reportSVM outperforms other classifiers. In terms of feature space, they try a Unigram,Bigram model in conjunction with parts-of-speech (POS) features. They note that theunigram model outperforms all other models.

The other feature is syntactic meta information. It’s obvious that the recursivelyenumerable grammar describes the most complete of any natural language. Thecomputational performance of the best syntactic parser of context free grammar islinear, so syntactic information is expensive for the sentiment analysis task. However,those experiments that involved dependency relations showed that syntax contributessignificantly to both Recall and Precision of most algorithms. For the task of textclassification in general see [8–10] deal with a task sentiment classification based onsyntactic relations. In reference [11] it was shown that POS-tagging and other linguisticfeatures contributes to the classifier accuracy. The experiments were conducted onfeedback data from Global Support Services survey.

Sentiment analysis using recurrent and recursive neural networks described in[17–19]. Important fact is that authors could not find previous works related toautomatic sentiment classification for Kazakh language.

538 N.S. Sakenovich and A.S. Zharmagambetov

3 Dataset

The labeled by human data set consists of *30,000 news articles in Russian language,specially selected for sentiment analysis, which consist of 11,286 neutral, 10,958positive and 7756 negative articles. The sentiment of each document can be one of thefollowing: positive, negative, neutral. 18,000 reviews from this dataset were chosen astraining data, 6,000 as cross validation dataset and 6,000 as test dataset. Further-more, *10000 (3021 positive, 2548 negative, 4431 neutral) news articles in Kazakhlanguage were labeled in order to train sentiment classifier. Each entry on this datasetconsists of the following field:

• Id - Unique ID of each review.• Sentiment - Sentiment of the review: 1 for positive reviews, 0 for negative reviews,

2 to the neutral reviews.• Text - Text of the document (on Kazakh or Russian language).

The goal is to increase the accuracy (precision and recall) in sentiment classificationof test dataset.

Furthermore, we used *70 GB of plain text data in Russian language and *10GB plain text data in Kazakh language in order to train word embedding by unsu-pervised method. These texts we obtained from open electronic libraries, news articles,crawled web sites, etc.

4 Learning Model

The first step of learning model is unsupervised training of word embedding.Word2vec, published by Google in 2013, is a neural network implementation thatlearns distributed representations for words. Distributed word vectors, i.e. wordembeddings are powerful and can be used for many applications, particularly wordprediction and translation. It accepts large un-annotated corpus and learns by unsu-pervised algorithms. There are two different architectures of Word2Vec algorithm. AtFig. 1 continuous bag of words (CBOW) architecture presented, the purpose of suchnetwork topology assumes mapping from context to particular term and vice versa atFig. 1 skip-gram architecture that maps particular term to its context.

We used “gensim” [13] python library with build-in Word2Vec model. It acceptslarge textual dataset for training. As was mentioned above, 70 GB and 10 GB raw datain Russian and Kazakh languages, respectively. The following options were used whiletraining for Word2Vec: 300 dimensional space, 40 minimum words and 3 words incontext. The vector representation of word has a lot of advantageous. It raises thenotion of space, and we can find distance between words and finding semantic similarwords. The simple result that can be obtained for such vector presented in Table 1.

Finally, map with a word as key and N dimensional vectors as value is obtainedfrom abovementioned word2vec algorithm. Next, these vectors will be used in clas-sification task. But before, each article should be preprocessed.

The preprocessing includes the following: (1) The all HTML tags, punctuations,were removed by “Beautiful Soup” python library. There are HTML tags such as

On One Approach of Solving Sentiment Analysis Task 539

“< br/>”, abbreviations, punctuation - all common issues when processing text fromonline. (2) Moreover, numbers and links were replaced by tags NUM and LINK,respectively. (3) Removing stop words. Conveniently, there is Python package -Natural Language Toolkit (NLTK) [12] that removes stop words with built in lists.(4) Lemmatization of each word. For Russian language was used lemmatization tool“Mystem” [23], which was developed by Yandex. For Kazakh language there is nolemmatization. In future works we plan to implement morphological lemmatizationtool for Kazakh language.

Next, each word is mapped to vector and for one document we get a sequence ofN dimensional vectors which will be given as input to LSTM recurrent neural network.

Long Short-Term Memory (LSTM) is a special type of recurrent neural networkswhich was invented to consider sequential dependencies as a set of words in some text.Furthermore, LSTM overcomes common problems of RNN as exploding gradient or

Fig. 1. The architectures of Word2Vec model. The CBOW architecture predicts the currentword based on the context (left). The skip-gram predicts surrounding words given the currentword (right). Taken from [3].

Table 1. Semantic similar words to Russian word - ‘man’

Words cos dist

Woman 0,6056Guy 0,4935Boy 0,4893Men 0,4632Person 0,4574Lady 0,4487Himself 0,4288Girl 0,4166His 0,3853He 0,3829


vanishing gradient. To overcome this drawback LSTM uses additional internal trans-formation, which operate with memory cell more cautious represented in Fig. 2. Fordetailed information how LSTM works you can refer to [20].

For sentiment classification task we compared several model architectures withLSTM, which are presented in Figs. 3, 4, 5 and 6. General idea for all schemes is thesame – input word vectors are processed via LSTM units, and then outputs from theseunits go further via vanilla neural networks or logistic regression unit. Below, in resultssection we give comparison of results of these various neural networks architectures.

Training algorithm was implemented using Theano [21] and Lasagne [22] packagesfor python language. C-extension for python (cython [15]) and GPU were used foracceleration and efficient calculations. In our experiments, using GPU gives up to 6–7times faster calculation compared with CPU usage with multithread. Abovementionedpackages give opportunity to easily implement various deep learning algorithms

Fig. 2. The architecture of LSTM. Instead of one output it uses three gates: input, forget, output.Taken from [24].

Fig. 3. Mean pooling from each output of LSTM followed by logistic regression (left).Stacked LSTM where the last layer is sliced and proceed to logistic regression unit (right).


Fig. 4. Stacked LSTM where the last layer is sliced and proceeded to multilayer perceptron(Neural Network) unit (left). Stacked two layer LSTM where the last layer is sliced andproceeded to logistic regression unit (right).

Fig. 5. Stacked two layer LSTM where the last layer is sliced and proceeded to multilayerperceptron (Neural Network) unit.


including LSTM, multilayer perceptron, etc. Also they have extension to use GPU tooptimize computations.

Sigmoid and tanh functions were used as LSTM internal activation functions.Similarly, softmax was used as activation function for logistic regression and NeuralNetworks units. Advantage of softmax functions is that it gives correct generalizationof the logistic sigmoid to the multinomial case:

hi ¼ eaiPN

j¼0 eaj

ð1Þ

We used categorical cross entropy to define the loss function which should beoptimized while training:

J ¼ �XN

i¼0

Xm

j¼0

yðiÞj � logðhðiÞj Þ ð2Þ

5 Results and Discussions

Table 2 summarizes the results on sentiment classification. As mentioned above thereare 30,000 news articles in Russian language and 10,000 news articles in Kazakhlanguage were chosen for train and test data.

It can be seen that the best model for Russian sentiment analysis is - “Stacked twolayer LSTMwith one hidden layer Neural Networks” (Fig. 5) and for Kazakh language is- “LSTMwithmean pooling and logistic regression unit” (Fig. 4). Another important factis that sentiment analysis for Kazakh language shows worse result. Probably, it can beexplained due to relatively small amount of training data set and lack of lemmatization.

Fig. 6. Stacked bidirectional LSTM where the last layers of each direction are concatenated andproceeded to multilayer perceptron unit.


The given work shows that deep recurrent neural networks can be efficientlyapplied to the task of sentiment classification. Particularly, LSTM shows stable resultseven for long sequential data as words or sentences in a news article. Additionally,word embedding helps extract semantic relations between words which have effect totraining process. Future works will be dedicated to improvement of sentiment classi-fication by studying deeply long term dependencies in the text document and byextracting syntax relations. Neural Turing Machines, adversarial neural networks willbe considered instead of or jointly with recurrent relation. Moreover, aspect basedsentiment classification task will be studied.

References

1. Chetviorkin, I., Braslavskiy, P., Loukachevich, N.: Sentiment analysis track at ROMIP2011. In: International Conference “Dialog 2012”: Computational Linguistics andIntellectual Technologies, Bekasovo, pp. 1–14 (2012)

2. Pak, A.A., Narynov, S.S., Zharmagambetov, A.S., Sagyndykova, S.N., Kenzhebayeva, Z.E.,Turemuratovich, I.: The method of synonyms extraction from unannotated corpus. In:DINWC 2015, Moscow, pp. 1–5 (2015)

3. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations invector space. In: Workshop at ICLR, Scottsdale, AZ, USA (2013)

4. Bo, P., Lee, L.: A sentimental education: sentiment analysis using subjectivitysummarization based on minimum cuts. In: ACL (2004)

5. Joachims, T.: Text categorization with support vector machines: learning with many relevantfeatures. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142.Springer, Heidelberg (1998)

6. Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervisedclassification of reviews. In: 40th Annual Meeting of the Association for ComputationalLinguistics (ACL 2002), Philadelphia, Pennsylvania, pp. 417–424 (2002)

Table 2. Results of sentiment classification

Method Average precision(ru/kz)

Average recall(ru/kz)

Accuracy(ru/kz)

Mean pooling + log. reg. (Fig. 3 -left)

80.2 %73.2 %

73.7 %72.2 %

76.3 %72.8 %

Stacked LSTM + log.reg (Fig. 3 -right)

81.3 %69.1 %

77.2 %61.3 %

80.8 %67.3 %

Stacked LSTM + NN (Fig. 4 -left)

78.2 %70.1 %

82 %72.6 %

82.8 %70.5 %

Stacked two LSTM + log.reg(Fig. 4 - right)

81.6 %70.4 %

85.7 %71.3 %

85.2 %70.9 %

Stacked two LSTM + NN(Fig. 5)

84.5 %71.1 %

86.4 %66.7 %

86.3 %69.8 %

Biderect. LSTM + NN (Fig. 6) 76.8 %62.9 %

69.3 %61.2 %

71.1 %62.3 %


7. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision.Technical report, Stanford (2009)

8. Furnkranz, J., Mitchell, T., Riloff, E.: A case study in using linguistic phrases for textcategorization on the WWW. In: AAAI/ICML Workshop on Learning for TextCategorization, pp. 5–12 (1998)

9. Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of theusefulness of statistical phrases for automated text categorization. In: Chin, A.G. (ed.) TextDatabases and Document Management: Theory and Practice, pp. 78–102. Idea GroupPublishing, USA (2001)

10. Nastase, B., Shirabad, J.S., Caropreso, M.F.: Using dependency relations for textclassification. In: 19th Canadian Conference on Artificial Intelligence, Quebec City,pp. 12–25 (2006)

11. Gamon, M.: Sentiment classification on customer feedback data: noisy data, large featurevectors, and the role of linguistic analysis. In: COLING 2004, Geneva, pp. 841–847 (2004)

12. Natural Language Toolkit. http://www.nltk.org/13. Gensim: Topic modeling for humans. https://radimrehurek.com/gensim/14. Sci-kit: Machine learning in python. http://scikit-learn.org/stable/15. Cython: C-Extensions for Python. http://cython.org/16. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–

167 (2012)17. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors

for sentiment analysis. In: 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies, pp. 142–150. Association for ComputationalLinguistics (2011)

18. Tarasov, D.S.: Deep recurrent neural networks for multiple language aspect based sentimentanalysis of user reviews. In: Dialog 2015, Moskow (2015)

19. Socher, R., Perelygin, A., Jean, Y.W., Chuang, J., Manning, C.D, Ng, A.Y., Potts, C.:Recursive deep models for semantic compositionality over a sentiment treebank. In: Con-ference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1642–1656.Citeseer, Seattle (2013)

20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. J. Neural Comput. 9(8),1735–1780 (1997)

21. Theano: Framework for python. http://deeplearning.net/software/theano/22. Lasagne: Framework for python. https://github.com/Lasagne/Lasagne23. Mystem: Morphology analysis tool. https://tech.yandex.ru/mystem/24. Understanding LSTM Networks. Colah’s personal blog. http://colah.github.io/posts/2015–08-

Understanding-LSTMs/


http://www.nltk.org/

https://radimrehurek.com/gensim/

http://scikit-learn.org/stable/

http://cython.org/

http://deeplearning.net/software/theano/

https://github.com/Lasagne/Lasagne

https://tech.yandex.ru/mystem/

http://colah.github.io/posts/2015%e2%80%9308-Understanding-LSTMs/

http://colah.github.io/posts/2015%e2%80%9308-Understanding-LSTMs/

On One Approach of Solving Sentiment Analysis Task for...

Documents

Transcript of On One Approach of Solving Sentiment Analysis Task for...