Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing...

13
Research Article Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing MultilevelInteractiveBidirectionalGRUand AttentionMechanism XiaodiWang , 1 XiaoliangChen , 1,2 MingweiTang, 1 TianYang, 1 andZhenWang 1 1 School of Computer and Software Engineering, Xihua University, Chengdu 610039, China 2 Department of Computer Science and Operations Research, University of Montreal, Montreal, QC H3C3J7, Canada Correspondence should be addressed to Xiaoliang Chen; [email protected] Received 5 April 2020; Accepted 16 June 2020; Published 8 July 2020 Academic Editor: Francisco R. Villatoro Copyright © 2020 Xiaodi Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e aim of aspect-level sentiment analysis is to identify the sentiment polarity of a given target term in sentences. Existing neural network models provide a useful account of how to judge the polarity. However, context relative position information for the target terms is adversely ignored under the limitation of training datasets. Considering position features between words into the models can improve the accuracy of sentiment classification. Hence, this study proposes an improved classification model by combining multilevel interactive bidirectional Gated Recurrent Unit (GRU), attention mechanisms, and position features (MI- biGRU). Firstly, the position features of words in a sentence are initialized to enrich word embedding. Secondly, the approach extracts the features of target terms and context by using a well-constructed multilevel interactive bidirectional neural network. irdly, an attention mechanism is introduced so that the model can pay greater attention to those words that are important for sentiment analysis. Finally, four classic sentiment classification datasets are used to deal with aspect-level tasks. Experimental results indicate that there is a correlation between the multilevel interactive attention network and the position features. MI- biGRU can obviously improve the performance of classification. 1.Introduction Capturing and analyzing the sentiments implied in large-scale comment texts has become a central topic for natural language processing (NLP). e tasks of fine-grained sentiment classi- fication of the target terms in a given context are called aspect- level sentiment analysis, which have received considerable attention compared with acquiring traditional comprehensive sentiment polarity [1, 2]. A growing number of prestigious researchers and engineers around the world have post their opinions and reports on topics of sentiment classification online and offer them for free. ese technical contributions have been properly accepted and acclaimed for their obvious advantages in NLP tasks. However, further aspect-level sen- timent analysis is quite sensitive to current researchers. ere are many problems in sentiment classification of aspect-level, including classification, regression, and recognition. We mainly focus on classification issues [3]. Sentiment predictions for a target term in a text are important for our increased understanding of sentence se- mantics and user emotions behind the sentences. e typical feature of aspect-level sentiment analysis can be exemplified in studies using the follow sentence expression: “they use fancy ingredients, but even fancy ingredients do not make for good pizza unless someone knows how to get the crust right.” e sentiment polarity of the target terms “ingre- dients,” “pizza,” and “crust” were positive, negative, and neutral, respectively. However, one potential problem is that the predictive accuracy of polarity is much lower than ex- pected by the application, which is limited to complex sentence features and language environments. Traditional methods of comprehensive sentiment evaluation do not meet the requirements of fine-grained aspect-level tasks based on the target terms [4]. ere are few studies that have investigated the association between sentiment polarity and the position information of target terms. Hence, this paper Hindawi Discrete Dynamics in Nature and Society Volume 2020, Article ID 5824873, 13 pages https://doi.org/10.1155/2020/5824873

Transcript of Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing...

Page 1: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

Research ArticleAspect-Level SentimentAnalysisBasedonPositionFeaturesUsingMultilevel Interactive Bidirectional GRU andAttention Mechanism

Xiaodi Wang 1 Xiaoliang Chen 12 Mingwei Tang1 Tian Yang1 and Zhen Wang1

1School of Computer and Software Engineering Xihua University Chengdu 610039 China2Department of Computer Science and Operations Research University of Montreal Montreal QC H3C3J7 Canada

Correspondence should be addressed to Xiaoliang Chen xdxlchengmailcom

Received 5 April 2020 Accepted 16 June 2020 Published 8 July 2020

Academic Editor Francisco R Villatoro

Copyright copy 2020 Xiaodi Wang et al (is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

(e aim of aspect-level sentiment analysis is to identify the sentiment polarity of a given target term in sentences Existing neuralnetwork models provide a useful account of how to judge the polarity However context relative position information for thetarget terms is adversely ignored under the limitation of training datasets Considering position features between words into themodels can improve the accuracy of sentiment classification Hence this study proposes an improved classification model bycombining multilevel interactive bidirectional Gated Recurrent Unit (GRU) attention mechanisms and position features (MI-biGRU) Firstly the position features of words in a sentence are initialized to enrich word embedding Secondly the approachextracts the features of target terms and context by using a well-constructed multilevel interactive bidirectional neural network(irdly an attention mechanism is introduced so that the model can pay greater attention to those words that are important forsentiment analysis Finally four classic sentiment classification datasets are used to deal with aspect-level tasks Experimentalresults indicate that there is a correlation between the multilevel interactive attention network and the position features MI-biGRU can obviously improve the performance of classification

1 Introduction

Capturing and analyzing the sentiments implied in large-scalecomment texts has become a central topic for natural languageprocessing (NLP) (e tasks of fine-grained sentiment classi-fication of the target terms in a given context are called aspect-level sentiment analysis which have received considerableattention compared with acquiring traditional comprehensivesentiment polarity [1 2] A growing number of prestigiousresearchers and engineers around the world have post theiropinions and reports on topics of sentiment classificationonline and offer them for free (ese technical contributionshave been properly accepted and acclaimed for their obviousadvantages in NLP tasks However further aspect-level sen-timent analysis is quite sensitive to current researchers (ereare many problems in sentiment classification of aspect-levelincluding classification regression and recognitionWemainlyfocus on classification issues [3]

Sentiment predictions for a target term in a text areimportant for our increased understanding of sentence se-mantics and user emotions behind the sentences (e typicalfeature of aspect-level sentiment analysis can be exemplifiedin studies using the follow sentence expression ldquothey usefancy ingredients but even fancy ingredients do not makefor good pizza unless someone knows how to get the crustrightrdquo (e sentiment polarity of the target terms ldquoingre-dientsrdquo ldquopizzardquo and ldquocrustrdquo were positive negative andneutral respectively However one potential problem is thatthe predictive accuracy of polarity is much lower than ex-pected by the application which is limited to complexsentence features and language environments Traditionalmethods of comprehensive sentiment evaluation do notmeet the requirements of fine-grained aspect-level tasksbased on the target terms [4] (ere are few studies that haveinvestigated the association between sentiment polarity andthe position information of target terms Hence this paper

HindawiDiscrete Dynamics in Nature and SocietyVolume 2020 Article ID 5824873 13 pageshttpsdoiorg10115520205824873

proposes a multilevel interactive bidirectional attentionnetwork model integrating bidirectional GRU and positioninformation to improve the accuracy of aspect-level senti-ment predictions

Traditional published methods for processing aspect-level tasks are limited to the selection of feature sets (efocus of these studies such as bag-of-words and sentimentlexicons [5] is to manually label a large number of featuresScholars have long debated the waste of labour on manualmarking However the existing studies indicate that thequality of training models largely depends on the con-structed labelled featured set Recently investigators haveexamined the effects of deep learning compared with tra-ditional manual generation methods in NLP tasks [6 7] (eformer has a clear advantage

Recurrent neural network (RNN) can extract the es-sential features of word embedding by using a multilevelrecurrent mechanism and then generate a vector repre-sentation of the target sentences Most of sentiment clas-sification models using RNN can achieve acceptable resultsthrough well-established tuning steps [8] More recent at-tention on the sentiment classification tasks has focused onthe provision of RNN variants (e first method to improvethe models is to adjust their structures For example target-dependent long short term memory (TD-LSTM) [9] candivide a context into left and right parts according to targetterms (en the hidden states to deal with aspect-level tasksare generated by combining two LSTM models in structure(e second is characterized by a change in the input of themodels For example there are methods to associate thetarget term vectors with the context vectors as the wholeinput of the LSTM model which can realize aspect-leveltasks by enhancing the semantic features of the words [10]

Research on the method of neural networks on aspect-level tasks has been mostly restricted to limited performanceimprovement However few studies have been able to drawon systematic research into the importance of the words in asentence In other words we cannot effectively identifywhich words in a sentence are more indispensable andcannot accurately locate these key words in aspect-leveltasks Fortunately attention mechanisms which are widelyused in machine translation [11] image recognition [12]and reading comprehension [13 14] can solve this problemAttention mechanisms [15] can be utilized to measure theimportance of each word in a context to the target terms inwhich attentions are ultimately expressed as weight score(e model will focus more attention on the words with highweight score and extract more information from the wordsrelated to the target terms thus improving the performanceof classification Some scholars have invested in this domainand achieved excellent results such as AE-LSTM ATAE-LSTM [16] MemNet [17] and IAN [18] However theinfluence of position parameters of target terms on classi-fication performance has remained unclear [19 20] (isindicates a need to understand the actual contribution of theposition information

Researchers review that the sentiment polarity of a targetterm contained in a sentence is related to the context aroundit but not to those with greater distant A well-constructed

aspect-level model should allocate higher weight score to thecontexts that possess closer relative distance to the targetterm (e idea can be illustrated briefly by the followingsentence ldquothey use fancy ingredients but even fancy in-gredients do not make for good pizza unless someone knowshow to get the crust rightrdquo In this case the polarity ispositive negative and neutral when the target term is settingas the word ldquoingredientsrdquo ldquopizzardquo and ldquocrustrdquo respectivelyIn order to decide the polarity we should intuitively con-centrate on the words that are close to the target one andthen consider the other words far away from it Hence theword ldquofancyrdquo in the case compared to other words such asldquogoodrdquo and ldquogetrdquo will make a greater contribution to de-termine the polarity of target term ldquoingredientsrdquo Conse-quently adding position features can enrich word semanticsin the embedding process (is work attempts to illuminatethe fact that a model with position information can learnmore sentence features on aspect-level tasks

(is study proposes an improved aspect-level classifi-cation model by combining multilevel interactive bidirec-tional Gated Recurrent Unit attention mechanisms andposition features (MI-biGRU) (e function of the modelconsists of three parts

(1) Calculate the positional index of each word in thesentence based on the current target term and ex-press it in an embedding

(2) Extract semantic features of target words and contextusing multilevel bidirectional GRU neural network

(3) Use bidirectional attention mechanism to obtain theweight score matrix of hidden states and determinerelevance of each context to the target word

(e model not only extracts the abstract semantic fea-tures of sentences but also calculates the position features ofwords in parallel by a multilevel structure A vector rep-resentation with more features can be obtained according tothe bidirectional attention mechanism which can enhancethe performance of sentiment classification tasks On top ofthat bidirectional embeddings can be brought together totackle accurate sentiment classification at the fine-grainedlevel Finally the effectiveness of this model will be evaluatedby using four public aspect-level sentiment datasets (eexperimental results show that the proposed model canachieve good sentiment discrimination performance at as-pect-level on all datasets

(is paper is organized as follows Section 2 introducesthe related work Section 3 formulates the improved modelMI-biGRU that is composed of multilevel interactive bidi-rectional Gated Recurrent Unit attention mechanisms andposition features Section 4 deals with experiments andreporting results on aspect-level sentiment analysis Con-clusions and future work are presented in Section 5

2 Related Work

(is section introduces the development of sentimentanalysis in recent years (e general research can be dividedinto three parts traditional sentiment analysis methods

2 Discrete Dynamics in Nature and Society

neural network-based methods and applications of atten-tion mechanism in aspect-level tasks

21 Traditional Sentiment Analysis Methods Existing tra-ditional methods on sentiment classification are extensiveand focus particularly on machine learning technologieswhich solve two problems text representation and featureextraction First of all several studies have used supportvector machines (SVM) to deal with text representationresearch studies in sentiment classification tasks [21]According to the formulation of SVM all the words of thetext do not make a distinction between target terms andnormal context (ere are other relatively text representa-tion methods of the literature that is concerned with sen-timent words [22 23] tokens [24] or dependency pathdistance [24] Above methods are called coarse-grainedclassifications On top of that the majority of studies onfeature extraction have obtained the sentiment lexicon andbag-of-words features [25ndash27] (ese methods have beenplaying an increasingly important role in improving theperformance of classification Yet these existing types ofapproaches have given rise to a lot of heated debate (emodel training is heavily dependent on the features weextract Manually labelling features will inevitably take a lotof manpower and time resources As a result the classifi-cation performance is low because of the high dimension ofuseful information if the features are obtained from anunlabelled text

22 Neural Network-Based Sentiment Analysis MethodsUsing neural network-based methods has become increas-ingly popular among the sentiment classification tasks fortheir flexible structure and pleasant performance [28] Forexample all the models such as Recursive Neural Networks[29] Recursive Neural Tensor Networks [30] Tree-LSTMs[31] and Hierarchical LSTMs [32] more or less enhance theaccuracy of sentiment classification by different constructivemodel structures (e models above improve the accuracycompared with traditional machine learning However re-searchers have come to recognize their inadequacies Ig-noring to distinguish the target terms of a sentence willgreatly decrease the classification effect (erefore somescholars in academia have turned their attention to the targetterms Jiang et al performed a similar series of experimentsto show the significance of target terms for sentimentclassification tasks [5] Tang et al reviewed the literaturefrom the period and proposed two improved models TD-LSTM and TC-LSTM which can deal with the problem ofautomatic target term extraction context feature enhance-ment and classification performance improvement Zhanget al conducted a series of trials on sentiment analysis inwhich they constructed a neural network model with twogate mechanisms [33] (e mechanisms implement thefunctions of extracting grammatical and semantic infor-mation and the relationship information between the leftand right context for a target term respectively Finallyinformation extracted by two gate mechanisms is aggregatedfor sentiment classification Overall these studies highlight

the need for target terms but there is no reference to theposition of such terms and the relationship between positioninformation and classification performance

23 Application of Attention Mechanism in Aspect-LevelSentiment Analysis Deep learning technologies are originallyapplied in the field of images which have gradually turned toNLP area and achieved excellent results Attention mechanismsin deep learning are necessarily serve as an effective way tohighly accurate sentiment classification A fewNLP researcherswho have surveyed the intrinsic relevance between the contextand the target terms in sentences have been found For ex-ample Zeng et al designed an attention-based LSTM for aspect-level sentiment classification [10] which processes target termembedding and word embedding in the pretraining step si-multaneously (en the target term vector will be put into anattention network to calculate the term weight More recentattention has focused on aspect-level tasks with similar attentionmechanisms Tang et al designed a deep memory network [17]with reporting multiple computational layers Each layer is acontext-based attention model through which the relationshipweight from context to target terms can be obtained Ma et alsuggest the deep semantic association between context andtarget terms by proposing an interactive attention model [18](is model obtains the two-way weight and combines them toperform aspect-level sentiment classification

Most of the improved neural network models can achievebetter results compared with the original one However thesemethods ignore the role of position relationship betweencontext and target terms As a result the polarity of the targetterms must be affected under certain positional relationships(e study of Zeng et al subsequent offers some importantinsights into the application of position information in clas-sification tasks [34] For example understanding the distancebetween context and target term and how to present suchdistance by embedding representationwill help our aspect-levelwork (e work of Gu et al uses a position-aware bidirectionalattention network to investigate aspect-level sentiment analysis[20] which provides rich semantic features in embeddingwords expression

As noted above interactive network or position informationis particularly useful in studying aspect-level sentiment analysisSo far little attention has been paid to both of them simulta-neously Hence a combination of both interactive concept andposition parameter was used in this investigation (is studyproposes an improved aspect-level sentiment analysis model bycombining multilevel interactive bidirectional Gated RecurrentUnit attention mechanisms and position features (MI-biGRU)First of all the distance from the context to target term in ourmodel will be prepared according to the similar procedure usedby the scholars Zeng et al [34] On top of that the wordembedding with position information will be trained by using amultilevel bidirectional GRU neural network Finally a bidi-rectional interactive attentionmechanism is used to compute theweight matrix to identify the context possessing the semanticassociation with the target termMI-biGRU can perform amoreclassification accuracy for aspect-level tasks than previousmodels which will be shown in Section 4

Discrete Dynamics in Nature and Society 3

3 Model Description

(is section presents the details of the model MI-biGRU foraspect-level sentiment analysis In the previous work severaldefinitions with symbolic differences have been proposed forsentiment analysis Hence we need to provide the basicconcepts and notations of MI-biGRU classification involvedin this paper

A sentiment tricategory problem represented in MI-biGRU is associated with a three-tuple polarity set (positivenegative and neutral) Given a sentence with n words in-cluding context and target terms A target term is usuallydenoted by the word group composed of one or more ad-jacent words in context where the position of the first andlast word in the word group is called the start and endposition respectively (e target term embedding sequencecan be denoted by [e1a e2a em

a ] with m predeterminedtarget terms Notation [p1 p2 pn] represents the rela-tive distance embedding from each word wi

c i isin 1 2 nof a sentence to a target term

(e overall architecture of MI-biGRU can be illustratedin Figure 1 (is model possesses two inputs One is a wordembedding sequence [w1 w2 wn] that consists of acontext embedding sequence [e1c e2c en

c ] and a positionembedding sequence [p1 p2 pn] (e other is a targetterm embedding sequence [e1a e2a em

a ] (e goal of themodel is to extract enough semantic information fromtwo embedding sequences and combine them to performaspect-level sentiment classification Notations employed torepresent the components of MI-biGRU are described inTable 1 (e details of the model are divided into six stepsbased on their execution order

31 Position Representation Aspect-level tasks havebenefited a lot from position embedding representation formore valuable word features [34] (e concept of relativedistance between words serves to quantify the relevance of asentence word to a target term We are required to representthe position information by a embeddable vector patternwhich can be formalized as an integer vector or a matrixdepending on whether there concerns unique or multipletarget terms

First of all the word position index of a target term in asentence will be marked as the cardinal point ldquo0rdquo Discretespacing from the ith word wi

c in a sentence to the cardinalpoint is called the relative distance of wi

c which is denoted bypi and can be calculated by the formula

pi

as minus i1113868111386811138681113868

1113868111386811138681113868 i lt as

0 as lt ilt ae

ae minus i1113868111386811138681113868

1113868111386811138681113868 i gt ae

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(1)

Extending this concept in a sentence with n words givesus the following position index list P [p1 p2 pn](is can be illustrated briefly by the following two examplesFirstly if the unique word ldquoquantityrdquo is applied as the targetterm in the sentence ldquothe quantity is also very good you willcome out satisfiedrdquo we can develop the position index list

[1 0 1 2 3 4 5 6 7 8 9] by setting a cardinal point ldquo0rdquo forthe second word ldquoquantityrdquo and deriving an increasinglypositive integer from the left or right direction for otherwords Secondly if the target term contains more than oneadjacent word all the internal words are assigned as thecardinal point ldquo0rdquo Other words in the sentence will obtainan increasingly positive integer from the start position of theterm to left direction or from the end position of the term toright direction (erefore the position index list[6 5 4 3 2 1 0 0 1 2 3 4 5 6 7] will be obtained if we setthe case sentence and the target term with ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo and ldquointerior decorationrdquo respectively

On top of that if multiple target terms are applied to asentence we can obtain a position index list sequence that iscalled the positionmatrix Assuming that a sentence has n wordsand m target terms let notationPi denote the list index of theith target term in a sentence Position matrix G is defined as

G

P1

P2

Pm

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

isin Rmtimesn (2)

where m refers to the number of target terms and n is thenumber of words in the sentence (en we use a positionembedding matrix P isin Rdptimesn to convert position index se-quence into a position embedding where dp refers to thedimension of position embedding P is initialized randomlyand updated during the model training process (e matrixis further exemplified in the same example ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo which contains two target terms ldquointerior decorationrdquoand ldquochefsrdquo We can obtain the position matrix

G P1

P21113890 1113891

6 5 4 3 2 1 0 0 1 2 3 4 5 6 7

14 13 12 11 10 9 8 7 6 5 4 3 2 1 01113890 1113891

(3)

(e position matrix has been playing an increasinglyimportant role in helping researchers get a better sense ofaspect-level tasks We can first observe the polarity of theemotional words which are near the target term and thenconsider the other words which are far away to judgewhether a sentence is positive or not For example aspresented in the case ldquothe quantity is also very good you willcome out satisfiedrdquo the distance of ldquogoodrdquo (4) is closer thanldquosatisfiedrdquo (9) by a simple numerical comparison accordingto the index list [1 0 1 2 3 4 5 6 7 8 9] and the approachwill give priority to ldquogoodrdquo instead of ldquosatisfiedrdquo when wejudge the sentiment polarity of the subject ldquoquantityrdquo of thesentence (is study suggests that adding position infor-mation to initialize word embedding can provide morefeatures to perform aspect-level sentiment classification

32 Word Representation One of the basic tasks of senti-ment analysis is to present each word in a given sentence byembedding operation A feasible approach is to embed each

4 Discrete Dynamics in Nature and Society

word in a low-dimensional real value vector through theword embedding matrix E isin Rdwtimesv where dw represents thedimension of word embedding and v denotes the size ofvocabulary Matrix E is generally initialized by randomnumber generation operation (en matrix weight will beupdated to reach a stable value in the process of model

training Another feasible method to obtain the matrix E isto pretrain it through the existing corpus [35]

(is study uses pretrained Glove (pretrained wordvectors of Glove can be obtained from httpnlpstanfordeduprojectsglove) from Stanford University to get wordembeddings Four sets of sequence symbols are applied to

Pool

Pool Pool

Pool

AttentionAttention

Attention layer

Hiddenstate layer 2

Hiddenstate layer 1

Sentiment classification

Softmax

Wordembedding layer

Final representation

GRU

GRU

GRU GRU

GRUGRU

GRU

GRU

GRU

GRU

GRU

GRU

GRU

Concatenate

ha21 ha22 ha2m hw21 hw22 hw2n

hw11 hw12 hw1n

GRUGRU

GRUGRU

GRU

GRU

GRUGRU GRU

GRU GRU

ha11 ha12 ha1m

ea1 ea2

ec1 ecn

eam

p1 pn

w1 w2 wn

+ + + + + +

+ + ++++

Aspect-level sentiment analysis using MI-biGRU

Figure 1 Architecture of MI-biGRU

Table 1 Component description of the proposed model

Notation Descriptionwi

c (e ith word in a contextw

ja (e jth word in a target term

pi Relative distance from the ith word to a target term in a sentencewi Concatenation of the ith context and position embeddingas Start position of a target termae End position of a target termei

c Embedding of the ith context in a sentenceei

a Embedding of the ith target term in a sentenceh1i

w First hidden layer vector formed from wi

h1ia First hidden layer vector formed from ei

a

h2iw Second hidden layer vector formed from h1i

w

h2ia Second hidden layer vector formed from h1i

a

Discrete Dynamics in Nature and Society 5

distinguish texts and their embedding form In a sentencethe context and its embedding with n words can be denotedby [w1

c w2c wn

c ] and [e1c e2c enc ] respectively (e m

target terms and their embeddings can be denoted by[w1

a w2a wm

a ] and [e1a e2a ema ] respectively On top of

that context vector [e1c e2c enc ] and obtained position

embedding [p1 p2 pn] will be concatenated to get finalword embedding representation of each word in a sentenceas [w1 w2 wn]

33 Introduction of Gate Recurrent Unit (GRU)Recurrent neural network is a widespread networkemployed in natural language processing in recent yearsOne advantage of RNN is the fact that it can process var-iable-length text sequence and extract key features of asentence However model performance of traditional RNNin the case of long sentences has been mostly restricted bythe problems of gradient disappearance and explosionduring the training process As a result RNN was unable tosend vital information in text back and forth

A great deal of previous research into RNN has focusedon model variants Much of the current scholars paysparticular attention to LSTM models and the GRU modelsBoth LSTM and GRU provide a gate mechanism so that theneural network can reserve the important information andforget those that are less relevant to the current state It hasbeen universally accepted that GRU has the advantages offewer necessary parameters and lower network complexitycompared with LSTM (erefore this paper plans to use theGRU model to extract the key features of word embedding

Details of GRU are illustrated according to the networkstructure shown in Figure 2 (e GRU simplifies four gatemechanisms of LSTM ie input gate output gate forget gateand cell state into two gates that are called reset gate andupdate gate At any time step t GRU includes three param-eters reset gate rt update gate zt and hidden state ht All theparameters are updated according to the following equationszt σ U

zmiddot xt + W

zmiddot htminus1( 1113857

rt σ Ur

middot xt + Wr

middot htminus1( 1113857

1113957ht tanh Uh

middot xt + Wh

middot htminus1 ⊙ rt( 11138571113872 1113873 ht 1 minus zt( 1113857⊙ htminus1 + zt ⊙ 1113957ht

(4)

(e symbolic meaning are described as follows xt de-notes the input word embedding at time t htminus1 representsthe hidden state at time t minus 1 Uz Ur Uh isin Rdwtimesdh andWz Wr Wh isin Rdhtimesdh denote weight matrices where dh

indicates the dimension of hidden state σ and tanh denotethe sigmoid and tanh function respectively Notation middot is thedot product and ⊙ is the elementwise multiplication

In this study we choose bidirectional GRU to obtain thevector representation of a hidden layer for the target termand context which can extract more comprehensive featurescompared with normal GRU (e bidirectional GRU con-sists of a forward hidden layer state h

ti

rarrisin Rdh and a backward

hidden layer state hti

larrisin Rdh First of all this work concat-

enates the hidden states from opposite directions to get the

hidden layer embedding hti [h

ti

rarr h

ti

larr] isin R2dh Some vital

features of a sentence that we required are included in hti On

top of that two hidden layer embeddings h1ia (1le ilem) and

h1jw (1le jle n) can be obtained by processing the target term

and context with bidirectional GRU where m and n denotesthe number of target terms and context in a sentence re-spectively Let notations h2i

a (1le ilem) and h2jw (1le jle n)

denote the hidden layer embeddings induced by h1ia and h

1jw

respectively Finally we can iteratively put obtained vectors[h11

a h12a h1m

a ] and [h11w h12

w h1nw ] into bidirectional

GRU and generate the following hidden layer embeddings[h21

a h22a h2m

a ] and [h21w h22

w h2nw ] respectively (e

latter carries higher level features than previous embeddings

34 Position-Based Multilevel Interactive Bidirectional At-tentionNetwork Interaction information between target termsand context is particularly useful in studying sentiment clas-sification (is study performs mean pooling on hidden layerembeddings [h11

a h12a h1m

a ] [h21a h22

a h2ma ]

[h11w h12

w h1nw ] and [h21

w h22w h2n

w ] in order to get in-teractive information respectively (e pooled results are thenaveraged as a part of input for the attentionmechanism and thefinal averaged embedding contains most of semantic features(e formulas used for average pooling are described as follows

wavg1 1113936

ni1h

1iw

n

aavg1 1113936

mi1h

1ia

m

wavg2 1113936

ni1h

2iw

n

aavg2 1113936

mi1h

2ia

m

wavg wavg1 + wavg21113872 1113873

2

aavg aavg1 + aavg21113872 1113873

2

(5)

σ σ

htndash1

rt zt ht

ht

1ndash

tanh

xt

~

Figure 2 Representation of GRU structure diagram

6 Discrete Dynamics in Nature and Society

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 2: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

proposes a multilevel interactive bidirectional attentionnetwork model integrating bidirectional GRU and positioninformation to improve the accuracy of aspect-level senti-ment predictions

Traditional published methods for processing aspect-level tasks are limited to the selection of feature sets (efocus of these studies such as bag-of-words and sentimentlexicons [5] is to manually label a large number of featuresScholars have long debated the waste of labour on manualmarking However the existing studies indicate that thequality of training models largely depends on the con-structed labelled featured set Recently investigators haveexamined the effects of deep learning compared with tra-ditional manual generation methods in NLP tasks [6 7] (eformer has a clear advantage

Recurrent neural network (RNN) can extract the es-sential features of word embedding by using a multilevelrecurrent mechanism and then generate a vector repre-sentation of the target sentences Most of sentiment clas-sification models using RNN can achieve acceptable resultsthrough well-established tuning steps [8] More recent at-tention on the sentiment classification tasks has focused onthe provision of RNN variants (e first method to improvethe models is to adjust their structures For example target-dependent long short term memory (TD-LSTM) [9] candivide a context into left and right parts according to targetterms (en the hidden states to deal with aspect-level tasksare generated by combining two LSTM models in structure(e second is characterized by a change in the input of themodels For example there are methods to associate thetarget term vectors with the context vectors as the wholeinput of the LSTM model which can realize aspect-leveltasks by enhancing the semantic features of the words [10]

Research on the method of neural networks on aspect-level tasks has been mostly restricted to limited performanceimprovement However few studies have been able to drawon systematic research into the importance of the words in asentence In other words we cannot effectively identifywhich words in a sentence are more indispensable andcannot accurately locate these key words in aspect-leveltasks Fortunately attention mechanisms which are widelyused in machine translation [11] image recognition [12]and reading comprehension [13 14] can solve this problemAttention mechanisms [15] can be utilized to measure theimportance of each word in a context to the target terms inwhich attentions are ultimately expressed as weight score(e model will focus more attention on the words with highweight score and extract more information from the wordsrelated to the target terms thus improving the performanceof classification Some scholars have invested in this domainand achieved excellent results such as AE-LSTM ATAE-LSTM [16] MemNet [17] and IAN [18] However theinfluence of position parameters of target terms on classi-fication performance has remained unclear [19 20] (isindicates a need to understand the actual contribution of theposition information

Researchers review that the sentiment polarity of a targetterm contained in a sentence is related to the context aroundit but not to those with greater distant A well-constructed

aspect-level model should allocate higher weight score to thecontexts that possess closer relative distance to the targetterm (e idea can be illustrated briefly by the followingsentence ldquothey use fancy ingredients but even fancy in-gredients do not make for good pizza unless someone knowshow to get the crust rightrdquo In this case the polarity ispositive negative and neutral when the target term is settingas the word ldquoingredientsrdquo ldquopizzardquo and ldquocrustrdquo respectivelyIn order to decide the polarity we should intuitively con-centrate on the words that are close to the target one andthen consider the other words far away from it Hence theword ldquofancyrdquo in the case compared to other words such asldquogoodrdquo and ldquogetrdquo will make a greater contribution to de-termine the polarity of target term ldquoingredientsrdquo Conse-quently adding position features can enrich word semanticsin the embedding process (is work attempts to illuminatethe fact that a model with position information can learnmore sentence features on aspect-level tasks

(is study proposes an improved aspect-level classifi-cation model by combining multilevel interactive bidirec-tional Gated Recurrent Unit attention mechanisms andposition features (MI-biGRU) (e function of the modelconsists of three parts

(1) Calculate the positional index of each word in thesentence based on the current target term and ex-press it in an embedding

(2) Extract semantic features of target words and contextusing multilevel bidirectional GRU neural network

(3) Use bidirectional attention mechanism to obtain theweight score matrix of hidden states and determinerelevance of each context to the target word

(e model not only extracts the abstract semantic fea-tures of sentences but also calculates the position features ofwords in parallel by a multilevel structure A vector rep-resentation with more features can be obtained according tothe bidirectional attention mechanism which can enhancethe performance of sentiment classification tasks On top ofthat bidirectional embeddings can be brought together totackle accurate sentiment classification at the fine-grainedlevel Finally the effectiveness of this model will be evaluatedby using four public aspect-level sentiment datasets (eexperimental results show that the proposed model canachieve good sentiment discrimination performance at as-pect-level on all datasets

(is paper is organized as follows Section 2 introducesthe related work Section 3 formulates the improved modelMI-biGRU that is composed of multilevel interactive bidi-rectional Gated Recurrent Unit attention mechanisms andposition features Section 4 deals with experiments andreporting results on aspect-level sentiment analysis Con-clusions and future work are presented in Section 5

2 Related Work

(is section introduces the development of sentimentanalysis in recent years (e general research can be dividedinto three parts traditional sentiment analysis methods

2 Discrete Dynamics in Nature and Society

neural network-based methods and applications of atten-tion mechanism in aspect-level tasks

21 Traditional Sentiment Analysis Methods Existing tra-ditional methods on sentiment classification are extensiveand focus particularly on machine learning technologieswhich solve two problems text representation and featureextraction First of all several studies have used supportvector machines (SVM) to deal with text representationresearch studies in sentiment classification tasks [21]According to the formulation of SVM all the words of thetext do not make a distinction between target terms andnormal context (ere are other relatively text representa-tion methods of the literature that is concerned with sen-timent words [22 23] tokens [24] or dependency pathdistance [24] Above methods are called coarse-grainedclassifications On top of that the majority of studies onfeature extraction have obtained the sentiment lexicon andbag-of-words features [25ndash27] (ese methods have beenplaying an increasingly important role in improving theperformance of classification Yet these existing types ofapproaches have given rise to a lot of heated debate (emodel training is heavily dependent on the features weextract Manually labelling features will inevitably take a lotof manpower and time resources As a result the classifi-cation performance is low because of the high dimension ofuseful information if the features are obtained from anunlabelled text

22 Neural Network-Based Sentiment Analysis MethodsUsing neural network-based methods has become increas-ingly popular among the sentiment classification tasks fortheir flexible structure and pleasant performance [28] Forexample all the models such as Recursive Neural Networks[29] Recursive Neural Tensor Networks [30] Tree-LSTMs[31] and Hierarchical LSTMs [32] more or less enhance theaccuracy of sentiment classification by different constructivemodel structures (e models above improve the accuracycompared with traditional machine learning However re-searchers have come to recognize their inadequacies Ig-noring to distinguish the target terms of a sentence willgreatly decrease the classification effect (erefore somescholars in academia have turned their attention to the targetterms Jiang et al performed a similar series of experimentsto show the significance of target terms for sentimentclassification tasks [5] Tang et al reviewed the literaturefrom the period and proposed two improved models TD-LSTM and TC-LSTM which can deal with the problem ofautomatic target term extraction context feature enhance-ment and classification performance improvement Zhanget al conducted a series of trials on sentiment analysis inwhich they constructed a neural network model with twogate mechanisms [33] (e mechanisms implement thefunctions of extracting grammatical and semantic infor-mation and the relationship information between the leftand right context for a target term respectively Finallyinformation extracted by two gate mechanisms is aggregatedfor sentiment classification Overall these studies highlight

the need for target terms but there is no reference to theposition of such terms and the relationship between positioninformation and classification performance

23 Application of Attention Mechanism in Aspect-LevelSentiment Analysis Deep learning technologies are originallyapplied in the field of images which have gradually turned toNLP area and achieved excellent results Attention mechanismsin deep learning are necessarily serve as an effective way tohighly accurate sentiment classification A fewNLP researcherswho have surveyed the intrinsic relevance between the contextand the target terms in sentences have been found For ex-ample Zeng et al designed an attention-based LSTM for aspect-level sentiment classification [10] which processes target termembedding and word embedding in the pretraining step si-multaneously (en the target term vector will be put into anattention network to calculate the term weight More recentattention has focused on aspect-level tasks with similar attentionmechanisms Tang et al designed a deep memory network [17]with reporting multiple computational layers Each layer is acontext-based attention model through which the relationshipweight from context to target terms can be obtained Ma et alsuggest the deep semantic association between context andtarget terms by proposing an interactive attention model [18](is model obtains the two-way weight and combines them toperform aspect-level sentiment classification

Most of the improved neural network models can achievebetter results compared with the original one However thesemethods ignore the role of position relationship betweencontext and target terms As a result the polarity of the targetterms must be affected under certain positional relationships(e study of Zeng et al subsequent offers some importantinsights into the application of position information in clas-sification tasks [34] For example understanding the distancebetween context and target term and how to present suchdistance by embedding representationwill help our aspect-levelwork (e work of Gu et al uses a position-aware bidirectionalattention network to investigate aspect-level sentiment analysis[20] which provides rich semantic features in embeddingwords expression

As noted above interactive network or position informationis particularly useful in studying aspect-level sentiment analysisSo far little attention has been paid to both of them simulta-neously Hence a combination of both interactive concept andposition parameter was used in this investigation (is studyproposes an improved aspect-level sentiment analysis model bycombining multilevel interactive bidirectional Gated RecurrentUnit attention mechanisms and position features (MI-biGRU)First of all the distance from the context to target term in ourmodel will be prepared according to the similar procedure usedby the scholars Zeng et al [34] On top of that the wordembedding with position information will be trained by using amultilevel bidirectional GRU neural network Finally a bidi-rectional interactive attentionmechanism is used to compute theweight matrix to identify the context possessing the semanticassociation with the target termMI-biGRU can perform amoreclassification accuracy for aspect-level tasks than previousmodels which will be shown in Section 4

Discrete Dynamics in Nature and Society 3

3 Model Description

(is section presents the details of the model MI-biGRU foraspect-level sentiment analysis In the previous work severaldefinitions with symbolic differences have been proposed forsentiment analysis Hence we need to provide the basicconcepts and notations of MI-biGRU classification involvedin this paper

A sentiment tricategory problem represented in MI-biGRU is associated with a three-tuple polarity set (positivenegative and neutral) Given a sentence with n words in-cluding context and target terms A target term is usuallydenoted by the word group composed of one or more ad-jacent words in context where the position of the first andlast word in the word group is called the start and endposition respectively (e target term embedding sequencecan be denoted by [e1a e2a em

a ] with m predeterminedtarget terms Notation [p1 p2 pn] represents the rela-tive distance embedding from each word wi

c i isin 1 2 nof a sentence to a target term

(e overall architecture of MI-biGRU can be illustratedin Figure 1 (is model possesses two inputs One is a wordembedding sequence [w1 w2 wn] that consists of acontext embedding sequence [e1c e2c en

c ] and a positionembedding sequence [p1 p2 pn] (e other is a targetterm embedding sequence [e1a e2a em

a ] (e goal of themodel is to extract enough semantic information fromtwo embedding sequences and combine them to performaspect-level sentiment classification Notations employed torepresent the components of MI-biGRU are described inTable 1 (e details of the model are divided into six stepsbased on their execution order

31 Position Representation Aspect-level tasks havebenefited a lot from position embedding representation formore valuable word features [34] (e concept of relativedistance between words serves to quantify the relevance of asentence word to a target term We are required to representthe position information by a embeddable vector patternwhich can be formalized as an integer vector or a matrixdepending on whether there concerns unique or multipletarget terms

First of all the word position index of a target term in asentence will be marked as the cardinal point ldquo0rdquo Discretespacing from the ith word wi

c in a sentence to the cardinalpoint is called the relative distance of wi

c which is denoted bypi and can be calculated by the formula

pi

as minus i1113868111386811138681113868

1113868111386811138681113868 i lt as

0 as lt ilt ae

ae minus i1113868111386811138681113868

1113868111386811138681113868 i gt ae

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(1)

Extending this concept in a sentence with n words givesus the following position index list P [p1 p2 pn](is can be illustrated briefly by the following two examplesFirstly if the unique word ldquoquantityrdquo is applied as the targetterm in the sentence ldquothe quantity is also very good you willcome out satisfiedrdquo we can develop the position index list

[1 0 1 2 3 4 5 6 7 8 9] by setting a cardinal point ldquo0rdquo forthe second word ldquoquantityrdquo and deriving an increasinglypositive integer from the left or right direction for otherwords Secondly if the target term contains more than oneadjacent word all the internal words are assigned as thecardinal point ldquo0rdquo Other words in the sentence will obtainan increasingly positive integer from the start position of theterm to left direction or from the end position of the term toright direction (erefore the position index list[6 5 4 3 2 1 0 0 1 2 3 4 5 6 7] will be obtained if we setthe case sentence and the target term with ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo and ldquointerior decorationrdquo respectively

On top of that if multiple target terms are applied to asentence we can obtain a position index list sequence that iscalled the positionmatrix Assuming that a sentence has n wordsand m target terms let notationPi denote the list index of theith target term in a sentence Position matrix G is defined as

G

P1

P2

Pm

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

isin Rmtimesn (2)

where m refers to the number of target terms and n is thenumber of words in the sentence (en we use a positionembedding matrix P isin Rdptimesn to convert position index se-quence into a position embedding where dp refers to thedimension of position embedding P is initialized randomlyand updated during the model training process (e matrixis further exemplified in the same example ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo which contains two target terms ldquointerior decorationrdquoand ldquochefsrdquo We can obtain the position matrix

G P1

P21113890 1113891

6 5 4 3 2 1 0 0 1 2 3 4 5 6 7

14 13 12 11 10 9 8 7 6 5 4 3 2 1 01113890 1113891

(3)

(e position matrix has been playing an increasinglyimportant role in helping researchers get a better sense ofaspect-level tasks We can first observe the polarity of theemotional words which are near the target term and thenconsider the other words which are far away to judgewhether a sentence is positive or not For example aspresented in the case ldquothe quantity is also very good you willcome out satisfiedrdquo the distance of ldquogoodrdquo (4) is closer thanldquosatisfiedrdquo (9) by a simple numerical comparison accordingto the index list [1 0 1 2 3 4 5 6 7 8 9] and the approachwill give priority to ldquogoodrdquo instead of ldquosatisfiedrdquo when wejudge the sentiment polarity of the subject ldquoquantityrdquo of thesentence (is study suggests that adding position infor-mation to initialize word embedding can provide morefeatures to perform aspect-level sentiment classification

32 Word Representation One of the basic tasks of senti-ment analysis is to present each word in a given sentence byembedding operation A feasible approach is to embed each

4 Discrete Dynamics in Nature and Society

word in a low-dimensional real value vector through theword embedding matrix E isin Rdwtimesv where dw represents thedimension of word embedding and v denotes the size ofvocabulary Matrix E is generally initialized by randomnumber generation operation (en matrix weight will beupdated to reach a stable value in the process of model

training Another feasible method to obtain the matrix E isto pretrain it through the existing corpus [35]

(is study uses pretrained Glove (pretrained wordvectors of Glove can be obtained from httpnlpstanfordeduprojectsglove) from Stanford University to get wordembeddings Four sets of sequence symbols are applied to

Pool

Pool Pool

Pool

AttentionAttention

Attention layer

Hiddenstate layer 2

Hiddenstate layer 1

Sentiment classification

Softmax

Wordembedding layer

Final representation

GRU

GRU

GRU GRU

GRUGRU

GRU

GRU

GRU

GRU

GRU

GRU

GRU

Concatenate

ha21 ha22 ha2m hw21 hw22 hw2n

hw11 hw12 hw1n

GRUGRU

GRUGRU

GRU

GRU

GRUGRU GRU

GRU GRU

ha11 ha12 ha1m

ea1 ea2

ec1 ecn

eam

p1 pn

w1 w2 wn

+ + + + + +

+ + ++++

Aspect-level sentiment analysis using MI-biGRU

Figure 1 Architecture of MI-biGRU

Table 1 Component description of the proposed model

Notation Descriptionwi

c (e ith word in a contextw

ja (e jth word in a target term

pi Relative distance from the ith word to a target term in a sentencewi Concatenation of the ith context and position embeddingas Start position of a target termae End position of a target termei

c Embedding of the ith context in a sentenceei

a Embedding of the ith target term in a sentenceh1i

w First hidden layer vector formed from wi

h1ia First hidden layer vector formed from ei

a

h2iw Second hidden layer vector formed from h1i

w

h2ia Second hidden layer vector formed from h1i

a

Discrete Dynamics in Nature and Society 5

distinguish texts and their embedding form In a sentencethe context and its embedding with n words can be denotedby [w1

c w2c wn

c ] and [e1c e2c enc ] respectively (e m

target terms and their embeddings can be denoted by[w1

a w2a wm

a ] and [e1a e2a ema ] respectively On top of

that context vector [e1c e2c enc ] and obtained position

embedding [p1 p2 pn] will be concatenated to get finalword embedding representation of each word in a sentenceas [w1 w2 wn]

33 Introduction of Gate Recurrent Unit (GRU)Recurrent neural network is a widespread networkemployed in natural language processing in recent yearsOne advantage of RNN is the fact that it can process var-iable-length text sequence and extract key features of asentence However model performance of traditional RNNin the case of long sentences has been mostly restricted bythe problems of gradient disappearance and explosionduring the training process As a result RNN was unable tosend vital information in text back and forth

A great deal of previous research into RNN has focusedon model variants Much of the current scholars paysparticular attention to LSTM models and the GRU modelsBoth LSTM and GRU provide a gate mechanism so that theneural network can reserve the important information andforget those that are less relevant to the current state It hasbeen universally accepted that GRU has the advantages offewer necessary parameters and lower network complexitycompared with LSTM (erefore this paper plans to use theGRU model to extract the key features of word embedding

Details of GRU are illustrated according to the networkstructure shown in Figure 2 (e GRU simplifies four gatemechanisms of LSTM ie input gate output gate forget gateand cell state into two gates that are called reset gate andupdate gate At any time step t GRU includes three param-eters reset gate rt update gate zt and hidden state ht All theparameters are updated according to the following equationszt σ U

zmiddot xt + W

zmiddot htminus1( 1113857

rt σ Ur

middot xt + Wr

middot htminus1( 1113857

1113957ht tanh Uh

middot xt + Wh

middot htminus1 ⊙ rt( 11138571113872 1113873 ht 1 minus zt( 1113857⊙ htminus1 + zt ⊙ 1113957ht

(4)

(e symbolic meaning are described as follows xt de-notes the input word embedding at time t htminus1 representsthe hidden state at time t minus 1 Uz Ur Uh isin Rdwtimesdh andWz Wr Wh isin Rdhtimesdh denote weight matrices where dh

indicates the dimension of hidden state σ and tanh denotethe sigmoid and tanh function respectively Notation middot is thedot product and ⊙ is the elementwise multiplication

In this study we choose bidirectional GRU to obtain thevector representation of a hidden layer for the target termand context which can extract more comprehensive featurescompared with normal GRU (e bidirectional GRU con-sists of a forward hidden layer state h

ti

rarrisin Rdh and a backward

hidden layer state hti

larrisin Rdh First of all this work concat-

enates the hidden states from opposite directions to get the

hidden layer embedding hti [h

ti

rarr h

ti

larr] isin R2dh Some vital

features of a sentence that we required are included in hti On

top of that two hidden layer embeddings h1ia (1le ilem) and

h1jw (1le jle n) can be obtained by processing the target term

and context with bidirectional GRU where m and n denotesthe number of target terms and context in a sentence re-spectively Let notations h2i

a (1le ilem) and h2jw (1le jle n)

denote the hidden layer embeddings induced by h1ia and h

1jw

respectively Finally we can iteratively put obtained vectors[h11

a h12a h1m

a ] and [h11w h12

w h1nw ] into bidirectional

GRU and generate the following hidden layer embeddings[h21

a h22a h2m

a ] and [h21w h22

w h2nw ] respectively (e

latter carries higher level features than previous embeddings

34 Position-Based Multilevel Interactive Bidirectional At-tentionNetwork Interaction information between target termsand context is particularly useful in studying sentiment clas-sification (is study performs mean pooling on hidden layerembeddings [h11

a h12a h1m

a ] [h21a h22

a h2ma ]

[h11w h12

w h1nw ] and [h21

w h22w h2n

w ] in order to get in-teractive information respectively (e pooled results are thenaveraged as a part of input for the attentionmechanism and thefinal averaged embedding contains most of semantic features(e formulas used for average pooling are described as follows

wavg1 1113936

ni1h

1iw

n

aavg1 1113936

mi1h

1ia

m

wavg2 1113936

ni1h

2iw

n

aavg2 1113936

mi1h

2ia

m

wavg wavg1 + wavg21113872 1113873

2

aavg aavg1 + aavg21113872 1113873

2

(5)

σ σ

htndash1

rt zt ht

ht

1ndash

tanh

xt

~

Figure 2 Representation of GRU structure diagram

6 Discrete Dynamics in Nature and Society

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 3: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

neural network-based methods and applications of atten-tion mechanism in aspect-level tasks

21 Traditional Sentiment Analysis Methods Existing tra-ditional methods on sentiment classification are extensiveand focus particularly on machine learning technologieswhich solve two problems text representation and featureextraction First of all several studies have used supportvector machines (SVM) to deal with text representationresearch studies in sentiment classification tasks [21]According to the formulation of SVM all the words of thetext do not make a distinction between target terms andnormal context (ere are other relatively text representa-tion methods of the literature that is concerned with sen-timent words [22 23] tokens [24] or dependency pathdistance [24] Above methods are called coarse-grainedclassifications On top of that the majority of studies onfeature extraction have obtained the sentiment lexicon andbag-of-words features [25ndash27] (ese methods have beenplaying an increasingly important role in improving theperformance of classification Yet these existing types ofapproaches have given rise to a lot of heated debate (emodel training is heavily dependent on the features weextract Manually labelling features will inevitably take a lotof manpower and time resources As a result the classifi-cation performance is low because of the high dimension ofuseful information if the features are obtained from anunlabelled text

22 Neural Network-Based Sentiment Analysis MethodsUsing neural network-based methods has become increas-ingly popular among the sentiment classification tasks fortheir flexible structure and pleasant performance [28] Forexample all the models such as Recursive Neural Networks[29] Recursive Neural Tensor Networks [30] Tree-LSTMs[31] and Hierarchical LSTMs [32] more or less enhance theaccuracy of sentiment classification by different constructivemodel structures (e models above improve the accuracycompared with traditional machine learning However re-searchers have come to recognize their inadequacies Ig-noring to distinguish the target terms of a sentence willgreatly decrease the classification effect (erefore somescholars in academia have turned their attention to the targetterms Jiang et al performed a similar series of experimentsto show the significance of target terms for sentimentclassification tasks [5] Tang et al reviewed the literaturefrom the period and proposed two improved models TD-LSTM and TC-LSTM which can deal with the problem ofautomatic target term extraction context feature enhance-ment and classification performance improvement Zhanget al conducted a series of trials on sentiment analysis inwhich they constructed a neural network model with twogate mechanisms [33] (e mechanisms implement thefunctions of extracting grammatical and semantic infor-mation and the relationship information between the leftand right context for a target term respectively Finallyinformation extracted by two gate mechanisms is aggregatedfor sentiment classification Overall these studies highlight

the need for target terms but there is no reference to theposition of such terms and the relationship between positioninformation and classification performance

23 Application of Attention Mechanism in Aspect-LevelSentiment Analysis Deep learning technologies are originallyapplied in the field of images which have gradually turned toNLP area and achieved excellent results Attention mechanismsin deep learning are necessarily serve as an effective way tohighly accurate sentiment classification A fewNLP researcherswho have surveyed the intrinsic relevance between the contextand the target terms in sentences have been found For ex-ample Zeng et al designed an attention-based LSTM for aspect-level sentiment classification [10] which processes target termembedding and word embedding in the pretraining step si-multaneously (en the target term vector will be put into anattention network to calculate the term weight More recentattention has focused on aspect-level tasks with similar attentionmechanisms Tang et al designed a deep memory network [17]with reporting multiple computational layers Each layer is acontext-based attention model through which the relationshipweight from context to target terms can be obtained Ma et alsuggest the deep semantic association between context andtarget terms by proposing an interactive attention model [18](is model obtains the two-way weight and combines them toperform aspect-level sentiment classification

Most of the improved neural network models can achievebetter results compared with the original one However thesemethods ignore the role of position relationship betweencontext and target terms As a result the polarity of the targetterms must be affected under certain positional relationships(e study of Zeng et al subsequent offers some importantinsights into the application of position information in clas-sification tasks [34] For example understanding the distancebetween context and target term and how to present suchdistance by embedding representationwill help our aspect-levelwork (e work of Gu et al uses a position-aware bidirectionalattention network to investigate aspect-level sentiment analysis[20] which provides rich semantic features in embeddingwords expression

As noted above interactive network or position informationis particularly useful in studying aspect-level sentiment analysisSo far little attention has been paid to both of them simulta-neously Hence a combination of both interactive concept andposition parameter was used in this investigation (is studyproposes an improved aspect-level sentiment analysis model bycombining multilevel interactive bidirectional Gated RecurrentUnit attention mechanisms and position features (MI-biGRU)First of all the distance from the context to target term in ourmodel will be prepared according to the similar procedure usedby the scholars Zeng et al [34] On top of that the wordembedding with position information will be trained by using amultilevel bidirectional GRU neural network Finally a bidi-rectional interactive attentionmechanism is used to compute theweight matrix to identify the context possessing the semanticassociation with the target termMI-biGRU can perform amoreclassification accuracy for aspect-level tasks than previousmodels which will be shown in Section 4

Discrete Dynamics in Nature and Society 3

3 Model Description

(is section presents the details of the model MI-biGRU foraspect-level sentiment analysis In the previous work severaldefinitions with symbolic differences have been proposed forsentiment analysis Hence we need to provide the basicconcepts and notations of MI-biGRU classification involvedin this paper

A sentiment tricategory problem represented in MI-biGRU is associated with a three-tuple polarity set (positivenegative and neutral) Given a sentence with n words in-cluding context and target terms A target term is usuallydenoted by the word group composed of one or more ad-jacent words in context where the position of the first andlast word in the word group is called the start and endposition respectively (e target term embedding sequencecan be denoted by [e1a e2a em

a ] with m predeterminedtarget terms Notation [p1 p2 pn] represents the rela-tive distance embedding from each word wi

c i isin 1 2 nof a sentence to a target term

(e overall architecture of MI-biGRU can be illustratedin Figure 1 (is model possesses two inputs One is a wordembedding sequence [w1 w2 wn] that consists of acontext embedding sequence [e1c e2c en

c ] and a positionembedding sequence [p1 p2 pn] (e other is a targetterm embedding sequence [e1a e2a em

a ] (e goal of themodel is to extract enough semantic information fromtwo embedding sequences and combine them to performaspect-level sentiment classification Notations employed torepresent the components of MI-biGRU are described inTable 1 (e details of the model are divided into six stepsbased on their execution order

31 Position Representation Aspect-level tasks havebenefited a lot from position embedding representation formore valuable word features [34] (e concept of relativedistance between words serves to quantify the relevance of asentence word to a target term We are required to representthe position information by a embeddable vector patternwhich can be formalized as an integer vector or a matrixdepending on whether there concerns unique or multipletarget terms

First of all the word position index of a target term in asentence will be marked as the cardinal point ldquo0rdquo Discretespacing from the ith word wi

c in a sentence to the cardinalpoint is called the relative distance of wi

c which is denoted bypi and can be calculated by the formula

pi

as minus i1113868111386811138681113868

1113868111386811138681113868 i lt as

0 as lt ilt ae

ae minus i1113868111386811138681113868

1113868111386811138681113868 i gt ae

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(1)

Extending this concept in a sentence with n words givesus the following position index list P [p1 p2 pn](is can be illustrated briefly by the following two examplesFirstly if the unique word ldquoquantityrdquo is applied as the targetterm in the sentence ldquothe quantity is also very good you willcome out satisfiedrdquo we can develop the position index list

[1 0 1 2 3 4 5 6 7 8 9] by setting a cardinal point ldquo0rdquo forthe second word ldquoquantityrdquo and deriving an increasinglypositive integer from the left or right direction for otherwords Secondly if the target term contains more than oneadjacent word all the internal words are assigned as thecardinal point ldquo0rdquo Other words in the sentence will obtainan increasingly positive integer from the start position of theterm to left direction or from the end position of the term toright direction (erefore the position index list[6 5 4 3 2 1 0 0 1 2 3 4 5 6 7] will be obtained if we setthe case sentence and the target term with ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo and ldquointerior decorationrdquo respectively

On top of that if multiple target terms are applied to asentence we can obtain a position index list sequence that iscalled the positionmatrix Assuming that a sentence has n wordsand m target terms let notationPi denote the list index of theith target term in a sentence Position matrix G is defined as

G

P1

P2

Pm

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

isin Rmtimesn (2)

where m refers to the number of target terms and n is thenumber of words in the sentence (en we use a positionembedding matrix P isin Rdptimesn to convert position index se-quence into a position embedding where dp refers to thedimension of position embedding P is initialized randomlyand updated during the model training process (e matrixis further exemplified in the same example ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo which contains two target terms ldquointerior decorationrdquoand ldquochefsrdquo We can obtain the position matrix

G P1

P21113890 1113891

6 5 4 3 2 1 0 0 1 2 3 4 5 6 7

14 13 12 11 10 9 8 7 6 5 4 3 2 1 01113890 1113891

(3)

(e position matrix has been playing an increasinglyimportant role in helping researchers get a better sense ofaspect-level tasks We can first observe the polarity of theemotional words which are near the target term and thenconsider the other words which are far away to judgewhether a sentence is positive or not For example aspresented in the case ldquothe quantity is also very good you willcome out satisfiedrdquo the distance of ldquogoodrdquo (4) is closer thanldquosatisfiedrdquo (9) by a simple numerical comparison accordingto the index list [1 0 1 2 3 4 5 6 7 8 9] and the approachwill give priority to ldquogoodrdquo instead of ldquosatisfiedrdquo when wejudge the sentiment polarity of the subject ldquoquantityrdquo of thesentence (is study suggests that adding position infor-mation to initialize word embedding can provide morefeatures to perform aspect-level sentiment classification

32 Word Representation One of the basic tasks of senti-ment analysis is to present each word in a given sentence byembedding operation A feasible approach is to embed each

4 Discrete Dynamics in Nature and Society

word in a low-dimensional real value vector through theword embedding matrix E isin Rdwtimesv where dw represents thedimension of word embedding and v denotes the size ofvocabulary Matrix E is generally initialized by randomnumber generation operation (en matrix weight will beupdated to reach a stable value in the process of model

training Another feasible method to obtain the matrix E isto pretrain it through the existing corpus [35]

(is study uses pretrained Glove (pretrained wordvectors of Glove can be obtained from httpnlpstanfordeduprojectsglove) from Stanford University to get wordembeddings Four sets of sequence symbols are applied to

Pool

Pool Pool

Pool

AttentionAttention

Attention layer

Hiddenstate layer 2

Hiddenstate layer 1

Sentiment classification

Softmax

Wordembedding layer

Final representation

GRU

GRU

GRU GRU

GRUGRU

GRU

GRU

GRU

GRU

GRU

GRU

GRU

Concatenate

ha21 ha22 ha2m hw21 hw22 hw2n

hw11 hw12 hw1n

GRUGRU

GRUGRU

GRU

GRU

GRUGRU GRU

GRU GRU

ha11 ha12 ha1m

ea1 ea2

ec1 ecn

eam

p1 pn

w1 w2 wn

+ + + + + +

+ + ++++

Aspect-level sentiment analysis using MI-biGRU

Figure 1 Architecture of MI-biGRU

Table 1 Component description of the proposed model

Notation Descriptionwi

c (e ith word in a contextw

ja (e jth word in a target term

pi Relative distance from the ith word to a target term in a sentencewi Concatenation of the ith context and position embeddingas Start position of a target termae End position of a target termei

c Embedding of the ith context in a sentenceei

a Embedding of the ith target term in a sentenceh1i

w First hidden layer vector formed from wi

h1ia First hidden layer vector formed from ei

a

h2iw Second hidden layer vector formed from h1i

w

h2ia Second hidden layer vector formed from h1i

a

Discrete Dynamics in Nature and Society 5

distinguish texts and their embedding form In a sentencethe context and its embedding with n words can be denotedby [w1

c w2c wn

c ] and [e1c e2c enc ] respectively (e m

target terms and their embeddings can be denoted by[w1

a w2a wm

a ] and [e1a e2a ema ] respectively On top of

that context vector [e1c e2c enc ] and obtained position

embedding [p1 p2 pn] will be concatenated to get finalword embedding representation of each word in a sentenceas [w1 w2 wn]

33 Introduction of Gate Recurrent Unit (GRU)Recurrent neural network is a widespread networkemployed in natural language processing in recent yearsOne advantage of RNN is the fact that it can process var-iable-length text sequence and extract key features of asentence However model performance of traditional RNNin the case of long sentences has been mostly restricted bythe problems of gradient disappearance and explosionduring the training process As a result RNN was unable tosend vital information in text back and forth

A great deal of previous research into RNN has focusedon model variants Much of the current scholars paysparticular attention to LSTM models and the GRU modelsBoth LSTM and GRU provide a gate mechanism so that theneural network can reserve the important information andforget those that are less relevant to the current state It hasbeen universally accepted that GRU has the advantages offewer necessary parameters and lower network complexitycompared with LSTM (erefore this paper plans to use theGRU model to extract the key features of word embedding

Details of GRU are illustrated according to the networkstructure shown in Figure 2 (e GRU simplifies four gatemechanisms of LSTM ie input gate output gate forget gateand cell state into two gates that are called reset gate andupdate gate At any time step t GRU includes three param-eters reset gate rt update gate zt and hidden state ht All theparameters are updated according to the following equationszt σ U

zmiddot xt + W

zmiddot htminus1( 1113857

rt σ Ur

middot xt + Wr

middot htminus1( 1113857

1113957ht tanh Uh

middot xt + Wh

middot htminus1 ⊙ rt( 11138571113872 1113873 ht 1 minus zt( 1113857⊙ htminus1 + zt ⊙ 1113957ht

(4)

(e symbolic meaning are described as follows xt de-notes the input word embedding at time t htminus1 representsthe hidden state at time t minus 1 Uz Ur Uh isin Rdwtimesdh andWz Wr Wh isin Rdhtimesdh denote weight matrices where dh

indicates the dimension of hidden state σ and tanh denotethe sigmoid and tanh function respectively Notation middot is thedot product and ⊙ is the elementwise multiplication

In this study we choose bidirectional GRU to obtain thevector representation of a hidden layer for the target termand context which can extract more comprehensive featurescompared with normal GRU (e bidirectional GRU con-sists of a forward hidden layer state h

ti

rarrisin Rdh and a backward

hidden layer state hti

larrisin Rdh First of all this work concat-

enates the hidden states from opposite directions to get the

hidden layer embedding hti [h

ti

rarr h

ti

larr] isin R2dh Some vital

features of a sentence that we required are included in hti On

top of that two hidden layer embeddings h1ia (1le ilem) and

h1jw (1le jle n) can be obtained by processing the target term

and context with bidirectional GRU where m and n denotesthe number of target terms and context in a sentence re-spectively Let notations h2i

a (1le ilem) and h2jw (1le jle n)

denote the hidden layer embeddings induced by h1ia and h

1jw

respectively Finally we can iteratively put obtained vectors[h11

a h12a h1m

a ] and [h11w h12

w h1nw ] into bidirectional

GRU and generate the following hidden layer embeddings[h21

a h22a h2m

a ] and [h21w h22

w h2nw ] respectively (e

latter carries higher level features than previous embeddings

34 Position-Based Multilevel Interactive Bidirectional At-tentionNetwork Interaction information between target termsand context is particularly useful in studying sentiment clas-sification (is study performs mean pooling on hidden layerembeddings [h11

a h12a h1m

a ] [h21a h22

a h2ma ]

[h11w h12

w h1nw ] and [h21

w h22w h2n

w ] in order to get in-teractive information respectively (e pooled results are thenaveraged as a part of input for the attentionmechanism and thefinal averaged embedding contains most of semantic features(e formulas used for average pooling are described as follows

wavg1 1113936

ni1h

1iw

n

aavg1 1113936

mi1h

1ia

m

wavg2 1113936

ni1h

2iw

n

aavg2 1113936

mi1h

2ia

m

wavg wavg1 + wavg21113872 1113873

2

aavg aavg1 + aavg21113872 1113873

2

(5)

σ σ

htndash1

rt zt ht

ht

1ndash

tanh

xt

~

Figure 2 Representation of GRU structure diagram

6 Discrete Dynamics in Nature and Society

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 4: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

3 Model Description

(is section presents the details of the model MI-biGRU foraspect-level sentiment analysis In the previous work severaldefinitions with symbolic differences have been proposed forsentiment analysis Hence we need to provide the basicconcepts and notations of MI-biGRU classification involvedin this paper

A sentiment tricategory problem represented in MI-biGRU is associated with a three-tuple polarity set (positivenegative and neutral) Given a sentence with n words in-cluding context and target terms A target term is usuallydenoted by the word group composed of one or more ad-jacent words in context where the position of the first andlast word in the word group is called the start and endposition respectively (e target term embedding sequencecan be denoted by [e1a e2a em

a ] with m predeterminedtarget terms Notation [p1 p2 pn] represents the rela-tive distance embedding from each word wi

c i isin 1 2 nof a sentence to a target term

(e overall architecture of MI-biGRU can be illustratedin Figure 1 (is model possesses two inputs One is a wordembedding sequence [w1 w2 wn] that consists of acontext embedding sequence [e1c e2c en

c ] and a positionembedding sequence [p1 p2 pn] (e other is a targetterm embedding sequence [e1a e2a em

a ] (e goal of themodel is to extract enough semantic information fromtwo embedding sequences and combine them to performaspect-level sentiment classification Notations employed torepresent the components of MI-biGRU are described inTable 1 (e details of the model are divided into six stepsbased on their execution order

31 Position Representation Aspect-level tasks havebenefited a lot from position embedding representation formore valuable word features [34] (e concept of relativedistance between words serves to quantify the relevance of asentence word to a target term We are required to representthe position information by a embeddable vector patternwhich can be formalized as an integer vector or a matrixdepending on whether there concerns unique or multipletarget terms

First of all the word position index of a target term in asentence will be marked as the cardinal point ldquo0rdquo Discretespacing from the ith word wi

c in a sentence to the cardinalpoint is called the relative distance of wi

c which is denoted bypi and can be calculated by the formula

pi

as minus i1113868111386811138681113868

1113868111386811138681113868 i lt as

0 as lt ilt ae

ae minus i1113868111386811138681113868

1113868111386811138681113868 i gt ae

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(1)

Extending this concept in a sentence with n words givesus the following position index list P [p1 p2 pn](is can be illustrated briefly by the following two examplesFirstly if the unique word ldquoquantityrdquo is applied as the targetterm in the sentence ldquothe quantity is also very good you willcome out satisfiedrdquo we can develop the position index list

[1 0 1 2 3 4 5 6 7 8 9] by setting a cardinal point ldquo0rdquo forthe second word ldquoquantityrdquo and deriving an increasinglypositive integer from the left or right direction for otherwords Secondly if the target term contains more than oneadjacent word all the internal words are assigned as thecardinal point ldquo0rdquo Other words in the sentence will obtainan increasingly positive integer from the start position of theterm to left direction or from the end position of the term toright direction (erefore the position index list[6 5 4 3 2 1 0 0 1 2 3 4 5 6 7] will be obtained if we setthe case sentence and the target term with ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo and ldquointerior decorationrdquo respectively

On top of that if multiple target terms are applied to asentence we can obtain a position index list sequence that iscalled the positionmatrix Assuming that a sentence has n wordsand m target terms let notationPi denote the list index of theith target term in a sentence Position matrix G is defined as

G

P1

P2

Pm

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

isin Rmtimesn (2)

where m refers to the number of target terms and n is thenumber of words in the sentence (en we use a positionembedding matrix P isin Rdptimesn to convert position index se-quence into a position embedding where dp refers to thedimension of position embedding P is initialized randomlyand updated during the model training process (e matrixis further exemplified in the same example ldquoall the moneywent into the interior decoration none of it went to thechefsrdquo which contains two target terms ldquointerior decorationrdquoand ldquochefsrdquo We can obtain the position matrix

G P1

P21113890 1113891

6 5 4 3 2 1 0 0 1 2 3 4 5 6 7

14 13 12 11 10 9 8 7 6 5 4 3 2 1 01113890 1113891

(3)

(e position matrix has been playing an increasinglyimportant role in helping researchers get a better sense ofaspect-level tasks We can first observe the polarity of theemotional words which are near the target term and thenconsider the other words which are far away to judgewhether a sentence is positive or not For example aspresented in the case ldquothe quantity is also very good you willcome out satisfiedrdquo the distance of ldquogoodrdquo (4) is closer thanldquosatisfiedrdquo (9) by a simple numerical comparison accordingto the index list [1 0 1 2 3 4 5 6 7 8 9] and the approachwill give priority to ldquogoodrdquo instead of ldquosatisfiedrdquo when wejudge the sentiment polarity of the subject ldquoquantityrdquo of thesentence (is study suggests that adding position infor-mation to initialize word embedding can provide morefeatures to perform aspect-level sentiment classification

32 Word Representation One of the basic tasks of senti-ment analysis is to present each word in a given sentence byembedding operation A feasible approach is to embed each

4 Discrete Dynamics in Nature and Society

word in a low-dimensional real value vector through theword embedding matrix E isin Rdwtimesv where dw represents thedimension of word embedding and v denotes the size ofvocabulary Matrix E is generally initialized by randomnumber generation operation (en matrix weight will beupdated to reach a stable value in the process of model

training Another feasible method to obtain the matrix E isto pretrain it through the existing corpus [35]

(is study uses pretrained Glove (pretrained wordvectors of Glove can be obtained from httpnlpstanfordeduprojectsglove) from Stanford University to get wordembeddings Four sets of sequence symbols are applied to

Pool

Pool Pool

Pool

AttentionAttention

Attention layer

Hiddenstate layer 2

Hiddenstate layer 1

Sentiment classification

Softmax

Wordembedding layer

Final representation

GRU

GRU

GRU GRU

GRUGRU

GRU

GRU

GRU

GRU

GRU

GRU

GRU

Concatenate

ha21 ha22 ha2m hw21 hw22 hw2n

hw11 hw12 hw1n

GRUGRU

GRUGRU

GRU

GRU

GRUGRU GRU

GRU GRU

ha11 ha12 ha1m

ea1 ea2

ec1 ecn

eam

p1 pn

w1 w2 wn

+ + + + + +

+ + ++++

Aspect-level sentiment analysis using MI-biGRU

Figure 1 Architecture of MI-biGRU

Table 1 Component description of the proposed model

Notation Descriptionwi

c (e ith word in a contextw

ja (e jth word in a target term

pi Relative distance from the ith word to a target term in a sentencewi Concatenation of the ith context and position embeddingas Start position of a target termae End position of a target termei

c Embedding of the ith context in a sentenceei

a Embedding of the ith target term in a sentenceh1i

w First hidden layer vector formed from wi

h1ia First hidden layer vector formed from ei

a

h2iw Second hidden layer vector formed from h1i

w

h2ia Second hidden layer vector formed from h1i

a

Discrete Dynamics in Nature and Society 5

distinguish texts and their embedding form In a sentencethe context and its embedding with n words can be denotedby [w1

c w2c wn

c ] and [e1c e2c enc ] respectively (e m

target terms and their embeddings can be denoted by[w1

a w2a wm

a ] and [e1a e2a ema ] respectively On top of

that context vector [e1c e2c enc ] and obtained position

embedding [p1 p2 pn] will be concatenated to get finalword embedding representation of each word in a sentenceas [w1 w2 wn]

33 Introduction of Gate Recurrent Unit (GRU)Recurrent neural network is a widespread networkemployed in natural language processing in recent yearsOne advantage of RNN is the fact that it can process var-iable-length text sequence and extract key features of asentence However model performance of traditional RNNin the case of long sentences has been mostly restricted bythe problems of gradient disappearance and explosionduring the training process As a result RNN was unable tosend vital information in text back and forth

A great deal of previous research into RNN has focusedon model variants Much of the current scholars paysparticular attention to LSTM models and the GRU modelsBoth LSTM and GRU provide a gate mechanism so that theneural network can reserve the important information andforget those that are less relevant to the current state It hasbeen universally accepted that GRU has the advantages offewer necessary parameters and lower network complexitycompared with LSTM (erefore this paper plans to use theGRU model to extract the key features of word embedding

Details of GRU are illustrated according to the networkstructure shown in Figure 2 (e GRU simplifies four gatemechanisms of LSTM ie input gate output gate forget gateand cell state into two gates that are called reset gate andupdate gate At any time step t GRU includes three param-eters reset gate rt update gate zt and hidden state ht All theparameters are updated according to the following equationszt σ U

zmiddot xt + W

zmiddot htminus1( 1113857

rt σ Ur

middot xt + Wr

middot htminus1( 1113857

1113957ht tanh Uh

middot xt + Wh

middot htminus1 ⊙ rt( 11138571113872 1113873 ht 1 minus zt( 1113857⊙ htminus1 + zt ⊙ 1113957ht

(4)

(e symbolic meaning are described as follows xt de-notes the input word embedding at time t htminus1 representsthe hidden state at time t minus 1 Uz Ur Uh isin Rdwtimesdh andWz Wr Wh isin Rdhtimesdh denote weight matrices where dh

indicates the dimension of hidden state σ and tanh denotethe sigmoid and tanh function respectively Notation middot is thedot product and ⊙ is the elementwise multiplication

In this study we choose bidirectional GRU to obtain thevector representation of a hidden layer for the target termand context which can extract more comprehensive featurescompared with normal GRU (e bidirectional GRU con-sists of a forward hidden layer state h

ti

rarrisin Rdh and a backward

hidden layer state hti

larrisin Rdh First of all this work concat-

enates the hidden states from opposite directions to get the

hidden layer embedding hti [h

ti

rarr h

ti

larr] isin R2dh Some vital

features of a sentence that we required are included in hti On

top of that two hidden layer embeddings h1ia (1le ilem) and

h1jw (1le jle n) can be obtained by processing the target term

and context with bidirectional GRU where m and n denotesthe number of target terms and context in a sentence re-spectively Let notations h2i

a (1le ilem) and h2jw (1le jle n)

denote the hidden layer embeddings induced by h1ia and h

1jw

respectively Finally we can iteratively put obtained vectors[h11

a h12a h1m

a ] and [h11w h12

w h1nw ] into bidirectional

GRU and generate the following hidden layer embeddings[h21

a h22a h2m

a ] and [h21w h22

w h2nw ] respectively (e

latter carries higher level features than previous embeddings

34 Position-Based Multilevel Interactive Bidirectional At-tentionNetwork Interaction information between target termsand context is particularly useful in studying sentiment clas-sification (is study performs mean pooling on hidden layerembeddings [h11

a h12a h1m

a ] [h21a h22

a h2ma ]

[h11w h12

w h1nw ] and [h21

w h22w h2n

w ] in order to get in-teractive information respectively (e pooled results are thenaveraged as a part of input for the attentionmechanism and thefinal averaged embedding contains most of semantic features(e formulas used for average pooling are described as follows

wavg1 1113936

ni1h

1iw

n

aavg1 1113936

mi1h

1ia

m

wavg2 1113936

ni1h

2iw

n

aavg2 1113936

mi1h

2ia

m

wavg wavg1 + wavg21113872 1113873

2

aavg aavg1 + aavg21113872 1113873

2

(5)

σ σ

htndash1

rt zt ht

ht

1ndash

tanh

xt

~

Figure 2 Representation of GRU structure diagram

6 Discrete Dynamics in Nature and Society

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 5: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

word in a low-dimensional real value vector through theword embedding matrix E isin Rdwtimesv where dw represents thedimension of word embedding and v denotes the size ofvocabulary Matrix E is generally initialized by randomnumber generation operation (en matrix weight will beupdated to reach a stable value in the process of model

training Another feasible method to obtain the matrix E isto pretrain it through the existing corpus [35]

(is study uses pretrained Glove (pretrained wordvectors of Glove can be obtained from httpnlpstanfordeduprojectsglove) from Stanford University to get wordembeddings Four sets of sequence symbols are applied to

Pool

Pool Pool

Pool

AttentionAttention

Attention layer

Hiddenstate layer 2

Hiddenstate layer 1

Sentiment classification

Softmax

Wordembedding layer

Final representation

GRU

GRU

GRU GRU

GRUGRU

GRU

GRU

GRU

GRU

GRU

GRU

GRU

Concatenate

ha21 ha22 ha2m hw21 hw22 hw2n

hw11 hw12 hw1n

GRUGRU

GRUGRU

GRU

GRU

GRUGRU GRU

GRU GRU

ha11 ha12 ha1m

ea1 ea2

ec1 ecn

eam

p1 pn

w1 w2 wn

+ + + + + +

+ + ++++

Aspect-level sentiment analysis using MI-biGRU

Figure 1 Architecture of MI-biGRU

Table 1 Component description of the proposed model

Notation Descriptionwi

c (e ith word in a contextw

ja (e jth word in a target term

pi Relative distance from the ith word to a target term in a sentencewi Concatenation of the ith context and position embeddingas Start position of a target termae End position of a target termei

c Embedding of the ith context in a sentenceei

a Embedding of the ith target term in a sentenceh1i

w First hidden layer vector formed from wi

h1ia First hidden layer vector formed from ei

a

h2iw Second hidden layer vector formed from h1i

w

h2ia Second hidden layer vector formed from h1i

a

Discrete Dynamics in Nature and Society 5

distinguish texts and their embedding form In a sentencethe context and its embedding with n words can be denotedby [w1

c w2c wn

c ] and [e1c e2c enc ] respectively (e m

target terms and their embeddings can be denoted by[w1

a w2a wm

a ] and [e1a e2a ema ] respectively On top of

that context vector [e1c e2c enc ] and obtained position

embedding [p1 p2 pn] will be concatenated to get finalword embedding representation of each word in a sentenceas [w1 w2 wn]

33 Introduction of Gate Recurrent Unit (GRU)Recurrent neural network is a widespread networkemployed in natural language processing in recent yearsOne advantage of RNN is the fact that it can process var-iable-length text sequence and extract key features of asentence However model performance of traditional RNNin the case of long sentences has been mostly restricted bythe problems of gradient disappearance and explosionduring the training process As a result RNN was unable tosend vital information in text back and forth

A great deal of previous research into RNN has focusedon model variants Much of the current scholars paysparticular attention to LSTM models and the GRU modelsBoth LSTM and GRU provide a gate mechanism so that theneural network can reserve the important information andforget those that are less relevant to the current state It hasbeen universally accepted that GRU has the advantages offewer necessary parameters and lower network complexitycompared with LSTM (erefore this paper plans to use theGRU model to extract the key features of word embedding

Details of GRU are illustrated according to the networkstructure shown in Figure 2 (e GRU simplifies four gatemechanisms of LSTM ie input gate output gate forget gateand cell state into two gates that are called reset gate andupdate gate At any time step t GRU includes three param-eters reset gate rt update gate zt and hidden state ht All theparameters are updated according to the following equationszt σ U

zmiddot xt + W

zmiddot htminus1( 1113857

rt σ Ur

middot xt + Wr

middot htminus1( 1113857

1113957ht tanh Uh

middot xt + Wh

middot htminus1 ⊙ rt( 11138571113872 1113873 ht 1 minus zt( 1113857⊙ htminus1 + zt ⊙ 1113957ht

(4)

(e symbolic meaning are described as follows xt de-notes the input word embedding at time t htminus1 representsthe hidden state at time t minus 1 Uz Ur Uh isin Rdwtimesdh andWz Wr Wh isin Rdhtimesdh denote weight matrices where dh

indicates the dimension of hidden state σ and tanh denotethe sigmoid and tanh function respectively Notation middot is thedot product and ⊙ is the elementwise multiplication

In this study we choose bidirectional GRU to obtain thevector representation of a hidden layer for the target termand context which can extract more comprehensive featurescompared with normal GRU (e bidirectional GRU con-sists of a forward hidden layer state h

ti

rarrisin Rdh and a backward

hidden layer state hti

larrisin Rdh First of all this work concat-

enates the hidden states from opposite directions to get the

hidden layer embedding hti [h

ti

rarr h

ti

larr] isin R2dh Some vital

features of a sentence that we required are included in hti On

top of that two hidden layer embeddings h1ia (1le ilem) and

h1jw (1le jle n) can be obtained by processing the target term

and context with bidirectional GRU where m and n denotesthe number of target terms and context in a sentence re-spectively Let notations h2i

a (1le ilem) and h2jw (1le jle n)

denote the hidden layer embeddings induced by h1ia and h

1jw

respectively Finally we can iteratively put obtained vectors[h11

a h12a h1m

a ] and [h11w h12

w h1nw ] into bidirectional

GRU and generate the following hidden layer embeddings[h21

a h22a h2m

a ] and [h21w h22

w h2nw ] respectively (e

latter carries higher level features than previous embeddings

34 Position-Based Multilevel Interactive Bidirectional At-tentionNetwork Interaction information between target termsand context is particularly useful in studying sentiment clas-sification (is study performs mean pooling on hidden layerembeddings [h11

a h12a h1m

a ] [h21a h22

a h2ma ]

[h11w h12

w h1nw ] and [h21

w h22w h2n

w ] in order to get in-teractive information respectively (e pooled results are thenaveraged as a part of input for the attentionmechanism and thefinal averaged embedding contains most of semantic features(e formulas used for average pooling are described as follows

wavg1 1113936

ni1h

1iw

n

aavg1 1113936

mi1h

1ia

m

wavg2 1113936

ni1h

2iw

n

aavg2 1113936

mi1h

2ia

m

wavg wavg1 + wavg21113872 1113873

2

aavg aavg1 + aavg21113872 1113873

2

(5)

σ σ

htndash1

rt zt ht

ht

1ndash

tanh

xt

~

Figure 2 Representation of GRU structure diagram

6 Discrete Dynamics in Nature and Society

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 6: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

distinguish texts and their embedding form In a sentencethe context and its embedding with n words can be denotedby [w1

c w2c wn

c ] and [e1c e2c enc ] respectively (e m

target terms and their embeddings can be denoted by[w1

a w2a wm

a ] and [e1a e2a ema ] respectively On top of

that context vector [e1c e2c enc ] and obtained position

embedding [p1 p2 pn] will be concatenated to get finalword embedding representation of each word in a sentenceas [w1 w2 wn]

33 Introduction of Gate Recurrent Unit (GRU)Recurrent neural network is a widespread networkemployed in natural language processing in recent yearsOne advantage of RNN is the fact that it can process var-iable-length text sequence and extract key features of asentence However model performance of traditional RNNin the case of long sentences has been mostly restricted bythe problems of gradient disappearance and explosionduring the training process As a result RNN was unable tosend vital information in text back and forth

A great deal of previous research into RNN has focusedon model variants Much of the current scholars paysparticular attention to LSTM models and the GRU modelsBoth LSTM and GRU provide a gate mechanism so that theneural network can reserve the important information andforget those that are less relevant to the current state It hasbeen universally accepted that GRU has the advantages offewer necessary parameters and lower network complexitycompared with LSTM (erefore this paper plans to use theGRU model to extract the key features of word embedding

Details of GRU are illustrated according to the networkstructure shown in Figure 2 (e GRU simplifies four gatemechanisms of LSTM ie input gate output gate forget gateand cell state into two gates that are called reset gate andupdate gate At any time step t GRU includes three param-eters reset gate rt update gate zt and hidden state ht All theparameters are updated according to the following equationszt σ U

zmiddot xt + W

zmiddot htminus1( 1113857

rt σ Ur

middot xt + Wr

middot htminus1( 1113857

1113957ht tanh Uh

middot xt + Wh

middot htminus1 ⊙ rt( 11138571113872 1113873 ht 1 minus zt( 1113857⊙ htminus1 + zt ⊙ 1113957ht

(4)

(e symbolic meaning are described as follows xt de-notes the input word embedding at time t htminus1 representsthe hidden state at time t minus 1 Uz Ur Uh isin Rdwtimesdh andWz Wr Wh isin Rdhtimesdh denote weight matrices where dh

indicates the dimension of hidden state σ and tanh denotethe sigmoid and tanh function respectively Notation middot is thedot product and ⊙ is the elementwise multiplication

In this study we choose bidirectional GRU to obtain thevector representation of a hidden layer for the target termand context which can extract more comprehensive featurescompared with normal GRU (e bidirectional GRU con-sists of a forward hidden layer state h

ti

rarrisin Rdh and a backward

hidden layer state hti

larrisin Rdh First of all this work concat-

enates the hidden states from opposite directions to get the

hidden layer embedding hti [h

ti

rarr h

ti

larr] isin R2dh Some vital

features of a sentence that we required are included in hti On

top of that two hidden layer embeddings h1ia (1le ilem) and

h1jw (1le jle n) can be obtained by processing the target term

and context with bidirectional GRU where m and n denotesthe number of target terms and context in a sentence re-spectively Let notations h2i

a (1le ilem) and h2jw (1le jle n)

denote the hidden layer embeddings induced by h1ia and h

1jw

respectively Finally we can iteratively put obtained vectors[h11

a h12a h1m

a ] and [h11w h12

w h1nw ] into bidirectional

GRU and generate the following hidden layer embeddings[h21

a h22a h2m

a ] and [h21w h22

w h2nw ] respectively (e

latter carries higher level features than previous embeddings

34 Position-Based Multilevel Interactive Bidirectional At-tentionNetwork Interaction information between target termsand context is particularly useful in studying sentiment clas-sification (is study performs mean pooling on hidden layerembeddings [h11

a h12a h1m

a ] [h21a h22

a h2ma ]

[h11w h12

w h1nw ] and [h21

w h22w h2n

w ] in order to get in-teractive information respectively (e pooled results are thenaveraged as a part of input for the attentionmechanism and thefinal averaged embedding contains most of semantic features(e formulas used for average pooling are described as follows

wavg1 1113936

ni1h

1iw

n

aavg1 1113936

mi1h

1ia

m

wavg2 1113936

ni1h

2iw

n

aavg2 1113936

mi1h

2ia

m

wavg wavg1 + wavg21113872 1113873

2

aavg aavg1 + aavg21113872 1113873

2

(5)

σ σ

htndash1

rt zt ht

ht

1ndash

tanh

xt

~

Figure 2 Representation of GRU structure diagram

6 Discrete Dynamics in Nature and Society

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 7: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

(e bidirectional GRU can extract more informationcarried by words in a given sentence and convert them intohidden states However the words differ in their importanceto the target term We should capture the relevant contextfor different target terms and then design a strategy toimprove the accuracy of classification by increasing theintensity of the modelrsquos attention to these words Weightscore can be used to express the degree of model concern(e higher the score the greater the correlation betweentarget terms and context Hence an attention mechanism isdeveloped to calculate the weight score between differenttarget terms and context If the sentiment polarity of a targetterm is determined by the model we should pay greaterattention to the words that have a higher score

(is study calculates the attention weight score in op-posite directions One is from target terms to context andthe other is from context to target terms (erefore the two-way approach was chosen because we can get two weightscore matrices in order to improve the performance of ourmodel (e process of using the attention mechanism in themodel are described as shown in Figure 1 First of all thetarget term embedding matrix [h21

a h22a h2m

a ] and theaveraged context embedding wavg are calculated to obtainthe attention vector αi

αi exp f h2i

a wavg1113872 11138731113872 1113873

1113936mj1 exp f h

2ja wavg1113872 11138731113872 1113873

f h2ia wavg1113872 1113873 tanh w

Tavg middot Wm middot h

2ia + bm1113872 1113873

(6)

where f is a score function that calculates the importance ofh2i

a in the target term Wm isin R2dhtimes2dh and bm isin R1times1 are theweight matrix and the bias respectively notation middot is the dotproduct wT

avg is the transpose of wavg and tanh represents anonlinear activation function

On top of that the averaged target term embedding aavgand context embedding matrix [h21

w h22w h2n

w ] are appliedto calculate the attention vector βi for the context (eequation is described as follows

βi exp f h2i

w aavg1113872 11138731113872 1113873

1113936nj1 exp f h

2jw aavg1113872 11138731113872 1113873

(7)

If we obtain the attention weight vectors αi and βi thetarget term representation a and the context representationw can be deduced by the following equations

a 1113944m

i1αih

2ia w 1113944

n

i1βih

2iw (8)

where a isin R2dh and w isin R2dh denote the final word em-bedding representations of the target term and contextwhich will be processed in the output layer

35 Output Layer Target term and context representationsdescribed in Section 34 will be concatenated asd [a w] isin R4dh at the output layer (en a nonlineartransformation layer and a softmax classifier are prepared

according to equation (9) to calculate the sentiment prob-ability value

z tanh d middot Wn + bn( 1113857 (9)

where Wn isin R4dhtimesdc and bn isin Rdc are the weight matrix andthe bias respectively and dc represents the number ofclassifications Probability Pk of category was analyzed byusing the following softmax function

Pk exp zk( 1113857

1113936kj1 exp zk( 1113857

(10)

36 Model Training In order to improve the model per-formance for the tasks of aspect-level sentiment classifica-tion this approach deals with the optimization of trainingprocess including word embedding layer bidirectionalGRU neural network layer attention layer and nonlinearlayer (e crossentropy with L2 regularization is applied asthe loss function which is defined as follows

L minus 1113944S

i1yilog( 1113957yi) +

12λθ

2 (11)

where λ is a regularization coefficient θ2 represents the L2regulation yi denotes the correct sentiment polarity intraining dataset and 1113957yi denotes the predicted sentimentpolarity for a sentence by using the proposed model (eparameter Θ is updated according to the gradient calculatedby using a backpropagation method (e formula is asfollows

Θ Θ minus λzL(Θ)

zΘ (12)

where λ is the learning rate In the training process themethod designs dropout strategy to randomly remove somefeatures of the hidden layer in order to avoid overfitting

4 Experiments

Section 3 has shown the theoretical formulas and operationalsteps for theMI-biGRUmodel(is section has attempted toprovide a series of experiments relating to four public aspect-level sentiment classification datasets from different do-mains (e aim of the experiments is to test the feasibility ofapplying MI-biGRU to deal with aspect-level tasks andevaluate the effectiveness of the proposed MI-biGRUmodel

41 Experimental Setting

411 Dataset and Preprocess SemEvals are one of the mostpopular aspect-level comment dataset series which report alarge amount of customer comments from the domain ofrestaurant and laptop salesWe extract SemEval 2014 (httpaltqcriorgsemeval2014task4) SemEval2015 (httpaltqcriorgsemeval2015task12) and SemEval2016 (httpaltqcriorgsemeval2016task5) from the series as the ex-perimental data SemEval 2014 includes two datasets forrestaurant and laptop sales respectively SemEvals 2015 and

Discrete Dynamics in Nature and Society 7

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 8: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

2016 are related to the restaurant Each piece of data in thedatasets is a single sentence which contains commentstarget terms sentiment labels and position information Weremove the sentences which the target term is labelled withldquonullrdquo or ldquoconflictrdquo from the datasets and the remainingsentences possess a corresponding sentiment label for eachtarget term (e statistics of the datasets are provided inTable 2

412 Pretraining (is section presents the pretrainingprocess for word embedding matrix E that is generally set bya random initialization operation and the weight score isthen updated during the training process However E can bepretrained on some existing corpora (e benefit of thisapproach is that we can obtain the optimal parameters of themodel from the high-quality datasets Hence a pretrainedGlove from Stanford University was adopted in order toimprove the model performance As a result the parametersof word embedding and bidirectional GRU in this study areinitialized with the same parameters in corresponding layers

413 Hyperparameters Setting Other parameters exceptpretraining of word embedding are initialized by the sam-pling operations from uniform distribution U(minus01 01) inwhich all bias are set to zero Both dimensions of wordembedding and bidirectional GRU hidden state are set to300 (e dimension of position embedding is considered as100(e size of batch is placed at 128 We take 80 as the maxlength of a sentence (e coefficient of L2 regulation and thelearning rate is set to 10minus 5 and 00029 respectively (isexperiment uses dropout strategy with the dropout rate 05in order to avoid suffering from overfitting It is important toemphasize that using the same parameters for differentdatasets may not yield the best results However there will bea series of parameters that will optimize the execution of themodel on each dataset from a global perspective (ereforewe confirm the above parameters as the final hyper-parameters of the model through a large number of ex-periments In addition we use the Adam optimizer tooptimize all parameters Experiments have shown that theAdam optimizer performs better than other optimizers suchas SGD and RMSProp on our classification task

414 Evaluation (is section presents the performanceevaluation indictors for all baseline methods that aremen-tioned in Sections 42 and 43(e definition of ldquoAccuracyrdquo isas follows

Accuracy TP + TN

TP + TN + FP + FN (13)

As outlined in Table 3 the symbol TP is the abbreviationfor ldquoTrue Positiverdquo which refers to the fact that both thesentiment label and the model prediction are positive (esymbol FP is short for ldquoFalse Positiverdquo which means that thesentiment label is negative and the model prediction ispositive Similarly ldquoFalse Negativerdquo and ldquoTrue Negativerdquo arepresented as the symbols FN and TN respectively

A broader set of estimation indicators precision recalland F1-score [36 37] is also adopted in this study

Precision TP

TP + FP

Recall TP

TP + FN

F1 2TP

2TP + FP + FN2 middot Precision middot RecallPrecision + Recall

(14)

Historically the term ldquoPrecisionrdquo has been used to de-scribe the ratio of correctly predicted positive observationsto total predicted positive observations which is generallyunderstood as the ability to distinguish the negative samples(e higher the ldquoPrecisionrdquo the stronger the modelrsquos ability todistinguish negative samples Previous studies mostly de-fined ldquoRecallrdquo as the ratio of correctly predicted positiveobservations to all observations in the actual class [38]which reflects the modelrsquos ability to recognize positivesamples (e higher the recall the stronger the modelrsquosability to recognize positive samples (e term ldquoF1-scorerdquocombines ldquoPrecisionrdquo and ldquoRecallrdquo (e robustness of theclassification model is determined by ldquoF1-scorerdquo

42 ComparedModels To demonstrate the advantage of ourmethod on aspect-level sentiment classification we com-pared it with the following baselines

(i) LSTM [10] LSTM is a classic neural network whichlearns the sentiment classification labels by thetransformation from word embeddings to hiddenstates an average operation of the states and asoftmax operation LSTM has only been carried outin coarse-grained classification tasks and has notdealt with the aspect-level tasks

(ii) AE-LSTM [10] AE-LSTM is a variant of LSTMwhich adds the connection operation betweenhidden states and target term to generate an at-tention weight vector Existing studies use thisvector representation to determine the sentimentpolarity of a sentence

(iii) ATAE-LSTM [10] the model structure of ATAE-LSTM is similar to AE-LSTM except for the step ofword embedding initialization Adding the targetword embedding into each context embedding atthe initialization step can highlight the status of thetarget term in LSTM and obtain more sufficientfeatures

(iv) IAN [18] INA is a neural network with interactivestructure Firstly the model calculates the two-wayattention weights between the context and targetword to obtain their rich relational features Sec-ondly the information of two directions is con-catenated to perform aspect-level sentimentclassification

(v) MemNet [17] MemNet is an open model to per-form aspect-level sentiment classification by using

8 Discrete Dynamics in Nature and Society

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 9: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

the same attention mechanism multiple times (eweight matrix is optimized through multilayerinteraction which can extract high-quality abstractfeatures

(vi) PBAN [20] in contrast to above baselines PBANintroduces the relative distance between the contextand the target word in a sentence to perform aspect-level sentiment classification (e model focusesmore on the words that are close to the target term(e attention mechanism is also used in PBAN tocalculate the weight matrix

(vii) MI-biGRU the model is proposed in this paperMI-biGRU combines the concept of relative dis-tance and the improved GRU with interactivestructure to perform aspect-level tasks

43 Comparison of Aspect-Level Sentiment Analysis Model(is section presents an application result for all baselinemethods that are mentioned in Section 42 We evaluated theeffectiveness of our method in terms of aspect-level senti-ment classification results on four shared datasets (eexperiment chooses Accuracy and F1-score to evaluate allthese methods because Accuracy is the basic metric and F1-score measures both precision and recall of the classificationresults

As we can see from Table 4 the Accuracy and F1-scoreie 7428 and 6024 under dataset Restaurant14 of LSTM isthe lowest of all the models A common low score underdifferent datasets indicated that LSTM lacks the mechanismto process multiple target terms in a sentence althoughLSTM averages the hidden states Developing a model toprocess more than one target term in a sentence contributesa lot to improve classification performance An improve-ment over the baseline LSTM is observed in AE-LSTM andATAE-LSTM Particularly we can notice that the results onthe four datasets are better than the baseline LSTM byapproximately 2ndash7 since AE-LSTM and ATAE-LSTM addjudgments on target terms

Introducing interactive structure or attention mecha-nism to models ie IAN and MemNet in Table 4 is

necessarily served as an effective way to improve the as-sessment scores since the abstract features and relationshipsbetween target terms and context play a positive role inmodel performance In the process of initializing wordembedding abundant features can be learned if the relativedistance is considered As we can see from the model PBANin Table 4 with the added position information its scoressurpass most baseline methods except for 7973 and 8079 indatasets Restaurant14 and Restaurant16

(e proposed model MI-biGRU combines the conceptof relative distance interactive model structure and theword embedding initialization involving context and targetterms to perform aspect-level tasks Linking the positionvector with the context can enrich the input features of themodel

Recently IAN and MemNet models have examined thepositive effects of the attention mechanism between contextand target terms on aspect-level feature extraction Im-provement model MI-biGRU of bidirectional GRU neuralnetwork generates the attention weight matrix from thetarget term to context and from the context to target term byusing interactive attention mechanism twice Degree foundto be influencing classification effect of MI-biGRU is pre-sented in Table 4 We can notice that the classificationperformance on MI-biGRU is much better than otherbaseline methods

We achieve the best Accuracy and F1-score in datasetsRestaurant14 Laptop14 and Restaurant15 We can noticethat ATAE-LSTM obtained the best Accuracy 8517 inRestaurant16 (e reason why the accuracy of Restaurant16is not optimal on MI-biGRU may be the imbalance of dataIn dataset SemEval 2016 the amount of data with sentimentpolarity is very small in the restaurant domain for testing sothe model does not classify some comments correctlyHowever the optimal F1-score is also obtained by ourmodelwhich illustrates that MI-biGRU is capable of distinguishingpositive and negative samples(erefore we can come to theconclusion that our model can achieve the best performancein four public datasets

44Model Analysis ofMI-biGRU (is section illustrates therationality of each component in MI-biGRU by contrastexperiments (e dependence of model accuracy on theamount of data is described according to a regular datachange Experiments of different technical combinations areshown in Table 5

In order to assess the contribution of the three technicalelements ie bi-GRU structure position and interaction inthe process of word embedding measuring accuracy by

Table 2 Statistics of aspect-based sentiment analysis datasets

DatasetTrain Test

Positive Negative Neural Positive Negative NeuralRestaurant14 2164 807 637 728 196 196Laptop14 994 870 464 341 128 169Restaurant15 1178 382 50 439 328 35Restaurant16 1620 709 88 597 190 38

Table 3 Description of TP FP TN and FN

Predict classPositive Negative

Actual classPositive TP (True Positive) FN (False Negative)Negative FP (False Positive) TN (True Negative)

Discrete Dynamics in Nature and Society 9

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 10: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

element combinations of four datasets was used First of allit can be seen from the data in Table 5 that the accuracyadvantages of the model with two bidirectional GRUs (bi-GRUs) are not obviously compared with only one bi-GRUstructure especially in laptop domain of SemEval 2014 andrestaurant domain of SemEval 2015 (erefore increasingthe number of bi-GRU structures may not significantlyimprove model performance On top of that it is apparentfrom this table in which position information plays an in-creasingly important role in model accuracy since theperformance of a bi-GRU structure with position is greatlyenhanced compared to the independent bi-GRU modelFinally what is highlighted in the table is the model MI-biGRU that adds a multilevel interactive GRU layer based onposition information MI-biGRU achieved the best perfor-mance on four aspect-level sentiment classification datasets(e test findings in Table 5 illustrate the feasibility of ourideas on aspect-level sentiment analysis

Simple random data extraction experiments were uti-lized to analyze the dependence of model accuracy on thequantity of data We randomly take 20 40 60 and 80of the data from the dataset of SemEval 2014 in restaurantand laptop domain (e trends of Accuracy and F1-score ofbaselines LSTM IAN PBAN andMI-biGRU under datasetsRestaurant14 and Laptop14 are shown in Figures 3ndash6

(e trends of Figures 3ndash6 reveal that there has been asteady increase of Accuracy and F1-score of all the baselinemethods with the rise of data amount (e Accuracy and F1-score of MI-biGRU is low in the initial data amountHowever the obvious finding to emerge from the analysis isthat the Accuracy and F1-score of MI-biGRU at 60 of thedata amount has reached an approximate peak (is may bebecause rich data is provided to the model and the ad-vantages of position information and multilevel interactionmechanism are realized Compared with using a smallamount of data rich data can allow the model to learn morecomplete semantic features (erefore the performance of

sentiment analysis achieves better results than baselinemodels When the amount of data exceeds 60 our modelrsquosperformance enhances rapidly and finally reaches thehighest score (erefore from a holistic perspective MI-biGRU can achieve the best results in classification taskscompared to baseline models if we can reach a critical massof data

45 Case Study Working principle and high performance ofthe novel model can be further illustrated by a case study thatvisualizes the attention weight between target terms andcontext according to color shades (e darker color of thewords in Figure 7 indicates the greater attention weight

Table 4 Performance of different models on sentiment analysis datasets

DatasetRestaurant14 Laptop14 Restaurant15 Restaurant16

Acc F1 Acc F1 Acc F1 Acc F1LSTM 7428 6024 6645 6023 7536 4869 8312 5255AE-LSTM 7612 6108 6883 6208 7594 5011 8446 5628ATAE-LSTM 7723 6297 6861 6299 7654 5329 8517 5932IAN 7714 6453 7194 6651 7728 5241 8149 6257MemNet(9)lowast 8057 6549 7149 6837 7718 5336 8142 6049PBAN 7973 7108 7461 7082 7811 5673 8079 6185MI-biGRU 8166 7386 7525 7093 7917 5997 8254 6348(e number (9) after MemNet refers to nine computational layers that are adopted

Table 5 Model accuracy of different technical combinations

Restaurant14 Laptop14 Restaurant15 Restaurant16Single-level interactive bi-GRU 8138 7249 7799 8196Multilevel interactive bi-GRUs 7903 7278 7882 8207Position + single-level interactive bi-GRU 8141 7438 7858 8207Position +multilevel interactive bi-GRUs 8166 7525 7917 8254

Restaurant14

74

72

82

80

78

76

Accu

racy

()

20 40 60 80 100The amount of data used ()

PBANMI-biGRU

LSTMIAN

Figure 3 Accuracy trend of baseline models in restaurant domain

10 Discrete Dynamics in Nature and Society

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 11: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

which is more essential to judge the polarity of a target term(us the model will pay more attention to these words

(is study confirms that when the model judged thepolarity of a target term it paid more attention to the wordsaround it Target term ldquoweekendsrdquo and ldquoFrench foodrdquo areperfect examples in Figure 7 When we distinguish thepolarity of ldquoweekendsrdquo the words ldquobitrdquo and ldquopackedrdquo aremore critical than the words with bigger relative distances(e words such as ldquobestrdquo that are close to the target termldquoFrench foodrdquo will have a greater impact on polarityjudgment

One interesting finding was that the model may givesome words with small distance a low attention weight Forexample the model actually gives the words ldquobutrdquo and ldquoviberdquoa low weight when the target term is ldquoweekendsrdquo (isphenomenon is normal since the sentiment contribution ofthe same words in the sentence to different target terms isvaried MI-biGRU will automatically select surroundingwords that are worth giving more attention according to the

20 40 60 80 100e amount of data used ()

74

72

70

68

66

64

Accu

racy

()

Laptop14

LSTMIAN

PBANMI-biGRU

Figure 5 Accuracy trend of baseline models in laptop domain

725

700

675

650

625

600

575

550

F l-s

core

()

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

Restaurant14

Figure 4 F1-score trend of baseline models in restaurant domain

60

62

70

68

66

64

20 40 60 80 100The amount of data used ()

LSTMIAN

PBANMI-biGRU

58

56

Laptop14

F l-s

core

()

Figure 6 F1-score trend of baseline models in laptop domain

Discrete Dynamics in Nature and Society 11

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 12: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

specific target term and then judge the sentiment polarity toperform aspect-level sentiment classification

5 Conclusion and Future Work

(is paper puts forward a novel descriptive multilevel in-teractive bidirectional attention network (MI-biGRU) whichinvolves both bidirectional GRUs and position informationfor aspect-level sentiment analysis We refine the traditionalaspect-level models by considering the context target termsand relative distance in word embeddings In addition twobidirectional GRUs and interactive attention mechanismsare combined to extract abstract deep features in aspect-leveltasks (e experimental results on restaurant and laptopcomment tasks demonstrate our advantage over the tradi-tional sentiment classification methods

As a sentiment classification method MI-biGRU per-forms very well on comment context especially when acritical mass of aspect-level sentiment sentences is reachedExtracting the attention weight of words from the position-labelled context according to information interaction makesMI-biGRU more effective than other regular methods

Our future work focuses on finding the embeddingconclusions of the words with semantic relationshipsFurthermore we will figure out phrase-level sentimentmethod with position information

Data Availability

(e data used to support the findings of this study are in-cluded within the article by reporting a website link httpaltqcriorgsemeval2014task4 httpaltqcriorgsemeval2015task12 and httpaltqcriorgsemeval2016task5

Conflicts of Interest

(e authors declare that they have no conflicts of Interest

Acknowledgments

(is work was supported by the Chunhui Plan Cooperationand Research Project Ministry of Education of China (noZ2015100) National Natural Science Foundation (no61902324) Scientific Research Funds project of Science andTechnology Department of Sichuan Province (nos2016JY0244 2017JQ0059 2019GFW131 2020JY and2020GFW) Fund Project of Chengdu Science and Tech-nology Bureau (no 2017-RK00-00026-ZF) Fund of SichuanEducational Committee (nos 15ZB0134 and 17ZA0360) and

Foundation of Cyberspace Security Key Laboratory ofSichuan Higher Education Institutions (no sjzz2016-73)and University-sponsored Overseas Education Project ofXihua University

References

[1] B Liu Sentiment Analysis Mining Opinions Sentiments andEmotions Cambridge University Press Cambridge UK 2015

[2] K Schouten and F Frasincar ldquoSurvey on aspect-level sen-timent analysisrdquo IEEE Transactions on Knowledge and DataEngineering vol 28 no 3 pp 813ndash830 2016

[3] M Tsytsarau and T Palpanas ldquoSurvey on mining subjectivedata on the webrdquo Data Mining and Knowledge Discoveryvol 24 no 3 pp 478ndash514 2012

[4] Y Tay A T Luu and S C Hui ldquoLearning to attend via word-aspect associative fusion for aspect-based sentiment analysisrdquo2017 httpsarxivorgabs171205403

[5] L Jiang M Yu M Zhou et al ldquoTarget-dependent twittersentiment classificationrdquo in Proceedings of the Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies Association for Computational LinguisticsPortland OR USA June 2011

[6] K C Cho B Merrienboer C Gulcehre et al ldquoLearningphrase representations using rnn encoder-decoder for sta-tistical machine translationrdquo 2014 httpsarxivorgabs14061078

[7] M Schuster and K K Paliwal ldquoBidirectional recurrent neuralnetworksrdquo IEEE Transactions on Signal Processing vol 45no 11 pp 2673ndash2681 1997

[8] S W Lai L H Xu K Liu et al ldquoRecurrent convolutionalneural networks for text classificationrdquo in Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligencevol 333 pp 2267ndash2273 Austin TX USA January 2015

[9] D Tang B Qin X Feng and T Liu ldquoEffective LSTMs fortarget-dependent sentiment classificationrdquo 2015 httpsarxivorgabs151201100

[10] J Zeng X Ma and K Zhou ldquoEnhancing attention-basedLSTM with position context for aspect-level sentiment clas-sificationrdquo IEEE Access vol 7 pp 20462ndash20471 2019

[11] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo inProceedings of the ICLR San Diego CA USA May 2015

[12] V Mnih N Heess A Graves et al ldquoRecurrent models ofvisual attentionrdquo in Proceedings of the Advances in NeuralInformation Processing Systems Montreal Canada December2014

[13] Y M Cui Z P Chen S Wei et al ldquoAttention-over-attentionneural networks for reading comprehensionrdquo 2016 httpsarxivorgabs160704423

[14] K M Hermann T Kocisky E Grefenstette et al ldquoTeachingmachines to read and comprehendrdquo 2015 httpsarxivorgabs150603340

Target term Sentence Polarity

Negative

Positive

may be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

Itweekends

French foodmay be a bit packed on

the best

but the vibe is good and

it is French food you will find in the area

weekends

It

Figure 7 Visualized attention weights

12 Discrete Dynamics in Nature and Society

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13

Page 13: Aspect-LevelSentimentAnalysisBasedonPositionFeaturesUsing ...downloads.hindawi.com/journals/ddns/2020/5824873.pdf · embedding and word embedding in the pretraining step, si-multaneously.

[15] O Wallaart and F Frasincar ldquoA hybrid approach for aspect-based sentiment analysis using a lexicalized domain ontologyand attentional neural modelsrdquo in Be Semantic WebSpringer Cham Switzerland 2019

[16] Y Wang M Huang X Zhu and L Zhao ldquoAttention-basedLSTM for aspect-level sentiment classificationrdquo in Proceed-ings of the Conference on Empirical Methods in NaturalLanguage Processing Austin TX USA November 2016

[17] D Tang B Qin and T Liu ldquoAspect level sentiment classi-fication with deep memory networkrdquo in Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing pp 214ndash224 Austin TX USA November 2016

[18] D Ma S Li X Zhang and H Wang ldquoInteractive attentionnetworks for aspect-level sentiment classificationrdquo in Pro-ceedings of the Twenty-Sixth International Joint Conference onArtificial Intelligence Melbourne Australia August 2017

[19] B Liu X An and J X Huang ldquoUsing term location infor-mation to enhance probabilistic information retrievalrdquo inProceedings of the 38th International ACM SIGIR Conferenceon Research and Development in Information RetrievalSantiago CL USA August 2015

[20] S Gu L Zhang Y Hou et al ldquoA position-aware bidirectionalattention network for aspect-level sentiment analysisrdquo inProceedings of the 27th International Conference on Compu-tational Linguistics pp 774ndash784 Santa Fe NewMexico USAAugust 2018

[21] B Pang L Lee and S Vaithyanathan ldquo(umbs up Senti-ment classification using machine learning techniquesrdquo inProceedings of the Empirical Methods in Natural LanguageProcessing pp 79ndash86 Philadelphia PA USA July 2002

[22] D T Vo and Y Zhang ldquoTarget-dependent twitter sentimentclassification with rich automatic featuresrdquo in Proceedings ofthe Twenty-Fourth International Joint Conference on ArtificialIntelligence (IJCAI) AAAI Press Buenos Aires ArgentinaJuly 2015

[23] M S Akhtar D Gupta A Ekbal and P BhattacharyyaldquoFeature selection and ensemble construction a two-stepmethod for aspect based sentiment analysisrdquo Knowledge-Based Systems vol 125 no 1 pp 116ndash135 2017

[24] J Wagner P Arora S Cortes et al ldquoDCU aspect-basedpolarity classification for SemEval task 4rdquo in Proceedings ofthe 8th International Workshop on Semantic Evaluation(SemEval-2014) Dublin Ireland August 2014

[25] N Kaji and M Kitsuregawa ldquoBuilding lexicon for sentimentanalysis from massive collection of HTML documentsrdquo inProceedings of the 2007 Joint Conference on Empirical Methodsin Natural Language Processing and Computational NaturalLanguage Learning (EMNLP-CoNLL) Prague Czech Re-public June 2007

[26] V Prez-Rosas ldquoLearning sentiment lexicons in Spanishrdquo inProceedings of the Eighth International Conference on Lan-guage Resources and Evaluation (LREC-2012) IstanbulTurkey May 2012

[27] S M Mohammad S Kiritchenko and X Zhu ldquoNrc-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquo2013 httpsarxivorgabs13086242

[28] H H Dohaiha P Prasad A Maag and A Alsadoon ldquoDeeplearning for aspect-based sentiment analysis a comparativereviewrdquo Expert Systems with Applications vol 118 pp 272ndash299 2018

[29] R Socher J Pennington E H Huang et al ldquoSemi-supervisedrecursive autoencoders for predicting sentiment distribu-tionsrdquo in Proceedings of the Conference on Empirical Methods

in Natural Language Processing DBLP Edinburgh UK July2011

[30] R Socher A Perelygin Y W Jean et al ldquoRecursive deepmodels for semantic compositionality over a sentimenttreebankrdquo in Proceedings of the Conference on EmpiricalMethods in Natural Language Processing pp 1631ndash1642SeattleWA USA October 2013

[31] K S Tai R Socher and C D Manning ldquoImproved semanticrepresentations from tree-structured long short-termmemory networksrdquo in Proceedings of the 53rd AnnualMeetingof the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Pro-cessing (Volume 1 Long Papers) vol 5 no 1 p 36 BeijingChina July 2015

[32] S Ruder P Ghaffari and J G Breslin ldquoA hierarchical modelof reviews for aspect-based sentiment analysisrdquo in Proceedingsof the Conference on Empirical Methods in Natural LanguageProcessing Austin TX USA November 2016

[33] M Zhang Y Zhang and D T Vo ldquoGated neural networks fortargeted sentiment analysisrdquo in Proceedings of the AAAIpp 3087ndash3093 Phoenix AZ USA February 2016

[34] D Zeng K Liu S Lai et al ldquoRelation classification viaconvolutional deep neural networkrdquo in Proceedings ofCOLING 2014 the 25th International Conference on Com-putational Linguistics Technical Papers pp 2335ndash2344Dublin Ireland August 2014

[35] J Pennington R Socher and C Manning ldquoGlove globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) Doha Qatar October 2014

[36] B Huang Y Ou and K M Carley ldquoAspect level sentimentclassification with attention-over-attention neural networksrdquoin Social Cultural and Behavioral Modeling Springer ChamSwitzerland 2018

[37] S Wu Y Xu F Wu Z Yuan Y Huang and X Li ldquoAspect-based sentiment analysis via fusing multiple sources of textualknowledgerdquo Knowledge-Based Systems vol 183 Article ID104868 2019

[38] C D Manning P Raghavan and H Schtze Introductionto Information Retrieval 1 Cambridge University PressCambridge UK 2008

Discrete Dynamics in Nature and Society 13