Research Article A New Data Representation Based on...

17
Research Article A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text Mujiono Sadikin, 1,2 Mohamad Ivan Fanany, 2 and T. Basaruddin 2 1 Faculty of Computer Science, Universitas Mercu Buana, l. Meruya Selatan No. 1, Kembangan, Jakarta Barat 11650, Indonesia 2 Machine Learning and Computer Vision Laboratory, Faculty of Computer Science, Universitas Indonesia, Depok, West Java 16424, Indonesia Correspondence should be addressed to Mujiono Sadikin; [email protected] Received 27 May 2016; Revised 8 August 2016; Accepted 18 September 2016 Academic Editor: Trong H. Duong Copyright © 2016 Mujiono Sadikin et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple token representations for a single drug name. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). is paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. e first technique is evaluated with the standard NN model, that is, MLP. e second technique involves two deep network classifiers, that is, DBN and SAE. e third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, that is, LSTM. In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645. 1. Introduction e rapid growth of information technology provides rich text data resources in all areas, including the medical field. An abundant amount of medical text data can be used to obtain valuable information for the benefit of many purposes. e understanding of drug interactions, for example, is an impor- tant aspect of manufacturing new medicines or controlling drug distribution in the market. e process to produce a medicinal product is an expensive and complex task. In many recent cases, however, many drugs are withdrawn from the market when it was discovered that the interaction between the drugs is hazardous to health [1]. Information, or objects extraction, from an unstructured text document, is one of the most challenging studies in the text mining area. e difficulties of text information extrac- tion keep increasing due to the increasing size of corpora, continuous growth of human’s natural language, and the unstructured formatted data [2]. Among such valuable infor- mation are medical entities such as drug name, compound, and brand; disease names and their relations, such as drug- drug interaction and drug-compound relation. We need a suitable method to extract such information. To embed those abundant data resources, however, many problems have to be tackled, for example, large data size, unstructured format, choosing the right NLP, and the limitation of annotated datasets. More specific and valuable information contained in medical text data is a drug entity (drug name). Drug name recognition is a primary task of medical text data extraction since the drug finding is the essential element in solving other information extraction problems [3, 4]. Among derivative work of drug name extractions are drug-drug interaction [5], drug adverse reaction [6], or other applications (information Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2016, Article ID 3483528, 16 pages http://dx.doi.org/10.1155/2016/3483528

Transcript of Research Article A New Data Representation Based on...

Page 1: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Research ArticleA New Data Representation Based on Training DataCharacteristics to Extract Drug Name Entity in Medical Text

Mujiono Sadikin12 Mohamad Ivan Fanany2 and T Basaruddin2

1Faculty of Computer Science Universitas Mercu Buana l Meruya Selatan No 1 KembanganJakarta Barat 11650 Indonesia2Machine Learning and Computer Vision Laboratory Faculty of Computer Science Universitas Indonesia DepokWest Java 16424 Indonesia

Correspondence should be addressed to Mujiono Sadikin mujionosadikinmercubuanaacid

Received 27 May 2016 Revised 8 August 2016 Accepted 18 September 2016

Academic Editor Trong H Duong

Copyright copy 2016 Mujiono Sadikin et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

One essential task in information extraction from the medical corpus is drug name recognition Compared with text sources comefrom other domains the medical text mining poses more challenges for example more unstructured text the fast growing of newterms addition a wide range of name variation for the same drug the lack of labeled dataset sources and external knowledgeand the multiple token representations for a single drug name Although many approaches have been proposed to overwhelmthe task some problems remained with poor F-score performance (less than 075) This paper presents a new treatment in datarepresentation techniques to overcome some of those challenges We propose three data representation techniques based on thecharacteristics of word distribution and word similarities as a result of word embedding training The first technique is evaluatedwith the standard NNmodel that is MLPThe second technique involves two deep network classifiers that is DBN and SAEThethird technique represents the sentence as a sequence that is evaluated with a recurrent NNmodel that is LSTM In extracting thedrug name entities the third technique gives the best F-score performance compared to the state of the art with its average F-scorebeing 08645

1 Introduction

The rapid growth of information technology provides richtext data resources in all areas including themedical field Anabundant amount of medical text data can be used to obtainvaluable information for the benefit of many purposes Theunderstanding of drug interactions for example is an impor-tant aspect of manufacturing new medicines or controllingdrug distribution in the market The process to produce amedicinal product is an expensive and complex task Inmanyrecent cases however many drugs are withdrawn from themarket when it was discovered that the interaction betweenthe drugs is hazardous to health [1]

Information or objects extraction from an unstructuredtext document is one of the most challenging studies in thetext mining area The difficulties of text information extrac-tion keep increasing due to the increasing size of corpora

continuous growth of humanrsquos natural language and theunstructured formatted data [2] Among such valuable infor-mation are medical entities such as drug name compoundand brand disease names and their relations such as drug-drug interaction and drug-compound relation We need asuitable method to extract such information To embed thoseabundant data resources however many problems have tobe tackled for example large data size unstructured formatchoosing the right NLP and the limitation of annotateddatasets

More specific and valuable information contained inmedical text data is a drug entity (drug name) Drug namerecognition is a primary task of medical text data extractionsince the drug finding is the essential element in solving otherinformation extraction problems [3 4] Among derivativework of drug name extractions are drug-drug interaction [5]drug adverse reaction [6] or other applications (information

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2016 Article ID 3483528 16 pageshttpdxdoiorg10115520163483528

2 Computational Intelligence and Neuroscience

retrieval decision support system drug development or drugdiscovery) [7]

Compared to other NER (name entity recognition) taskssuch as PERSON LOCATION EVENT or TIME drug nameentity recognition facesmore challenges First the drug nameentities are usually unstructured texts [8] where the numberof new entities is quickly growing over timeThus it is hard tocreate a dictionary which always includes the entire lexiconand is up-to-date [9] Second the naming of the drug alsowidely varies The abbreviation and acronym increase thedifficulties in determining the concepts referred to by theterms Third many drug names contain a combination ofnonword and word symbols [10] Fourth the other problemin drug name extraction is that a single drug name might berepresented by multiple tokens [11] Due to the complexity inextracting multiple tokens for drugs some researchers suchas [12] even ignore that case in the MedLine and DrugBanktraining with the reason that the multiple tokens drug isonly 18 of all drug names It is different with anotherdomain that is entity names in the biomedical field areusually longer Fifth in some cases the drug name is acombination of medical and general terms Sixth the lack ofthe labelled dataset is another problem it has yet to be solvedby extracting the drug name entities

This paper presents three data representation techniquesto extract drug name entities contained in the sentences ofmedical texts For the first and the second techniques wecreated an instance of the dataset as a tuple which is formedfrom 5 vectors of words In the first technique the tuplewas constructed from all sentences treated as a sequencewhereas in the second technique the tuple is made fromeach sentence treated as a sequence The first and secondtechniques were evaluated with the standardMLP-NNmodelwhich is performed in the first experiment In the secondexperiment we use the second data representation techniquewhich is also applied to the other NN model that is DBNand SAE The third data representation which assumes thetext as sequential entities was assessed with the recurrentNNmodel LSTM Those three data representation techniquesare based on the word2vec value characteristics that is theircosine and the Euclidean distance between the vectors ofwords

In the first and second techniques we apply three dif-ferent scenarios to select the most possible words whichrepresent the drug name The scenarios are based on thecharacteristics of training data that is drug words distri-bution that is usually assumed to have a smaller frequencyof appearance in the dataset sentences The drug namecandidate selections are as follows In the first case all testdataset is taken In the second case 23 of all test dataset isselected In the third case 119909119910 (119909 lt 119910) of the test dataset(where 119909 and 119910 are arbitrary integer numbers) are selectedafter clustering the test dataset into 119910 clusters

In the third experiment based on the characteristics ofthe resulting word vectors of the trained word embeddingwe formulate a sequence data representation applied to RNN-LSTMWe used the Euclidian distance of the current input tothe previous input as an additional feature besides its vector of

words In this study the vector of words is provided by wordembedding methods proposed by Mikolov et al [13]

Our main important contributions in this study are

(1) the new data representation techniques which donot require any external knowledge nor handcraftedfeatures

(2) the drug extraction techniques based on the wordsdistribution contained in the training data

Our proposed method is evaluated on DrugBank andMedLine medical open dataset obtained from SemEval 2013Competition task 91 see httpswwwcsyorkacuksemeval-2013task9 which is also used by [11 12 14] The format ofbothmedical texts is in Englishwhere some sentences containdrug name entities In extracting drug entity names fromthe dataset our data representation techniques give the bestperformance with F-score values 0687 for MLP 06700 forDBN and 0682 for SAE whereas the third technique withLSTM gives the best F-score that is 09430 The averageF-score of the third technique is 08645 that is the bestperformance compared to the other previous methods

By applying the data representation techniques our pro-posed approach provides at least three advantages

(1) The capability to identify multiple tokens as a singlename entity

(2) The ability to deal with the absence of any externalknowledge in certain languages

(3) No need to construct any additional features suchas characters type identification orthography feature(lowercase or uppercase identification) or token posi-tion

The rest of the sections of this paper are organized asfollows Section 2 explains some previous works dealing withname entity (and drug name as well) extraction frommedicaltext sources The framework approach and methodologyto overcome the challenges of drug name extraction arepresented in Section 3 The section also describes datasetmaterials and experiment scenarios Section 4 discusses theexperiment results and its analysis while Section 5 explainsthe achievement the shortcoming and the prospects of thisstudy The section also describes several potential explo-rations for future research

2 Related Works

The entity recognition in a biomedical text is an activeresearch and many methods have been proposed For exam-ple Pal and Gosal [9] summarize their survey on variousentity recognition approaches The approaches can be cate-gorized into three models dictionary based rule-based andlearning based methods [2 8] A dictionary based approachuses a list of terms (term collection) to assist in predictingwhich targeted entity will be included in the predicted groupAlthough their overall precision is more accurate their recallis poor since they anticipate less new terms The rule-basedapproach defines a certain rule which describes such pattern

Computational Intelligence and Neuroscience 3

formation surrounding the targeted entity This rule can be asyntactic term or lexical term Finally the learning approachis usually based on statistical data characteristics to builda model using machine learning techniques The model iscapable of automatic learning based on positive neutral andnegative training data

Drug name extraction and their classification are oneof the challenges in the Semantic Evaluation Task (SemEval2013) The best-reported performance for this challenge was715 in F-score [15] Until now the studies to extract drugnames still continue and many approaches have been pro-posed CRF-based learning is the most commonmethod uti-lized in the clinical text information extraction CRF is usedby one of the best [11] participants in SemEval challenges inthe clinical text ( httpswwwcsyorkacuksemeval-2013)As for the use of external knowledge aimed at increasing theperformance the author [11] uses ChEBI (Chemical Entitiesof Biological Interest) that is a dictionary of small molecularentitiesThe best achieved performance is 057 in F-score (forthe overall dataset)

A hybrid approach model which combines statisticallearning and dictionary based is proposed by [16] In theirstudy the author utilizes word2vec representation CRFlearning model and DINTO a drug ontology With thisword2vec representation targeted drug is treated as a currenttoken in context windows which consists of three tokens onthe left and three tokens on the right Additional featuresare included in the data representation such as pos tagslemma in the windows context and an orthography featureas uppercase lowercase and mixed capThe author also usedWikipedia text as an additional resource to performword2vecrepresentation training The best F-score value in extractingthe drug name provided by the method is 072

The result of CRF-based active learning which is appliedtoNERBIO (Beginning InsideOutput) annotation token forextracting name entity in the clinical text is presented in [17]The framework of this active learning approach is a sequentialprocess initial model generation querying training anditerationThe CRF Algorithm BIO approach was also studiedby Ben Abacha et al [14] The features for the CRF algorithmare formulated based on token and linguistics feature andsemantic feature The best F-score achieved by this proposedmethod is 072

Korkontzelos et al studied a combination of aggregatedclassifier maximum entropy-multinomial classifier andhandcrafted feature to extract drug entity [4] They classifieddrug and nondrug based on the token features formulationsuch as tokens windows the current token and 8 otherhandcrafted features

Another approach for discovering valuable informationfrom clinical text data that adopts event-location extractionmodel was examined by Bjorne et al [12] They use an SVMclassifier to predict drug or nondrug entity which is appliedto DrugBank datasetThe best performance achieved by theirmethod is 06 in F-score The drawback of their approach isthat it only deals with a single token drug name

To overcome the ambiguity problem in NERmined froma medical corpus a segment representation method has alsobeen proposed by Keretna et al [8] Their approach treats

each word as belonging to three classes that is NE not NEand an ambiguous class The ambiguity of the class memberis determined by identifying whether the word appears inmore than one context or not If so this word falls into theambiguous class After three class segments are found eachword is then applied to the classifier learning Related to theirapproach in our previous work we propose pattern learningthat utilizes the regular expression surrounding drug namesand their compounds [18] The performance of our methodis quite good with the average F-score being 081 but has alimitation in dealing with more unstructured text data

In summarizing the related previous works on drugname entity extraction we noted some drawbacks whichneed to be addressed In general almost all state-of-the-artmethods work based on ad hoc external knowledge whichis not always available The requirement of the handcraftedfeature is another difficult constraint since not all datasetscontain such feature An additional challenge that remainedunsolved by the previous works is the problem of multipletokens representation for a single drug name This studyproposes a new data representation technique to handle thosechallenges

Our proposed method is based only on the data distribu-tion pattern and vector of words characteristics so there is noneed for external knowledge nor additional handcrafted fea-tures To overcome themultiple tokens problemwe propose anew technique which treats a target entity as a set of tokens (atuple) at once rather than treating the target entity as a singletoken surrounded by other tokens such as those used by [16]or [19] By addressing a set of the tokens as a single sampleour proposed method can predict whether a set of tokensis a drug name or not In our first experiment we evaluatethe first and second data representation techniques and applyMLP learning model In our second scenario we choose thethe second technique which gave the best result with MLPand apply it to two different machine learningmethods DBNand SAE In our third experiment we examined the thirddata representation technique which utilizes the Euclidiandistance between successive words in a certain sentence ofmedical text The third data representation is then fed intoan LSTM model Based on the resulting F-score value thesecond experiment gives the best performance

3 Method and Material

31 Framework In this study using the word2vec valuecharacteristics we conducted three experiments based ondifferent data representation techniquesThe first and secondexperiment examine conventional tuple data representationwhereas the third experiment examines sequence data rep-resentation We describe the organization of these threeexperiments in this section In general the proposed methodto extract drug name entities in this study consists of twomain phases The first phase is a data representation to for-mulate the feature representation In the second phase modeltraining testing and their evaluation are then conducted toevaluate the performance of the proposed method

4 Computational Intelligence and Neuroscience

(4) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Word2vec training 1data set only

Word2vec training 2data set + Wiki

Data structure formatting 1 one sequence for all

Data structure formatting 2 one sequence for

each sentence

Datatraining

Datatesting

Labelling

Traininglabel

Testinglabel

Train (drugNER)

Test (drugNER)

Trainingsentences

Testingsentences

S

Training (MLP)

Model (MLP)

Testing (MLP)

Evaluation

E(1) Include all (2) 23 of

distribution

Predicteddrug

(3) xy part of yclusters (xlt y)

Figure 1 Proposed approach framework of the first experiment

The proposedmethod of the first experiment consists of 4steps (see Figure 1) The first step is a data representation for-mulation The output of the first step is the tuples of trainingand testing datasetThe second step is dataset labelling whichis applied to both testing and training dataThe step providesthe label of each tupleThe third step is the candidate selectionwhich is performed to minimize the noises since the actualdrug target quantity is far less compared to nondrug nameIn the last step we performed the experiment with MLP-NNmodel and its result evaluation The detailed explanation ofeach step is explained in Sections 34 36 and 39 whereasSections 32 and 33 describe training data analysis as thefoundation of this proposed method As a part of the firstexperiment we also evaluate the impact of the usage of theEuclidean distance average as themodelrsquos regularizationThisregularization term is described in Section 371

The framework of the second experiment which involvesDBN and SAE learning model to the second data represen-tation technique is illustrated in Figure 2 In general thesteps of the second experiment are similar to the first onewith its differences being the data representation used andthe learning model involved In the second experiment thesecond technique is used only with DBN and SAE as thelearning model

The framework of the third experiment using the LSTMis illustrated in Figure 3 There are tree steps in the thirdexperiment The first step is sequence data representationformulation which provides both sequence training data andtesting dataThe second step is data labelling which generatesthe label of training and testing data LSTM experimentand its result evaluation are performed in the third stepThe detailed description of these three steps is presented inSections 34 343 and 39 as well

32 Training Data Analysis Each of the sentences in thedataset contains four data types that is drug group brandand drug-n If the sentence contains none of those four typesthe type value is null In the study we extracted drug anddrug-n Overall in both DrugBank andMedLine datasets thequantity of drug name target is far less compared to the non-drug target Segura-Bedmar et al [15] present the first basicstatistics of the dataset Amore detailed exploration regardingtoken distribution in the training dataset is described in thissection The MedLine sentences training dataset contains25783 single tokens which consist of 4003 unique tokensThose tokens distributions are not uniformbut are dominatedby a small part of some unique tokens If all of the uniquetokens are arranged and ranked based on the most frequentappearances in the sentences the quartile distribution willhave the following result presented in Figure 41198761 representstoken numbers 1 to 1001 whose total of frequency is 206881198762 represents token numbers 1002 to 2002 whose total offrequency is 28491198763 represents token numbers 2003 to 3002whose total of frequency is 1264 and 1198764 represents tokennumbers 3003 to 4003 whose total of frequency is 1000 Thefigure shows that the majority of appearances are dominatedby only a small amount of the total tokens

Further analysis of the dataset tokens shows that most ofthe drug names of the targeted token rarely appear in thedataset When we divide those token collections into threepartitions based on their sum of frequency as presented inTable 1 it is shown that all of the drug name entities targetedare contained in 23 part with less frequent appearances ofeach token (a unique token in the same sum of frequency)A similar pattern of training data token distribution alsoemerged in the DrugBank dataset as illustrated in Figure 5and Table 2 When we look into specific token distributions

Computational Intelligence and Neuroscience 5

(4) Experiment amp evaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Data training

Data testing

Labelling

Training label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (DBN SAE)

(DBN SAE)

(DBN SAE)

Model

Testing

Evaluation

E

Predicted drug

(1) Include all (2) 23 ofdistribution

(3) xy part of y

Data structure formatting 2 one sequence for

each sentenceWord2vec training 2

data set + Wiki

clusters (xlt y)

Figure 2 Proposed approach framework of the second experiment

(3) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

Preprocessing Word2vec training2 data set + Wiki

Sequence data structure formatting

Datatraining

Datatesting

LabellingTraining

label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (LSTM)

Model (LSTM)

Testing (LSTM)

Evaluation

E

Predicteddrug

Figure 3 Proposed approach framework of the third experiment

the position of most of the drug name targets is in the thirdpart since the most frequently appearing words in the firstand the secondparts are themost commonwords such as stopwords (ldquoofrdquo ldquotherdquo ldquoardquo ldquoendrdquo ldquotordquo ldquowhererdquo ldquoasrdquo ldquofromrdquo andsuch kind of words) and common words in medical domainsuch as ldquoadministratorrdquo ldquopatientrdquo ldquoeffectrdquo and ldquodoserdquo

33 Word Embedding Analysis To represent the dataset weutilized the word embedding model proposed by Mikolovet al [13] We treated all of the sentences as a corpus afterthe training dataset and testing dataset were combined Theused word2vec training model was the CBOW (Continuous

Bag Of Words) model with context window length 5 and thevector dimension 100 The result of the word2vec trainingis the representation of word in 100 dimension row vectorsBased on the row vector the similarities or dissimilaritiesbetween words can be estimatedThe description below is theanalysis summary of word2vec representation result which isused as a base reference for the data representation techniqueand the experiment scenarios By taking some sample of drugtargets and nondrug vector representation it is shown thatdrug word has more similarities (cosine distance) to anotherdrug than to nondrug and vice versa Some of those samplesare illustrated in Table 3 We also computed the Euclidean

6 Computational Intelligence and Neuroscience

20688

28491246 1000

0

5000

10000

15000

20000

25000

Q1 Q2 Q3 Q4

Figure 4 Distribution of MedLine train dataset token

Table 1 The frequency distribution and drug target token positionMedLine

13 Σ sample Σ frequency Σ single token of drug entity1 28 8661 mdash2 410 8510 503 3563 8612 262

Table 2The frequency distribution and drug target token positionDrugBank

13 Σ sample Σ frequency Σ single token of drug entity1 27 33538 mdash2 332 33463 333 5501 33351 920

distance between all of thewords Table 4 shows the average ofEuclidean distance and cosine distance between drug-drugdrug-nondrug and nondrug-nondrug These values of theaverage distance showus that intuitively it is feasible to groupthe collection of the words into drug group and nondruggroup based on their vector representations value

34 Feature Representation Data Formatting andData Label-ling Based on the training data and word embedding anal-ysis we formulate the feature representation and its dataformatting In the first and second techniques we try toovercome the multiple tokens drawback left unsolved in [12]by formatting single input data as an 119873-gram model with119873 = 5 (one tuple piece of data consists of 5 tokens) toaccommodate the maximum token which acts as a singledrug entity target name The tuples were provided from thesentences of both training and testing data Thus we have aset of tuples of training data and a set of tuples of testing dataEach tuple was treated as a single input

To identify a single input whether it is a nondrug or drugtarget we use a multiclassification approach which classifiesthe single input into one of six classes Class 1 representsnondrug whereas the other classes represent drug targetwhich also identified how many tokens (words) perform thedrug target To identify which class a certain tuple belongs tothe following is determinedThe drug tuple is the tuple whosefirst token (token-1) is the drug type If token-1 is not a drug

90550

6162 2199 14410

20000

40000

60000

80000

100000

Q1 Q2 Q3 Q4

Figure 5 Distribution of DrugBank train dataset token

regardless of whatever the rest of the 4 tokens are then thetuple is classified as no drugThis kind of tuple is identified asclass 1 If token-1 is a drug and token-2 is not a drug regardlessof the last 3 tokens the tuple will be identified as class 2 andso on

Since we only extracted the drug entity we ignored theother token types whether it is a group brand or anothercommon token To provide the label of each tuple we onlyuse the drug and drug-n types as the tuple reference listIn general if the sequence of token in each tuple in datasetcontains the sequence which is exactly the same with oneof tuple reference list members then the tuple in dataset isidentified as drug entity The detail of the algorithm usedto provide the label of each tuple in both training data andtesting data is described in Algorithm 1

We proposed two techniques in constructing the tupleset of the sentences The first technique treats all sentencesas one sequence whereas in the second technique eachsentence is processed as one sequence The first and thesecond techniques are evaluated with MLP DBN and SAEmodel The third technique treats the sentences of datasetas a sequence where the occurrence of the current token isinfluenced by the previous one By treating the sentence asa sequence not only in the data representation but also inthe classification and recognition process the most suitablemodel to be used is RNNWe appliedRNN-LSTM to the thirdtechnique

341 First Technique The first dataset formatting (one se-quence for all sentences) is performed as follows In the firststep all sentences in the dataset are formatted as a tokensequence Let the token sequence be

11990511199052119905311990541199055119905611990571199058 sdot sdot sdot 119905119899 (1)

with 119899 being number of tokens in the sequences then thedataset format will be

11990511199052119905311990541199055 11990521199053119905411990551199056 sdot sdot sdot 119905119899minus4119905119899minus3119905119899minus2119905119899minus1119905119899 (2)

A sample of sentences and their drug names are presentedin Table 5 Taken from DrugBank training data Table 5 isthe raw data of 3 samples with three relevant fields thatis sentences character drug position and the drug nameTable 6 illustrates a portion of the dataset and its label as theresult of the raw data in Table 5 Referring to the drug-n namefield in the dataset dataset number 6 is identified as a drug

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

2 Computational Intelligence and Neuroscience

retrieval decision support system drug development or drugdiscovery) [7]

Compared to other NER (name entity recognition) taskssuch as PERSON LOCATION EVENT or TIME drug nameentity recognition facesmore challenges First the drug nameentities are usually unstructured texts [8] where the numberof new entities is quickly growing over timeThus it is hard tocreate a dictionary which always includes the entire lexiconand is up-to-date [9] Second the naming of the drug alsowidely varies The abbreviation and acronym increase thedifficulties in determining the concepts referred to by theterms Third many drug names contain a combination ofnonword and word symbols [10] Fourth the other problemin drug name extraction is that a single drug name might berepresented by multiple tokens [11] Due to the complexity inextracting multiple tokens for drugs some researchers suchas [12] even ignore that case in the MedLine and DrugBanktraining with the reason that the multiple tokens drug isonly 18 of all drug names It is different with anotherdomain that is entity names in the biomedical field areusually longer Fifth in some cases the drug name is acombination of medical and general terms Sixth the lack ofthe labelled dataset is another problem it has yet to be solvedby extracting the drug name entities

This paper presents three data representation techniquesto extract drug name entities contained in the sentences ofmedical texts For the first and the second techniques wecreated an instance of the dataset as a tuple which is formedfrom 5 vectors of words In the first technique the tuplewas constructed from all sentences treated as a sequencewhereas in the second technique the tuple is made fromeach sentence treated as a sequence The first and secondtechniques were evaluated with the standardMLP-NNmodelwhich is performed in the first experiment In the secondexperiment we use the second data representation techniquewhich is also applied to the other NN model that is DBNand SAE The third data representation which assumes thetext as sequential entities was assessed with the recurrentNNmodel LSTM Those three data representation techniquesare based on the word2vec value characteristics that is theircosine and the Euclidean distance between the vectors ofwords

In the first and second techniques we apply three dif-ferent scenarios to select the most possible words whichrepresent the drug name The scenarios are based on thecharacteristics of training data that is drug words distri-bution that is usually assumed to have a smaller frequencyof appearance in the dataset sentences The drug namecandidate selections are as follows In the first case all testdataset is taken In the second case 23 of all test dataset isselected In the third case 119909119910 (119909 lt 119910) of the test dataset(where 119909 and 119910 are arbitrary integer numbers) are selectedafter clustering the test dataset into 119910 clusters

In the third experiment based on the characteristics ofthe resulting word vectors of the trained word embeddingwe formulate a sequence data representation applied to RNN-LSTMWe used the Euclidian distance of the current input tothe previous input as an additional feature besides its vector of

words In this study the vector of words is provided by wordembedding methods proposed by Mikolov et al [13]

Our main important contributions in this study are

(1) the new data representation techniques which donot require any external knowledge nor handcraftedfeatures

(2) the drug extraction techniques based on the wordsdistribution contained in the training data

Our proposed method is evaluated on DrugBank andMedLine medical open dataset obtained from SemEval 2013Competition task 91 see httpswwwcsyorkacuksemeval-2013task9 which is also used by [11 12 14] The format ofbothmedical texts is in Englishwhere some sentences containdrug name entities In extracting drug entity names fromthe dataset our data representation techniques give the bestperformance with F-score values 0687 for MLP 06700 forDBN and 0682 for SAE whereas the third technique withLSTM gives the best F-score that is 09430 The averageF-score of the third technique is 08645 that is the bestperformance compared to the other previous methods

By applying the data representation techniques our pro-posed approach provides at least three advantages

(1) The capability to identify multiple tokens as a singlename entity

(2) The ability to deal with the absence of any externalknowledge in certain languages

(3) No need to construct any additional features suchas characters type identification orthography feature(lowercase or uppercase identification) or token posi-tion

The rest of the sections of this paper are organized asfollows Section 2 explains some previous works dealing withname entity (and drug name as well) extraction frommedicaltext sources The framework approach and methodologyto overcome the challenges of drug name extraction arepresented in Section 3 The section also describes datasetmaterials and experiment scenarios Section 4 discusses theexperiment results and its analysis while Section 5 explainsthe achievement the shortcoming and the prospects of thisstudy The section also describes several potential explo-rations for future research

2 Related Works

The entity recognition in a biomedical text is an activeresearch and many methods have been proposed For exam-ple Pal and Gosal [9] summarize their survey on variousentity recognition approaches The approaches can be cate-gorized into three models dictionary based rule-based andlearning based methods [2 8] A dictionary based approachuses a list of terms (term collection) to assist in predictingwhich targeted entity will be included in the predicted groupAlthough their overall precision is more accurate their recallis poor since they anticipate less new terms The rule-basedapproach defines a certain rule which describes such pattern

Computational Intelligence and Neuroscience 3

formation surrounding the targeted entity This rule can be asyntactic term or lexical term Finally the learning approachis usually based on statistical data characteristics to builda model using machine learning techniques The model iscapable of automatic learning based on positive neutral andnegative training data

Drug name extraction and their classification are oneof the challenges in the Semantic Evaluation Task (SemEval2013) The best-reported performance for this challenge was715 in F-score [15] Until now the studies to extract drugnames still continue and many approaches have been pro-posed CRF-based learning is the most commonmethod uti-lized in the clinical text information extraction CRF is usedby one of the best [11] participants in SemEval challenges inthe clinical text ( httpswwwcsyorkacuksemeval-2013)As for the use of external knowledge aimed at increasing theperformance the author [11] uses ChEBI (Chemical Entitiesof Biological Interest) that is a dictionary of small molecularentitiesThe best achieved performance is 057 in F-score (forthe overall dataset)

A hybrid approach model which combines statisticallearning and dictionary based is proposed by [16] In theirstudy the author utilizes word2vec representation CRFlearning model and DINTO a drug ontology With thisword2vec representation targeted drug is treated as a currenttoken in context windows which consists of three tokens onthe left and three tokens on the right Additional featuresare included in the data representation such as pos tagslemma in the windows context and an orthography featureas uppercase lowercase and mixed capThe author also usedWikipedia text as an additional resource to performword2vecrepresentation training The best F-score value in extractingthe drug name provided by the method is 072

The result of CRF-based active learning which is appliedtoNERBIO (Beginning InsideOutput) annotation token forextracting name entity in the clinical text is presented in [17]The framework of this active learning approach is a sequentialprocess initial model generation querying training anditerationThe CRF Algorithm BIO approach was also studiedby Ben Abacha et al [14] The features for the CRF algorithmare formulated based on token and linguistics feature andsemantic feature The best F-score achieved by this proposedmethod is 072

Korkontzelos et al studied a combination of aggregatedclassifier maximum entropy-multinomial classifier andhandcrafted feature to extract drug entity [4] They classifieddrug and nondrug based on the token features formulationsuch as tokens windows the current token and 8 otherhandcrafted features

Another approach for discovering valuable informationfrom clinical text data that adopts event-location extractionmodel was examined by Bjorne et al [12] They use an SVMclassifier to predict drug or nondrug entity which is appliedto DrugBank datasetThe best performance achieved by theirmethod is 06 in F-score The drawback of their approach isthat it only deals with a single token drug name

To overcome the ambiguity problem in NERmined froma medical corpus a segment representation method has alsobeen proposed by Keretna et al [8] Their approach treats

each word as belonging to three classes that is NE not NEand an ambiguous class The ambiguity of the class memberis determined by identifying whether the word appears inmore than one context or not If so this word falls into theambiguous class After three class segments are found eachword is then applied to the classifier learning Related to theirapproach in our previous work we propose pattern learningthat utilizes the regular expression surrounding drug namesand their compounds [18] The performance of our methodis quite good with the average F-score being 081 but has alimitation in dealing with more unstructured text data

In summarizing the related previous works on drugname entity extraction we noted some drawbacks whichneed to be addressed In general almost all state-of-the-artmethods work based on ad hoc external knowledge whichis not always available The requirement of the handcraftedfeature is another difficult constraint since not all datasetscontain such feature An additional challenge that remainedunsolved by the previous works is the problem of multipletokens representation for a single drug name This studyproposes a new data representation technique to handle thosechallenges

Our proposed method is based only on the data distribu-tion pattern and vector of words characteristics so there is noneed for external knowledge nor additional handcrafted fea-tures To overcome themultiple tokens problemwe propose anew technique which treats a target entity as a set of tokens (atuple) at once rather than treating the target entity as a singletoken surrounded by other tokens such as those used by [16]or [19] By addressing a set of the tokens as a single sampleour proposed method can predict whether a set of tokensis a drug name or not In our first experiment we evaluatethe first and second data representation techniques and applyMLP learning model In our second scenario we choose thethe second technique which gave the best result with MLPand apply it to two different machine learningmethods DBNand SAE In our third experiment we examined the thirddata representation technique which utilizes the Euclidiandistance between successive words in a certain sentence ofmedical text The third data representation is then fed intoan LSTM model Based on the resulting F-score value thesecond experiment gives the best performance

3 Method and Material

31 Framework In this study using the word2vec valuecharacteristics we conducted three experiments based ondifferent data representation techniquesThe first and secondexperiment examine conventional tuple data representationwhereas the third experiment examines sequence data rep-resentation We describe the organization of these threeexperiments in this section In general the proposed methodto extract drug name entities in this study consists of twomain phases The first phase is a data representation to for-mulate the feature representation In the second phase modeltraining testing and their evaluation are then conducted toevaluate the performance of the proposed method

4 Computational Intelligence and Neuroscience

(4) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Word2vec training 1data set only

Word2vec training 2data set + Wiki

Data structure formatting 1 one sequence for all

Data structure formatting 2 one sequence for

each sentence

Datatraining

Datatesting

Labelling

Traininglabel

Testinglabel

Train (drugNER)

Test (drugNER)

Trainingsentences

Testingsentences

S

Training (MLP)

Model (MLP)

Testing (MLP)

Evaluation

E(1) Include all (2) 23 of

distribution

Predicteddrug

(3) xy part of yclusters (xlt y)

Figure 1 Proposed approach framework of the first experiment

The proposedmethod of the first experiment consists of 4steps (see Figure 1) The first step is a data representation for-mulation The output of the first step is the tuples of trainingand testing datasetThe second step is dataset labelling whichis applied to both testing and training dataThe step providesthe label of each tupleThe third step is the candidate selectionwhich is performed to minimize the noises since the actualdrug target quantity is far less compared to nondrug nameIn the last step we performed the experiment with MLP-NNmodel and its result evaluation The detailed explanation ofeach step is explained in Sections 34 36 and 39 whereasSections 32 and 33 describe training data analysis as thefoundation of this proposed method As a part of the firstexperiment we also evaluate the impact of the usage of theEuclidean distance average as themodelrsquos regularizationThisregularization term is described in Section 371

The framework of the second experiment which involvesDBN and SAE learning model to the second data represen-tation technique is illustrated in Figure 2 In general thesteps of the second experiment are similar to the first onewith its differences being the data representation used andthe learning model involved In the second experiment thesecond technique is used only with DBN and SAE as thelearning model

The framework of the third experiment using the LSTMis illustrated in Figure 3 There are tree steps in the thirdexperiment The first step is sequence data representationformulation which provides both sequence training data andtesting dataThe second step is data labelling which generatesthe label of training and testing data LSTM experimentand its result evaluation are performed in the third stepThe detailed description of these three steps is presented inSections 34 343 and 39 as well

32 Training Data Analysis Each of the sentences in thedataset contains four data types that is drug group brandand drug-n If the sentence contains none of those four typesthe type value is null In the study we extracted drug anddrug-n Overall in both DrugBank andMedLine datasets thequantity of drug name target is far less compared to the non-drug target Segura-Bedmar et al [15] present the first basicstatistics of the dataset Amore detailed exploration regardingtoken distribution in the training dataset is described in thissection The MedLine sentences training dataset contains25783 single tokens which consist of 4003 unique tokensThose tokens distributions are not uniformbut are dominatedby a small part of some unique tokens If all of the uniquetokens are arranged and ranked based on the most frequentappearances in the sentences the quartile distribution willhave the following result presented in Figure 41198761 representstoken numbers 1 to 1001 whose total of frequency is 206881198762 represents token numbers 1002 to 2002 whose total offrequency is 28491198763 represents token numbers 2003 to 3002whose total of frequency is 1264 and 1198764 represents tokennumbers 3003 to 4003 whose total of frequency is 1000 Thefigure shows that the majority of appearances are dominatedby only a small amount of the total tokens

Further analysis of the dataset tokens shows that most ofthe drug names of the targeted token rarely appear in thedataset When we divide those token collections into threepartitions based on their sum of frequency as presented inTable 1 it is shown that all of the drug name entities targetedare contained in 23 part with less frequent appearances ofeach token (a unique token in the same sum of frequency)A similar pattern of training data token distribution alsoemerged in the DrugBank dataset as illustrated in Figure 5and Table 2 When we look into specific token distributions

Computational Intelligence and Neuroscience 5

(4) Experiment amp evaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Data training

Data testing

Labelling

Training label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (DBN SAE)

(DBN SAE)

(DBN SAE)

Model

Testing

Evaluation

E

Predicted drug

(1) Include all (2) 23 ofdistribution

(3) xy part of y

Data structure formatting 2 one sequence for

each sentenceWord2vec training 2

data set + Wiki

clusters (xlt y)

Figure 2 Proposed approach framework of the second experiment

(3) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

Preprocessing Word2vec training2 data set + Wiki

Sequence data structure formatting

Datatraining

Datatesting

LabellingTraining

label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (LSTM)

Model (LSTM)

Testing (LSTM)

Evaluation

E

Predicteddrug

Figure 3 Proposed approach framework of the third experiment

the position of most of the drug name targets is in the thirdpart since the most frequently appearing words in the firstand the secondparts are themost commonwords such as stopwords (ldquoofrdquo ldquotherdquo ldquoardquo ldquoendrdquo ldquotordquo ldquowhererdquo ldquoasrdquo ldquofromrdquo andsuch kind of words) and common words in medical domainsuch as ldquoadministratorrdquo ldquopatientrdquo ldquoeffectrdquo and ldquodoserdquo

33 Word Embedding Analysis To represent the dataset weutilized the word embedding model proposed by Mikolovet al [13] We treated all of the sentences as a corpus afterthe training dataset and testing dataset were combined Theused word2vec training model was the CBOW (Continuous

Bag Of Words) model with context window length 5 and thevector dimension 100 The result of the word2vec trainingis the representation of word in 100 dimension row vectorsBased on the row vector the similarities or dissimilaritiesbetween words can be estimatedThe description below is theanalysis summary of word2vec representation result which isused as a base reference for the data representation techniqueand the experiment scenarios By taking some sample of drugtargets and nondrug vector representation it is shown thatdrug word has more similarities (cosine distance) to anotherdrug than to nondrug and vice versa Some of those samplesare illustrated in Table 3 We also computed the Euclidean

6 Computational Intelligence and Neuroscience

20688

28491246 1000

0

5000

10000

15000

20000

25000

Q1 Q2 Q3 Q4

Figure 4 Distribution of MedLine train dataset token

Table 1 The frequency distribution and drug target token positionMedLine

13 Σ sample Σ frequency Σ single token of drug entity1 28 8661 mdash2 410 8510 503 3563 8612 262

Table 2The frequency distribution and drug target token positionDrugBank

13 Σ sample Σ frequency Σ single token of drug entity1 27 33538 mdash2 332 33463 333 5501 33351 920

distance between all of thewords Table 4 shows the average ofEuclidean distance and cosine distance between drug-drugdrug-nondrug and nondrug-nondrug These values of theaverage distance showus that intuitively it is feasible to groupthe collection of the words into drug group and nondruggroup based on their vector representations value

34 Feature Representation Data Formatting andData Label-ling Based on the training data and word embedding anal-ysis we formulate the feature representation and its dataformatting In the first and second techniques we try toovercome the multiple tokens drawback left unsolved in [12]by formatting single input data as an 119873-gram model with119873 = 5 (one tuple piece of data consists of 5 tokens) toaccommodate the maximum token which acts as a singledrug entity target name The tuples were provided from thesentences of both training and testing data Thus we have aset of tuples of training data and a set of tuples of testing dataEach tuple was treated as a single input

To identify a single input whether it is a nondrug or drugtarget we use a multiclassification approach which classifiesthe single input into one of six classes Class 1 representsnondrug whereas the other classes represent drug targetwhich also identified how many tokens (words) perform thedrug target To identify which class a certain tuple belongs tothe following is determinedThe drug tuple is the tuple whosefirst token (token-1) is the drug type If token-1 is not a drug

90550

6162 2199 14410

20000

40000

60000

80000

100000

Q1 Q2 Q3 Q4

Figure 5 Distribution of DrugBank train dataset token

regardless of whatever the rest of the 4 tokens are then thetuple is classified as no drugThis kind of tuple is identified asclass 1 If token-1 is a drug and token-2 is not a drug regardlessof the last 3 tokens the tuple will be identified as class 2 andso on

Since we only extracted the drug entity we ignored theother token types whether it is a group brand or anothercommon token To provide the label of each tuple we onlyuse the drug and drug-n types as the tuple reference listIn general if the sequence of token in each tuple in datasetcontains the sequence which is exactly the same with oneof tuple reference list members then the tuple in dataset isidentified as drug entity The detail of the algorithm usedto provide the label of each tuple in both training data andtesting data is described in Algorithm 1

We proposed two techniques in constructing the tupleset of the sentences The first technique treats all sentencesas one sequence whereas in the second technique eachsentence is processed as one sequence The first and thesecond techniques are evaluated with MLP DBN and SAEmodel The third technique treats the sentences of datasetas a sequence where the occurrence of the current token isinfluenced by the previous one By treating the sentence asa sequence not only in the data representation but also inthe classification and recognition process the most suitablemodel to be used is RNNWe appliedRNN-LSTM to the thirdtechnique

341 First Technique The first dataset formatting (one se-quence for all sentences) is performed as follows In the firststep all sentences in the dataset are formatted as a tokensequence Let the token sequence be

11990511199052119905311990541199055119905611990571199058 sdot sdot sdot 119905119899 (1)

with 119899 being number of tokens in the sequences then thedataset format will be

11990511199052119905311990541199055 11990521199053119905411990551199056 sdot sdot sdot 119905119899minus4119905119899minus3119905119899minus2119905119899minus1119905119899 (2)

A sample of sentences and their drug names are presentedin Table 5 Taken from DrugBank training data Table 5 isthe raw data of 3 samples with three relevant fields thatis sentences character drug position and the drug nameTable 6 illustrates a portion of the dataset and its label as theresult of the raw data in Table 5 Referring to the drug-n namefield in the dataset dataset number 6 is identified as a drug

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 3

formation surrounding the targeted entity This rule can be asyntactic term or lexical term Finally the learning approachis usually based on statistical data characteristics to builda model using machine learning techniques The model iscapable of automatic learning based on positive neutral andnegative training data

Drug name extraction and their classification are oneof the challenges in the Semantic Evaluation Task (SemEval2013) The best-reported performance for this challenge was715 in F-score [15] Until now the studies to extract drugnames still continue and many approaches have been pro-posed CRF-based learning is the most commonmethod uti-lized in the clinical text information extraction CRF is usedby one of the best [11] participants in SemEval challenges inthe clinical text ( httpswwwcsyorkacuksemeval-2013)As for the use of external knowledge aimed at increasing theperformance the author [11] uses ChEBI (Chemical Entitiesof Biological Interest) that is a dictionary of small molecularentitiesThe best achieved performance is 057 in F-score (forthe overall dataset)

A hybrid approach model which combines statisticallearning and dictionary based is proposed by [16] In theirstudy the author utilizes word2vec representation CRFlearning model and DINTO a drug ontology With thisword2vec representation targeted drug is treated as a currenttoken in context windows which consists of three tokens onthe left and three tokens on the right Additional featuresare included in the data representation such as pos tagslemma in the windows context and an orthography featureas uppercase lowercase and mixed capThe author also usedWikipedia text as an additional resource to performword2vecrepresentation training The best F-score value in extractingthe drug name provided by the method is 072

The result of CRF-based active learning which is appliedtoNERBIO (Beginning InsideOutput) annotation token forextracting name entity in the clinical text is presented in [17]The framework of this active learning approach is a sequentialprocess initial model generation querying training anditerationThe CRF Algorithm BIO approach was also studiedby Ben Abacha et al [14] The features for the CRF algorithmare formulated based on token and linguistics feature andsemantic feature The best F-score achieved by this proposedmethod is 072

Korkontzelos et al studied a combination of aggregatedclassifier maximum entropy-multinomial classifier andhandcrafted feature to extract drug entity [4] They classifieddrug and nondrug based on the token features formulationsuch as tokens windows the current token and 8 otherhandcrafted features

Another approach for discovering valuable informationfrom clinical text data that adopts event-location extractionmodel was examined by Bjorne et al [12] They use an SVMclassifier to predict drug or nondrug entity which is appliedto DrugBank datasetThe best performance achieved by theirmethod is 06 in F-score The drawback of their approach isthat it only deals with a single token drug name

To overcome the ambiguity problem in NERmined froma medical corpus a segment representation method has alsobeen proposed by Keretna et al [8] Their approach treats

each word as belonging to three classes that is NE not NEand an ambiguous class The ambiguity of the class memberis determined by identifying whether the word appears inmore than one context or not If so this word falls into theambiguous class After three class segments are found eachword is then applied to the classifier learning Related to theirapproach in our previous work we propose pattern learningthat utilizes the regular expression surrounding drug namesand their compounds [18] The performance of our methodis quite good with the average F-score being 081 but has alimitation in dealing with more unstructured text data

In summarizing the related previous works on drugname entity extraction we noted some drawbacks whichneed to be addressed In general almost all state-of-the-artmethods work based on ad hoc external knowledge whichis not always available The requirement of the handcraftedfeature is another difficult constraint since not all datasetscontain such feature An additional challenge that remainedunsolved by the previous works is the problem of multipletokens representation for a single drug name This studyproposes a new data representation technique to handle thosechallenges

Our proposed method is based only on the data distribu-tion pattern and vector of words characteristics so there is noneed for external knowledge nor additional handcrafted fea-tures To overcome themultiple tokens problemwe propose anew technique which treats a target entity as a set of tokens (atuple) at once rather than treating the target entity as a singletoken surrounded by other tokens such as those used by [16]or [19] By addressing a set of the tokens as a single sampleour proposed method can predict whether a set of tokensis a drug name or not In our first experiment we evaluatethe first and second data representation techniques and applyMLP learning model In our second scenario we choose thethe second technique which gave the best result with MLPand apply it to two different machine learningmethods DBNand SAE In our third experiment we examined the thirddata representation technique which utilizes the Euclidiandistance between successive words in a certain sentence ofmedical text The third data representation is then fed intoan LSTM model Based on the resulting F-score value thesecond experiment gives the best performance

3 Method and Material

31 Framework In this study using the word2vec valuecharacteristics we conducted three experiments based ondifferent data representation techniquesThe first and secondexperiment examine conventional tuple data representationwhereas the third experiment examines sequence data rep-resentation We describe the organization of these threeexperiments in this section In general the proposed methodto extract drug name entities in this study consists of twomain phases The first phase is a data representation to for-mulate the feature representation In the second phase modeltraining testing and their evaluation are then conducted toevaluate the performance of the proposed method

4 Computational Intelligence and Neuroscience

(4) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Word2vec training 1data set only

Word2vec training 2data set + Wiki

Data structure formatting 1 one sequence for all

Data structure formatting 2 one sequence for

each sentence

Datatraining

Datatesting

Labelling

Traininglabel

Testinglabel

Train (drugNER)

Test (drugNER)

Trainingsentences

Testingsentences

S

Training (MLP)

Model (MLP)

Testing (MLP)

Evaluation

E(1) Include all (2) 23 of

distribution

Predicteddrug

(3) xy part of yclusters (xlt y)

Figure 1 Proposed approach framework of the first experiment

The proposedmethod of the first experiment consists of 4steps (see Figure 1) The first step is a data representation for-mulation The output of the first step is the tuples of trainingand testing datasetThe second step is dataset labelling whichis applied to both testing and training dataThe step providesthe label of each tupleThe third step is the candidate selectionwhich is performed to minimize the noises since the actualdrug target quantity is far less compared to nondrug nameIn the last step we performed the experiment with MLP-NNmodel and its result evaluation The detailed explanation ofeach step is explained in Sections 34 36 and 39 whereasSections 32 and 33 describe training data analysis as thefoundation of this proposed method As a part of the firstexperiment we also evaluate the impact of the usage of theEuclidean distance average as themodelrsquos regularizationThisregularization term is described in Section 371

The framework of the second experiment which involvesDBN and SAE learning model to the second data represen-tation technique is illustrated in Figure 2 In general thesteps of the second experiment are similar to the first onewith its differences being the data representation used andthe learning model involved In the second experiment thesecond technique is used only with DBN and SAE as thelearning model

The framework of the third experiment using the LSTMis illustrated in Figure 3 There are tree steps in the thirdexperiment The first step is sequence data representationformulation which provides both sequence training data andtesting dataThe second step is data labelling which generatesthe label of training and testing data LSTM experimentand its result evaluation are performed in the third stepThe detailed description of these three steps is presented inSections 34 343 and 39 as well

32 Training Data Analysis Each of the sentences in thedataset contains four data types that is drug group brandand drug-n If the sentence contains none of those four typesthe type value is null In the study we extracted drug anddrug-n Overall in both DrugBank andMedLine datasets thequantity of drug name target is far less compared to the non-drug target Segura-Bedmar et al [15] present the first basicstatistics of the dataset Amore detailed exploration regardingtoken distribution in the training dataset is described in thissection The MedLine sentences training dataset contains25783 single tokens which consist of 4003 unique tokensThose tokens distributions are not uniformbut are dominatedby a small part of some unique tokens If all of the uniquetokens are arranged and ranked based on the most frequentappearances in the sentences the quartile distribution willhave the following result presented in Figure 41198761 representstoken numbers 1 to 1001 whose total of frequency is 206881198762 represents token numbers 1002 to 2002 whose total offrequency is 28491198763 represents token numbers 2003 to 3002whose total of frequency is 1264 and 1198764 represents tokennumbers 3003 to 4003 whose total of frequency is 1000 Thefigure shows that the majority of appearances are dominatedby only a small amount of the total tokens

Further analysis of the dataset tokens shows that most ofthe drug names of the targeted token rarely appear in thedataset When we divide those token collections into threepartitions based on their sum of frequency as presented inTable 1 it is shown that all of the drug name entities targetedare contained in 23 part with less frequent appearances ofeach token (a unique token in the same sum of frequency)A similar pattern of training data token distribution alsoemerged in the DrugBank dataset as illustrated in Figure 5and Table 2 When we look into specific token distributions

Computational Intelligence and Neuroscience 5

(4) Experiment amp evaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Data training

Data testing

Labelling

Training label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (DBN SAE)

(DBN SAE)

(DBN SAE)

Model

Testing

Evaluation

E

Predicted drug

(1) Include all (2) 23 ofdistribution

(3) xy part of y

Data structure formatting 2 one sequence for

each sentenceWord2vec training 2

data set + Wiki

clusters (xlt y)

Figure 2 Proposed approach framework of the second experiment

(3) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

Preprocessing Word2vec training2 data set + Wiki

Sequence data structure formatting

Datatraining

Datatesting

LabellingTraining

label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (LSTM)

Model (LSTM)

Testing (LSTM)

Evaluation

E

Predicteddrug

Figure 3 Proposed approach framework of the third experiment

the position of most of the drug name targets is in the thirdpart since the most frequently appearing words in the firstand the secondparts are themost commonwords such as stopwords (ldquoofrdquo ldquotherdquo ldquoardquo ldquoendrdquo ldquotordquo ldquowhererdquo ldquoasrdquo ldquofromrdquo andsuch kind of words) and common words in medical domainsuch as ldquoadministratorrdquo ldquopatientrdquo ldquoeffectrdquo and ldquodoserdquo

33 Word Embedding Analysis To represent the dataset weutilized the word embedding model proposed by Mikolovet al [13] We treated all of the sentences as a corpus afterthe training dataset and testing dataset were combined Theused word2vec training model was the CBOW (Continuous

Bag Of Words) model with context window length 5 and thevector dimension 100 The result of the word2vec trainingis the representation of word in 100 dimension row vectorsBased on the row vector the similarities or dissimilaritiesbetween words can be estimatedThe description below is theanalysis summary of word2vec representation result which isused as a base reference for the data representation techniqueand the experiment scenarios By taking some sample of drugtargets and nondrug vector representation it is shown thatdrug word has more similarities (cosine distance) to anotherdrug than to nondrug and vice versa Some of those samplesare illustrated in Table 3 We also computed the Euclidean

6 Computational Intelligence and Neuroscience

20688

28491246 1000

0

5000

10000

15000

20000

25000

Q1 Q2 Q3 Q4

Figure 4 Distribution of MedLine train dataset token

Table 1 The frequency distribution and drug target token positionMedLine

13 Σ sample Σ frequency Σ single token of drug entity1 28 8661 mdash2 410 8510 503 3563 8612 262

Table 2The frequency distribution and drug target token positionDrugBank

13 Σ sample Σ frequency Σ single token of drug entity1 27 33538 mdash2 332 33463 333 5501 33351 920

distance between all of thewords Table 4 shows the average ofEuclidean distance and cosine distance between drug-drugdrug-nondrug and nondrug-nondrug These values of theaverage distance showus that intuitively it is feasible to groupthe collection of the words into drug group and nondruggroup based on their vector representations value

34 Feature Representation Data Formatting andData Label-ling Based on the training data and word embedding anal-ysis we formulate the feature representation and its dataformatting In the first and second techniques we try toovercome the multiple tokens drawback left unsolved in [12]by formatting single input data as an 119873-gram model with119873 = 5 (one tuple piece of data consists of 5 tokens) toaccommodate the maximum token which acts as a singledrug entity target name The tuples were provided from thesentences of both training and testing data Thus we have aset of tuples of training data and a set of tuples of testing dataEach tuple was treated as a single input

To identify a single input whether it is a nondrug or drugtarget we use a multiclassification approach which classifiesthe single input into one of six classes Class 1 representsnondrug whereas the other classes represent drug targetwhich also identified how many tokens (words) perform thedrug target To identify which class a certain tuple belongs tothe following is determinedThe drug tuple is the tuple whosefirst token (token-1) is the drug type If token-1 is not a drug

90550

6162 2199 14410

20000

40000

60000

80000

100000

Q1 Q2 Q3 Q4

Figure 5 Distribution of DrugBank train dataset token

regardless of whatever the rest of the 4 tokens are then thetuple is classified as no drugThis kind of tuple is identified asclass 1 If token-1 is a drug and token-2 is not a drug regardlessof the last 3 tokens the tuple will be identified as class 2 andso on

Since we only extracted the drug entity we ignored theother token types whether it is a group brand or anothercommon token To provide the label of each tuple we onlyuse the drug and drug-n types as the tuple reference listIn general if the sequence of token in each tuple in datasetcontains the sequence which is exactly the same with oneof tuple reference list members then the tuple in dataset isidentified as drug entity The detail of the algorithm usedto provide the label of each tuple in both training data andtesting data is described in Algorithm 1

We proposed two techniques in constructing the tupleset of the sentences The first technique treats all sentencesas one sequence whereas in the second technique eachsentence is processed as one sequence The first and thesecond techniques are evaluated with MLP DBN and SAEmodel The third technique treats the sentences of datasetas a sequence where the occurrence of the current token isinfluenced by the previous one By treating the sentence asa sequence not only in the data representation but also inthe classification and recognition process the most suitablemodel to be used is RNNWe appliedRNN-LSTM to the thirdtechnique

341 First Technique The first dataset formatting (one se-quence for all sentences) is performed as follows In the firststep all sentences in the dataset are formatted as a tokensequence Let the token sequence be

11990511199052119905311990541199055119905611990571199058 sdot sdot sdot 119905119899 (1)

with 119899 being number of tokens in the sequences then thedataset format will be

11990511199052119905311990541199055 11990521199053119905411990551199056 sdot sdot sdot 119905119899minus4119905119899minus3119905119899minus2119905119899minus1119905119899 (2)

A sample of sentences and their drug names are presentedin Table 5 Taken from DrugBank training data Table 5 isthe raw data of 3 samples with three relevant fields thatis sentences character drug position and the drug nameTable 6 illustrates a portion of the dataset and its label as theresult of the raw data in Table 5 Referring to the drug-n namefield in the dataset dataset number 6 is identified as a drug

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

4 Computational Intelligence and Neuroscience

(4) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Word2vec training 1data set only

Word2vec training 2data set + Wiki

Data structure formatting 1 one sequence for all

Data structure formatting 2 one sequence for

each sentence

Datatraining

Datatesting

Labelling

Traininglabel

Testinglabel

Train (drugNER)

Test (drugNER)

Trainingsentences

Testingsentences

S

Training (MLP)

Model (MLP)

Testing (MLP)

Evaluation

E(1) Include all (2) 23 of

distribution

Predicteddrug

(3) xy part of yclusters (xlt y)

Figure 1 Proposed approach framework of the first experiment

The proposedmethod of the first experiment consists of 4steps (see Figure 1) The first step is a data representation for-mulation The output of the first step is the tuples of trainingand testing datasetThe second step is dataset labelling whichis applied to both testing and training dataThe step providesthe label of each tupleThe third step is the candidate selectionwhich is performed to minimize the noises since the actualdrug target quantity is far less compared to nondrug nameIn the last step we performed the experiment with MLP-NNmodel and its result evaluation The detailed explanation ofeach step is explained in Sections 34 36 and 39 whereasSections 32 and 33 describe training data analysis as thefoundation of this proposed method As a part of the firstexperiment we also evaluate the impact of the usage of theEuclidean distance average as themodelrsquos regularizationThisregularization term is described in Section 371

The framework of the second experiment which involvesDBN and SAE learning model to the second data represen-tation technique is illustrated in Figure 2 In general thesteps of the second experiment are similar to the first onewith its differences being the data representation used andthe learning model involved In the second experiment thesecond technique is used only with DBN and SAE as thelearning model

The framework of the third experiment using the LSTMis illustrated in Figure 3 There are tree steps in the thirdexperiment The first step is sequence data representationformulation which provides both sequence training data andtesting dataThe second step is data labelling which generatesthe label of training and testing data LSTM experimentand its result evaluation are performed in the third stepThe detailed description of these three steps is presented inSections 34 343 and 39 as well

32 Training Data Analysis Each of the sentences in thedataset contains four data types that is drug group brandand drug-n If the sentence contains none of those four typesthe type value is null In the study we extracted drug anddrug-n Overall in both DrugBank andMedLine datasets thequantity of drug name target is far less compared to the non-drug target Segura-Bedmar et al [15] present the first basicstatistics of the dataset Amore detailed exploration regardingtoken distribution in the training dataset is described in thissection The MedLine sentences training dataset contains25783 single tokens which consist of 4003 unique tokensThose tokens distributions are not uniformbut are dominatedby a small part of some unique tokens If all of the uniquetokens are arranged and ranked based on the most frequentappearances in the sentences the quartile distribution willhave the following result presented in Figure 41198761 representstoken numbers 1 to 1001 whose total of frequency is 206881198762 represents token numbers 1002 to 2002 whose total offrequency is 28491198763 represents token numbers 2003 to 3002whose total of frequency is 1264 and 1198764 represents tokennumbers 3003 to 4003 whose total of frequency is 1000 Thefigure shows that the majority of appearances are dominatedby only a small amount of the total tokens

Further analysis of the dataset tokens shows that most ofthe drug names of the targeted token rarely appear in thedataset When we divide those token collections into threepartitions based on their sum of frequency as presented inTable 1 it is shown that all of the drug name entities targetedare contained in 23 part with less frequent appearances ofeach token (a unique token in the same sum of frequency)A similar pattern of training data token distribution alsoemerged in the DrugBank dataset as illustrated in Figure 5and Table 2 When we look into specific token distributions

Computational Intelligence and Neuroscience 5

(4) Experiment amp evaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Data training

Data testing

Labelling

Training label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (DBN SAE)

(DBN SAE)

(DBN SAE)

Model

Testing

Evaluation

E

Predicted drug

(1) Include all (2) 23 ofdistribution

(3) xy part of y

Data structure formatting 2 one sequence for

each sentenceWord2vec training 2

data set + Wiki

clusters (xlt y)

Figure 2 Proposed approach framework of the second experiment

(3) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

Preprocessing Word2vec training2 data set + Wiki

Sequence data structure formatting

Datatraining

Datatesting

LabellingTraining

label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (LSTM)

Model (LSTM)

Testing (LSTM)

Evaluation

E

Predicteddrug

Figure 3 Proposed approach framework of the third experiment

the position of most of the drug name targets is in the thirdpart since the most frequently appearing words in the firstand the secondparts are themost commonwords such as stopwords (ldquoofrdquo ldquotherdquo ldquoardquo ldquoendrdquo ldquotordquo ldquowhererdquo ldquoasrdquo ldquofromrdquo andsuch kind of words) and common words in medical domainsuch as ldquoadministratorrdquo ldquopatientrdquo ldquoeffectrdquo and ldquodoserdquo

33 Word Embedding Analysis To represent the dataset weutilized the word embedding model proposed by Mikolovet al [13] We treated all of the sentences as a corpus afterthe training dataset and testing dataset were combined Theused word2vec training model was the CBOW (Continuous

Bag Of Words) model with context window length 5 and thevector dimension 100 The result of the word2vec trainingis the representation of word in 100 dimension row vectorsBased on the row vector the similarities or dissimilaritiesbetween words can be estimatedThe description below is theanalysis summary of word2vec representation result which isused as a base reference for the data representation techniqueand the experiment scenarios By taking some sample of drugtargets and nondrug vector representation it is shown thatdrug word has more similarities (cosine distance) to anotherdrug than to nondrug and vice versa Some of those samplesare illustrated in Table 3 We also computed the Euclidean

6 Computational Intelligence and Neuroscience

20688

28491246 1000

0

5000

10000

15000

20000

25000

Q1 Q2 Q3 Q4

Figure 4 Distribution of MedLine train dataset token

Table 1 The frequency distribution and drug target token positionMedLine

13 Σ sample Σ frequency Σ single token of drug entity1 28 8661 mdash2 410 8510 503 3563 8612 262

Table 2The frequency distribution and drug target token positionDrugBank

13 Σ sample Σ frequency Σ single token of drug entity1 27 33538 mdash2 332 33463 333 5501 33351 920

distance between all of thewords Table 4 shows the average ofEuclidean distance and cosine distance between drug-drugdrug-nondrug and nondrug-nondrug These values of theaverage distance showus that intuitively it is feasible to groupthe collection of the words into drug group and nondruggroup based on their vector representations value

34 Feature Representation Data Formatting andData Label-ling Based on the training data and word embedding anal-ysis we formulate the feature representation and its dataformatting In the first and second techniques we try toovercome the multiple tokens drawback left unsolved in [12]by formatting single input data as an 119873-gram model with119873 = 5 (one tuple piece of data consists of 5 tokens) toaccommodate the maximum token which acts as a singledrug entity target name The tuples were provided from thesentences of both training and testing data Thus we have aset of tuples of training data and a set of tuples of testing dataEach tuple was treated as a single input

To identify a single input whether it is a nondrug or drugtarget we use a multiclassification approach which classifiesthe single input into one of six classes Class 1 representsnondrug whereas the other classes represent drug targetwhich also identified how many tokens (words) perform thedrug target To identify which class a certain tuple belongs tothe following is determinedThe drug tuple is the tuple whosefirst token (token-1) is the drug type If token-1 is not a drug

90550

6162 2199 14410

20000

40000

60000

80000

100000

Q1 Q2 Q3 Q4

Figure 5 Distribution of DrugBank train dataset token

regardless of whatever the rest of the 4 tokens are then thetuple is classified as no drugThis kind of tuple is identified asclass 1 If token-1 is a drug and token-2 is not a drug regardlessof the last 3 tokens the tuple will be identified as class 2 andso on

Since we only extracted the drug entity we ignored theother token types whether it is a group brand or anothercommon token To provide the label of each tuple we onlyuse the drug and drug-n types as the tuple reference listIn general if the sequence of token in each tuple in datasetcontains the sequence which is exactly the same with oneof tuple reference list members then the tuple in dataset isidentified as drug entity The detail of the algorithm usedto provide the label of each tuple in both training data andtesting data is described in Algorithm 1

We proposed two techniques in constructing the tupleset of the sentences The first technique treats all sentencesas one sequence whereas in the second technique eachsentence is processed as one sequence The first and thesecond techniques are evaluated with MLP DBN and SAEmodel The third technique treats the sentences of datasetas a sequence where the occurrence of the current token isinfluenced by the previous one By treating the sentence asa sequence not only in the data representation but also inthe classification and recognition process the most suitablemodel to be used is RNNWe appliedRNN-LSTM to the thirdtechnique

341 First Technique The first dataset formatting (one se-quence for all sentences) is performed as follows In the firststep all sentences in the dataset are formatted as a tokensequence Let the token sequence be

11990511199052119905311990541199055119905611990571199058 sdot sdot sdot 119905119899 (1)

with 119899 being number of tokens in the sequences then thedataset format will be

11990511199052119905311990541199055 11990521199053119905411990551199056 sdot sdot sdot 119905119899minus4119905119899minus3119905119899minus2119905119899minus1119905119899 (2)

A sample of sentences and their drug names are presentedin Table 5 Taken from DrugBank training data Table 5 isthe raw data of 3 samples with three relevant fields thatis sentences character drug position and the drug nameTable 6 illustrates a portion of the dataset and its label as theresult of the raw data in Table 5 Referring to the drug-n namefield in the dataset dataset number 6 is identified as a drug

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 5

(4) Experiment amp evaluation

(2) Dataset labelling

(1) Data representation formulation

(3) Candidate selection

Preprocessing

Data training

Data testing

Labelling

Training label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (DBN SAE)

(DBN SAE)

(DBN SAE)

Model

Testing

Evaluation

E

Predicted drug

(1) Include all (2) 23 ofdistribution

(3) xy part of y

Data structure formatting 2 one sequence for

each sentenceWord2vec training 2

data set + Wiki

clusters (xlt y)

Figure 2 Proposed approach framework of the second experiment

(3) Experiment ampevaluation

(2) Dataset labelling

(1) Data representation formulation

Preprocessing Word2vec training2 data set + Wiki

Sequence data structure formatting

Datatraining

Datatesting

LabellingTraining

label

Testinglabel

Train (drug NER)

Test (drug NER)

Trainingsentences

Testingsentences

S

Training (LSTM)

Model (LSTM)

Testing (LSTM)

Evaluation

E

Predicteddrug

Figure 3 Proposed approach framework of the third experiment

the position of most of the drug name targets is in the thirdpart since the most frequently appearing words in the firstand the secondparts are themost commonwords such as stopwords (ldquoofrdquo ldquotherdquo ldquoardquo ldquoendrdquo ldquotordquo ldquowhererdquo ldquoasrdquo ldquofromrdquo andsuch kind of words) and common words in medical domainsuch as ldquoadministratorrdquo ldquopatientrdquo ldquoeffectrdquo and ldquodoserdquo

33 Word Embedding Analysis To represent the dataset weutilized the word embedding model proposed by Mikolovet al [13] We treated all of the sentences as a corpus afterthe training dataset and testing dataset were combined Theused word2vec training model was the CBOW (Continuous

Bag Of Words) model with context window length 5 and thevector dimension 100 The result of the word2vec trainingis the representation of word in 100 dimension row vectorsBased on the row vector the similarities or dissimilaritiesbetween words can be estimatedThe description below is theanalysis summary of word2vec representation result which isused as a base reference for the data representation techniqueand the experiment scenarios By taking some sample of drugtargets and nondrug vector representation it is shown thatdrug word has more similarities (cosine distance) to anotherdrug than to nondrug and vice versa Some of those samplesare illustrated in Table 3 We also computed the Euclidean

6 Computational Intelligence and Neuroscience

20688

28491246 1000

0

5000

10000

15000

20000

25000

Q1 Q2 Q3 Q4

Figure 4 Distribution of MedLine train dataset token

Table 1 The frequency distribution and drug target token positionMedLine

13 Σ sample Σ frequency Σ single token of drug entity1 28 8661 mdash2 410 8510 503 3563 8612 262

Table 2The frequency distribution and drug target token positionDrugBank

13 Σ sample Σ frequency Σ single token of drug entity1 27 33538 mdash2 332 33463 333 5501 33351 920

distance between all of thewords Table 4 shows the average ofEuclidean distance and cosine distance between drug-drugdrug-nondrug and nondrug-nondrug These values of theaverage distance showus that intuitively it is feasible to groupthe collection of the words into drug group and nondruggroup based on their vector representations value

34 Feature Representation Data Formatting andData Label-ling Based on the training data and word embedding anal-ysis we formulate the feature representation and its dataformatting In the first and second techniques we try toovercome the multiple tokens drawback left unsolved in [12]by formatting single input data as an 119873-gram model with119873 = 5 (one tuple piece of data consists of 5 tokens) toaccommodate the maximum token which acts as a singledrug entity target name The tuples were provided from thesentences of both training and testing data Thus we have aset of tuples of training data and a set of tuples of testing dataEach tuple was treated as a single input

To identify a single input whether it is a nondrug or drugtarget we use a multiclassification approach which classifiesthe single input into one of six classes Class 1 representsnondrug whereas the other classes represent drug targetwhich also identified how many tokens (words) perform thedrug target To identify which class a certain tuple belongs tothe following is determinedThe drug tuple is the tuple whosefirst token (token-1) is the drug type If token-1 is not a drug

90550

6162 2199 14410

20000

40000

60000

80000

100000

Q1 Q2 Q3 Q4

Figure 5 Distribution of DrugBank train dataset token

regardless of whatever the rest of the 4 tokens are then thetuple is classified as no drugThis kind of tuple is identified asclass 1 If token-1 is a drug and token-2 is not a drug regardlessof the last 3 tokens the tuple will be identified as class 2 andso on

Since we only extracted the drug entity we ignored theother token types whether it is a group brand or anothercommon token To provide the label of each tuple we onlyuse the drug and drug-n types as the tuple reference listIn general if the sequence of token in each tuple in datasetcontains the sequence which is exactly the same with oneof tuple reference list members then the tuple in dataset isidentified as drug entity The detail of the algorithm usedto provide the label of each tuple in both training data andtesting data is described in Algorithm 1

We proposed two techniques in constructing the tupleset of the sentences The first technique treats all sentencesas one sequence whereas in the second technique eachsentence is processed as one sequence The first and thesecond techniques are evaluated with MLP DBN and SAEmodel The third technique treats the sentences of datasetas a sequence where the occurrence of the current token isinfluenced by the previous one By treating the sentence asa sequence not only in the data representation but also inthe classification and recognition process the most suitablemodel to be used is RNNWe appliedRNN-LSTM to the thirdtechnique

341 First Technique The first dataset formatting (one se-quence for all sentences) is performed as follows In the firststep all sentences in the dataset are formatted as a tokensequence Let the token sequence be

11990511199052119905311990541199055119905611990571199058 sdot sdot sdot 119905119899 (1)

with 119899 being number of tokens in the sequences then thedataset format will be

11990511199052119905311990541199055 11990521199053119905411990551199056 sdot sdot sdot 119905119899minus4119905119899minus3119905119899minus2119905119899minus1119905119899 (2)

A sample of sentences and their drug names are presentedin Table 5 Taken from DrugBank training data Table 5 isthe raw data of 3 samples with three relevant fields thatis sentences character drug position and the drug nameTable 6 illustrates a portion of the dataset and its label as theresult of the raw data in Table 5 Referring to the drug-n namefield in the dataset dataset number 6 is identified as a drug

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

6 Computational Intelligence and Neuroscience

20688

28491246 1000

0

5000

10000

15000

20000

25000

Q1 Q2 Q3 Q4

Figure 4 Distribution of MedLine train dataset token

Table 1 The frequency distribution and drug target token positionMedLine

13 Σ sample Σ frequency Σ single token of drug entity1 28 8661 mdash2 410 8510 503 3563 8612 262

Table 2The frequency distribution and drug target token positionDrugBank

13 Σ sample Σ frequency Σ single token of drug entity1 27 33538 mdash2 332 33463 333 5501 33351 920

distance between all of thewords Table 4 shows the average ofEuclidean distance and cosine distance between drug-drugdrug-nondrug and nondrug-nondrug These values of theaverage distance showus that intuitively it is feasible to groupthe collection of the words into drug group and nondruggroup based on their vector representations value

34 Feature Representation Data Formatting andData Label-ling Based on the training data and word embedding anal-ysis we formulate the feature representation and its dataformatting In the first and second techniques we try toovercome the multiple tokens drawback left unsolved in [12]by formatting single input data as an 119873-gram model with119873 = 5 (one tuple piece of data consists of 5 tokens) toaccommodate the maximum token which acts as a singledrug entity target name The tuples were provided from thesentences of both training and testing data Thus we have aset of tuples of training data and a set of tuples of testing dataEach tuple was treated as a single input

To identify a single input whether it is a nondrug or drugtarget we use a multiclassification approach which classifiesthe single input into one of six classes Class 1 representsnondrug whereas the other classes represent drug targetwhich also identified how many tokens (words) perform thedrug target To identify which class a certain tuple belongs tothe following is determinedThe drug tuple is the tuple whosefirst token (token-1) is the drug type If token-1 is not a drug

90550

6162 2199 14410

20000

40000

60000

80000

100000

Q1 Q2 Q3 Q4

Figure 5 Distribution of DrugBank train dataset token

regardless of whatever the rest of the 4 tokens are then thetuple is classified as no drugThis kind of tuple is identified asclass 1 If token-1 is a drug and token-2 is not a drug regardlessof the last 3 tokens the tuple will be identified as class 2 andso on

Since we only extracted the drug entity we ignored theother token types whether it is a group brand or anothercommon token To provide the label of each tuple we onlyuse the drug and drug-n types as the tuple reference listIn general if the sequence of token in each tuple in datasetcontains the sequence which is exactly the same with oneof tuple reference list members then the tuple in dataset isidentified as drug entity The detail of the algorithm usedto provide the label of each tuple in both training data andtesting data is described in Algorithm 1

We proposed two techniques in constructing the tupleset of the sentences The first technique treats all sentencesas one sequence whereas in the second technique eachsentence is processed as one sequence The first and thesecond techniques are evaluated with MLP DBN and SAEmodel The third technique treats the sentences of datasetas a sequence where the occurrence of the current token isinfluenced by the previous one By treating the sentence asa sequence not only in the data representation but also inthe classification and recognition process the most suitablemodel to be used is RNNWe appliedRNN-LSTM to the thirdtechnique

341 First Technique The first dataset formatting (one se-quence for all sentences) is performed as follows In the firststep all sentences in the dataset are formatted as a tokensequence Let the token sequence be

11990511199052119905311990541199055119905611990571199058 sdot sdot sdot 119905119899 (1)

with 119899 being number of tokens in the sequences then thedataset format will be

11990511199052119905311990541199055 11990521199053119905411990551199056 sdot sdot sdot 119905119899minus4119905119899minus3119905119899minus2119905119899minus1119905119899 (2)

A sample of sentences and their drug names are presentedin Table 5 Taken from DrugBank training data Table 5 isthe raw data of 3 samples with three relevant fields thatis sentences character drug position and the drug nameTable 6 illustrates a portion of the dataset and its label as theresult of the raw data in Table 5 Referring to the drug-n namefield in the dataset dataset number 6 is identified as a drug

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 7

Table 3 Some of the cosine distance similarities between two kinds of words

Word 1 Word 2 Similarities (cosine dist) Remarkdilantin tegretol 075135758 drug-drugphenytoin dilantin 062360351 drug-drugphenytoin tegretol 051322415 drug-drugcholestyramine dilantin 024557819 drug-drugcholestyramine phenytoin 023701277 drug-drugadministration patients 020459694 non-drug - non-drugtegretol may 011605539 drug - non-drugcholestyramine patients 008827197 drug - non-drugevaluated end 007379115 non-drug - non-drugwithin controlled 006111103 non-drug - non-drugcholestyramine evaluated 004024139 drug - non-drugdilantin end 002234770 drug - non-drug

Table 4 The average of Euclidean distance and cosine similaritiesbetween groups of words

Word group Euclidean dist avg Cosine dist avgdrug - non-drug 0096113798 0194855980non-drug - non-drug 0094824332 0604091044drug-drug 0093840800 0617929002

Table 5 Sample of DrugBank sentences and their drug name target

Sentence Drugposition Drug name

modification of surface histidineresidues abolishes the cytotoxicactivity of clostridium difficile toxin a

79ndash107 clostridiumdifficile toxin a

antimicrobial activity of ganodermalucidum extract alone and incombination with some antibiotics

26ndash50 ganodermalucidum extract

on the other hand surprisingly greentea gallocatechins(minus)-epigallocatechin-3-o-gallate andtheasinensin a potently enhanced thepromoter activity (182 and 247activity at 1 microm resp)

33ndash56 green teagallocatechins

whereas the others are classified as a nondrug entity Thecomplete label illustration of the dataset provided by the firsttechnique is presented in Table 7 As described in Section 34the value of vector dimension for each token is 100Thereforefor single data it is represented as 100 lowast 5 = 500 lengths of aone-dimensional vector

342 Second Technique The second technique is used fortreating one sequence that comes from each sentence of thedataset With this treatment we added special characters lowastas padding to the last part of the token when its datasetlength is less than 5 By applying the second technique thefirst sentence of the sample provided a dataset as illustratedin Table 8

343 Third Technique Naturally the NLP sentence is a se-quence in which the occurrence of the current word isconditioned by the previous one Based on the word2vecvalue analysis it is shown that intuitively we can separate thedrug word and nondrug word by their Euclidean distanceTherefore we used the Euclidean distance between the cur-rent words with the previous one to represent the influenceThus each current input 119909119894 is represented by [119909V119894119909119889119894] whichis the concatenation of word2vec value 119909V119894 and its Euclidiandistance to the previous one 119909119889119894 Each 119909 is the row vectorwith the dimension length being 200 the first 100 values areits word2vector and the rest of all 100 values are the Euclidiandistance to the previous For the first word all values of 119909119889119894are 0 With the LSTM model the task to extract the drugname from the medical data text is the binary classificationapplied to each word of the sentence We formulate theword sequence and its class as described in Table 9 In thisexperiment each word that represents the drug name isidentified as class 1 such as ldquoplenaxisrdquo ldquocytochromerdquo and ldquop-450rdquo whereas the other words are identified by class 0

35 Wiki Sources In this study we also utilize Wikipedia asthe additional text sources in word2vec training as used by[16] The Wiki text addition is used to evaluate the impact ofthe training data volume in improving the quality of wordrsquosvector

36 Candidates Selection The tokens as the drug entitiestarget are only a tiny part of the total tokens In MedLinedataset 171 of 2000 tokens (less than 10) are drugs whereasin DrugBank the number of drug tokens is 180 of 5252 [15]So the major part of these tokens is nondrug and other noisessuch as a stop word and special or numerical charactersBased on this fact we propose a candidate selection step toeliminate those noises We examine two mechanisms in thecandidate selection The first is based on token distributionThe second is formed by selecting 119909119910 part of the clusteringresult of data test In the first scenario we only used 23 of thetoken which appears in the lower 23 part of the total tokenThis is presented in Tables 1 and 2 However in the second

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

8 Computational Intelligence and Neuroscience

Result Labelled datasetInput array of tuple array of drugoutput array of label Array of drug contains list of drug and drug-n onlylabel[] lt= 1 Initializationfor each t in tuple do

for each d in drug doif length (d) = 1 then

if t[1] = d[1] thenmatch 1 token druglabel lt= 2 break exit from for each d in drug

elseend

elseif length (d) = 2 then

if t[1] = d[1] and t[2] = d[2] thenmatch 2 tokens druglabel lt= 3 break exit from for each d in drug

elseend

elseif length (d) = 3 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] thenlabel lt= 4 break exit from for each d in drug

elseend

elseif length (d) = 4 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] thenlabel lt= 5 break exit from for each d in drug

elseend

elseif length (d) = 5 then

if t[1] = d[1] and t[2] = d[2] and t[3] = d[3] and t[4] = d[4] and t[5] = d[5] thenlabel lt= 6 break exit from for each d in drug

elseend

elseend

endend

endend

endend

Algorithm 1 Dataset labelling

Table 6 A portion of the dataset formulation as the results of DrugBank sample with first technique

Dataset number Token-1 Token-2 Token-3 Token-4 Token-5 Label1 modification of surface histidine residues 12 of surface histidine residues abolishes 13 surface histidine residues abolishes the 14 histidine residues abolishes the cytotoxic 15 the cytotoxic activity of clostridium 16 clostridium difficile toxin a antimicrobial 57 difficile toxin a antimicrobial activity 1

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 9

Table 7 First technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo 2ldquotestosteronerdquo ldquoconcentrationsrdquo ldquojustrdquo ldquopriorrdquo ldquotordquo 2ldquobeta-adrenergicrdquo ldquoantagonistsrdquo ldquoandrdquo ldquoalpha-adrenergicrdquo ldquostimulantsrdquo 3ldquocarbonicrdquo ldquoanhydraserdquo ldquoinhibitorsrdquo ldquoconcomitantrdquo ldquouserdquo 3ldquosodiumrdquo ldquopolystyrenerdquo ldquosulfonaterdquo ldquoshouldrdquo ldquoberdquo 4ldquosodiumrdquo ldquoacidrdquo ldquophosphaterdquo ldquosuchrdquo ldquoasrdquo 4ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquomdashrdquo 5ldquononsteroidalrdquo ldquoantirdquo ldquoinflammatoryrdquo ldquodrugsrdquo ldquoandrdquo 5ldquocaseinrdquo ldquophosphopeptide-amorphousrdquo ldquocalciumrdquo ldquophosphaterdquo ldquocomplexrdquo 6ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo 1ldquowererdquo ldquoperformedrdquo ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo 1

Table 8 Second technique of data representation and its label

Token-1 Token-2 Token-3 Token-4 Token-5 Labelldquomodificationrdquo ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo 1ldquoofrdquo ldquosurfacerdquo ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo 1surface histidine residues abolishes the 1ldquohistidinerdquo ldquoresiduesrdquo ldquoabolishesrdquo ldquotherdquo ldquocytotoxicrdquo 1ldquotherdquo ldquocytotoxicrdquo ldquoactivityrdquo ldquoofrdquo ldquoclostridiumrdquo 1ldquoclostridiumrdquo ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo 5ldquodifficilerdquo ldquotoxinrdquo ldquoardquo ldquolowastrdquo ldquolowastrdquo 1ldquoardquo ldquoatoxinrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1ldquotoxicrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo ldquolowastrdquo 1

mechanism we selected 119909119910 (119909 lt 119910) which is a part of totaltoken after the tokens are clustered into 119910 clusters

37 Overview of NN Model

371 MLP In the first experiment we used multilayer per-ceptron NN to train the model and evaluate the performance[20] Given a training set of119898 examples then the overall costfunction can be defined as

119869 (119882 119887) = [ 1119898119898sum119894=1

119869 (119882 119887 119909119894 119910119894)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

119869 (119882 119887) = [ 1119898119898sum119894=1

(12 10038171003817100381710038171003817ℎ119908119887 (119909(i)) minus 119910119894100381710038171003817100381710038172)]

+ 1205822119899119897minus1sum119897=1

119904119897sum119894=1

119904119897minus1sum119895=1

(119882119895(119897)119894 )2

(3)

In the definition of 119869(119882 119887) the first term is an averagesum-of-squares error term whereas the second term is aregularization term which is also called a weight decay termIn this experiment we use three kinds of regularization 0

L0 with 120582 = 0 1 L1 with 120582 = 1 and 2 with 120582 =the average of Euclidean distance We computed the L2rsquos120582 based on the word embedding vector analysis that drugtarget and nondrug can be distinguished by looking at theirEuclidean distanceThus for L2 regularization the parameteris calculated as

120582 = 1119899 lowast (119899 minus 1)119899minus1sum119894=1

119899sum119895=119894+1

dist (119909119894 119909119895) (4)

where dist(119909119894 119909119895) is the Euclidean distance of 119909119894 and 119909119895The model training and testing are implemented by

modifying the code from [21] which can be downloaded athttpsgithubcomrasmusbergpalmDeepLearnToolbox

372 DBN DBN is a learning model composed of two ormore stacked RBMs [22 23] An RBM is an undirectedgraph learningmodel which associates withMarkov RandomFields (MRF) In the DBN the RBM acts as feature extractorwhere the pretraining process provides initial weights valuesto be fine-tuned in the discriminative process in the lastlayer The last layer may be formed by logistic regressionor any standard discriminative classifiers [23] RBM wasoriginally developed for binary data observation [24 25] It isa popular type of unsupervisedmodel for binary data [26 27]Some derivatives of RBM models are also proposed to tacklecontinuousreal values suggested in [28 29]

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

10 Computational Intelligence and Neuroscience

Table 9 Third technique of data representation and its label

Sent1 Class 0 0 0 0 1 0 0Word ldquodrugrdquo ldquointeractionrdquo ldquostudiesrdquo ldquowithrdquo ldquoplenaxisrdquo ldquowererdquo ldquoperformedrdquo

Sent2 Class 1 1 0 0 0 0 0Word ldquocytochromerdquo ldquop-450rdquo ldquoisrdquo ldquonotrdquo ldquoknownrdquo ldquoinrdquo ldquotherdquo

373 SAE An autoencoder (AE) neural network is one ofthe unsupervised learning algorithmsTheNN tries to learn afunction ℎ(119908 119909) asymp 119909 The autoencoder NN architecture alsoconsists of input hidden and output layers The particularcharacteristic of the autoencoder is that the target output issimilar to the input The interesting structure of the data isestimated by applying a certain constraint to the networkwhich limits the number of hidden units However whenthe number of hidden units has to be larger it can beimposed with sparsity constraints on the hidden units [30]The sparsity constraint is used to enforce the average valueof hidden unit activation constrained to a certain valueAs used in the DBN model after we trained the SAE thetrained weight was used to initialize the weight of NN for theclassification

374 RNN-LSTM RNN (Recurrent Neural Network) is anNN which considers the previous input in determining theoutput of the current input RNN is powerful when it isapplied to the dataset with a sequential pattern or when thecurrent state input depends on the previous one such as thetime series data sentences of NLP An LSTM network is aspecial kind of RNN which also consists of 3 layers that isan input layer a single recurrent hidden layer and an outputlayer [31] The main innovation of LSTM is that its hiddenlayer consists of one or more memory blocks Each blockincludes one or morememory cells In the standard form theinputs are connected to all of the cells and gates whereas thecells are connected to the outputs The gates are connected toother gates and cells in the hidden layer The single standardLSTM is a hidden layer with input memory cell and outputgates [32 33]

38 Dataset To validate the proposed approach we utilizedDrugBank and MedLine open dataset which have also beenused by previous researchers Additionally we used druglabel documents from various drug producers and regulatorInternet sites located in Indonesia

(1) httpwwwkalbemedcom

(2) httpwwwdechacarecom

(3) httpinfoobatindonesiacomobat and

(4) httpwwwpomgoidwebregindexphphomeproduk01

The drug labels are written in Bahasa Indonesia and theircommon contents are drug name drug components indica-tion contraindication dosage and warning

39 Evaluation To evaluate the performance of the pro-posed method we use common measured parameters indata mining that is precision recall and F-score Thecomputation formula of these parameters is as follows Let119862 = 1198621 1198622 1198623 119862119899 be a set of the extracted drug nameof this method and 119870 = 1198701 1198702 1198703 119870119897 is set of actualdrug names in the document set 119863 Adopted from [18] theparameter computations formula is

Precision (119870119894 119862119895) = (TruePositive)(TruePositive + FalsePositive)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(1003817100381710038171003817100381711986211989510038171003817100381710038171003817)

Recall (119870119894 119862119895) = (TruePositive)(TruePositive + FalseNegative)= (10038171003817100381710038171003817119870119894 cap 11986211989510038171003817100381710038171003817)(10038171003817100381710038171198701198941003817100381710038171003817)

(5)

where 119870119894 119862119895 and 119870119894 cap 119862119895 denote the number of drugnames in 119870 in 119862 and in both 119870 and 119862 respectively The F-score value is computed by the following formula

119865-score (119870119894 119862119895)= (2 lowast Precision (119870119894 119862119895) lowast Recall (119870119894 119862119895))(TruePositive + FalsePositive) (6)

4 Results and Discussion

41 MLP Learning Performance The following experimentsare the part of the first experiment These experiments areperformed to evaluate the contribution of the three regular-ization settings as described in Section 371 By arranging thesentence in training dataset as 5-gram of words the quantityof generated sample is presented in Table 10 We do trainingand testing of the MLP-NN learning model for all those testdata compositionsThe result ofmodel performances on bothdatasets that is MedLine and DrugBank in learning phase isshown in Figures 6 and 7 The NN learning parameters thatare used for all experiments are 500 input nodes two hiddenlayerswhere each layer has 100 nodeswith sigmoid activationand 6 output nodes with softmax function the learning rate= 1 momentum = 05 and epochs = 100 We used minibatchscenario in the training with the batch size being 100 Thepresented errors in Figures 6 and 7 are the errors for full batchthat is the mean errors of all minibatches

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 11

Table 10 Dataset composition

Dataset Train TestAll 23 part Cluster

MedLine 26500 10360 6673 5783DrugBank 100100 2000 1326 1933

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 6 Full batch training error of MedLine dataset

The learningmodel performance shows different patternsbetweenMedLine and DrugBank datasets For both datasetsL1 regularization tends to stabilize in the lower iterationand its training error performance is always less than L0or L2 The L0 and L2 training error performance patternhowever shows a slight different behavior between MedLineand DrugBank For the MedLine dataset L0 and L2 producedifferent results for some of the iterations Neverthelessthe training error performance of L0 and L2 for DrugBankis almost the same in every iteration Different patternresults are probably due to the variation in the quantityof training data As illustrated in Table 10 the volume ofDrugBank training data is almost four times the volume ofthe MedLine dataset It can be concluded that for largerdataset the contribution of L2 regularization setting is nottoo significant in achieving better performance For smallerdataset (MedLine) however the performance is better evenafter only few iterations

42 Open Dataset Performance In Tables 11 12 13 and 14numbering (1) numbering (2) and numbering (3) in themost left column indicate the candidate selection techniquewith

(i) (1) all data tests being selected

(ii) (2) 23 part of data test being selected

(iii) (3) 23 part of 3 clusters for MedLine or 34 part of 4clusters for DrugBank

421 MLP-NN Performance In this first experiment for twodata representation techniques and three candidate selectionscenarios we have six experiment scenarios The result

Iteration of 100 epochs0 10 20 30 40 50 60 70 80 90 100

Erro

r

0

005

01

015

02

025

03

L1L0L2

Figure 7 Full batch training error of DrugBank dataset

of the experiment which applies the first data representa-tion technique and three candidate selection scenarios ispresented in Table 11 In computing the F-score we onlyselect the predicted target which is provided by the lowesterror (the minimum one) For MedLine dataset the bestperformance is shown by L2 regularization setting where theerror is 0041818 in third candidate selection scenario with F-score 0439516 whereas the DrugBank is achieved togetherby L0 and L1 regularization setting with an error test of00802 in second candidate selection scenario the F-scorewas 0641745 Overall it can be concluded that DrugBankexperiments give the best F-score performance The candi-date selection scenarios also contributed to improving theperformance as we found that for both of MedLine andDrugBank the best achievement is provided by the secondand third scenarios respectively

The next experimental scenario in the first experiment isperformed to evaluate the impact of the data representationtechnique and the addition of Wiki source in word2vectraining The results are presented in Tables 12 and 13According to the obtained results presented in Table 11 theL0 regularization gives the best F-score Hence accordinglywe only used the L0 regularization for the next experimentalscenario Table 12 presents the impact of the data representa-tion technique Looking at the F-score the second techniquegives better results for both datasets that is the MedLine andDrugBank

Table 13 shows the result of adding the Wiki sourceinto word2vec training in providing the vector of wordrepresentation These results confirm that the addition oftraining datawill improve the performance Itmight be due tothe fact thatmost of the targeted tokens such as drug name areuncommon words whereas the words that are used in Wikirsquossentence are commonly used words Hence the addition ofcommonly used words will make the difference between drugtoken and the nondrug token (the commonly used token)become greater For the MLP-NN experimental results the4th scenario that is the second data representation with 23partition data selection in DrugBank dataset provides thebest performance with 0684646757 in F-score

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

12 Computational Intelligence and Neuroscience

Table 11 The F-score performances of three of scenarios experiments

MedLine Prec Rec F-score Lx Error test(1) 03564 05450 04310 L0 00305(2) 03806 05023 04331 L1 L2 00432(3) 03773 05266 04395 L2 00418DrugBank Prec Rec F-score Lx Error test(1) 06312 05372 05805 L0 007900(2) 06438 06398 06417 L0 L2 00802(3) 06305 05380 05806 L0 00776

Table 12 The F-score performance as an impact of data representation technique

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 03564 05450 04310 06515 06220 06364(2) 03806 05023 04331 06119 07377 06689(3) 03772 05266 04395 06143 0656873 06348DrugBank Prec Rec F-score Prec Rec F-score(1) 06438 05337 05836 07143 04962 05856(2) 06438 06398 06418 07182 05804 06420(3) 06306 05380 05807 05974 05476 05714

422 DBN and SAE Performance In the second experimentwhich involves DBN and SAE learning model we only usethe experiment scenario that gives the best results in the firstexperiment The best experiment scenario uses the seconddata representation technique withWiki text as an additionalsource in the word2vec training step

In the DBN experiment we use two stacked RBMs with500 nodes of visible unit and 100 nodes of the hidden layerfor the first and also the second RBMs The used learningparameters are as follows momentum = 0 and alpha = 1 Weusedminibatch scenario in the training with the batch size of100 As for RBM constraints the range of input data value isrestricted to [0 sdot sdot sdot 1] as the original RBM which is developedfor binary data type whereas the range of vector ofword valueis [minus1 sdot sdot sdot 1] Sowe normalize the data value into [0 sdot sdot sdot 1] rangebefore performing the RBM training In the last layer of DBNwe use one layer of MLP with 100 hidden nodes and 6 outputnodes with softmax output function as classifier

The used SAE architecture is two stacked AEs with thefollowing nodes configuration The first AE has 500 unitsof visible unit 100 hidden layers and 500 output layersThe second AE has 100 nodes of visible unit 100 nodes ofhidden unit and 100 nodes of output unit The used learningparameters for first SAE and the second SAE respectively areas follows activation function = sigmoid and tanh learningrate = 1 and 2 momentum = 05 and 05 sparsity target =005 and 005 The batch size of 100 is set for both of AEsIn the SAE experiment we use the same discriminative layeras DBN that is one layer MLP with 100 hidden nodes and 6output nodes with softmax activation function

The experiments results are presented in Table 14 Thereis a difference in performances when using the MedLine andthe DrugBank datasets when feeding them into MLP DBN

A single LSTM cell

ℎ20

ℎ10

ℎ21

ℎ11

ℎ22

ℎ12

ℎ23

ℎ13

ℎ2n

ℎ1n

x0 x1 x2 x3 xn

y1 y2 y3 ynmiddot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 8 The global LSTM network

and SAE models The best results for the MedLine datasetare obtained when using the SAE For the DrugBank theMLP gives the best results The DBN gives lower averageperformance for both datasets The lower performance isprobably due to the normalization on theword vector value to[0 sdot sdot sdot 1] whereas their original value range is in fact between[minus1 sdot sdot sdot 1] The best performance for all experiments 1 and 2 isgiven by SAE with the second scenario of candidate selectionas described in Section 36 Its F-score is 0686192469

423 LSTM Performance The global LSTM network usedis presented in Figure 8 Each single LSTM block consistsof two stacked hidden layers and one input node with eachinput dimension being 200 as described in Section 343 Allhidden layers are fully connected We used sigmoid as anoutput activation function which is the most suitable forbinary classification We implemented a peepholes connec-tion LSTM variant where its gate layers look at the cell state[34] In addition to implementing the peepholes connection

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 13

Table 13 The F-score performances as an impact of the Wiki addition of word2vec training data

Dataset (1) One seq of all sentences (2) One seq of each sentenceMedLine Prec Rec F-score Prec Rec F-score(1) 05661 04582 05065 0614 06495 06336(2) 05661 04946 05279 05972 07454 06631(3) 05714 04462 05011 06193 06927 06540DrugBank Prec Rec F-score Prec Rec F-score(1) 06778 05460 06047 06973 06107 06511(2) 06776 06124 06433 06961 06736 06846(3) 07173 05574 06273 06976 06193 06561

Table 14 Experimental results of three NN models

Dataset MLP DBN SAEMedLine Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06515 06220 06364 05464 06866 06085 06728 06214 06461(2) 05972 07454 06631 06119 07377 06689 06504 07261 06862(3) 06193 06927 06540 06139 06575 06350 06738 06518 06626Average 06227 06867 06512 05907 06939 06375 06657 06665 06650DrugBank Prec Rec F-score Prec Rec F-score Prec Rec F-score(1) 06973 06107 06512 06952 05847 06352 06081 06036 06059(2) 06961 06736 06847 06937 06479 06700 06836 06768 06802(3) 06976 06193 06561 06968 05929 06406 06033 06050 06042Average 06970 06345 0664 06952 06085 06486 06317 06285 06301

we also use a couple of forget and input gates The detailedsingle LSTM architecture and each gate formula computationcan be referred to in [33]

The LSTM experiments were implemented with severaldifferent parameter settings Their results presented in thissection are the best among all our experiments Each pieceof input data consists of two components its word vectorvalue and its Euclidian distance to the previous input dataIn treating both input data components we adapt the AddingProblem Experiment as presented in [35]We use the Jannlabtools [36] with some modifications in the part of entry toconform with our data settings

The best achieved performance is obtained with LSTMblock architecture of one node input layer two nodes hiddenlayer and one node output layer The used parameters arelearning rate = 0001 momentum = 09 epoch = 30 andinput dimension = 200 with the time sequence frame set to2 The complete treatment of drug sentence as a sequenceboth in representation and in recognition to extract the drugname entities is the best technique as shown by F-scoreperformance in Table 15

As described in previous work section there are manyapproaches related to drug extraction that have been pro-posed Most of them utilize certain external knowledge toachieve the extraction objective Table 16 summarizes theirF-score performance Among the state-of-the-art techniquesour third data representation technique applied to the LSTMmodel is outperforming Also our proposedmethod does notrequire any external knowledge

Table 15 The F-score performance of third data representationtechnique with RNN-LSTM

Prec Rec F-scoreMedLine 1 06474 07859DrugBank 1 08921 09430Average 08645

43 Drug Label Dataset Performance As additional exper-iment we also use Indonesian language drug label cor-pus to evaluate the methodrsquos performance Regarding theIndonesian drug label we could not find any certain externalknowledge that can be used to assist the extraction of thedrug name contained in the drug label In the presenceof this hindrance we found our proposed method is moresuitable than any other previous approaches As the drug labeltexts are collected from various sites of drug distributorsproducers and government regulators they do does notclearly contain training data and testing data as inDrugBanksor MedLine datasets The other characteristics of these textsare the more structured sentences contained in the dataAlthough the texts are coming from various sources allof them are similar kind of document (the drug label thatmight be generated by machine) After the data cleaningstep (HTML tag removal etc) we annotated the datasetmanually The total quantity of dataset after performingthe data representation step as described in Section 34 is

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

14 Computational Intelligence and Neuroscience

Table 16The F-score performance compared to the state of the art

Approach F-score RemarkThe Best of SemEval 2013 [15] 07150 mdash

[11] 05700With externalknowledgeChEBI

[16] + Wiki 07200With externalknowledgeDINTO

[14] 07200 Additionalfeature BIO

[12] 06000 Single token onlyMLP-SentenceSequence + Wiki(average)Ours 06580 Without external

knowledgeDBN-SentenceSequence + Wiki(average)Ours 06430 Without external

knowledgeSAE-SentenceSequence + Wiki(average)Ours 06480 Without external

knowledgeLSTM-AllSentenceSequence + Wiki +EuclidianDistance (average)Ours 08645 Without external

knowledge

Table 17 The best performance of 10 executions on drug labelcorpus

Iteration Prec Recall F-score1 09170 09667 094122 08849 09157 090003 09134 09619 093704 09298 09500 093985 09640 09570 096056 08857 09514 091787 09489 09689 095888 09622 09654 096389 09507 09601 0955410 09516 09625 09570Average 093081 09560 09431Min 08849 09157 09000Max 09640 09689 09638

1046200 In this experiment we perform 10 times cross-validation scenario by randomly selecting 80 data for thetraining data and 20 data for testing

The experimental result for drug label dataset shows thatall of the candidate selection scenarios provide excellentF-score (above 09) The excellent F-score performance isprobably due to the more structured sentences in those textsThe best results of those ten experiments are presented inTable 17

44 Choosing the Best Scenario In the first and secondexperiments we studied various experiment scenarioswhich involve three investigated parameters additional Wikisource data representation techniques and drug target can-didate selection In general the Wiki addition contributes toimproving the F-score performance The additional source

in word2vec training enhances the quality of the resultingword2vec Through the addition of common words fromWiki the difference between the common words and theuncommon words that is drug name becomes greater(better distinguishing power)

One problem in mining drug name entity from medicaltext is the imbalanced quantity between drug token and othertokens [15] Also the targeted drug entities are only a smallpart of the total tokens Thus majority of tokens are noiseIn dealing with this problem the second and third candidateselection scenarios show their contribution to reduce thequantity of noise Since the possibility of extracting the noisesis reduced then the recall value and F-score value increase aswell as shown in the first and second experiments results

The third experiment which uses LSTM model does notapply the candidate selection scenario because the inputdataset is treated as sentence sequence So the input datasetcan not be randomly divided (selected) as the tuple treatmentin the first and second experiments

5 Conclusion and Future Works

This study proposes a new approach in the data representa-tion and classification to extract drug name entities containedin the sentences of medical text documents The suggestedapproach solves the problem of multiple tokens for a singleentity that remained unsolved in previous studies This studyalso introduces some techniques to tackle the absence ofspecific external knowledge Naturally the words containedin the sentence follow a certain sequence pattern that isthe current word is conditioned by other previous wordsBased on the sequence notion the treatment of medical textsentences which apply the sequence NN model gives betterresults In this study we presented three data representationtechniquesThe first and second techniques treat the sentenceas a nonsequence pattern which is evaluated with the non-sequential NN classifier (MLP DBN and SAE) whereas thethird technique treats the sentences as a sequence to providedata that is used as the input of the sequential NN classifierthat is LSTM The performance of the application of LSTMmodels for the sequence data representation with the averageF-score being 08645 rendered the best result compared tothe state of the art

Some opportunities to improve the performance of theproposed technique are still widely opened The first stepimprovement can be the incorporation of additional hand-crafted features such as the words position the use of capitalcase at the beginning of the word and the type of characteras also used in the previous studies [16 37] As presented inthe MLP experiments for drug label document the proposedmethods achieved excellent performance when applied to themore structured text Thus the effort to make the sentenceof the dataset that is DrugBank and MedLine to be morestructured can also be elaborated Regarding the LSTMmodel and the sequence data representation for the sentencesof medical text our future study will tackle the multipleentity extractions such as drug group drug brand and drugcompounds Another task that is potential to be solved with

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 15: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Computational Intelligence and Neuroscience 15

the LSTMmodel is the drug-drug interaction extraction Ourexperiments also utilize the Euclidean distance measure inaddition to the word2vec features Such addition gives a goodF-score performance The significance of embedding theEuclidean distance features however needs to be exploredfurther

Competing Interests

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work is supported by Higher Education Science andTechnology Development Grant funded by Indonesia Min-istry of Research and Higher Education Contract no1004UN2R12HKP05002016

References

[1] M Sadikin and I Wasito ldquoTranslation and classification algo-rithm of FDA-Drugs to DOEN2011 class therapy to estimatedrug-drug interactionrdquo in Proceedings of the The 2nd Interna-tional Conference on Information Systems for Business Competi-tiveness (ICISBC rsquo13) pp 1ndash5 Semarang Indonesia 2013

[2] H Tang and J Ye ldquoA survey for information extractionmethodrdquoTech Rep 2007

[3] S Zhang and N Elhadad ldquoUnsupervised biomedical namedentity recognition experiments with clinical and biologicaltextsrdquo Journal of Biomedical Informatics vol 46 no 6 pp 1088ndash1098 2013

[4] I Korkontzelos D Piliouras A W Dowsey and S AnaniadouldquoBoosting drug named entity recognition using an aggregateclassifierrdquo Artificial Intelligence in Medicine vol 65 no 2 pp145ndash153 2015

[5] M Herrero-Zazo I Segura-Bedmar P Martınez and TDeclerck ldquoThe DDI corpus an annotated corpus with phar-macological substances and drug-drug interactionsrdquo Journal ofBiomedical Informatics vol 46 no 5 pp 914ndash920 2013

[6] H Sampathkumar X-W Chen and B Luo ldquoMining adversedrug reactions from online healthcare forums using HiddenMarkovModelrdquoBMCMedical Informatics andDecisionMakingvol 14 article 91 2014

[7] I Segura-Bedmar P Martınez and M Segura-Bedmar ldquoDrugname recognition and classification in biomedical texts A casestudy outlining approaches underpinning automated systemsrdquoDrug Discovery Today vol 13 no 17-18 pp 816ndash823 2008

[8] S Keretna C P Lim D Creighton and K B ShabanldquoEnhancingmedical named entity recognitionwith an extendedsegment representation techniquerdquo Computer Methods andPrograms in Bimoedicine vol 119 no 2 pp 88ndash100 2015

[9] G Pal and S Gosal ldquoA survey of biological entity recognitionapproachesrdquo International Journal on Recent and InnovationTrends in Computing and Communication vol 3 no 9 2015

[10] S Liu B Tang Q Chen andXWang ldquoDrug name recognitionapproaches and resourcesrdquo Information vol 6 no 4 pp 790ndash810 2015

[11] T Grego and F M Couto ldquoLASIGE using conditional randomfields and ChEBI ontologyrdquo in Proceedings of the 7th Interna-tional Workshop on Semantic Evaluation (SemEval rsquo13) vol 2pp 660ndash666 2013

[12] J Bjorne S Kaewphan and T Salakoski ldquoUTurku drug namedentity recognition and drug-drug interaction extraction usingSVM classification and domain knowledgerdquo in Proceedingsof the 2nd Joint Conferernce on Lexical and ComputationalSemantic vol 2 pp 651ndash659 Atlanta Ga USA 2013

[13] T Mikolov G Corrado K Chen and J Dean ldquoEfficientestimation of word representations in vector spacerdquo httpsarxivorgabs13013781

[14] A Ben AbachaM FM Chowdhury A Karanasiou YMrabetA Lavelli and P Zweigenbaum ldquoTextmining for pharmacovig-ilance using machine learning for drug name recognition anddrug-drug interaction extraction and classificationrdquo Journal ofBiomedical Informatics vol 58 pp 122ndash132 2015

[15] I Segura-Bedmar PMartinez andMHerrero-Zazo ldquoSemeval-2013 task 9 extraction of drug-drug interactions from biomed-ical texts (ddiextraction 2013)rdquo in Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval rsquo13)vol 2 pp 341ndash350 Association for Computational LinguisticsAtlanta Georgia USA 2013

[16] I Segura-bedmar and P Mart ldquoExploring word embeddingfor drug name recognitionrdquo in Proceedings of the The 6thInternational Workshop on Health Text Mining and InformationAnalysis pp 64ndash72 Lisbon Portugal September 2015

[17] Y Chen T A Lasko Q Mei J C Denny and H Xu ldquoA study ofactive learningmethods for named entity recognition in clinicaltextrdquo Journal of Biomedical Informatics vol 58 pp 11ndash18 2015

[18] M Sadikin and I Wasito ldquoToward object interaction miningby starting with object extraction based on pattern learningmethodrdquo in Proceedings of the Pattern Learning Method Asia-PacificMaterials Science and Information Technology Conference(APMSIT rsquo14) Shanghai China December 2014

[19] Q-C Bui P M A Sloot E M VanMulligen and J A Kors ldquoAnovel feature-based approach to extract drug-drug interactionsfrom biomedical textrdquo Bioinformatics vol 30 no 23 pp 3365ndash3371 2014

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[21] R B Palm ldquoPrediction as a candidate for learning deephierarchical models of datardquo 2012

[22] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[23] A Fischer and C Igel ldquoProgress in pattern recognitionimage analysis computer vision and applicationsrdquo in 17thIberoamerican Congress CIARP 2012 Buenos Aires ArgentinaSeptember 3ndash6 2012 Proceedings An Introduction to RestrictedBoltzmann Machines pp 14ndash36 Springer Berlin Germany2012

[24] T Tieleman ldquoTraining restricted Boltzmann machines usingapproximations to the likelihood gradientrdquo in Proceedings of the25th International Conference on Machine Learning (ICML rsquo08)pp 1064ndash1071 2008

[25] G E Dahl R P Adams and H Larochelle ldquoTraining re-stricted boltzmann machines on word observationsrdquo httpsarxivorgabs12025695

[26] Y Tang and I Sutskever Data Normalization in the Learningof Restricted Boltzmann Machines Department of Computer

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 16: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

16 Computational Intelligence and Neuroscience

Science Toronto University UTML-TR-11 Toronto Canada2011 httpwwwcstorontoedusimtangpapersRbmZMpdf

[27] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes inComputer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[28] H Chen and A F Murray ldquoContinuous restricted Boltz-mannmachine with an implementable training algorithmrdquo IEEProceedingsmdashVision Image and Signal Processing vol 150 no 3pp 153ndash158 2003

[29] MWellingM Rosen-Zvi andG E Hinton ldquoExponential fam-ily harmoniums with an application to information retrievalrdquoinAdvances in Neural Information Processing Systems (NIPS) 17pp 1481ndash1488 2005

[30] A Ng Deep Learning Tutorial httpufldlstanfordedututori-alunsupervisedAutoencoders

[31] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[32] J Hammerton ldquoNamed entity recognition with long short-term memoryrdquo in Proceedings of the 7th Conference on NaturalLanguage Learning atHLT-NAACL (CONLL rsquo03) vol 4 pp 172ndash175 2003

[33] C Olah ldquoUnderstanding LSTM Networksrdquo 2015 httpcolahgithubioposts2015-08-Understanding-LSTMs

[34] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks (IJCNN rsquo00) vol 3 p 6Como Italy July 2000

[35] S Hochreiter ldquoLSTM Can Solve Hardrdquo in Advances in NeuralInformation Processing Systems Neural Information ProcessingSystems (NIPS) Denver Colo USA 1996

[36] S Otte D Krechel and M Liwicki ldquoJannlab neural networkframework for javardquo in Proceedings of the Poster ProceedingsConference (MLDM rsquo13) pp 39ndash46 IBAI New York NY USA2013

[37] R Boyce and G Gardner ldquoUsing natural language processingto identify pharmacokinetic drug-drug interactions describedin drug package insertsrdquo in Proceedings of the Workshop onBiomedical Natural Language Processing (BioNLP rsquo12) pp 206ndash213 BioNLP Montreal Canada 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 17: Research Article A New Data Representation Based on ...downloads.hindawi.com/journals/cin/2016/3483528.pdfsyntactic term or lexical term. Finally, the learning approach is usually

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014