Research Article Metaheuristic Algorithms for Convolution Neural...

14
Research Article Metaheuristic Algorithms for Convolution Neural Network L. M. Rasdi Rere, 1,2 Mohamad Ivan Fanany, 1 and Aniati Murni Arymurthy 1 1 Machine Learning and Computer Vision Laboratory, Faculty of Computer Science, Universitas Indonesia, Depok 16424, Indonesia 2 Computer System Laboratory, STMIK Jakarta STI&K, Jakarta 12140, Indonesia Correspondence should be addressed to L. M. Rasdi Rere; [email protected] Received 29 January 2016; Revised 15 April 2016; Accepted 10 May 2016 Academic Editor: Martin Hagan Copyright © 2016 L. M. Rasdi Rere et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A typical modern optimization technique is usually either heuristic or metaheuristic. is technique has managed to solve some optimization problems in the research area of science, engineering, and industry. However, implementation strategy of metaheuristic for accuracy improvement on convolution neural networks (CNN), a famous deep learning method, is still rarely investigated. Deep learning relates to a type of machine learning technique, where its aim is to move closer to the goal of artificial intelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human. In this paper, we propose the implementation strategy of three popular metaheuristic approaches, that is, simulated annealing, differential evolution, and harmony search, to optimize CNN. e performances of these metaheuristic methods in optimizing CNN on classifying MNIST and CIFAR dataset were evaluated and compared. Furthermore, the proposed methods are also compared with the original CNN. Although the proposed methods show an increase in the computation time, their accuracy has also been improved (up to 7.14 percent). 1. Introduction Deep learning (DL) is mainly motivated by the research of artificial intelligent, in which the general goal is to imitate the ability of human brain to observe, analyze, learn, and make a decision, especially for complex problem [1]. is technique is in the intersection amongst the research area of signal processing, neural network, graphical modeling, opti- mization, and pattern recognition. e current reputation of DL is implicitly due to drastically improve the abilities of chip processing, significantly decrease the cost of computing hardware, and advanced research in machine learning and signal processing [2]. In general, the model of DL technique can be classi- fied into discriminative, generative, and hybrid models [2]. Discriminative models, for instance, are CNN, deep neural network, and recurrent neural network. Some examples of generative models are deep belief networks (DBN), restricted Boltzmann machine, regularized autoencoders, and deep Boltzmann machines. On the other hand, hybrid model refers to the deep architecture using the combination of a discriminative and generative model. An example of this model is DBN to pretrain deep CNN, which can improve the performance of deep CNN over random initialization. Among all of the hybrid DL techniques, metaheuristic opti- mization for training a CNN is the focus of this paper. Although the sound character of DL has to solve a variety of learning tasks, training is difficult [3–5]. Some examples of successful methods for training DL are stochastic gradient descent, conjugate gradient, Hessian-free optimization, and Krylov subspace descent. Stochastic gradient descent is easy to implement and also fast in the process for a case with many training samples. However, this method needs several manual tuning scheme to make its parameters optimal, and also its process is princi- pally sequential; as a result, it was difficult to parallelize them with graphics processing unit (GPU). Conjugate gradient, on the other hand, is easier to check for convergence as well as more stable to train. Nevertheless, CG is slow, so that it needs multiple CPUs and availability of a vast number of RAMs [6]. Hessian-free optimization has been applied to train deep autoencoders [7], proficient in handling underfitting problem, and more efficient than pretraining + fine tuning Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2016, Article ID 1537325, 13 pages http://dx.doi.org/10.1155/2016/1537325

Transcript of Research Article Metaheuristic Algorithms for Convolution Neural...

Page 1: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Research ArticleMetaheuristic Algorithms for Convolution Neural Network

L M Rasdi Rere12 Mohamad Ivan Fanany1 and Aniati Murni Arymurthy1

1Machine Learning and Computer Vision Laboratory Faculty of Computer Science Universitas Indonesia Depok 16424 Indonesia2Computer System Laboratory STMIK Jakarta STIampK Jakarta 12140 Indonesia

Correspondence should be addressed to L M Rasdi Rere laodemohammaduiacid

Received 29 January 2016 Revised 15 April 2016 Accepted 10 May 2016

Academic Editor Martin Hagan

Copyright copy 2016 L M Rasdi Rere et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

A typical modern optimization technique is usually either heuristic or metaheuristic This technique has managed to solvesome optimization problems in the research area of science engineering and industry However implementation strategy ofmetaheuristic for accuracy improvement on convolution neural networks (CNN) a famous deep learning method is still rarelyinvestigated Deep learning relates to a type of machine learning technique where its aim is to move closer to the goal of artificialintelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human In thispaper we propose the implementation strategy of three popular metaheuristic approaches that is simulated annealing differentialevolution and harmony search to optimize CNN The performances of these metaheuristic methods in optimizing CNN onclassifying MNIST and CIFAR dataset were evaluated and compared Furthermore the proposed methods are also comparedwith the original CNN Although the proposed methods show an increase in the computation time their accuracy has also beenimproved (up to 714 percent)

1 Introduction

Deep learning (DL) is mainly motivated by the research ofartificial intelligent in which the general goal is to imitatethe ability of human brain to observe analyze learn andmake a decision especially for complex problem [1] Thistechnique is in the intersection amongst the research area ofsignal processing neural network graphical modeling opti-mization and pattern recognition The current reputation ofDL is implicitly due to drastically improve the abilities ofchip processing significantly decrease the cost of computinghardware and advanced research in machine learning andsignal processing [2]

In general the model of DL technique can be classi-fied into discriminative generative and hybrid models [2]Discriminative models for instance are CNN deep neuralnetwork and recurrent neural network Some examples ofgenerative models are deep belief networks (DBN) restrictedBoltzmann machine regularized autoencoders and deepBoltzmann machines On the other hand hybrid modelrefers to the deep architecture using the combination of adiscriminative and generative model An example of this

model is DBN to pretrain deep CNN which can improvethe performance of deep CNN over random initializationAmong all of the hybrid DL techniques metaheuristic opti-mization for training a CNN is the focus of this paper

Although the sound character of DL has to solve a varietyof learning tasks training is difficult [3ndash5] Some examplesof successful methods for training DL are stochastic gradientdescent conjugate gradient Hessian-free optimization andKrylov subspace descent

Stochastic gradient descent is easy to implement and alsofast in the process for a case with many training samplesHowever this method needs several manual tuning schemeto make its parameters optimal and also its process is princi-pally sequential as a result it was difficult to parallelize themwith graphics processing unit (GPU) Conjugate gradient onthe other hand is easier to check for convergence as well asmore stable to train Nevertheless CG is slow so that it needsmultiple CPUs and availability of a vast number of RAMs [6]

Hessian-free optimization has been applied to traindeep autoencoders [7] proficient in handling underfittingproblem and more efficient than pretraining + fine tuning

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2016 Article ID 1537325 13 pageshttpdxdoiorg10115520161537325

2 Computational Intelligence and Neuroscience

proposed by Hinton and Salakhutdinov [8] On the otherside Krylov subspace descent is more robust and simplerthanHessian-free optimization as well as looks like it is betterfor the classification performance and optimization speedHowever Krylov subspace descent needs more memory thanHessian-free optimization [9]

In fact techniques of modern optimization are heuristicor metaheuristic These optimization techniques have beenapplied to solve any optimization problems in the researcharea of science engineering and even industry [10]Howeverresearch on metaheuristic to optimize DL method is rarelyconducted Onework is the combination of genetic algorithm(GA) and CNN proposed by You and Pu [11] Their modelselects the CNN characteristic by the process of recombina-tion and mutation on GA in which the model of CNN existsas individual in the algorithm of GA Besides in recombina-tion process only the layers weights and threshold value ofC1 (convolution in the first layer) and C3 (convolution in thethird layer) are changed inCNNmodel Another work is fine-tuning CNN using harmony search (HS) by Rosa et al [12]

In this paper we compared the performance of threemetaheuristic algorithms that is simulated annealing (SA)differential evolution (DE) andHS for optimizing CNNThestrategy employed is looking for the best value of the fitnessfunction on the last layer usingmetaheuristic algorithm thenthe results will be used again to calculate the weights andbiases in the previous layer In the case of testing the per-formance of the proposed methods we use MNIST (MixedNational Institute of Standards and Technology) datasetThis dataset comprises images of digital handwritten digitscontaining 60000 training data items and 10000 testing dataitems All of the images have been centered and standardizedwith the size of 28 times 28 pixels Each pixel of the image isrepresented by 0 for black and 255 for white and in betweenare different shades of gray [13]

This paper is organized as follows Section 1 is an intro-duction Section 2 explains the used metaheuristic algo-rithms Section 3 describes the convolution neural networksSection 4 gives a description of the proposed methodsSection 5 presents the result of simulation and Section 6 isthe conclusion

2 Metaheuristic Algorithms

Metaheuristic is well known as an efficient method for hardoptimization problems that is the problems that cannotbe solved optimally using deterministic approach within areasonable time limit Metaheuristic methods work for threemain purposes for fast solving of problem for solving largeproblems and for making a more robust algorithm Thesemethods are also simple to design as well as flexible and easyto implement [14]

In general metaheuristic algorithms use the combinationof rules and randomization to duplicate the phenomena ofnature The biological system imitations of metaheuristicalgorithm for instance are evolution strategy GA and DEPhenomena of ethology for examples are particle swarmoptimization (PSO) bee colony optimization (BCO) bac-terial foraging optimization algorithms (BFOA) and ant

colony optimization (ACO) Phenomena of physics are SAmicrocanonical annealing and threshold accepting method[15] Another form of metaheuristic is inspired by musicphenomena such as HS algorithm [16]

Classification of metaheuristic algorithm can also bedivided into single-solution-based and population-basedmetaheuristic algorithm Some of the examples for single-solution-based metaheuristic are the noising method tabusearch SA TA and guided local search In the case of meta-heuristic based on population it can be classified into swarmintelligent and evolutionary computation The general termof swarm intelligent is inspired by the collective behaviorof social insect colonies or animal societies Examples ofthese algorithms are GP GA ES and DE On the other sidethe algorithm for evolutionary computation takes inspirationfrom the principles of Darwinian for developing adaptationinto their environment Some examples of these algorithmsare PSO BCO ACO and BFOA [15] Among all thesemetaheuristic algorithms SA DE and HS are used in thispaper

21 Simulated Annealing Algorithm SA is a technique ofrandom search for the problem of global optimization Itmimics the process of annealing in material processing [10]This technique was firstly proposed in 1983 by Kirkpatrick etal [17]

The principle idea of SA is using random search whichnot only allows changes that improve the fitness functionbut also maintains some changes that are not ideal Asan example in minimum optimization problem any betterchanges that decrease the fitness function value 119891(119909) will beaccepted but some changes that increase 119891(119909) will also beaccepted with a transition probability (119901) as follows

119901 = exp(minusΔ119864119896119879

) (1)

where Δ119864 is the energy level changes 119896 is Boltzmannrsquosconstant and 119879 is temperature for controlling the process ofannealingThis equation is based on the Boltzmann distribu-tion in physics [10] The following is standard procedure ofSA for optimization problems

(1) Generate the solution vector the initial solution vectoris randomly selected and then the fitness function iscalculated

(2) Initialize the temperature if the temperature value istoo high it will take a long time to reach convergencewhereas a too small value can cause the system tomissthe global optimum

(3) Select a new solution a new solution is randomlyselected from the neighborhood of the current solu-tion

(4) Evaluate a new solution a new solution is acceptedas a new current solution depending on its fitnessfunction

(5) Decrease the temperature during the search processthe temperature is periodically decreased

Computational Intelligence and Neuroscience 3

(6) Stop or repeat the computation is stopped when thetermination criterion is satisfied Otherwise steps (2)and (6) are repeated

22 Differential Evolution Algorithm Differential evolutionwas firstly proposed by Price and Storn in 1995 to solvethe Chebyshev polynomial problem [15] This algorithm iscreated based on individuals difference exploiting randomsearch in the space of solution and finally operates theprocedure of mutation crossover and selection to obtain thesuitable individual in system [18]

There are some types in DE including the classical formDErand1bin it indicates that in the process ofmutation thetarget vector is randomly selected and only a single differentvector is applied The acronym of bin shows that crossoverprocess is organized by a rule of binomial decision Theprocedure of DE algorithm is shown by the following steps

(1) Determining parameter setting population size is thenumber of individuals Mutation factor (F) controlsthe magnification of the two individual differences toavoid search stagnation Crossover rate (CR) decideshowmany consecutive genes of themutated vector arecopied to the offspring

(2) Initialization of population the population is pro-duced by randomly generating the vectors in thesuitable search range

(3) Evaluation of individual each individual is evaluatedby calculating their objective function

(4) Mutation operation mutation adds identical variableto one or more vector parameters In this operationthree auxiliary parents (1199091199011

119872 1199091199012

119872 1199091199013

119872) are selected

randomly in which they will participate in mutationoperation to create a mutated individual 119909mut

119872as

follows

119909mut119872= 1199091199011

119872+ F (1199091199012

119872minus 1199091199013

119872) (2)

where 1199011 1199012 1199013 isin 1 2 119873 and119873 = 1199011 = 1199012 =1199013

(5) Combination operation recombination (crossover) isapplied after mutation operation

(6) Selection operation this operation determineswhether the offspring in the next generation shouldbecome a member of the population or not

(7) Stopping criterion the current generation is substi-tuted by the new generation until the criterion oftermination is satisfied

23 Harmony Search Algorithm Harmony search algorithmis proposed by Geem et al in 2001 [19] This algorithm isinspired by the musical process of searching for a perfectstate of harmony Like harmony in music solution vector ofoptimization and improvisation from the musician are anal-ogous to structures of local and global search in optimizationtechniques

In improvisation of the music the players sound anypitch in the possible range together that can create onevector of harmony In the case of pitches creating a realharmony this experience is stored in the memory of eachplayer and they have the opportunity to create better harmonynext time [16] There are three possible alternatives whenone pitch is improvised by a musician any one pitch isplayed from herhis memory a nearby pitch is played fromherhis memory and an entirely random pitch is played withthe range of possible sound If these options are used foroptimization they have three equivalent components the useof harmony memory pitch adjusting and randomization InHS algorithm these rules are correlated with two relevantparameters that is harmony consideration rate (HMCR) andpitch adjusting rate (PAR) The procedure of HS algorithmcan be summarized into five steps as follows [16]

(1) Initialize the problem and parameters in this algo-rithm the problem can be maximum or minimumoptimization and the relevant parameters areHMCRPAR size of harmony memory and terminationcriterion

(2) Initialize harmony memory the harmony memory(HM) is usually initialized as a matrix that is createdrandomly as a vector of solution and arranged basedon the objective function

(3) Improve a new harmony a vector of new harmonyis produced from HM based on HMCR PAR andrandomization Selection of new value is based onHMCR parameter by range 0 through 1The vector ofnew harmony is observed to decide whether it shouldbe pitch-adjusted using PAR parameter The processof pitch adjusting is executed only after a value isselected from HM

(4) Update harmony memory the new harmony substi-tutes the worst harmony in terms of the value of thefitness function in which the fitness function of newharmony is better than worst harmony

(5) Repeat (3) and (4) until satisfying the terminationcriterion in the case of meeting the terminationcriterion the computation is ended Alternativelyprocess (3) and (4) are reiterated In the end thevector of the best HM is nominated and is reflectedas the best solution for the problem

3 Convolution Neural Network

CNN is a variant of the standard multilayer perceptron(MLP) A substantial advantage of this method especially forpattern recognition compared with conventional approachesis due to its capability in reducing the dimension of dataextracting the feature sequentially and classifying one struc-ture of network [20] The basic architecture model of CNNwas inspired in 1962 from visual cortex proposed by Hubeland Wiesel

In 1980 Fukushimas Neocognitron created the firstcomputation of this model and then in 1989 following theidea of Fukushima LeCun et al found the state-of-the-art

4 Computational Intelligence and Neuroscience

Input

ConvolutionsGaussian connection

Subsampling

C1 feature maps

Output10

Full connectionConvolutions Subsampling Full connection

S2 f maps C5 layer120 F6 layer

84814 times 1432 times 32

828 times 28

S4 f maps 165 times 5C3 f maps 1010 times 10

Figure 1 Architecture of CNN by LeCun et al (LeNet-5)

performance on a number of tasks for pattern recognitionusing error gradient method [21]

The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps

In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)

In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]

Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896

119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh

function as follows

119872119896

119894119895= tanh ((119882119896 times 119909)

119894119895+ 119887119896) (3)

The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899

119894is

the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation

119886119895 = tanh(120573 sum119873times119873

119886119899times119899

119894+ 119887) (4)

After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes

4 Design of Proposed Methods

The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset

In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper

119910 =

1

2

(

sum119873

119894=119873(119900 minus 119906)

2

119873

)

05

(5)

where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved

41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly

Computational Intelligence and Neuroscience 5

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)

if 119891(1199091015840) le 119891(119909) then119909 larr 119909

1015840else

119909 larr 1199091015840 with a transition probability (119901)

endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer

end

Algorithm 1 CNNSA

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894

119872in population 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor each of individual 119909119894

119872in population 119875119872 do

select auxiliary parents 1199091119872 1199092

119872 1199093

119872

create offspring 119909child119872

using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909

child119872 119909119894

119872)

end119872 = 119872 + 1update 119909 for all layer

end

Algorithm 2 CNNDE

Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN

Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod

42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion

Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of

one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551

Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method

43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory

In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN

6 Computational Intelligence and Neuroscience

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894

119872 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor number of search do

if rand ltHMCR then119909119894

new from HMelse

if rand lt PAR then119909119894

new = 119909119894

new + Δ119909

else119909119894

new = 119909119894

119872+ rand

endend

end119909119894

new = 119909min + rand(119909max minus 119909min)

end

Algorithm 3 CNNHS

Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006

(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN

Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod

5 Simulation and Results

In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2

In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the

values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)

As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]

All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

2 Computational Intelligence and Neuroscience

proposed by Hinton and Salakhutdinov [8] On the otherside Krylov subspace descent is more robust and simplerthanHessian-free optimization as well as looks like it is betterfor the classification performance and optimization speedHowever Krylov subspace descent needs more memory thanHessian-free optimization [9]

In fact techniques of modern optimization are heuristicor metaheuristic These optimization techniques have beenapplied to solve any optimization problems in the researcharea of science engineering and even industry [10]Howeverresearch on metaheuristic to optimize DL method is rarelyconducted Onework is the combination of genetic algorithm(GA) and CNN proposed by You and Pu [11] Their modelselects the CNN characteristic by the process of recombina-tion and mutation on GA in which the model of CNN existsas individual in the algorithm of GA Besides in recombina-tion process only the layers weights and threshold value ofC1 (convolution in the first layer) and C3 (convolution in thethird layer) are changed inCNNmodel Another work is fine-tuning CNN using harmony search (HS) by Rosa et al [12]

In this paper we compared the performance of threemetaheuristic algorithms that is simulated annealing (SA)differential evolution (DE) andHS for optimizing CNNThestrategy employed is looking for the best value of the fitnessfunction on the last layer usingmetaheuristic algorithm thenthe results will be used again to calculate the weights andbiases in the previous layer In the case of testing the per-formance of the proposed methods we use MNIST (MixedNational Institute of Standards and Technology) datasetThis dataset comprises images of digital handwritten digitscontaining 60000 training data items and 10000 testing dataitems All of the images have been centered and standardizedwith the size of 28 times 28 pixels Each pixel of the image isrepresented by 0 for black and 255 for white and in betweenare different shades of gray [13]

This paper is organized as follows Section 1 is an intro-duction Section 2 explains the used metaheuristic algo-rithms Section 3 describes the convolution neural networksSection 4 gives a description of the proposed methodsSection 5 presents the result of simulation and Section 6 isthe conclusion

2 Metaheuristic Algorithms

Metaheuristic is well known as an efficient method for hardoptimization problems that is the problems that cannotbe solved optimally using deterministic approach within areasonable time limit Metaheuristic methods work for threemain purposes for fast solving of problem for solving largeproblems and for making a more robust algorithm Thesemethods are also simple to design as well as flexible and easyto implement [14]

In general metaheuristic algorithms use the combinationof rules and randomization to duplicate the phenomena ofnature The biological system imitations of metaheuristicalgorithm for instance are evolution strategy GA and DEPhenomena of ethology for examples are particle swarmoptimization (PSO) bee colony optimization (BCO) bac-terial foraging optimization algorithms (BFOA) and ant

colony optimization (ACO) Phenomena of physics are SAmicrocanonical annealing and threshold accepting method[15] Another form of metaheuristic is inspired by musicphenomena such as HS algorithm [16]

Classification of metaheuristic algorithm can also bedivided into single-solution-based and population-basedmetaheuristic algorithm Some of the examples for single-solution-based metaheuristic are the noising method tabusearch SA TA and guided local search In the case of meta-heuristic based on population it can be classified into swarmintelligent and evolutionary computation The general termof swarm intelligent is inspired by the collective behaviorof social insect colonies or animal societies Examples ofthese algorithms are GP GA ES and DE On the other sidethe algorithm for evolutionary computation takes inspirationfrom the principles of Darwinian for developing adaptationinto their environment Some examples of these algorithmsare PSO BCO ACO and BFOA [15] Among all thesemetaheuristic algorithms SA DE and HS are used in thispaper

21 Simulated Annealing Algorithm SA is a technique ofrandom search for the problem of global optimization Itmimics the process of annealing in material processing [10]This technique was firstly proposed in 1983 by Kirkpatrick etal [17]

The principle idea of SA is using random search whichnot only allows changes that improve the fitness functionbut also maintains some changes that are not ideal Asan example in minimum optimization problem any betterchanges that decrease the fitness function value 119891(119909) will beaccepted but some changes that increase 119891(119909) will also beaccepted with a transition probability (119901) as follows

119901 = exp(minusΔ119864119896119879

) (1)

where Δ119864 is the energy level changes 119896 is Boltzmannrsquosconstant and 119879 is temperature for controlling the process ofannealingThis equation is based on the Boltzmann distribu-tion in physics [10] The following is standard procedure ofSA for optimization problems

(1) Generate the solution vector the initial solution vectoris randomly selected and then the fitness function iscalculated

(2) Initialize the temperature if the temperature value istoo high it will take a long time to reach convergencewhereas a too small value can cause the system tomissthe global optimum

(3) Select a new solution a new solution is randomlyselected from the neighborhood of the current solu-tion

(4) Evaluate a new solution a new solution is acceptedas a new current solution depending on its fitnessfunction

(5) Decrease the temperature during the search processthe temperature is periodically decreased

Computational Intelligence and Neuroscience 3

(6) Stop or repeat the computation is stopped when thetermination criterion is satisfied Otherwise steps (2)and (6) are repeated

22 Differential Evolution Algorithm Differential evolutionwas firstly proposed by Price and Storn in 1995 to solvethe Chebyshev polynomial problem [15] This algorithm iscreated based on individuals difference exploiting randomsearch in the space of solution and finally operates theprocedure of mutation crossover and selection to obtain thesuitable individual in system [18]

There are some types in DE including the classical formDErand1bin it indicates that in the process ofmutation thetarget vector is randomly selected and only a single differentvector is applied The acronym of bin shows that crossoverprocess is organized by a rule of binomial decision Theprocedure of DE algorithm is shown by the following steps

(1) Determining parameter setting population size is thenumber of individuals Mutation factor (F) controlsthe magnification of the two individual differences toavoid search stagnation Crossover rate (CR) decideshowmany consecutive genes of themutated vector arecopied to the offspring

(2) Initialization of population the population is pro-duced by randomly generating the vectors in thesuitable search range

(3) Evaluation of individual each individual is evaluatedby calculating their objective function

(4) Mutation operation mutation adds identical variableto one or more vector parameters In this operationthree auxiliary parents (1199091199011

119872 1199091199012

119872 1199091199013

119872) are selected

randomly in which they will participate in mutationoperation to create a mutated individual 119909mut

119872as

follows

119909mut119872= 1199091199011

119872+ F (1199091199012

119872minus 1199091199013

119872) (2)

where 1199011 1199012 1199013 isin 1 2 119873 and119873 = 1199011 = 1199012 =1199013

(5) Combination operation recombination (crossover) isapplied after mutation operation

(6) Selection operation this operation determineswhether the offspring in the next generation shouldbecome a member of the population or not

(7) Stopping criterion the current generation is substi-tuted by the new generation until the criterion oftermination is satisfied

23 Harmony Search Algorithm Harmony search algorithmis proposed by Geem et al in 2001 [19] This algorithm isinspired by the musical process of searching for a perfectstate of harmony Like harmony in music solution vector ofoptimization and improvisation from the musician are anal-ogous to structures of local and global search in optimizationtechniques

In improvisation of the music the players sound anypitch in the possible range together that can create onevector of harmony In the case of pitches creating a realharmony this experience is stored in the memory of eachplayer and they have the opportunity to create better harmonynext time [16] There are three possible alternatives whenone pitch is improvised by a musician any one pitch isplayed from herhis memory a nearby pitch is played fromherhis memory and an entirely random pitch is played withthe range of possible sound If these options are used foroptimization they have three equivalent components the useof harmony memory pitch adjusting and randomization InHS algorithm these rules are correlated with two relevantparameters that is harmony consideration rate (HMCR) andpitch adjusting rate (PAR) The procedure of HS algorithmcan be summarized into five steps as follows [16]

(1) Initialize the problem and parameters in this algo-rithm the problem can be maximum or minimumoptimization and the relevant parameters areHMCRPAR size of harmony memory and terminationcriterion

(2) Initialize harmony memory the harmony memory(HM) is usually initialized as a matrix that is createdrandomly as a vector of solution and arranged basedon the objective function

(3) Improve a new harmony a vector of new harmonyis produced from HM based on HMCR PAR andrandomization Selection of new value is based onHMCR parameter by range 0 through 1The vector ofnew harmony is observed to decide whether it shouldbe pitch-adjusted using PAR parameter The processof pitch adjusting is executed only after a value isselected from HM

(4) Update harmony memory the new harmony substi-tutes the worst harmony in terms of the value of thefitness function in which the fitness function of newharmony is better than worst harmony

(5) Repeat (3) and (4) until satisfying the terminationcriterion in the case of meeting the terminationcriterion the computation is ended Alternativelyprocess (3) and (4) are reiterated In the end thevector of the best HM is nominated and is reflectedas the best solution for the problem

3 Convolution Neural Network

CNN is a variant of the standard multilayer perceptron(MLP) A substantial advantage of this method especially forpattern recognition compared with conventional approachesis due to its capability in reducing the dimension of dataextracting the feature sequentially and classifying one struc-ture of network [20] The basic architecture model of CNNwas inspired in 1962 from visual cortex proposed by Hubeland Wiesel

In 1980 Fukushimas Neocognitron created the firstcomputation of this model and then in 1989 following theidea of Fukushima LeCun et al found the state-of-the-art

4 Computational Intelligence and Neuroscience

Input

ConvolutionsGaussian connection

Subsampling

C1 feature maps

Output10

Full connectionConvolutions Subsampling Full connection

S2 f maps C5 layer120 F6 layer

84814 times 1432 times 32

828 times 28

S4 f maps 165 times 5C3 f maps 1010 times 10

Figure 1 Architecture of CNN by LeCun et al (LeNet-5)

performance on a number of tasks for pattern recognitionusing error gradient method [21]

The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps

In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)

In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]

Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896

119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh

function as follows

119872119896

119894119895= tanh ((119882119896 times 119909)

119894119895+ 119887119896) (3)

The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899

119894is

the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation

119886119895 = tanh(120573 sum119873times119873

119886119899times119899

119894+ 119887) (4)

After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes

4 Design of Proposed Methods

The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset

In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper

119910 =

1

2

(

sum119873

119894=119873(119900 minus 119906)

2

119873

)

05

(5)

where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved

41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly

Computational Intelligence and Neuroscience 5

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)

if 119891(1199091015840) le 119891(119909) then119909 larr 119909

1015840else

119909 larr 1199091015840 with a transition probability (119901)

endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer

end

Algorithm 1 CNNSA

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894

119872in population 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor each of individual 119909119894

119872in population 119875119872 do

select auxiliary parents 1199091119872 1199092

119872 1199093

119872

create offspring 119909child119872

using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909

child119872 119909119894

119872)

end119872 = 119872 + 1update 119909 for all layer

end

Algorithm 2 CNNDE

Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN

Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod

42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion

Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of

one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551

Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method

43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory

In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN

6 Computational Intelligence and Neuroscience

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894

119872 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor number of search do

if rand ltHMCR then119909119894

new from HMelse

if rand lt PAR then119909119894

new = 119909119894

new + Δ119909

else119909119894

new = 119909119894

119872+ rand

endend

end119909119894

new = 119909min + rand(119909max minus 119909min)

end

Algorithm 3 CNNHS

Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006

(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN

Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod

5 Simulation and Results

In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2

In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the

values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)

As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]

All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Computational Intelligence and Neuroscience 3

(6) Stop or repeat the computation is stopped when thetermination criterion is satisfied Otherwise steps (2)and (6) are repeated

22 Differential Evolution Algorithm Differential evolutionwas firstly proposed by Price and Storn in 1995 to solvethe Chebyshev polynomial problem [15] This algorithm iscreated based on individuals difference exploiting randomsearch in the space of solution and finally operates theprocedure of mutation crossover and selection to obtain thesuitable individual in system [18]

There are some types in DE including the classical formDErand1bin it indicates that in the process ofmutation thetarget vector is randomly selected and only a single differentvector is applied The acronym of bin shows that crossoverprocess is organized by a rule of binomial decision Theprocedure of DE algorithm is shown by the following steps

(1) Determining parameter setting population size is thenumber of individuals Mutation factor (F) controlsthe magnification of the two individual differences toavoid search stagnation Crossover rate (CR) decideshowmany consecutive genes of themutated vector arecopied to the offspring

(2) Initialization of population the population is pro-duced by randomly generating the vectors in thesuitable search range

(3) Evaluation of individual each individual is evaluatedby calculating their objective function

(4) Mutation operation mutation adds identical variableto one or more vector parameters In this operationthree auxiliary parents (1199091199011

119872 1199091199012

119872 1199091199013

119872) are selected

randomly in which they will participate in mutationoperation to create a mutated individual 119909mut

119872as

follows

119909mut119872= 1199091199011

119872+ F (1199091199012

119872minus 1199091199013

119872) (2)

where 1199011 1199012 1199013 isin 1 2 119873 and119873 = 1199011 = 1199012 =1199013

(5) Combination operation recombination (crossover) isapplied after mutation operation

(6) Selection operation this operation determineswhether the offspring in the next generation shouldbecome a member of the population or not

(7) Stopping criterion the current generation is substi-tuted by the new generation until the criterion oftermination is satisfied

23 Harmony Search Algorithm Harmony search algorithmis proposed by Geem et al in 2001 [19] This algorithm isinspired by the musical process of searching for a perfectstate of harmony Like harmony in music solution vector ofoptimization and improvisation from the musician are anal-ogous to structures of local and global search in optimizationtechniques

In improvisation of the music the players sound anypitch in the possible range together that can create onevector of harmony In the case of pitches creating a realharmony this experience is stored in the memory of eachplayer and they have the opportunity to create better harmonynext time [16] There are three possible alternatives whenone pitch is improvised by a musician any one pitch isplayed from herhis memory a nearby pitch is played fromherhis memory and an entirely random pitch is played withthe range of possible sound If these options are used foroptimization they have three equivalent components the useof harmony memory pitch adjusting and randomization InHS algorithm these rules are correlated with two relevantparameters that is harmony consideration rate (HMCR) andpitch adjusting rate (PAR) The procedure of HS algorithmcan be summarized into five steps as follows [16]

(1) Initialize the problem and parameters in this algo-rithm the problem can be maximum or minimumoptimization and the relevant parameters areHMCRPAR size of harmony memory and terminationcriterion

(2) Initialize harmony memory the harmony memory(HM) is usually initialized as a matrix that is createdrandomly as a vector of solution and arranged basedon the objective function

(3) Improve a new harmony a vector of new harmonyis produced from HM based on HMCR PAR andrandomization Selection of new value is based onHMCR parameter by range 0 through 1The vector ofnew harmony is observed to decide whether it shouldbe pitch-adjusted using PAR parameter The processof pitch adjusting is executed only after a value isselected from HM

(4) Update harmony memory the new harmony substi-tutes the worst harmony in terms of the value of thefitness function in which the fitness function of newharmony is better than worst harmony

(5) Repeat (3) and (4) until satisfying the terminationcriterion in the case of meeting the terminationcriterion the computation is ended Alternativelyprocess (3) and (4) are reiterated In the end thevector of the best HM is nominated and is reflectedas the best solution for the problem

3 Convolution Neural Network

CNN is a variant of the standard multilayer perceptron(MLP) A substantial advantage of this method especially forpattern recognition compared with conventional approachesis due to its capability in reducing the dimension of dataextracting the feature sequentially and classifying one struc-ture of network [20] The basic architecture model of CNNwas inspired in 1962 from visual cortex proposed by Hubeland Wiesel

In 1980 Fukushimas Neocognitron created the firstcomputation of this model and then in 1989 following theidea of Fukushima LeCun et al found the state-of-the-art

4 Computational Intelligence and Neuroscience

Input

ConvolutionsGaussian connection

Subsampling

C1 feature maps

Output10

Full connectionConvolutions Subsampling Full connection

S2 f maps C5 layer120 F6 layer

84814 times 1432 times 32

828 times 28

S4 f maps 165 times 5C3 f maps 1010 times 10

Figure 1 Architecture of CNN by LeCun et al (LeNet-5)

performance on a number of tasks for pattern recognitionusing error gradient method [21]

The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps

In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)

In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]

Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896

119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh

function as follows

119872119896

119894119895= tanh ((119882119896 times 119909)

119894119895+ 119887119896) (3)

The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899

119894is

the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation

119886119895 = tanh(120573 sum119873times119873

119886119899times119899

119894+ 119887) (4)

After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes

4 Design of Proposed Methods

The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset

In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper

119910 =

1

2

(

sum119873

119894=119873(119900 minus 119906)

2

119873

)

05

(5)

where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved

41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly

Computational Intelligence and Neuroscience 5

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)

if 119891(1199091015840) le 119891(119909) then119909 larr 119909

1015840else

119909 larr 1199091015840 with a transition probability (119901)

endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer

end

Algorithm 1 CNNSA

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894

119872in population 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor each of individual 119909119894

119872in population 119875119872 do

select auxiliary parents 1199091119872 1199092

119872 1199093

119872

create offspring 119909child119872

using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909

child119872 119909119894

119872)

end119872 = 119872 + 1update 119909 for all layer

end

Algorithm 2 CNNDE

Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN

Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod

42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion

Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of

one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551

Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method

43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory

In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN

6 Computational Intelligence and Neuroscience

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894

119872 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor number of search do

if rand ltHMCR then119909119894

new from HMelse

if rand lt PAR then119909119894

new = 119909119894

new + Δ119909

else119909119894

new = 119909119894

119872+ rand

endend

end119909119894

new = 119909min + rand(119909max minus 119909min)

end

Algorithm 3 CNNHS

Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006

(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN

Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod

5 Simulation and Results

In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2

In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the

values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)

As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]

All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

4 Computational Intelligence and Neuroscience

Input

ConvolutionsGaussian connection

Subsampling

C1 feature maps

Output10

Full connectionConvolutions Subsampling Full connection

S2 f maps C5 layer120 F6 layer

84814 times 1432 times 32

828 times 28

S4 f maps 165 times 5C3 f maps 1010 times 10

Figure 1 Architecture of CNN by LeCun et al (LeNet-5)

performance on a number of tasks for pattern recognitionusing error gradient method [21]

The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps

In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)

In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]

Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896

119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh

function as follows

119872119896

119894119895= tanh ((119882119896 times 119909)

119894119895+ 119887119896) (3)

The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899

119894is

the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation

119886119895 = tanh(120573 sum119873times119873

119886119899times119899

119894+ 119887) (4)

After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes

4 Design of Proposed Methods

The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset

In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper

119910 =

1

2

(

sum119873

119894=119873(119900 minus 119906)

2

119873

)

05

(5)

where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved

41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly

Computational Intelligence and Neuroscience 5

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)

if 119891(1199091015840) le 119891(119909) then119909 larr 119909

1015840else

119909 larr 1199091015840 with a transition probability (119901)

endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer

end

Algorithm 1 CNNSA

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894

119872in population 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor each of individual 119909119894

119872in population 119875119872 do

select auxiliary parents 1199091119872 1199092

119872 1199093

119872

create offspring 119909child119872

using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909

child119872 119909119894

119872)

end119872 = 119872 + 1update 119909 for all layer

end

Algorithm 2 CNNDE

Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN

Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod

42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion

Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of

one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551

Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method

43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory

In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN

6 Computational Intelligence and Neuroscience

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894

119872 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor number of search do

if rand ltHMCR then119909119894

new from HMelse

if rand lt PAR then119909119894

new = 119909119894

new + Δ119909

else119909119894

new = 119909119894

119872+ rand

endend

end119909119894

new = 119909min + rand(119909max minus 119909min)

end

Algorithm 3 CNNHS

Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006

(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN

Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod

5 Simulation and Results

In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2

In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the

values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)

As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]

All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Computational Intelligence and Neuroscience 5

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)

if 119891(1199091015840) le 119891(119909) then119909 larr 119909

1015840else

119909 larr 1199091015840 with a transition probability (119901)

endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer

end

Algorithm 1 CNNSA

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894

119872in population 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor each of individual 119909119894

119872in population 119875119872 do

select auxiliary parents 1199091119872 1199092

119872 1199093

119872

create offspring 119909child119872

using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909

child119872 119909119894

119872)

end119872 = 119872 + 1update 119909 for all layer

end

Algorithm 2 CNNDE

Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN

Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod

42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion

Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of

one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551

Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method

43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory

In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN

6 Computational Intelligence and Neuroscience

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894

119872 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor number of search do

if rand ltHMCR then119909119894

new from HMelse

if rand lt PAR then119909119894

new = 119909119894

new + Δ119909

else119909119894

new = 119909119894

119872+ rand

endend

end119909119894

new = 119909min + rand(119909max minus 119909min)

end

Algorithm 3 CNNHS

Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006

(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN

Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod

5 Simulation and Results

In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2

In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the

values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)

As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]

All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

6 Computational Intelligence and Neuroscience

Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894

119872 119908 and 119887 on the last layer

while termination criterion is not satisfied dofor number of search do

if rand ltHMCR then119909119894

new from HMelse

if rand lt PAR then119909119894

new = 119909119894

new + Δ119909

else119909119894

new = 119909119894

119872+ rand

endend

end119909119894

new = 119909min + rand(119909max minus 119909min)

end

Algorithm 3 CNNHS

Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006

(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN

Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod

5 Simulation and Results

In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2

In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the

values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)

As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]

All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Computational Intelligence and Neuroscience 7

Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436

Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation

1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004

Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c

Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation

1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768

Figure 2 Example of some images fromMNIST dataset

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

8 Computational Intelligence and Neuroscience

CNN CNNSACNNDE CNNHS

0

4

8

12

16

20Er

ror (

)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

02

04

06

08

1

Erro

r (

)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)

CNNCNNSA

CNNDECNNHS

0

300

600

900

1200

1500

1800

Tim

e (s)

2 3 4 5 6 7 8 9 101Number of epochs

(a)

CNNCNNSA

CNNDECNNHS

0

2

4

6

8

10Ti

me (

s)

2 3 4 5 6 7 8 9 101Number of epochs

(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

5

10

15

20

25

Erro

r (

)

(a)

CNNCNNSA

CNNDECNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

03

06

09

12

15

Erro

r (

)

(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Computational Intelligence and Neuroscience 9

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

minus200

200

600

1000

1400

1800

2200

2600

3000Ti

me (

s)

(a)

CNN CNNSACNNDE CNNHS

2 3 4 5 6 7 8 9 101Number of epochs

0

20

40

60

80

100

120

Tim

e (s)

(b)

Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)

CNN CNNSA

CNNHSCNNDE

0

05

1

15

Erro

r (

)

0005

01015

02025

03035

04045

05Er

ror (

)

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s) times10

4times10

4

times104

times104

1 2 3 4 5 6 70Time (s)

1 2 3 4 5 6 70Time (s)

0005

01015

02025

03035

04045

05

Erro

r (

)

0005

01015

02025

03035

04045

05

Erro

r (

)

Figure 7 Error versus computation time for 100 epochs

as Figure 6 for computational time and its standard devia-tion

The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484

for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)

The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

10 Computational Intelligence and Neuroscience

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Figure 8 Example of some images from CIFAR dataset

CNNSACNN

002040608

112141618

222242628

3

Obj

ectiv

e

0

002

004

006

008

01

012O

bjec

tive

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

ValobjTrainobj

ValobjTrainobj

Figure 9 CNN versus CNNSA for objective

CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)

In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000

images Some example images of this dataset are showed inFigure 8

The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Computational Intelligence and Neuroscience 11

CNNSACNN

0

0001

0002

0003

0004

0005

0006

0007

0008

0009

001

Erro

r (

)

0

002

004

006

008

01

012

014

016

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop1errorValtop1error

Traintop1errorValtop1error

Figure 10 CNN versus CNNSA for top-1 error

CNNSACNN

0

01

02

03

04

05

06

07

08

09

1

11

12

Erro

r (

)

0

0002

0004

0006

0008

001

0012

0014

0016

0018

002

Erro

r (

)

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs

Traintop5errorValtop5error

Traintop5errorValtop5error

Figure 11 CNN versus CNNSA for top-5 error

6 Conclusion

This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch

It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN

For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

12 Computational Intelligence and Neuroscience

Table 5 Comparison of CNN and CNNSA for train

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186

Table 6 Comparison of CNN and CNNSA for validation

Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error

1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095

recurrent neural network deep belief network and AlexNet(a newer variant of CNN)

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)

References

[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015

[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013

[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012

[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Computational Intelligence and Neuroscience 13

[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015

[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011

[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010

[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012

[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010

[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015

[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015

[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014

[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009

[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013

[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005

[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983

[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011

[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001

[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010

[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012

[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012

[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article Metaheuristic Algorithms for Convolution Neural …downloads.hindawi.com/journals/cin/2016/1537325.pdf · Deep learning (DL) is mainly motivated by the research of

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014