Research Article Metaheuristic Algorithms for Convolution Neural...
Transcript of Research Article Metaheuristic Algorithms for Convolution Neural...
Research ArticleMetaheuristic Algorithms for Convolution Neural Network
L M Rasdi Rere12 Mohamad Ivan Fanany1 and Aniati Murni Arymurthy1
1Machine Learning and Computer Vision Laboratory Faculty of Computer Science Universitas Indonesia Depok 16424 Indonesia2Computer System Laboratory STMIK Jakarta STIampK Jakarta 12140 Indonesia
Correspondence should be addressed to L M Rasdi Rere laodemohammaduiacid
Received 29 January 2016 Revised 15 April 2016 Accepted 10 May 2016
Academic Editor Martin Hagan
Copyright copy 2016 L M Rasdi Rere et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
A typical modern optimization technique is usually either heuristic or metaheuristic This technique has managed to solvesome optimization problems in the research area of science engineering and industry However implementation strategy ofmetaheuristic for accuracy improvement on convolution neural networks (CNN) a famous deep learning method is still rarelyinvestigated Deep learning relates to a type of machine learning technique where its aim is to move closer to the goal of artificialintelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human In thispaper we propose the implementation strategy of three popular metaheuristic approaches that is simulated annealing differentialevolution and harmony search to optimize CNN The performances of these metaheuristic methods in optimizing CNN onclassifying MNIST and CIFAR dataset were evaluated and compared Furthermore the proposed methods are also comparedwith the original CNN Although the proposed methods show an increase in the computation time their accuracy has also beenimproved (up to 714 percent)
1 Introduction
Deep learning (DL) is mainly motivated by the research ofartificial intelligent in which the general goal is to imitatethe ability of human brain to observe analyze learn andmake a decision especially for complex problem [1] Thistechnique is in the intersection amongst the research area ofsignal processing neural network graphical modeling opti-mization and pattern recognition The current reputation ofDL is implicitly due to drastically improve the abilities ofchip processing significantly decrease the cost of computinghardware and advanced research in machine learning andsignal processing [2]
In general the model of DL technique can be classi-fied into discriminative generative and hybrid models [2]Discriminative models for instance are CNN deep neuralnetwork and recurrent neural network Some examples ofgenerative models are deep belief networks (DBN) restrictedBoltzmann machine regularized autoencoders and deepBoltzmann machines On the other hand hybrid modelrefers to the deep architecture using the combination of adiscriminative and generative model An example of this
model is DBN to pretrain deep CNN which can improvethe performance of deep CNN over random initializationAmong all of the hybrid DL techniques metaheuristic opti-mization for training a CNN is the focus of this paper
Although the sound character of DL has to solve a varietyof learning tasks training is difficult [3ndash5] Some examplesof successful methods for training DL are stochastic gradientdescent conjugate gradient Hessian-free optimization andKrylov subspace descent
Stochastic gradient descent is easy to implement and alsofast in the process for a case with many training samplesHowever this method needs several manual tuning schemeto make its parameters optimal and also its process is princi-pally sequential as a result it was difficult to parallelize themwith graphics processing unit (GPU) Conjugate gradient onthe other hand is easier to check for convergence as well asmore stable to train Nevertheless CG is slow so that it needsmultiple CPUs and availability of a vast number of RAMs [6]
Hessian-free optimization has been applied to traindeep autoencoders [7] proficient in handling underfittingproblem and more efficient than pretraining + fine tuning
Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2016 Article ID 1537325 13 pageshttpdxdoiorg10115520161537325
2 Computational Intelligence and Neuroscience
proposed by Hinton and Salakhutdinov [8] On the otherside Krylov subspace descent is more robust and simplerthanHessian-free optimization as well as looks like it is betterfor the classification performance and optimization speedHowever Krylov subspace descent needs more memory thanHessian-free optimization [9]
In fact techniques of modern optimization are heuristicor metaheuristic These optimization techniques have beenapplied to solve any optimization problems in the researcharea of science engineering and even industry [10]Howeverresearch on metaheuristic to optimize DL method is rarelyconducted Onework is the combination of genetic algorithm(GA) and CNN proposed by You and Pu [11] Their modelselects the CNN characteristic by the process of recombina-tion and mutation on GA in which the model of CNN existsas individual in the algorithm of GA Besides in recombina-tion process only the layers weights and threshold value ofC1 (convolution in the first layer) and C3 (convolution in thethird layer) are changed inCNNmodel Another work is fine-tuning CNN using harmony search (HS) by Rosa et al [12]
In this paper we compared the performance of threemetaheuristic algorithms that is simulated annealing (SA)differential evolution (DE) andHS for optimizing CNNThestrategy employed is looking for the best value of the fitnessfunction on the last layer usingmetaheuristic algorithm thenthe results will be used again to calculate the weights andbiases in the previous layer In the case of testing the per-formance of the proposed methods we use MNIST (MixedNational Institute of Standards and Technology) datasetThis dataset comprises images of digital handwritten digitscontaining 60000 training data items and 10000 testing dataitems All of the images have been centered and standardizedwith the size of 28 times 28 pixels Each pixel of the image isrepresented by 0 for black and 255 for white and in betweenare different shades of gray [13]
This paper is organized as follows Section 1 is an intro-duction Section 2 explains the used metaheuristic algo-rithms Section 3 describes the convolution neural networksSection 4 gives a description of the proposed methodsSection 5 presents the result of simulation and Section 6 isthe conclusion
2 Metaheuristic Algorithms
Metaheuristic is well known as an efficient method for hardoptimization problems that is the problems that cannotbe solved optimally using deterministic approach within areasonable time limit Metaheuristic methods work for threemain purposes for fast solving of problem for solving largeproblems and for making a more robust algorithm Thesemethods are also simple to design as well as flexible and easyto implement [14]
In general metaheuristic algorithms use the combinationof rules and randomization to duplicate the phenomena ofnature The biological system imitations of metaheuristicalgorithm for instance are evolution strategy GA and DEPhenomena of ethology for examples are particle swarmoptimization (PSO) bee colony optimization (BCO) bac-terial foraging optimization algorithms (BFOA) and ant
colony optimization (ACO) Phenomena of physics are SAmicrocanonical annealing and threshold accepting method[15] Another form of metaheuristic is inspired by musicphenomena such as HS algorithm [16]
Classification of metaheuristic algorithm can also bedivided into single-solution-based and population-basedmetaheuristic algorithm Some of the examples for single-solution-based metaheuristic are the noising method tabusearch SA TA and guided local search In the case of meta-heuristic based on population it can be classified into swarmintelligent and evolutionary computation The general termof swarm intelligent is inspired by the collective behaviorof social insect colonies or animal societies Examples ofthese algorithms are GP GA ES and DE On the other sidethe algorithm for evolutionary computation takes inspirationfrom the principles of Darwinian for developing adaptationinto their environment Some examples of these algorithmsare PSO BCO ACO and BFOA [15] Among all thesemetaheuristic algorithms SA DE and HS are used in thispaper
21 Simulated Annealing Algorithm SA is a technique ofrandom search for the problem of global optimization Itmimics the process of annealing in material processing [10]This technique was firstly proposed in 1983 by Kirkpatrick etal [17]
The principle idea of SA is using random search whichnot only allows changes that improve the fitness functionbut also maintains some changes that are not ideal Asan example in minimum optimization problem any betterchanges that decrease the fitness function value 119891(119909) will beaccepted but some changes that increase 119891(119909) will also beaccepted with a transition probability (119901) as follows
119901 = exp(minusΔ119864119896119879
) (1)
where Δ119864 is the energy level changes 119896 is Boltzmannrsquosconstant and 119879 is temperature for controlling the process ofannealingThis equation is based on the Boltzmann distribu-tion in physics [10] The following is standard procedure ofSA for optimization problems
(1) Generate the solution vector the initial solution vectoris randomly selected and then the fitness function iscalculated
(2) Initialize the temperature if the temperature value istoo high it will take a long time to reach convergencewhereas a too small value can cause the system tomissthe global optimum
(3) Select a new solution a new solution is randomlyselected from the neighborhood of the current solu-tion
(4) Evaluate a new solution a new solution is acceptedas a new current solution depending on its fitnessfunction
(5) Decrease the temperature during the search processthe temperature is periodically decreased
Computational Intelligence and Neuroscience 3
(6) Stop or repeat the computation is stopped when thetermination criterion is satisfied Otherwise steps (2)and (6) are repeated
22 Differential Evolution Algorithm Differential evolutionwas firstly proposed by Price and Storn in 1995 to solvethe Chebyshev polynomial problem [15] This algorithm iscreated based on individuals difference exploiting randomsearch in the space of solution and finally operates theprocedure of mutation crossover and selection to obtain thesuitable individual in system [18]
There are some types in DE including the classical formDErand1bin it indicates that in the process ofmutation thetarget vector is randomly selected and only a single differentvector is applied The acronym of bin shows that crossoverprocess is organized by a rule of binomial decision Theprocedure of DE algorithm is shown by the following steps
(1) Determining parameter setting population size is thenumber of individuals Mutation factor (F) controlsthe magnification of the two individual differences toavoid search stagnation Crossover rate (CR) decideshowmany consecutive genes of themutated vector arecopied to the offspring
(2) Initialization of population the population is pro-duced by randomly generating the vectors in thesuitable search range
(3) Evaluation of individual each individual is evaluatedby calculating their objective function
(4) Mutation operation mutation adds identical variableto one or more vector parameters In this operationthree auxiliary parents (1199091199011
119872 1199091199012
119872 1199091199013
119872) are selected
randomly in which they will participate in mutationoperation to create a mutated individual 119909mut
119872as
follows
119909mut119872= 1199091199011
119872+ F (1199091199012
119872minus 1199091199013
119872) (2)
where 1199011 1199012 1199013 isin 1 2 119873 and119873 = 1199011 = 1199012 =1199013
(5) Combination operation recombination (crossover) isapplied after mutation operation
(6) Selection operation this operation determineswhether the offspring in the next generation shouldbecome a member of the population or not
(7) Stopping criterion the current generation is substi-tuted by the new generation until the criterion oftermination is satisfied
23 Harmony Search Algorithm Harmony search algorithmis proposed by Geem et al in 2001 [19] This algorithm isinspired by the musical process of searching for a perfectstate of harmony Like harmony in music solution vector ofoptimization and improvisation from the musician are anal-ogous to structures of local and global search in optimizationtechniques
In improvisation of the music the players sound anypitch in the possible range together that can create onevector of harmony In the case of pitches creating a realharmony this experience is stored in the memory of eachplayer and they have the opportunity to create better harmonynext time [16] There are three possible alternatives whenone pitch is improvised by a musician any one pitch isplayed from herhis memory a nearby pitch is played fromherhis memory and an entirely random pitch is played withthe range of possible sound If these options are used foroptimization they have three equivalent components the useof harmony memory pitch adjusting and randomization InHS algorithm these rules are correlated with two relevantparameters that is harmony consideration rate (HMCR) andpitch adjusting rate (PAR) The procedure of HS algorithmcan be summarized into five steps as follows [16]
(1) Initialize the problem and parameters in this algo-rithm the problem can be maximum or minimumoptimization and the relevant parameters areHMCRPAR size of harmony memory and terminationcriterion
(2) Initialize harmony memory the harmony memory(HM) is usually initialized as a matrix that is createdrandomly as a vector of solution and arranged basedon the objective function
(3) Improve a new harmony a vector of new harmonyis produced from HM based on HMCR PAR andrandomization Selection of new value is based onHMCR parameter by range 0 through 1The vector ofnew harmony is observed to decide whether it shouldbe pitch-adjusted using PAR parameter The processof pitch adjusting is executed only after a value isselected from HM
(4) Update harmony memory the new harmony substi-tutes the worst harmony in terms of the value of thefitness function in which the fitness function of newharmony is better than worst harmony
(5) Repeat (3) and (4) until satisfying the terminationcriterion in the case of meeting the terminationcriterion the computation is ended Alternativelyprocess (3) and (4) are reiterated In the end thevector of the best HM is nominated and is reflectedas the best solution for the problem
3 Convolution Neural Network
CNN is a variant of the standard multilayer perceptron(MLP) A substantial advantage of this method especially forpattern recognition compared with conventional approachesis due to its capability in reducing the dimension of dataextracting the feature sequentially and classifying one struc-ture of network [20] The basic architecture model of CNNwas inspired in 1962 from visual cortex proposed by Hubeland Wiesel
In 1980 Fukushimas Neocognitron created the firstcomputation of this model and then in 1989 following theidea of Fukushima LeCun et al found the state-of-the-art
4 Computational Intelligence and Neuroscience
Input
ConvolutionsGaussian connection
Subsampling
C1 feature maps
Output10
Full connectionConvolutions Subsampling Full connection
S2 f maps C5 layer120 F6 layer
84814 times 1432 times 32
828 times 28
S4 f maps 165 times 5C3 f maps 1010 times 10
Figure 1 Architecture of CNN by LeCun et al (LeNet-5)
performance on a number of tasks for pattern recognitionusing error gradient method [21]
The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps
In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)
In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]
Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896
119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh
function as follows
119872119896
119894119895= tanh ((119882119896 times 119909)
119894119895+ 119887119896) (3)
The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899
119894is
the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation
119886119895 = tanh(120573 sum119873times119873
119886119899times119899
119894+ 119887) (4)
After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes
4 Design of Proposed Methods
The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset
In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper
119910 =
1
2
(
sum119873
119894=119873(119900 minus 119906)
2
119873
)
05
(5)
where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved
41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly
Computational Intelligence and Neuroscience 5
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)
if 119891(1199091015840) le 119891(119909) then119909 larr 119909
1015840else
119909 larr 1199091015840 with a transition probability (119901)
endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer
end
Algorithm 1 CNNSA
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894
119872in population 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor each of individual 119909119894
119872in population 119875119872 do
select auxiliary parents 1199091119872 1199092
119872 1199093
119872
create offspring 119909child119872
using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909
child119872 119909119894
119872)
end119872 = 119872 + 1update 119909 for all layer
end
Algorithm 2 CNNDE
Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN
Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod
42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion
Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of
one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551
Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method
43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory
In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN
6 Computational Intelligence and Neuroscience
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894
119872 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor number of search do
if rand ltHMCR then119909119894
new from HMelse
if rand lt PAR then119909119894
new = 119909119894
new + Δ119909
else119909119894
new = 119909119894
119872+ rand
endend
end119909119894
new = 119909min + rand(119909max minus 119909min)
end
Algorithm 3 CNNHS
Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006
(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN
Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod
5 Simulation and Results
In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2
In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the
values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)
As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]
All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 Computational Intelligence and Neuroscience
proposed by Hinton and Salakhutdinov [8] On the otherside Krylov subspace descent is more robust and simplerthanHessian-free optimization as well as looks like it is betterfor the classification performance and optimization speedHowever Krylov subspace descent needs more memory thanHessian-free optimization [9]
In fact techniques of modern optimization are heuristicor metaheuristic These optimization techniques have beenapplied to solve any optimization problems in the researcharea of science engineering and even industry [10]Howeverresearch on metaheuristic to optimize DL method is rarelyconducted Onework is the combination of genetic algorithm(GA) and CNN proposed by You and Pu [11] Their modelselects the CNN characteristic by the process of recombina-tion and mutation on GA in which the model of CNN existsas individual in the algorithm of GA Besides in recombina-tion process only the layers weights and threshold value ofC1 (convolution in the first layer) and C3 (convolution in thethird layer) are changed inCNNmodel Another work is fine-tuning CNN using harmony search (HS) by Rosa et al [12]
In this paper we compared the performance of threemetaheuristic algorithms that is simulated annealing (SA)differential evolution (DE) andHS for optimizing CNNThestrategy employed is looking for the best value of the fitnessfunction on the last layer usingmetaheuristic algorithm thenthe results will be used again to calculate the weights andbiases in the previous layer In the case of testing the per-formance of the proposed methods we use MNIST (MixedNational Institute of Standards and Technology) datasetThis dataset comprises images of digital handwritten digitscontaining 60000 training data items and 10000 testing dataitems All of the images have been centered and standardizedwith the size of 28 times 28 pixels Each pixel of the image isrepresented by 0 for black and 255 for white and in betweenare different shades of gray [13]
This paper is organized as follows Section 1 is an intro-duction Section 2 explains the used metaheuristic algo-rithms Section 3 describes the convolution neural networksSection 4 gives a description of the proposed methodsSection 5 presents the result of simulation and Section 6 isthe conclusion
2 Metaheuristic Algorithms
Metaheuristic is well known as an efficient method for hardoptimization problems that is the problems that cannotbe solved optimally using deterministic approach within areasonable time limit Metaheuristic methods work for threemain purposes for fast solving of problem for solving largeproblems and for making a more robust algorithm Thesemethods are also simple to design as well as flexible and easyto implement [14]
In general metaheuristic algorithms use the combinationof rules and randomization to duplicate the phenomena ofnature The biological system imitations of metaheuristicalgorithm for instance are evolution strategy GA and DEPhenomena of ethology for examples are particle swarmoptimization (PSO) bee colony optimization (BCO) bac-terial foraging optimization algorithms (BFOA) and ant
colony optimization (ACO) Phenomena of physics are SAmicrocanonical annealing and threshold accepting method[15] Another form of metaheuristic is inspired by musicphenomena such as HS algorithm [16]
Classification of metaheuristic algorithm can also bedivided into single-solution-based and population-basedmetaheuristic algorithm Some of the examples for single-solution-based metaheuristic are the noising method tabusearch SA TA and guided local search In the case of meta-heuristic based on population it can be classified into swarmintelligent and evolutionary computation The general termof swarm intelligent is inspired by the collective behaviorof social insect colonies or animal societies Examples ofthese algorithms are GP GA ES and DE On the other sidethe algorithm for evolutionary computation takes inspirationfrom the principles of Darwinian for developing adaptationinto their environment Some examples of these algorithmsare PSO BCO ACO and BFOA [15] Among all thesemetaheuristic algorithms SA DE and HS are used in thispaper
21 Simulated Annealing Algorithm SA is a technique ofrandom search for the problem of global optimization Itmimics the process of annealing in material processing [10]This technique was firstly proposed in 1983 by Kirkpatrick etal [17]
The principle idea of SA is using random search whichnot only allows changes that improve the fitness functionbut also maintains some changes that are not ideal Asan example in minimum optimization problem any betterchanges that decrease the fitness function value 119891(119909) will beaccepted but some changes that increase 119891(119909) will also beaccepted with a transition probability (119901) as follows
119901 = exp(minusΔ119864119896119879
) (1)
where Δ119864 is the energy level changes 119896 is Boltzmannrsquosconstant and 119879 is temperature for controlling the process ofannealingThis equation is based on the Boltzmann distribu-tion in physics [10] The following is standard procedure ofSA for optimization problems
(1) Generate the solution vector the initial solution vectoris randomly selected and then the fitness function iscalculated
(2) Initialize the temperature if the temperature value istoo high it will take a long time to reach convergencewhereas a too small value can cause the system tomissthe global optimum
(3) Select a new solution a new solution is randomlyselected from the neighborhood of the current solu-tion
(4) Evaluate a new solution a new solution is acceptedas a new current solution depending on its fitnessfunction
(5) Decrease the temperature during the search processthe temperature is periodically decreased
Computational Intelligence and Neuroscience 3
(6) Stop or repeat the computation is stopped when thetermination criterion is satisfied Otherwise steps (2)and (6) are repeated
22 Differential Evolution Algorithm Differential evolutionwas firstly proposed by Price and Storn in 1995 to solvethe Chebyshev polynomial problem [15] This algorithm iscreated based on individuals difference exploiting randomsearch in the space of solution and finally operates theprocedure of mutation crossover and selection to obtain thesuitable individual in system [18]
There are some types in DE including the classical formDErand1bin it indicates that in the process ofmutation thetarget vector is randomly selected and only a single differentvector is applied The acronym of bin shows that crossoverprocess is organized by a rule of binomial decision Theprocedure of DE algorithm is shown by the following steps
(1) Determining parameter setting population size is thenumber of individuals Mutation factor (F) controlsthe magnification of the two individual differences toavoid search stagnation Crossover rate (CR) decideshowmany consecutive genes of themutated vector arecopied to the offspring
(2) Initialization of population the population is pro-duced by randomly generating the vectors in thesuitable search range
(3) Evaluation of individual each individual is evaluatedby calculating their objective function
(4) Mutation operation mutation adds identical variableto one or more vector parameters In this operationthree auxiliary parents (1199091199011
119872 1199091199012
119872 1199091199013
119872) are selected
randomly in which they will participate in mutationoperation to create a mutated individual 119909mut
119872as
follows
119909mut119872= 1199091199011
119872+ F (1199091199012
119872minus 1199091199013
119872) (2)
where 1199011 1199012 1199013 isin 1 2 119873 and119873 = 1199011 = 1199012 =1199013
(5) Combination operation recombination (crossover) isapplied after mutation operation
(6) Selection operation this operation determineswhether the offspring in the next generation shouldbecome a member of the population or not
(7) Stopping criterion the current generation is substi-tuted by the new generation until the criterion oftermination is satisfied
23 Harmony Search Algorithm Harmony search algorithmis proposed by Geem et al in 2001 [19] This algorithm isinspired by the musical process of searching for a perfectstate of harmony Like harmony in music solution vector ofoptimization and improvisation from the musician are anal-ogous to structures of local and global search in optimizationtechniques
In improvisation of the music the players sound anypitch in the possible range together that can create onevector of harmony In the case of pitches creating a realharmony this experience is stored in the memory of eachplayer and they have the opportunity to create better harmonynext time [16] There are three possible alternatives whenone pitch is improvised by a musician any one pitch isplayed from herhis memory a nearby pitch is played fromherhis memory and an entirely random pitch is played withthe range of possible sound If these options are used foroptimization they have three equivalent components the useof harmony memory pitch adjusting and randomization InHS algorithm these rules are correlated with two relevantparameters that is harmony consideration rate (HMCR) andpitch adjusting rate (PAR) The procedure of HS algorithmcan be summarized into five steps as follows [16]
(1) Initialize the problem and parameters in this algo-rithm the problem can be maximum or minimumoptimization and the relevant parameters areHMCRPAR size of harmony memory and terminationcriterion
(2) Initialize harmony memory the harmony memory(HM) is usually initialized as a matrix that is createdrandomly as a vector of solution and arranged basedon the objective function
(3) Improve a new harmony a vector of new harmonyis produced from HM based on HMCR PAR andrandomization Selection of new value is based onHMCR parameter by range 0 through 1The vector ofnew harmony is observed to decide whether it shouldbe pitch-adjusted using PAR parameter The processof pitch adjusting is executed only after a value isselected from HM
(4) Update harmony memory the new harmony substi-tutes the worst harmony in terms of the value of thefitness function in which the fitness function of newharmony is better than worst harmony
(5) Repeat (3) and (4) until satisfying the terminationcriterion in the case of meeting the terminationcriterion the computation is ended Alternativelyprocess (3) and (4) are reiterated In the end thevector of the best HM is nominated and is reflectedas the best solution for the problem
3 Convolution Neural Network
CNN is a variant of the standard multilayer perceptron(MLP) A substantial advantage of this method especially forpattern recognition compared with conventional approachesis due to its capability in reducing the dimension of dataextracting the feature sequentially and classifying one struc-ture of network [20] The basic architecture model of CNNwas inspired in 1962 from visual cortex proposed by Hubeland Wiesel
In 1980 Fukushimas Neocognitron created the firstcomputation of this model and then in 1989 following theidea of Fukushima LeCun et al found the state-of-the-art
4 Computational Intelligence and Neuroscience
Input
ConvolutionsGaussian connection
Subsampling
C1 feature maps
Output10
Full connectionConvolutions Subsampling Full connection
S2 f maps C5 layer120 F6 layer
84814 times 1432 times 32
828 times 28
S4 f maps 165 times 5C3 f maps 1010 times 10
Figure 1 Architecture of CNN by LeCun et al (LeNet-5)
performance on a number of tasks for pattern recognitionusing error gradient method [21]
The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps
In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)
In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]
Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896
119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh
function as follows
119872119896
119894119895= tanh ((119882119896 times 119909)
119894119895+ 119887119896) (3)
The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899
119894is
the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation
119886119895 = tanh(120573 sum119873times119873
119886119899times119899
119894+ 119887) (4)
After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes
4 Design of Proposed Methods
The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset
In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper
119910 =
1
2
(
sum119873
119894=119873(119900 minus 119906)
2
119873
)
05
(5)
where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved
41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly
Computational Intelligence and Neuroscience 5
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)
if 119891(1199091015840) le 119891(119909) then119909 larr 119909
1015840else
119909 larr 1199091015840 with a transition probability (119901)
endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer
end
Algorithm 1 CNNSA
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894
119872in population 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor each of individual 119909119894
119872in population 119875119872 do
select auxiliary parents 1199091119872 1199092
119872 1199093
119872
create offspring 119909child119872
using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909
child119872 119909119894
119872)
end119872 = 119872 + 1update 119909 for all layer
end
Algorithm 2 CNNDE
Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN
Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod
42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion
Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of
one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551
Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method
43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory
In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN
6 Computational Intelligence and Neuroscience
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894
119872 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor number of search do
if rand ltHMCR then119909119894
new from HMelse
if rand lt PAR then119909119894
new = 119909119894
new + Δ119909
else119909119894
new = 119909119894
119872+ rand
endend
end119909119894
new = 119909min + rand(119909max minus 119909min)
end
Algorithm 3 CNNHS
Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006
(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN
Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod
5 Simulation and Results
In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2
In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the
values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)
As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]
All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 3
(6) Stop or repeat the computation is stopped when thetermination criterion is satisfied Otherwise steps (2)and (6) are repeated
22 Differential Evolution Algorithm Differential evolutionwas firstly proposed by Price and Storn in 1995 to solvethe Chebyshev polynomial problem [15] This algorithm iscreated based on individuals difference exploiting randomsearch in the space of solution and finally operates theprocedure of mutation crossover and selection to obtain thesuitable individual in system [18]
There are some types in DE including the classical formDErand1bin it indicates that in the process ofmutation thetarget vector is randomly selected and only a single differentvector is applied The acronym of bin shows that crossoverprocess is organized by a rule of binomial decision Theprocedure of DE algorithm is shown by the following steps
(1) Determining parameter setting population size is thenumber of individuals Mutation factor (F) controlsthe magnification of the two individual differences toavoid search stagnation Crossover rate (CR) decideshowmany consecutive genes of themutated vector arecopied to the offspring
(2) Initialization of population the population is pro-duced by randomly generating the vectors in thesuitable search range
(3) Evaluation of individual each individual is evaluatedby calculating their objective function
(4) Mutation operation mutation adds identical variableto one or more vector parameters In this operationthree auxiliary parents (1199091199011
119872 1199091199012
119872 1199091199013
119872) are selected
randomly in which they will participate in mutationoperation to create a mutated individual 119909mut
119872as
follows
119909mut119872= 1199091199011
119872+ F (1199091199012
119872minus 1199091199013
119872) (2)
where 1199011 1199012 1199013 isin 1 2 119873 and119873 = 1199011 = 1199012 =1199013
(5) Combination operation recombination (crossover) isapplied after mutation operation
(6) Selection operation this operation determineswhether the offspring in the next generation shouldbecome a member of the population or not
(7) Stopping criterion the current generation is substi-tuted by the new generation until the criterion oftermination is satisfied
23 Harmony Search Algorithm Harmony search algorithmis proposed by Geem et al in 2001 [19] This algorithm isinspired by the musical process of searching for a perfectstate of harmony Like harmony in music solution vector ofoptimization and improvisation from the musician are anal-ogous to structures of local and global search in optimizationtechniques
In improvisation of the music the players sound anypitch in the possible range together that can create onevector of harmony In the case of pitches creating a realharmony this experience is stored in the memory of eachplayer and they have the opportunity to create better harmonynext time [16] There are three possible alternatives whenone pitch is improvised by a musician any one pitch isplayed from herhis memory a nearby pitch is played fromherhis memory and an entirely random pitch is played withthe range of possible sound If these options are used foroptimization they have three equivalent components the useof harmony memory pitch adjusting and randomization InHS algorithm these rules are correlated with two relevantparameters that is harmony consideration rate (HMCR) andpitch adjusting rate (PAR) The procedure of HS algorithmcan be summarized into five steps as follows [16]
(1) Initialize the problem and parameters in this algo-rithm the problem can be maximum or minimumoptimization and the relevant parameters areHMCRPAR size of harmony memory and terminationcriterion
(2) Initialize harmony memory the harmony memory(HM) is usually initialized as a matrix that is createdrandomly as a vector of solution and arranged basedon the objective function
(3) Improve a new harmony a vector of new harmonyis produced from HM based on HMCR PAR andrandomization Selection of new value is based onHMCR parameter by range 0 through 1The vector ofnew harmony is observed to decide whether it shouldbe pitch-adjusted using PAR parameter The processof pitch adjusting is executed only after a value isselected from HM
(4) Update harmony memory the new harmony substi-tutes the worst harmony in terms of the value of thefitness function in which the fitness function of newharmony is better than worst harmony
(5) Repeat (3) and (4) until satisfying the terminationcriterion in the case of meeting the terminationcriterion the computation is ended Alternativelyprocess (3) and (4) are reiterated In the end thevector of the best HM is nominated and is reflectedas the best solution for the problem
3 Convolution Neural Network
CNN is a variant of the standard multilayer perceptron(MLP) A substantial advantage of this method especially forpattern recognition compared with conventional approachesis due to its capability in reducing the dimension of dataextracting the feature sequentially and classifying one struc-ture of network [20] The basic architecture model of CNNwas inspired in 1962 from visual cortex proposed by Hubeland Wiesel
In 1980 Fukushimas Neocognitron created the firstcomputation of this model and then in 1989 following theidea of Fukushima LeCun et al found the state-of-the-art
4 Computational Intelligence and Neuroscience
Input
ConvolutionsGaussian connection
Subsampling
C1 feature maps
Output10
Full connectionConvolutions Subsampling Full connection
S2 f maps C5 layer120 F6 layer
84814 times 1432 times 32
828 times 28
S4 f maps 165 times 5C3 f maps 1010 times 10
Figure 1 Architecture of CNN by LeCun et al (LeNet-5)
performance on a number of tasks for pattern recognitionusing error gradient method [21]
The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps
In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)
In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]
Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896
119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh
function as follows
119872119896
119894119895= tanh ((119882119896 times 119909)
119894119895+ 119887119896) (3)
The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899
119894is
the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation
119886119895 = tanh(120573 sum119873times119873
119886119899times119899
119894+ 119887) (4)
After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes
4 Design of Proposed Methods
The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset
In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper
119910 =
1
2
(
sum119873
119894=119873(119900 minus 119906)
2
119873
)
05
(5)
where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved
41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly
Computational Intelligence and Neuroscience 5
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)
if 119891(1199091015840) le 119891(119909) then119909 larr 119909
1015840else
119909 larr 1199091015840 with a transition probability (119901)
endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer
end
Algorithm 1 CNNSA
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894
119872in population 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor each of individual 119909119894
119872in population 119875119872 do
select auxiliary parents 1199091119872 1199092
119872 1199093
119872
create offspring 119909child119872
using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909
child119872 119909119894
119872)
end119872 = 119872 + 1update 119909 for all layer
end
Algorithm 2 CNNDE
Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN
Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod
42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion
Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of
one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551
Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method
43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory
In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN
6 Computational Intelligence and Neuroscience
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894
119872 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor number of search do
if rand ltHMCR then119909119894
new from HMelse
if rand lt PAR then119909119894
new = 119909119894
new + Δ119909
else119909119894
new = 119909119894
119872+ rand
endend
end119909119894
new = 119909min + rand(119909max minus 119909min)
end
Algorithm 3 CNNHS
Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006
(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN
Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod
5 Simulation and Results
In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2
In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the
values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)
As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]
All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 Computational Intelligence and Neuroscience
Input
ConvolutionsGaussian connection
Subsampling
C1 feature maps
Output10
Full connectionConvolutions Subsampling Full connection
S2 f maps C5 layer120 F6 layer
84814 times 1432 times 32
828 times 28
S4 f maps 165 times 5C3 f maps 1010 times 10
Figure 1 Architecture of CNN by LeCun et al (LeNet-5)
performance on a number of tasks for pattern recognitionusing error gradient method [21]
The classical CNN by LeCun et al is an extension oftraditional MLP based on three ideas local receptive fieldsweights sharing and spatialtemporal subsampling Theseideas can be organized into two types of layers which areconvolution layers and subsampling layers As is showedin Figure 1 the processing layers contain three convolutionlayers C1 C3 andC5 combined in betweenwith two subsam-pling layers S2 and S4 and output layer F6These convolutionand subsampling layers are structured into planes calledfeatures maps
In convolution layer each neuron is linked locally to asmall input region (local receptive field) in the precedinglayer All neurons with similar feature maps obtain datafrom different input regions until the whole of plane input isskimmed but the same of weights is shared (weights sharing)
In subsampling layer the feature maps are spatiallydownsampled in which the size of the map is reduced by afactor 2 As an example the feature map in layer C3 of size10 times 10 is subsampled to a conforming feature map of size5times 5 in the subsequent layer S4The last layer is F6 that is theprocess of classification [21]
Principally a convolution layer is correlated with somefeature maps the size of the kernel and connections to theprevious layer Each feature map is the results of a sum ofconvolution from the maps of the previous layer by theircorresponding kernel and a linear filter Adding a bias termand applying it to a nonlinear function the 119896th feature map119872119896
119894119895with theweights119882119896 and bias 119887119896 is obtained using the tanh
function as follows
119872119896
119894119895= tanh ((119882119896 times 119909)
119894119895+ 119887119896) (3)
The purpose of a subsampling layer is to reach spatialinvariance by reducing the resolution of feature maps inwhich each pooled feature map relates to one feature map ofthe preceding layer The subsampling function where 119886119899times119899
119894is
the inputs 120573 is a trainable scalar and 119887 is trainable bias isgiven by the following equation
119886119895 = tanh(120573 sum119873times119873
119886119899times119899
119894+ 119887) (4)
After several convolutions and subsampling the laststructure is classification layer This layer works as an inputfor a series of fully connected layers that will execute theclassification task It has one output neuron every class labeland in the case of MNIST dataset this layer contains tenneurons corresponding to their classes
4 Design of Proposed Methods
The architecture of this proposed method refers to a simpleCNN structure (LeNet-5) not a complex structure likeAlexNet [22] We use two variations of design structure Firstis i-6c-2s-12c-2s where the number of C1 is 6 and that of C3is 12 Second is i-8c-2s-16c-2s where the number of C1 is 8and that of C3 is 18 The kernel size of all convolution layersis 5 times 5 and the scale of subsampling is 2 This architectureis designed for recognizing handwritten digits from MNISTdataset
In this proposed method SA DE and HS algorithm areused to train CNN (CNNSA CNNDE and CNNHS) to findthe condition of best accuracy and also tominimize estimatederror and indicator of network complexityThis objective canbe realized by computing the lost function of vector solutionor the standard error on the training set The following is thelost function used in this paper
119910 =
1
2
(
sum119873
119894=119873(119900 minus 119906)
2
119873
)
05
(5)
where 119900 is the expected output 119906 is the real output and119873 issome training samples In the case of termination criteriontwo situations are used in this method The first is whenthe maximum iteration has been reached and the secondis when the loss function is less than a certain constantBoth conditions mean that the most optimal state has beenachieved
41 Design of CNNSA Method Principally algorithm onCNN computes the values of weight and bias in which onthe last layer they are used to calculate the lost functionThesevalues of weight and bias in the last layer are used as solutionvector denoted as 119909 to be optimized in SA algorithm byadding Δ119909 randomly
Computational Intelligence and Neuroscience 5
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)
if 119891(1199091015840) le 119891(119909) then119909 larr 119909
1015840else
119909 larr 1199091015840 with a transition probability (119901)
endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer
end
Algorithm 1 CNNSA
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894
119872in population 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor each of individual 119909119894
119872in population 119875119872 do
select auxiliary parents 1199091119872 1199092
119872 1199093
119872
create offspring 119909child119872
using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909
child119872 119909119894
119872)
end119872 = 119872 + 1update 119909 for all layer
end
Algorithm 2 CNNDE
Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN
Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod
42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion
Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of
one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551
Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method
43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory
In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN
6 Computational Intelligence and Neuroscience
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894
119872 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor number of search do
if rand ltHMCR then119909119894
new from HMelse
if rand lt PAR then119909119894
new = 119909119894
new + Δ119909
else119909119894
new = 119909119894
119872+ rand
endend
end119909119894
new = 119909min + rand(119909max minus 119909min)
end
Algorithm 3 CNNHS
Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006
(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN
Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod
5 Simulation and Results
In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2
In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the
values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)
As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]
All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 5
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)solution vector (119909) 119908 and 119887 on the last layerwhile termination criterion is not satisfied dofor number of 1199091015840 do1199091015840= 119909 + Δ119909 119891(1199091015840)
if 119891(1199091015840) le 119891(119909) then119909 larr 119909
1015840else
119909 larr 1199091015840 with a transition probability (119901)
endenddecrease the temperature 119879 = 119888 times 119879update 119909 for all layer
end
Algorithm 1 CNNSA
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)individual 119909119894
119872in population 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor each of individual 119909119894
119872in population 119875119872 do
select auxiliary parents 1199091119872 1199092
119872 1199093
119872
create offspring 119909child119872
using mutation and recombination119875119872+1 = 119875119872+1cup Best (119909
child119872 119909119894
119872)
end119872 = 119872 + 1update 119909 for all layer
end
Algorithm 2 CNNDE
Δ119909 is the essential aspect of this proposed methodSelection in the proper of this value will significantly increasethe accuracy For example in CNNSA to one epoch if Δ119909 =00008 times rand then the accuracy is 8812 in which this valueis 573 greater than the original CNN (8239) However ifΔ119909 = 00001 times rand its accuracy is 8579 and its value is only340 greater than the original CNN
Furthermore this solution vector is updated based onSA algorithm When the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystem Algorithm 1 is the CNNSA algorithm of the proposedmethod
42 Design of CNNDEMethod At the first time this methodcomputes all the values of weight and bias The values ofweight and bias on the last layer (119909) are used to calculate thelost function and then by adding Δ119909 randomly these newvalues are used to initialize the individuals in the popula-tion
Similar to CNNSAmethod selection in the proper of Δ119909will significantly increase the value of accuracy In the case of
one epoch in CNNDE as an example if Δ119909 = 00008 times randthen the accuracy is 8630 in which this value is 391 greaterthan the original CNN (8239) However if Δ119909 = 000001 timesrand its accuracy is 8551
Furthermore these individuals in the population areupdated based on the DE algorithm When the terminationcriterion is satisfied all of weights and biases are updated forall layers in the system Algorithm 2 is the CNNDE algorithmof the proposed method
43 Design of CNNHSMethod At the first time like CNNSAand CNNDE this method computes all the values of weightand bias The values of weight and bias on the last layer (119909)are used to calculate the lost function and then by adding Δ119909randomly these new values are used to initialize the harmonymemory
In this method Δ119909 is also an important aspect whileselection of the proper of Δ119909 will significantly increase thevalue of accuracy For example of one epoch in CNNHS (i-8c-2s-16c-2s) if Δ119909 = 00008 times rand then the accuracy is8723 in which this value is 714 greater than the original CNN
6 Computational Intelligence and Neuroscience
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894
119872 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor number of search do
if rand ltHMCR then119909119894
new from HMelse
if rand lt PAR then119909119894
new = 119909119894
new + Δ119909
else119909119894
new = 119909119894
119872+ rand
endend
end119909119894
new = 119909min + rand(119909max minus 119909min)
end
Algorithm 3 CNNHS
Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006
(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN
Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod
5 Simulation and Results
In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2
In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the
values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)
As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]
All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 Computational Intelligence and Neuroscience
Result accuracy timeinitialization and set-up i-6c-2s-12c-2scalculation process weights (119908) biases (119887) lost function 119891(119909)harmony memory 119909119894
119872 119908 and 119887 on the last layer
while termination criterion is not satisfied dofor number of search do
if rand ltHMCR then119909119894
new from HMelse
if rand lt PAR then119909119894
new = 119909119894
new + Δ119909
else119909119894
new = 119909119894
119872+ rand
endend
end119909119894
new = 119909min + rand(119909max minus 119909min)
end
Algorithm 3 CNNHS
Table 1 Accuracy and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8239 na 8812 039 8630 033 8723 0952 8906 na 9277 043 9133 019 9120 0333 9113 na 9461 031 9345 028 9324 0404 9233 na 9557 016 9463 044 9377 0125 9311 na 9629 014 9515 015 9489 0336 9367 na 9661 018 9567 020 9517 0437 9425 na 9672 012 9628 020 9565 0208 9477 na 9699 011 9659 011 9608 0249 9537 na 9711 006 9668 017 9616 01110 9545 na 9737 014 9686 010 9698 006
(8009) However if Δ119909 = 000001 times rand its accuracy is8023 the value is only 014 greater than CNN
Furthermore this harmony memory is updated based ontheHS algorithmWhen the termination criterion is satisfiedall of weights and biases are updated for all layers in thesystemAlgorithm 3 is theCNNHSalgorithmof the proposedmethod
5 Simulation and Results
In this paper the primary goal is to improve the accuracyof original CNN by using SA DE and HS algorithm Thiscan be performed by minimizing the classification task errortested on theMNIST dataset Some of the example images forMNIST dataset are shown in Figure 2
In CNNSA experiment the size of neighborhood was set= 10 and maximum of iteration (maxit) = 10 In CNNDE thepopulation size = 10 andmaxit = 10 In CNNHS the harmonymemory size = 10 and maxit = 10 Since it is difficult to makesure of the control of parameter in all of the experiment the
values of 119888 = 05 for SA F = 08 and CR = 03 for DE andHMCR= 08 and PAR = 03 for HSWe also set the parameterof CNN that is the learning rate (120572 = 1) and the batch size(100)
As for the epoch parameter the number of epochs is 1 to10 for every experiment All of the experiment was imple-mented in MATLAB-R2011a on a personal computer withprocessor Intel Core i7-4500u 8GB RAM running memoryWindows 10 with five separate runtimes The original pro-gram of this simulation is DeepLearn Toolbox from Palm[23]
All of the experiment results of the proposedmethods arecompared with the experiment result from the original CNNThese results for the design of i-6c-2s-12c-2s are summarizedin Table 1 for accuracy Table 2 for the computational timeand Figure 3 for error and its standard deviation as well asFigure 4 for computational time and its standard deviationThe results for the design of i-8c-2s-16c-2s are summarizedin Table 3 for accuracy Table 4 for the computational timeand Figure 5 for error and its standard deviation as well
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 7
Table 2 Computation time and its standard deviation for design i-2s-6c-2s-12c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 9321 na 11748 112 13858 090 16092 0852 22505 na 24343 990 27808 166 37059 5873 31884 na 35649 196 41464 243 41413 0634 37944 na 47983 195 55139 228 55451 0735 47904 na 59635 408 53321 142 69290 2906 57638 na 72148 148 64030 619 82956 1957 67657 na 83955 119 74489 398 96818 1978 76824 na 96069 174 85274 448 11052 1399 85585 na 108218 254 95789 578 124554 49610 95454 na 120252 208 13731 151 162313 436
Table 3 Accuracy and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSAccuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation Accuracy Standard deviation
1 8009 na 8636 076 8478 124 8723 0572 8904 na 9118 025 9163 030 9215 0553 9098 na 9356 020 9367 017 9369 0314 9227 na 9469 016 9486 043 9463 0205 9317 na 9551 012 9557 004 9530 0166 9379 na 9623 008 9620 014 9580 0257 9474 na 9652 008 9652 032 9571 0248 9522 na 9695 007 9668 019 9640 0139 9554 na 9718 008 9710 000 9684 02710 9605 na 9735 002 9732 004 9677 004
Table 4 Computation time and its standard deviation for design i-2s-8c-2s-16c
Epoch CNN CNNSA CNNDE CNNHSTime Standard deviation Time Standard deviation Time Standard deviation Time Standard deviation
1 14502 na 17508 064 28954 078 19610 1352 32362 na 35355 269 58696 1256 39543 0603 52016 na 61471 1210 86882 402 597391 1834 69280 na 71853 3105 118549 3495 79443 1705 72905 na 88564 1253 145164 499 102372 12516 87917 na 105130 130 104526 4062 125593 32547 130821 na 127103 2555 155467 786 162730 64568 145506 na 153330 855 1539 10675 177392 22519 139262 na 172650 3231 157352 1842 212332 957610 151174 na 205440 3585 261962 3737 235490 8768
Figure 2 Example of some images fromMNIST dataset
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 Computational Intelligence and Neuroscience
CNN CNNSACNNDE CNNHS
0
4
8
12
16
20Er
ror (
)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
02
04
06
08
1
Erro
r (
)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 3 Error and its standard deviation (i-6c-2s-12c-2s)
CNNCNNSA
CNNDECNNHS
0
300
600
900
1200
1500
1800
Tim
e (s)
2 3 4 5 6 7 8 9 101Number of epochs
(a)
CNNCNNSA
CNNDECNNHS
0
2
4
6
8
10Ti
me (
s)
2 3 4 5 6 7 8 9 101Number of epochs
(b)Figure 4 Computation time and its standard deviation (i-6c-2s-12c-2s)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
5
10
15
20
25
Erro
r (
)
(a)
CNNCNNSA
CNNDECNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
03
06
09
12
15
Erro
r (
)
(b)Figure 5 Error and its standard deviation (i-8c-2s-16c-2s)
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 9
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
minus200
200
600
1000
1400
1800
2200
2600
3000Ti
me (
s)
(a)
CNN CNNSACNNDE CNNHS
2 3 4 5 6 7 8 9 101Number of epochs
0
20
40
60
80
100
120
Tim
e (s)
(b)
Figure 6 Computation time and its standard deviation (i-8c-2s-16c-2s)
CNN CNNSA
CNNHSCNNDE
0
05
1
15
Erro
r (
)
0005
01015
02025
03035
04045
05Er
ror (
)
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s) times10
4times10
4
times104
times104
1 2 3 4 5 6 70Time (s)
1 2 3 4 5 6 70Time (s)
0005
01015
02025
03035
04045
05
Erro
r (
)
0005
01015
02025
03035
04045
05
Erro
r (
)
Figure 7 Error versus computation time for 100 epochs
as Figure 6 for computational time and its standard devia-tion
The experiments of original CNN are conducted at onlyone time for each epoch because the value of its accuracy willnot change if the experiment is repeated with the same con-dition In general the tests conducted showed that the higherthe epoch value the better the accuracy For example in oneepoch compared to CNN (8239) the accuracy increased to573 for CNNSA (8812) 391 for CNNDE (8630) and 484
for CNNHS (8723) While in 5 epochs compared to CNN(9311) the increase of accuracy is 318 for CNNSA (9629)204 for CNNDE (9415) and 178 for CNNHS (9489) Inthe case of 100 epochs as shown in Figure 7 the increase inaccuracy compared to CNN (9865) is only 016 for CNNSA(9881) 013 for CNNDE (9878) and 009 for CNNHS(9874)
The experiment results show that CNNSA presents thebest accuracy for all epochs Accuracy improvement of
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 Computational Intelligence and Neuroscience
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
Figure 8 Example of some images from CIFAR dataset
CNNSACNN
002040608
112141618
222242628
3
Obj
ectiv
e
0
002
004
006
008
01
012O
bjec
tive
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
ValobjTrainobj
ValobjTrainobj
Figure 9 CNN versus CNNSA for objective
CNNSA compared to the original CNN varies for eachepoch with a range of values between 174 (9 epochs) and 573(1 epoch) The computation time for the proposed methodcompared to the original CNN is in the range of 101 times(CNNSA two epochs 246244) up to 170 times (CNNHSnine epochs 1246856)
In addition we also test our proposed method withCIFAR10 (Canadian Institute for Advanced Research)dataset This dataset consists of 60000 color images inwhich the size of every image is 32 times 32 There are fivebatches for training composed of 50000 images and onebatch of test images consists of 10000 images The CIFAR10dataset is divided into ten classes where each class has 6000
images Some example images of this dataset are showed inFigure 8
The experiment of CIFAR10 dataset was conducted inMATLAB-R2014a We use the number of epochs 1 to 15for this experiment The original program is MatConvNetfrom [24] In this paper the program was modified withSA algorithm The results can be seen in Figure 9 forobjective Figure 10 for top-1 error and Figure 11 for a top-5 error Table 5 for Comparison of CNN and CNNSAfor train as well as Table 6 for Comparison of CNN andCNNSA for validation In general these results show thatCNNSA works better than original CNN for CIFAR10dataset
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 11
CNNSACNN
0
0001
0002
0003
0004
0005
0006
0007
0008
0009
001
Erro
r (
)
0
002
004
006
008
01
012
014
016
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop1errorValtop1error
Traintop1errorValtop1error
Figure 10 CNN versus CNNSA for top-1 error
CNNSACNN
0
01
02
03
04
05
06
07
08
09
1
11
12
Erro
r (
)
0
0002
0004
0006
0008
001
0012
0014
0016
0018
002
Erro
r (
)
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
2 3 4 5 6 7 8 9 10 11 12 13 14 151Number of epochs
Traintop5errorValtop5error
Traintop5errorValtop5error
Figure 11 CNN versus CNNSA for top-5 error
6 Conclusion
This paper shows that SA DE and HS algorithms improvethe accuracy of the CNN Although there is an increasein computation time error of the proposed method issmaller than the original CNN for all variations of theepoch
It is possible to validate the performance of this proposedmethod on other benchmark datasets such as ORL INRIAHollywood II and ImageNet This strategy can also bedeveloped for other metaheuristic algorithms such as ACOPSO and BCO to optimize CNN
For the future study metaheuristic algorithms appliedto the other DL methods need to be explored such as the
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
12 Computational Intelligence and Neuroscience
Table 5 Comparison of CNN and CNNSA for train
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 14835700 010676 052868 0009493 000092 0001682 10443820 003664 036148 0013218 000094 0001883 09158232 002686 031518 0010585 000094 000174 08279042 002176 028358 0008023 000096 0001885 07749367 001966 026404 0009285 000106 0001866 07314783 001750 025076 0013674 000102 0001757 06968027 001566 023968 0117740 00009 0001688 06654411 001398 022774 0011239 00011 000189 06440073 001320 021978 0011338 000106 0001810 06213060 001312 020990 0009957 000116 0001911 06024042 001184 020716 0008434 000096 00017612 05786811 001090 019954 0009425 00011 00019213 05684009 001068 019548 0012485 000082 0001814 05486258 000994 018914 0012108 000098 00018415 05347288 000986 018446 0009675 00011 000186
Table 6 Comparison of CNN and CNNSA for validation
Epoch CNN CNNSAObjective Top-1 error Top-5 error Objective Top-1 error Top-5 error
1 1148227 00466 03959 0034091 00039 000872 0985902 00300 03422 0061806 00044 000913 0873938 00255 02997 0054007 00050 000914 0908667 00273 03053 0054711 00051 000915 0799778 00226 02669 0043632 00044 000916 0772151 00209 02614 0071143 00057 000917 0784206 00210 02593 0065040 00050 000958 0732094 00170 02474 0048466 00061 000959 0761574 00217 02532 0056708 00056 0009110 0763323 00207 02515 0044423 00048 0008611 0720129 00165 02352 0047963 00041 0008712 0700847 00167 02338 0063033 00055 0008713 0729708 00194 02389 0068989 00052 0009614 0747789 00192 02431 0056425 00049 0009115 0723088 00182 02355 0052753 00052 00095
recurrent neural network deep belief network and AlexNet(a newer variant of CNN)
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work is supported by Higher Education Center ofExcellence Research Grant funded by Indonesian Ministryof Research Technology andHigher Education (Contract no1068UN2R12HKP05002016)
References
[1] M M Najafabadi F Villanustre T M Khoshgoftaar N SeliyaR Wald and E Muharemagic ldquoDeep learning applications andchallenges in big data analyticsrdquo Journal of Big Data vol 2 no1 pp 1ndash21 2015
[2] L Deng and D Yu Deep Learning Methods and ApplicationFoundation and Trends in Signal Processing Redmond WashUSA 2013
[3] J L Sweeney Deep learning using genetic algorithms [MSthesis] Department of Computer Science Rochester Instituteof Technology Rochester NY USA 2012
[4] P O Glauner Comparison of training methods for deep neuralnetworks [MS thesis] 2015
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 13
[5] L M R Rere M I Fanany and A M Arymurthy ldquoSimulatedannealing algorithm for deep learningrdquo Procedia ComputerScience vol 72 pp 137ndash144 2015
[6] Q V Le J Ngiam A Coates A Lahiri B Prochnow and A YNg ldquoOn optimization methods for deep learningrdquo in Proceed-ings of the 28th International Conference on Machine Learning(ICML rsquo11) pp 265ndash272 Bellevue Wash USA July 2011
[7] J Martens ldquoDeep learning via hessian-free optimizationrdquo inProceedings of the 27th International Conference on MachineLearning Haifa Israel 2010
[8] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[9] O Vinyal and D Poyey ldquoKrylov subspace descent for deeplearningrdquo in Proceedings of the 15th International Conference onArtificial Intelligent and Statistics (AISTATS) La Palma Spain2012
[10] X-S Yang Engineering Optimization An Introduction withMetaheuristic Application John Wiley amp Sons Hoboken NJUSA 2010
[11] Z You and Y Pu ldquoThe genetic convolutional neural networkmodel based on random samplerdquo International Journal of u- ande-Service Science andTechnology vol 8 no 11 pp 317ndash326 2015
[12] G Rosa J Papa A Marana W Scheire and D Cox ldquoFine-tuning convolutional neural networks using harmony searchrdquoin Progress in Pattern Recognition Image Analysis ComputerVision and Applications A Pardo and J Kittler Eds vol 9423of Lecture Notes in Computer Science pp 683ndash690 2015
[13] LiSA-Lab Deep Learning Tutorial Release 01 University ofMontreal Montreal Canada 2014
[14] E-G Talbi Metaheuristics From Design to ImplementationJohn Wiley amp Sons Hoboken NJ USA 2009
[15] I Boussaıd J Lepagnot and P Siarry ldquoA survey on optimizationmetaheuristicsrdquo Information Sciences vol 237 pp 82ndash117 2013
[16] K S Lee and Z W Geem ldquoA new meta-heuristic algorithm forcontinuous engineering optimization harmony search theoryand practicerdquo Computer Methods in Applied Mechanics andEngineering vol 194 no 36ndash38 pp 3902ndash3933 2005
[17] S Kirkpatrick C D Gelatt Jr and M P Vecchi ldquoOptimizationby simulated annealingrdquo Science vol 220 no 4598 pp 671ndash6801983
[18] N Noman D Bollegala and H Iba ldquoAn adaptive differentialevolution algorithmrdquo in Proceedings of the IEEE Congress ofEvolutionary Computation (CEC rsquo11) pp 2229ndash2236 June 2011
[19] Z W Geem J H Kim and G V Loganathan ldquoA new heuristicoptimization algorithm harmony searchrdquo Simulation vol 76no 2 pp 60ndash68 2001
[20] Y Bengio ldquoLearning deep architectures for AIrdquo Foundation andTrends in Machine Learning vol 2 no 1 pp 1ndash127 2009
[21] Y LeCun K Kavukcuoglu and C Farabet ldquoConvolutionalnetworks and applications in visionrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems pp 253ndash256Paris France June 2010
[22] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo in Pro-ceedings of the 26th Annual Conference on Neural InformationProcessing Systems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe NevUSA December 2012
[23] R Palm Prediction as a candidate for learning deep hierarchicalmodels of data [MS thesis] Technical University of Denmark2012
[24] A Vedaldi and K Lenc ldquoMatConvNet convolutional neuralnetworks for matlabrdquo in Proceedings of the 23rd ACM Inter-national Conference on Multimedia pp 689ndash692 BrisbaneAustralia October 2015
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014