Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 ›...

18
Research Article A Parallel Framework for Multipoint Spiral Search in ab Initio Protein Structure Prediction Mahmood A. Rashid, 1,2 Swakkhar Shatabda, 1,2 M. A. Hakim Newton, 1 Md Tamjidul Hoque, 3 and Abdul Sattar 1,2 1 Institute for Integrated & Intelligent Systems, Science 2 (N34) 1.45, 170 Kessels Road, Nathan, QLD 4111, Australia 2 Queensland Research Lab, National ICT Australia, Level 8, Y Block, 2 George Street, Brisbane, QLD 4000, Australia 3 Computer Science, 2000 Lakeshore Drive, Math 308, New Orleans, LA 70148, USA Correspondence should be addressed to Mahmood A. Rashid; [email protected] Received 31 October 2013; Revised 4 February 2014; Accepted 6 February 2014; Published 16 March 2014 Academic Editor: Rita Casadio Copyright © 2014 Mahmood A. Rashid et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Protein structure prediction is computationally a very challenging problem. A large number of existing search algorithms attempt to solve the problem by exploring possible structures and finding the one with the minimum free energy. However, these algorithms perform poorly on large sized proteins due to an astronomically wide search space. In this paper, we present a multipoint spiral search framework that uses parallel processing techniques to expedite exploration by starting from different points. In our approach, a set of random initial solutions are generated and distributed to different threads. We allow each thread to run for a predefined period of time. e improved solutions are stored threadwise. When the threads finish, the solutions are merged together and the duplicates are removed. A selected distinct set of solutions are then split to different threads again. In our ab initio protein structure prediction method, we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping. We use both the low resolution hydrophobic-polar energy model and the high-resolution 20 × 20 energy model for search guiding. e experimental results show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point search approaches for both energy models on three-dimensional face-centred-cubic lattice. We also experimentally show the effectiveness of mixing energy models within parallel threads. 1. Introduction Proteins are essentially linear chain of amino acids. ey adopt specific folded three-dimensional structures to per- form specific tasks. e function of a given protein is determined by its native structure, which has the lowest possible free energy level. Nevertheless, misfolded proteins cause many critical diseases such as Alzheimer’s disease, Parkinson’s disease, and cancer [1, 2]. Protein structures are important in drug design and biotechnology. Protein structure prediction (PSP) is computationally a very hard problem [3]. Given a protein’s amino acid sequence, the problem is to find a three-dimensional structure of the protein such that the total interaction energy amongst the amino acids in the sequence is minimised. e protein fold- ing process that leads to such structures involves very com- plex molecular dynamics [4] and unknown energy factors. To deal with the complexity in a hierarchical fashion, researchers have used discretised lattice-based structures and simplified energy models [57] for PSP. However, the complexity of the simplified problem still remains challenging. ere are a large number of existing search algorithms that attempt to solve the PSP problem by exploring fea- sible structures called conformations. For population-based approaches, a genetic algorithm (GA + [8]) reportedly pro- duces the state-of-the-art results using hydrophobic-polar (HP) energy model. On the other hand, for local search approaches, spiral search (SS-Tabu) [9], which is a tabu-based Hindawi Publishing Corporation Advances in Bioinformatics Volume 2014, Article ID 985968, 17 pages http://dx.doi.org/10.1155/2014/985968

Transcript of Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 ›...

Page 1: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Research ArticleA Parallel Framework for Multipoint Spiral Search inab Initio Protein Structure Prediction

Mahmood A Rashid12 Swakkhar Shatabda12 M A Hakim Newton1

Md Tamjidul Hoque3 and Abdul Sattar12

1 Institute for Integrated amp Intelligent Systems Science 2 (N34) 145 170 Kessels Road Nathan QLD 4111 Australia2 Queensland Research Lab National ICT Australia Level 8 Y Block 2 George Street Brisbane QLD 4000 Australia3 Computer Science 2000 Lakeshore Drive Math 308 New Orleans LA 70148 USA

Correspondence should be addressed to Mahmood A Rashid mahmoodrashidgmailcom

Received 31 October 2013 Revised 4 February 2014 Accepted 6 February 2014 Published 16 March 2014

Academic Editor Rita Casadio

Copyright copy 2014 Mahmood A Rashid et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Protein structure prediction is computationally a very challenging problem A large number of existing search algorithms attemptto solve the problem by exploring possible structures and finding the one with theminimum free energy However these algorithmsperform poorly on large sized proteins due to an astronomically wide search space In this paper we present a multipoint spiralsearch framework that uses parallel processing techniques to expedite exploration by starting fromdifferent points In our approacha set of random initial solutions are generated and distributed to different threads We allow each thread to run for a predefinedperiod of time The improved solutions are stored threadwise When the threads finish the solutions are merged together and theduplicates are removed A selected distinct set of solutions are then split to different threads again In our ab initio protein structureprediction method we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping We use both the lowresolution hydrophobic-polar energy model and the high-resolution 20 times 20 energy model for search guiding The experimentalresults show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point searchapproaches for both energy models on three-dimensional face-centred-cubic lattice We also experimentally show the effectivenessof mixing energy models within parallel threads

1 Introduction

Proteins are essentially linear chain of amino acids Theyadopt specific folded three-dimensional structures to per-form specific tasks The function of a given protein isdetermined by its native structure which has the lowestpossible free energy level Nevertheless misfolded proteinscause many critical diseases such as Alzheimerrsquos diseaseParkinsonrsquos disease and cancer [1 2] Protein structures areimportant in drug design and biotechnology

Protein structure prediction (PSP) is computationally avery hard problem [3] Given a proteinrsquos amino acid sequencethe problem is to find a three-dimensional structure of theprotein such that the total interaction energy amongst the

amino acids in the sequence is minimised The protein fold-ing process that leads to such structures involves very com-plexmolecular dynamics [4] and unknown energy factors Todeal with the complexity in a hierarchical fashion researchershave used discretised lattice-based structures and simplifiedenergy models [5ndash7] for PSP However the complexity of thesimplified problem still remains challenging

There are a large number of existing search algorithmsthat attempt to solve the PSP problem by exploring fea-sible structures called conformations For population-basedapproaches a genetic algorithm (GA+ [8]) reportedly pro-duces the state-of-the-art results using hydrophobic-polar(HP) energy model On the other hand for local searchapproaches spiral search (SS-Tabu) [9] which is a tabu-based

Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2014 Article ID 985968 17 pageshttpdxdoiorg1011552014985968

2 Advances in Bioinformatics

local search produces the best results using HP model Bothalgorithms use three-dimensional (3D) face-centred-cubic(FCC) lattice for conformation representation

The approaches used in [10ndash13] produced the state-of-the-art results using the high resolution Berrera 20 times 20

energy matrix (henceforth referred to as BM energy model)Nevertheless the challenges in PSP largely remain in thefact that the energy function that needs to be minimisedin order to obtain the native structure of a given protein isnot clearly known A high resolution 20 times 20 energy model(such as BM) could better capture the behaviour of the actualenergy function than a low resolution energy model (such asHP) However the fine grained details of the high resolutioninteraction energy matrix are often not very informativefor guiding the search Pairwise contributions that have lowmagnitudes could be dominated by the accumulated pairwisecontributions having large magnitudes In contrast a lowresolution energy model could effectively bias the searchtowards certain promising directions particularly emphasis-ing on the pairwise contributions with large magnitudes

In a collaborative human team each member may workindividually on hisher own way to solve a problem Theymay meet together occasionally to discuss the possible waysthey could find and may then refocus only on the moreviable options in the next iterationWe envisage this approachto be useful in finding a suitable solution when there areenormously many alternatives that are very close to eachother We therefore try this in the context of conformationalsearch for protein structure prediction

In this paper we present a multithreaded search tech-nique that runs SS-Tabu in each thread that is guided byeither HP energy or by 20 times 20 BM energy model The searchstarts with a set of random initial solutions by distributingthese solutions to different threads We allow each thread torun for a predefined period of time The interim improvedsolutions are stored threadwise and merged together whenall threads have finished their execution After removing theduplicates from the merged solutions a selected distinct setof solutions is then considered for next iteration In ourapproach multipoint start first helps find some promisingresults For the next set of solutions to be distributed themost promising solutions from the merged list are selectedTherefore multipoint parallelism reduces the search space byexploring the vicinities of the promising solutions recursivelyIn our parallel local search we use both the HP energymodel and 20 times 20 BM energy model on the 3D FCClattice space The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches for thesimilar models

The rest of the paper is organized as follows Section 2describes the background of the protein structure Section 3presents the related work Section 41 presents the SS-Tabualgorithm used in the parallel search approach Section 4describes our parallel framework in detail Section 5 discussesand analyses the experimental results and finally Section 6presents the conclusions and outlines the future work

1 0 minus1

minus1 minus1 0

minus1 minus1 0

ZX

Y

0

0 1 1

0 1 minus11 1 0

1 0 1

1 minus1 0

0 0

0 minus1 minus1

0 minus1 1

minus1 0 minus1

minus1 0 1

Figure 1 A unit 3D FCC lattice with 12 basis vectors on theCartesian coordinates

2 Background

There are three computational approaches for protein struc-ture prediction These are homology modeling [14] proteinthreading [15 16] and ab initio methods [17 18] Predictionquality of homology modeling and protein threading dependson the sequential similarity of previously known proteinstructures However our work is based on the ab initioapproach that only depends on the amino acid sequence ofthe target protein Levinthalrsquos paradox [19] and Anfinsenrsquoshypothesis [20] are the basis of ab initio methods for PSPThe idea was originated in 1970 when it was demonstratedthat all information needed to fold a protein resides in itsamino acid sequence In our simplified protein structureprediction model we use 3D FCC lattice for conformationmapping HP and 20 times 20 BM energy models for confor-mation evaluation and the spiral search algorithm [9] (SS-Tabu) in a parallel framework for conformation search Thesimplified models (lattice model and energy models) andlocal search are described below

21 Simplified Model In this research we use 3D FCC latticepoints for conformation mapping to generate backbone ofprotein structures We use the HP and 20 times 20 BM energymodel for conformation evaluation The 3D FCC lattice theHP energymodel andBMenergymodel are briefly describedbelow

211 3D FCC Lattice The FCC lattice has the highestpacking density compared to the other existing lattices[21] The hexagonal close packed (HCP) lattice also knownas cuboctahedron was used in [22] In HCP each latticepoint has 12 neighbours that correspond to 12 basis verticeswith real-numbered coordinates which causes the loss ofstructural precision for PSP In FCC each lattice point has 12neighbours as shown in Figure 1

Advances in Bioinformatics 3

Figure 1 shows the 12 basis vectors with respect to theorigin The basis vectors are presented below denoting as sdot sdot sdot

= (1 1 0) = (0 1 1)

= (1 0 1) = (minus1 1 0)

= (0 minus1 1) = (minus1 0 1)

= (1 minus1 0) = (0 1 minus1)

119868 = (1 0 minus1) 119869 = (minus1 minus1 0)

= (0 minus1 minus1) = (minus1 0 minus1)

(1)

In simplified PSP conformations are mapped on thelattice by a sequence of basis vectors or by the relative vectorsthat are relative to the previous basis vectors in the sequence

212 HP Energy Model The 20 amino acid monomersare the building block of protein polymers These aminoacids are broadly divided into two categories based on theirhydrophobicity (a) hydrophobic amino acids (Gly Ala ProVal Leu Ile Met Phe Tyr Trp) denoted by H and (b)hydrophilic or polar amino acids (Ser Thr Cys Asn GlnLys His Arg Asp Glu) denoted by P In the HP model [23]when two nonconsecutive hydrophobic amino acids becometopologically neighbours they contribute a certain amountof negative energy which for simplicity is shown as minus1 inTable 1 The total energy (119864) of a conformation based on theHP model becomes the sum of the contributions of all pairsof nonconsecutive hydrophobic amino acids as follows

119864 = sum

119894lt119895minus1

119888119894119895sdot 119890119894119895 (2)

Here 119888119894119895

= 1 if amino acids 119894 and 119895 are nonconsecutiveneighbours on the lattice otherwise 0 and 119890

119894119895= minus1 if 119894th and

119895th amino acids are hydrophobic otherwise 0

22 BM Energy Model By analysing crystallised proteinstructures Miyazawa and Jernigan [24] in 1985 statisticallydeduced a 20 times 20 energy matrix that considers residuecontact propensities between the amino acids By calculat-ing empirical contact energies on the basis of informationavailable from selected protein structures and following thequasichemical approximation Berrera et al [25] in 2003deduced another 20 times 20 energy matrix In this work weuse the latter model and denote it by BM energy modelTable 2 shows the BM energy model with amino acid namesat the left-most column and the bottom-most row and theinteraction energy values in the cells The amino acid namesthat have boldface are hydrophobic We draw lines in Table 2to show groupings based on H-H H-P and P-P interactionsIn the context of this work it is worth noting that mostenergy contributions that have large magnitudes are fromH-H interactions followed by those from H-P interactions

The total energy 119864bm (shown in (3)) of a conformationbased on the BMenergymodel is the sumof the contributions

over all pairs of nonconsecutive amino acids that are one unitlattice distance apart

119864bm = sum

119894lt119895minus1

c119894119895sdot 119890119894119895 (3)

Here c119894119895= 1 if amino acids at positions 119894 and 119895 in the sequence

are nonconsecutive neighbours on the lattice otherwise 0and 119890119894119895is the empirical energy value between the 119894th and 119895th

amino acid pair specified in the matrix for the BMmodel

23 Local Search Starting from an initial solution localsearch algorithms move from one solution to another to finda better solution Local search algorithms are well known forefficiently producing high quality solutions [9 26 27] whichare difficult for systematic search approaches However theyare incomplete [28] and suffer from revisitation and stagna-tion Restarting the whole or parts of a solution remains thetypical approach to deal with such situations

24 Tabu Metaheuristic Tabu metaheuristic [29 30]enhances the performance of local search algorithms Itmaintains a short-term memory storage to remember thelocal changes of a solution Then any further local changesfor those stored positions are forbidden for a certain numberof subsequent iterations (known as tabu tenure)

3 Related Work

There are a large number of existing search algorithms thatattempt to solve the PSP problem by exploring feasiblestructures on different energy models In this section weexplore the works related to HP and 20times20 energy models asbelow

31 HP Energy-Based Approaches Different types of meta-heuristic have been used in solving the simplified PSP prob-lem These include Monte Carlo Simulation [31] SimulatedAnnealing [32] Genetic Algorithms (GA) [33 34] TabuSearch with GA [35] Tabu Search with Hill Climbing [36]Ant Colony Optimisation [37] Immune Algorithms [38]Tabu-based Stochastic Local Search [26 27] and ConstraintProgramming [39]

The Bioinformatics Group headed by Rolf Backofenapplied Constraint Programming [40ndash42] using exact andcomplete algorithms Their exact and complete algorithmswork efficiently if similar hydrophobic core exists in therepository

Cebrian et al [26] used tabu-based local search andShatabda et al [27] used memory-based local search withtabu heuristic and achieved the state-of-the-art results How-ever Dotu et al [39] used constraint programming andfound promising results but only for smaller sized (length lt100 amino acids) proteins Besides local search Unger andMoult [33] applied population-based genetic algorithms toPSP and found their method to be more promising thanthe Monte Carlo-based methods [31] They used absoluteencodings on the square and cubic lattices for HP energy

4 Advances in Bioinformatics

Table 1 HP energy model [23]

H PH minus1 0P 0 0

model Later Patton [43] used relative encodings to representconformations and a penalty method to enforce the self-avoiding walk constraint GAs have been used byHoque et al[22] for cubic and 3DHCP latticesThey used DFS-generatedpathways [44] in GA crossover for protein structure predic-tion They also introduced a twin-removal operator [45] toremove duplicates from the population to prevent the searchfrom stalling Ullah et al in [12 46] combined local searchwith constraint programming They used a 20 times 20 energymodel [25] on FCC lattice and found promising resultsIn another hybrid approach [47] tabu metaheuristic wascombined with genetic algorithms in two-dimensional HPmodel to observe crossover and mutation rates over time

However for the simplified model (HP energy modeland 3D FCC lattice) that is used in this paper a newgenetic algorithm GA+ [8] and a tabu-based local searchalgorithm Spiral Search [9] currently produce the state-of-the-art results

32 Empirical 20 times 20 Matrix Energy Based ApproachesA constraint programming technique was used in [48] byDal Palu et al to predict tertiary structures of real pro-teins using secondary structure information They also usedconstraint programming with different heuristics in [49]and a constraint solver named COLA [50] that is highlyoptimized for protein structure prediction In another work[51] a fragment assembly method was utilised with empiricalenergy potentials to optimise protein structures Amongother successful approaches a population-based local search[52] and a population-based genetic algorithm [13] were usedwith empirical energy functions

In a hybrid approach Ullah and Steinofel [12] applied aconstraint programming-based large neighbourhood searchtechnique on top of the output of COLA solver The hybridapproach produced the state-of-the-art results for severalsmall sized (less than 75 amino acids) benchmark proteins

In another work Ullah et al [46] proposed a two stageoptimisation approach combining constraint programmingand local search The first stage of the approach producedcompact optimal structures by using the CPSP tools based onthe HP model In the second stage those compact structureswere used as the input of a simulated annealing-based localsearch that is guided by the BM energy model

In a recent work [10] Shatabda et al presented a mixedheuristic local search algorithm for PSP and produced thestate-of-the-art results using BM energy model on 3D FCClattice The mixed heuristic local search in each iterationrandomly selects a heuristic from a given number of heuris-tics designed by the authors The selected heuristics are thenused in evaluating the generated neighbouring solutions ofthe current solution Although the heuristics themselves areweaker than the BMenergy their collective use in the random

mixing fashion produces results better than the BM energyitself

33 Parallel Approaches Vargas and Lopes [53] proposedan Artificial Bee Colony algorithm based on two parallelapproaches (master slave and a hybrid hierarchical) forprotein structure prediction using the 3D HP model withsidechains They showed that the parallel methods achieveda good level of efficiency while compared with the sequentialversion A comparative study of parallel metaheuristics wasconducted by Trantar et al [54] using a genetic algorithm asimulated annealing algorithm and a random searchmethodin grid environments for protein structure prediction Inanother work [55] they applied a parallel hybrid geneticalgorithm in order to efficiently deal with the PSP problemusing the computational grid They experimentally showedthe effectiveness of a computational grid-based approachAll-atom force field-based protein structure prediction usingparallel particle swarm optimization approach was proposedby Kandov in [56] He showed that asynchronous parallelisa-tion speeds up the simulation better than the synchronousone and reduces the effective time for predictions signifi-cantly Among others Calvo et al in [57 58] applied a par-allel multiobjective evolutionary approach and found linearspeedups in structure prediction for benchmark proteins andRobles et al in [59] applied parallel approach in local search topredict secondary structure of a protein from its amino acidsequence

4 Our Approach

The driving force of our parallel search framework is SS-Tabu[9] that has two versions (i) the existing algorithm designedfor HP model (as shown in Algorithm 1 and described inSection 41) and (ii) the customised spiral search algorithmdesigned for 20 times 20 BM energy model (as shown inAlgorithm 5 and described in Section 42) We feed the twoversions of spiral search algorithms in different threads indifferent combinations The variations are described in theexperimental results section

41 SS-Tabu Spiral Search SS-Tabu is a hydrophobic coredirected local search [9] that works in a spiral fashion Thisalgorithm (the pseudocode in Algorithm 1) is the basis ofthe proposed parallel local search framework SS-Tabu iscomposed of H and P move selections random-walk [60]and relay-restart [9] However this algorithm is furthercustomised for detailed 20 times 20 energy model as describedin Section 42 Both versions of SS-Tabu are used in paral-lel threads with different combinations within the parallelframework The features of existing SS-Tabu are described inAlgorithm 1

411 Applying Diagonal Move In a tabu-guided local search(see Algorithm 1) we use the diagonal move operator (shownin Figure 2) to build H-core A diagonal move displaces 119894thamino acid from its position to another position on the latticewithout changing the position of its succeeding (119894 + 1)th and

Advances in Bioinformatics 5

Table2Th

e20times20BM

energy

mod

elby

Berrerae

tal[25]

Cysminus3477

Met

minus224

minus19

01Ph

eminus2424

minus2304

minus246

7Ile

minus241

minus2286

minus253

minus2691

Leuminus2343

minus2208

minus2491

minus2647

minus2501

Valminus2258

minus2079

minus2391

minus2568

minus244

7minus2385

Trp

minus208

minus209

minus2286

minus2303

minus2222

minus2097

minus18

67Ty

rminus18

92minus18

34minus19

63minus19

98minus19

19minus17

9minus18

34minus13

35Ala

minus17

minus15

17minus17

5minus18

72minus17

28minus17

31minus15

65minus13

18minus1119

Gly

minus1101

minus0897

minus10

34minus0885

minus0767

minus0756

minus1142

minus0818

minus029

0219

Thr

minus12

43minus0999

minus12

37minus13

6minus12

02minus12

4minus10

77minus0892

minus0717

minus0311

minus0617

Ser

minus13

06minus0893

minus1178

minus10

37minus0959

minus0933

minus1145

minus0859

minus0607

minus0261

minus0548

minus0519

Gln

minus0835

minus072

minus0807

minus0778

minus0729

minus064

2minus0997

minus0687

minus0323

0033

minus0342

minus026

0054

Asn

minus0788

minus0658

minus079

minus0669

minus0524

minus0673

minus0884

minus067

minus0371

minus023

minus0463

minus0423

minus0253

minus0367

Glu

minus0179

minus0209

minus0419

minus0439

minus0366

minus0335

minus0624

minus0453

minus0039

044

3minus0192

minus0161

0179

016

0933

Asp

minus0616

minus040

9minus0482

minus0402

minus0291

minus0298

minus0613

minus0631

minus0235

minus0097

minus0382

minus0521

0022

minus0344

0634

0179

His

minus14

99minus12

52minus13

3minus12

34minus1176

minus1118

minus13

83minus12

22minus064

6minus0325

minus072

minus0639

minus029

minus0455

minus0324

minus066

4minus10

78Arg

minus0771

minus0611

minus0805

minus0854

minus0758

minus066

4minus0912

minus0745

minus0327

minus005

minus0247

minus0264

minus004

2minus0114

minus0374

minus0584

minus0307

02

Lys

minus0112

minus0146

minus027

minus0253

minus0222

minus02

minus0391

minus0349

0196

0589

0155

0223

0334

0271

minus0057

minus0176

0388

0815

1339

Prominus1196

minus0788

minus10

76minus0991

minus0771

minus0886

minus12

78minus10

67minus0374

minus004

2minus0222

minus0199

minus0035

minus0018

0257

0189

minus0346

minus0023

0661

0129

Cys

Met

Phe

IleLe

uVa

lTrp

Tyr

Ala

Gly

Thr

Ser

Gln

Asn

Glu

Asp

His

Arg

Lys

Pro

6 Advances in Bioinformatics

(1) H and P are hydrophobic and polar amino acids(2) maxIter terminates the iteration(3) maxRetry sets the time of relay-restart(4) maxRW sets the time of random-walk(5) initTabuList()(6) for (119894 = 1 to maxIter) do(7) mv larr997888 selectMoveForH()(8) if (mv = null) then(9) applyMove(mv)(10) updateTabuList(i)(11) else(12) mv larr997888 selectMoveForP()(13) if (mv = null) then(14) applyMove(mv)(15) evaluate(AA) AAmdashamino acid array(16) if (improved) then(17) retry++(18) else(19) improvedList larr997888 addTopOfList()(20) retry = 0(21) rw = 0(22) if retry ge maxRetry then(23) relayRestart(improvedList)(24) resetTabuList()(25) rw++(26) if rw ge maxRW then(27) randomWalk(maxPull)(28) resetTabuList()

Algorithm 1 SpiralSearchHP(C)

A

D

D

AB B

C CD G G

E EF F

Figure 2 Diagonal move operator For easy understanding thefigures are presented in 2D space

preceding (119894 minus 1)th amino acids in the sequence The move isjust a corner-flip to an unoccupied lattice point

412 Forming H-Core Protein structures have hydrophobiccores (H-core) that hide the hydrophobic amino acids fromwater and expose the polar amino acids to the surface to bein contact with the surrounding water molecules [61] H-coreformation is an important objective for HP-based proteinstructure prediction models In our work we repeatedly usethe diagonal-move to aid forming the H-core We maintaina tabu list to control the amino acids from getting involvedin the diagonal moves SS-Tabu performs a series of diagonalmoves on a given conformation to build the H-core aroundthe hydrophobic core centre (HCC) as shown in Figure 3TheCartesian distance between theHCCand the current positionor a new position is denoted by 119889

1and 119889

2 respectively The

d1

d1

d2

d2

HCC

HCC

HCC

Figure 3 Spiral search comprising a series of diagonal moves withtabu metaheuristics For simplification and easy understanding thefigures are presented in 2D space

diagonal move squeezes the conformation and quickly formsthe H-core in a spiral fashion

413 Selecting Moves for HP Model In H-move selectionalgorithm (Algorithm 2) the HCC is calculated (Line 2) byfinding arithmetic means of 119909 119910 and 119911 coordinates of allhydrophobic amino acids using (4) The selection is guidedby the Cartesian distance 119889

119894(as shown in (5)) between

HCC and the hydrophobic amino acids in the sequence For

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

2 Advances in Bioinformatics

local search produces the best results using HP model Bothalgorithms use three-dimensional (3D) face-centred-cubic(FCC) lattice for conformation representation

The approaches used in [10ndash13] produced the state-of-the-art results using the high resolution Berrera 20 times 20

energy matrix (henceforth referred to as BM energy model)Nevertheless the challenges in PSP largely remain in thefact that the energy function that needs to be minimisedin order to obtain the native structure of a given protein isnot clearly known A high resolution 20 times 20 energy model(such as BM) could better capture the behaviour of the actualenergy function than a low resolution energy model (such asHP) However the fine grained details of the high resolutioninteraction energy matrix are often not very informativefor guiding the search Pairwise contributions that have lowmagnitudes could be dominated by the accumulated pairwisecontributions having large magnitudes In contrast a lowresolution energy model could effectively bias the searchtowards certain promising directions particularly emphasis-ing on the pairwise contributions with large magnitudes

In a collaborative human team each member may workindividually on hisher own way to solve a problem Theymay meet together occasionally to discuss the possible waysthey could find and may then refocus only on the moreviable options in the next iterationWe envisage this approachto be useful in finding a suitable solution when there areenormously many alternatives that are very close to eachother We therefore try this in the context of conformationalsearch for protein structure prediction

In this paper we present a multithreaded search tech-nique that runs SS-Tabu in each thread that is guided byeither HP energy or by 20 times 20 BM energy model The searchstarts with a set of random initial solutions by distributingthese solutions to different threads We allow each thread torun for a predefined period of time The interim improvedsolutions are stored threadwise and merged together whenall threads have finished their execution After removing theduplicates from the merged solutions a selected distinct setof solutions is then considered for next iteration In ourapproach multipoint start first helps find some promisingresults For the next set of solutions to be distributed themost promising solutions from the merged list are selectedTherefore multipoint parallelism reduces the search space byexploring the vicinities of the promising solutions recursivelyIn our parallel local search we use both the HP energymodel and 20 times 20 BM energy model on the 3D FCClattice space The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches for thesimilar models

The rest of the paper is organized as follows Section 2describes the background of the protein structure Section 3presents the related work Section 41 presents the SS-Tabualgorithm used in the parallel search approach Section 4describes our parallel framework in detail Section 5 discussesand analyses the experimental results and finally Section 6presents the conclusions and outlines the future work

1 0 minus1

minus1 minus1 0

minus1 minus1 0

ZX

Y

0

0 1 1

0 1 minus11 1 0

1 0 1

1 minus1 0

0 0

0 minus1 minus1

0 minus1 1

minus1 0 minus1

minus1 0 1

Figure 1 A unit 3D FCC lattice with 12 basis vectors on theCartesian coordinates

2 Background

There are three computational approaches for protein struc-ture prediction These are homology modeling [14] proteinthreading [15 16] and ab initio methods [17 18] Predictionquality of homology modeling and protein threading dependson the sequential similarity of previously known proteinstructures However our work is based on the ab initioapproach that only depends on the amino acid sequence ofthe target protein Levinthalrsquos paradox [19] and Anfinsenrsquoshypothesis [20] are the basis of ab initio methods for PSPThe idea was originated in 1970 when it was demonstratedthat all information needed to fold a protein resides in itsamino acid sequence In our simplified protein structureprediction model we use 3D FCC lattice for conformationmapping HP and 20 times 20 BM energy models for confor-mation evaluation and the spiral search algorithm [9] (SS-Tabu) in a parallel framework for conformation search Thesimplified models (lattice model and energy models) andlocal search are described below

21 Simplified Model In this research we use 3D FCC latticepoints for conformation mapping to generate backbone ofprotein structures We use the HP and 20 times 20 BM energymodel for conformation evaluation The 3D FCC lattice theHP energymodel andBMenergymodel are briefly describedbelow

211 3D FCC Lattice The FCC lattice has the highestpacking density compared to the other existing lattices[21] The hexagonal close packed (HCP) lattice also knownas cuboctahedron was used in [22] In HCP each latticepoint has 12 neighbours that correspond to 12 basis verticeswith real-numbered coordinates which causes the loss ofstructural precision for PSP In FCC each lattice point has 12neighbours as shown in Figure 1

Advances in Bioinformatics 3

Figure 1 shows the 12 basis vectors with respect to theorigin The basis vectors are presented below denoting as sdot sdot sdot

= (1 1 0) = (0 1 1)

= (1 0 1) = (minus1 1 0)

= (0 minus1 1) = (minus1 0 1)

= (1 minus1 0) = (0 1 minus1)

119868 = (1 0 minus1) 119869 = (minus1 minus1 0)

= (0 minus1 minus1) = (minus1 0 minus1)

(1)

In simplified PSP conformations are mapped on thelattice by a sequence of basis vectors or by the relative vectorsthat are relative to the previous basis vectors in the sequence

212 HP Energy Model The 20 amino acid monomersare the building block of protein polymers These aminoacids are broadly divided into two categories based on theirhydrophobicity (a) hydrophobic amino acids (Gly Ala ProVal Leu Ile Met Phe Tyr Trp) denoted by H and (b)hydrophilic or polar amino acids (Ser Thr Cys Asn GlnLys His Arg Asp Glu) denoted by P In the HP model [23]when two nonconsecutive hydrophobic amino acids becometopologically neighbours they contribute a certain amountof negative energy which for simplicity is shown as minus1 inTable 1 The total energy (119864) of a conformation based on theHP model becomes the sum of the contributions of all pairsof nonconsecutive hydrophobic amino acids as follows

119864 = sum

119894lt119895minus1

119888119894119895sdot 119890119894119895 (2)

Here 119888119894119895

= 1 if amino acids 119894 and 119895 are nonconsecutiveneighbours on the lattice otherwise 0 and 119890

119894119895= minus1 if 119894th and

119895th amino acids are hydrophobic otherwise 0

22 BM Energy Model By analysing crystallised proteinstructures Miyazawa and Jernigan [24] in 1985 statisticallydeduced a 20 times 20 energy matrix that considers residuecontact propensities between the amino acids By calculat-ing empirical contact energies on the basis of informationavailable from selected protein structures and following thequasichemical approximation Berrera et al [25] in 2003deduced another 20 times 20 energy matrix In this work weuse the latter model and denote it by BM energy modelTable 2 shows the BM energy model with amino acid namesat the left-most column and the bottom-most row and theinteraction energy values in the cells The amino acid namesthat have boldface are hydrophobic We draw lines in Table 2to show groupings based on H-H H-P and P-P interactionsIn the context of this work it is worth noting that mostenergy contributions that have large magnitudes are fromH-H interactions followed by those from H-P interactions

The total energy 119864bm (shown in (3)) of a conformationbased on the BMenergymodel is the sumof the contributions

over all pairs of nonconsecutive amino acids that are one unitlattice distance apart

119864bm = sum

119894lt119895minus1

c119894119895sdot 119890119894119895 (3)

Here c119894119895= 1 if amino acids at positions 119894 and 119895 in the sequence

are nonconsecutive neighbours on the lattice otherwise 0and 119890119894119895is the empirical energy value between the 119894th and 119895th

amino acid pair specified in the matrix for the BMmodel

23 Local Search Starting from an initial solution localsearch algorithms move from one solution to another to finda better solution Local search algorithms are well known forefficiently producing high quality solutions [9 26 27] whichare difficult for systematic search approaches However theyare incomplete [28] and suffer from revisitation and stagna-tion Restarting the whole or parts of a solution remains thetypical approach to deal with such situations

24 Tabu Metaheuristic Tabu metaheuristic [29 30]enhances the performance of local search algorithms Itmaintains a short-term memory storage to remember thelocal changes of a solution Then any further local changesfor those stored positions are forbidden for a certain numberof subsequent iterations (known as tabu tenure)

3 Related Work

There are a large number of existing search algorithms thatattempt to solve the PSP problem by exploring feasiblestructures on different energy models In this section weexplore the works related to HP and 20times20 energy models asbelow

31 HP Energy-Based Approaches Different types of meta-heuristic have been used in solving the simplified PSP prob-lem These include Monte Carlo Simulation [31] SimulatedAnnealing [32] Genetic Algorithms (GA) [33 34] TabuSearch with GA [35] Tabu Search with Hill Climbing [36]Ant Colony Optimisation [37] Immune Algorithms [38]Tabu-based Stochastic Local Search [26 27] and ConstraintProgramming [39]

The Bioinformatics Group headed by Rolf Backofenapplied Constraint Programming [40ndash42] using exact andcomplete algorithms Their exact and complete algorithmswork efficiently if similar hydrophobic core exists in therepository

Cebrian et al [26] used tabu-based local search andShatabda et al [27] used memory-based local search withtabu heuristic and achieved the state-of-the-art results How-ever Dotu et al [39] used constraint programming andfound promising results but only for smaller sized (length lt100 amino acids) proteins Besides local search Unger andMoult [33] applied population-based genetic algorithms toPSP and found their method to be more promising thanthe Monte Carlo-based methods [31] They used absoluteencodings on the square and cubic lattices for HP energy

4 Advances in Bioinformatics

Table 1 HP energy model [23]

H PH minus1 0P 0 0

model Later Patton [43] used relative encodings to representconformations and a penalty method to enforce the self-avoiding walk constraint GAs have been used byHoque et al[22] for cubic and 3DHCP latticesThey used DFS-generatedpathways [44] in GA crossover for protein structure predic-tion They also introduced a twin-removal operator [45] toremove duplicates from the population to prevent the searchfrom stalling Ullah et al in [12 46] combined local searchwith constraint programming They used a 20 times 20 energymodel [25] on FCC lattice and found promising resultsIn another hybrid approach [47] tabu metaheuristic wascombined with genetic algorithms in two-dimensional HPmodel to observe crossover and mutation rates over time

However for the simplified model (HP energy modeland 3D FCC lattice) that is used in this paper a newgenetic algorithm GA+ [8] and a tabu-based local searchalgorithm Spiral Search [9] currently produce the state-of-the-art results

32 Empirical 20 times 20 Matrix Energy Based ApproachesA constraint programming technique was used in [48] byDal Palu et al to predict tertiary structures of real pro-teins using secondary structure information They also usedconstraint programming with different heuristics in [49]and a constraint solver named COLA [50] that is highlyoptimized for protein structure prediction In another work[51] a fragment assembly method was utilised with empiricalenergy potentials to optimise protein structures Amongother successful approaches a population-based local search[52] and a population-based genetic algorithm [13] were usedwith empirical energy functions

In a hybrid approach Ullah and Steinofel [12] applied aconstraint programming-based large neighbourhood searchtechnique on top of the output of COLA solver The hybridapproach produced the state-of-the-art results for severalsmall sized (less than 75 amino acids) benchmark proteins

In another work Ullah et al [46] proposed a two stageoptimisation approach combining constraint programmingand local search The first stage of the approach producedcompact optimal structures by using the CPSP tools based onthe HP model In the second stage those compact structureswere used as the input of a simulated annealing-based localsearch that is guided by the BM energy model

In a recent work [10] Shatabda et al presented a mixedheuristic local search algorithm for PSP and produced thestate-of-the-art results using BM energy model on 3D FCClattice The mixed heuristic local search in each iterationrandomly selects a heuristic from a given number of heuris-tics designed by the authors The selected heuristics are thenused in evaluating the generated neighbouring solutions ofthe current solution Although the heuristics themselves areweaker than the BMenergy their collective use in the random

mixing fashion produces results better than the BM energyitself

33 Parallel Approaches Vargas and Lopes [53] proposedan Artificial Bee Colony algorithm based on two parallelapproaches (master slave and a hybrid hierarchical) forprotein structure prediction using the 3D HP model withsidechains They showed that the parallel methods achieveda good level of efficiency while compared with the sequentialversion A comparative study of parallel metaheuristics wasconducted by Trantar et al [54] using a genetic algorithm asimulated annealing algorithm and a random searchmethodin grid environments for protein structure prediction Inanother work [55] they applied a parallel hybrid geneticalgorithm in order to efficiently deal with the PSP problemusing the computational grid They experimentally showedthe effectiveness of a computational grid-based approachAll-atom force field-based protein structure prediction usingparallel particle swarm optimization approach was proposedby Kandov in [56] He showed that asynchronous parallelisa-tion speeds up the simulation better than the synchronousone and reduces the effective time for predictions signifi-cantly Among others Calvo et al in [57 58] applied a par-allel multiobjective evolutionary approach and found linearspeedups in structure prediction for benchmark proteins andRobles et al in [59] applied parallel approach in local search topredict secondary structure of a protein from its amino acidsequence

4 Our Approach

The driving force of our parallel search framework is SS-Tabu[9] that has two versions (i) the existing algorithm designedfor HP model (as shown in Algorithm 1 and described inSection 41) and (ii) the customised spiral search algorithmdesigned for 20 times 20 BM energy model (as shown inAlgorithm 5 and described in Section 42) We feed the twoversions of spiral search algorithms in different threads indifferent combinations The variations are described in theexperimental results section

41 SS-Tabu Spiral Search SS-Tabu is a hydrophobic coredirected local search [9] that works in a spiral fashion Thisalgorithm (the pseudocode in Algorithm 1) is the basis ofthe proposed parallel local search framework SS-Tabu iscomposed of H and P move selections random-walk [60]and relay-restart [9] However this algorithm is furthercustomised for detailed 20 times 20 energy model as describedin Section 42 Both versions of SS-Tabu are used in paral-lel threads with different combinations within the parallelframework The features of existing SS-Tabu are described inAlgorithm 1

411 Applying Diagonal Move In a tabu-guided local search(see Algorithm 1) we use the diagonal move operator (shownin Figure 2) to build H-core A diagonal move displaces 119894thamino acid from its position to another position on the latticewithout changing the position of its succeeding (119894 + 1)th and

Advances in Bioinformatics 5

Table2Th

e20times20BM

energy

mod

elby

Berrerae

tal[25]

Cysminus3477

Met

minus224

minus19

01Ph

eminus2424

minus2304

minus246

7Ile

minus241

minus2286

minus253

minus2691

Leuminus2343

minus2208

minus2491

minus2647

minus2501

Valminus2258

minus2079

minus2391

minus2568

minus244

7minus2385

Trp

minus208

minus209

minus2286

minus2303

minus2222

minus2097

minus18

67Ty

rminus18

92minus18

34minus19

63minus19

98minus19

19minus17

9minus18

34minus13

35Ala

minus17

minus15

17minus17

5minus18

72minus17

28minus17

31minus15

65minus13

18minus1119

Gly

minus1101

minus0897

minus10

34minus0885

minus0767

minus0756

minus1142

minus0818

minus029

0219

Thr

minus12

43minus0999

minus12

37minus13

6minus12

02minus12

4minus10

77minus0892

minus0717

minus0311

minus0617

Ser

minus13

06minus0893

minus1178

minus10

37minus0959

minus0933

minus1145

minus0859

minus0607

minus0261

minus0548

minus0519

Gln

minus0835

minus072

minus0807

minus0778

minus0729

minus064

2minus0997

minus0687

minus0323

0033

minus0342

minus026

0054

Asn

minus0788

minus0658

minus079

minus0669

minus0524

minus0673

minus0884

minus067

minus0371

minus023

minus0463

minus0423

minus0253

minus0367

Glu

minus0179

minus0209

minus0419

minus0439

minus0366

minus0335

minus0624

minus0453

minus0039

044

3minus0192

minus0161

0179

016

0933

Asp

minus0616

minus040

9minus0482

minus0402

minus0291

minus0298

minus0613

minus0631

minus0235

minus0097

minus0382

minus0521

0022

minus0344

0634

0179

His

minus14

99minus12

52minus13

3minus12

34minus1176

minus1118

minus13

83minus12

22minus064

6minus0325

minus072

minus0639

minus029

minus0455

minus0324

minus066

4minus10

78Arg

minus0771

minus0611

minus0805

minus0854

minus0758

minus066

4minus0912

minus0745

minus0327

minus005

minus0247

minus0264

minus004

2minus0114

minus0374

minus0584

minus0307

02

Lys

minus0112

minus0146

minus027

minus0253

minus0222

minus02

minus0391

minus0349

0196

0589

0155

0223

0334

0271

minus0057

minus0176

0388

0815

1339

Prominus1196

minus0788

minus10

76minus0991

minus0771

minus0886

minus12

78minus10

67minus0374

minus004

2minus0222

minus0199

minus0035

minus0018

0257

0189

minus0346

minus0023

0661

0129

Cys

Met

Phe

IleLe

uVa

lTrp

Tyr

Ala

Gly

Thr

Ser

Gln

Asn

Glu

Asp

His

Arg

Lys

Pro

6 Advances in Bioinformatics

(1) H and P are hydrophobic and polar amino acids(2) maxIter terminates the iteration(3) maxRetry sets the time of relay-restart(4) maxRW sets the time of random-walk(5) initTabuList()(6) for (119894 = 1 to maxIter) do(7) mv larr997888 selectMoveForH()(8) if (mv = null) then(9) applyMove(mv)(10) updateTabuList(i)(11) else(12) mv larr997888 selectMoveForP()(13) if (mv = null) then(14) applyMove(mv)(15) evaluate(AA) AAmdashamino acid array(16) if (improved) then(17) retry++(18) else(19) improvedList larr997888 addTopOfList()(20) retry = 0(21) rw = 0(22) if retry ge maxRetry then(23) relayRestart(improvedList)(24) resetTabuList()(25) rw++(26) if rw ge maxRW then(27) randomWalk(maxPull)(28) resetTabuList()

Algorithm 1 SpiralSearchHP(C)

A

D

D

AB B

C CD G G

E EF F

Figure 2 Diagonal move operator For easy understanding thefigures are presented in 2D space

preceding (119894 minus 1)th amino acids in the sequence The move isjust a corner-flip to an unoccupied lattice point

412 Forming H-Core Protein structures have hydrophobiccores (H-core) that hide the hydrophobic amino acids fromwater and expose the polar amino acids to the surface to bein contact with the surrounding water molecules [61] H-coreformation is an important objective for HP-based proteinstructure prediction models In our work we repeatedly usethe diagonal-move to aid forming the H-core We maintaina tabu list to control the amino acids from getting involvedin the diagonal moves SS-Tabu performs a series of diagonalmoves on a given conformation to build the H-core aroundthe hydrophobic core centre (HCC) as shown in Figure 3TheCartesian distance between theHCCand the current positionor a new position is denoted by 119889

1and 119889

2 respectively The

d1

d1

d2

d2

HCC

HCC

HCC

Figure 3 Spiral search comprising a series of diagonal moves withtabu metaheuristics For simplification and easy understanding thefigures are presented in 2D space

diagonal move squeezes the conformation and quickly formsthe H-core in a spiral fashion

413 Selecting Moves for HP Model In H-move selectionalgorithm (Algorithm 2) the HCC is calculated (Line 2) byfinding arithmetic means of 119909 119910 and 119911 coordinates of allhydrophobic amino acids using (4) The selection is guidedby the Cartesian distance 119889

119894(as shown in (5)) between

HCC and the hydrophobic amino acids in the sequence For

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 3

Figure 1 shows the 12 basis vectors with respect to theorigin The basis vectors are presented below denoting as sdot sdot sdot

= (1 1 0) = (0 1 1)

= (1 0 1) = (minus1 1 0)

= (0 minus1 1) = (minus1 0 1)

= (1 minus1 0) = (0 1 minus1)

119868 = (1 0 minus1) 119869 = (minus1 minus1 0)

= (0 minus1 minus1) = (minus1 0 minus1)

(1)

In simplified PSP conformations are mapped on thelattice by a sequence of basis vectors or by the relative vectorsthat are relative to the previous basis vectors in the sequence

212 HP Energy Model The 20 amino acid monomersare the building block of protein polymers These aminoacids are broadly divided into two categories based on theirhydrophobicity (a) hydrophobic amino acids (Gly Ala ProVal Leu Ile Met Phe Tyr Trp) denoted by H and (b)hydrophilic or polar amino acids (Ser Thr Cys Asn GlnLys His Arg Asp Glu) denoted by P In the HP model [23]when two nonconsecutive hydrophobic amino acids becometopologically neighbours they contribute a certain amountof negative energy which for simplicity is shown as minus1 inTable 1 The total energy (119864) of a conformation based on theHP model becomes the sum of the contributions of all pairsof nonconsecutive hydrophobic amino acids as follows

119864 = sum

119894lt119895minus1

119888119894119895sdot 119890119894119895 (2)

Here 119888119894119895

= 1 if amino acids 119894 and 119895 are nonconsecutiveneighbours on the lattice otherwise 0 and 119890

119894119895= minus1 if 119894th and

119895th amino acids are hydrophobic otherwise 0

22 BM Energy Model By analysing crystallised proteinstructures Miyazawa and Jernigan [24] in 1985 statisticallydeduced a 20 times 20 energy matrix that considers residuecontact propensities between the amino acids By calculat-ing empirical contact energies on the basis of informationavailable from selected protein structures and following thequasichemical approximation Berrera et al [25] in 2003deduced another 20 times 20 energy matrix In this work weuse the latter model and denote it by BM energy modelTable 2 shows the BM energy model with amino acid namesat the left-most column and the bottom-most row and theinteraction energy values in the cells The amino acid namesthat have boldface are hydrophobic We draw lines in Table 2to show groupings based on H-H H-P and P-P interactionsIn the context of this work it is worth noting that mostenergy contributions that have large magnitudes are fromH-H interactions followed by those from H-P interactions

The total energy 119864bm (shown in (3)) of a conformationbased on the BMenergymodel is the sumof the contributions

over all pairs of nonconsecutive amino acids that are one unitlattice distance apart

119864bm = sum

119894lt119895minus1

c119894119895sdot 119890119894119895 (3)

Here c119894119895= 1 if amino acids at positions 119894 and 119895 in the sequence

are nonconsecutive neighbours on the lattice otherwise 0and 119890119894119895is the empirical energy value between the 119894th and 119895th

amino acid pair specified in the matrix for the BMmodel

23 Local Search Starting from an initial solution localsearch algorithms move from one solution to another to finda better solution Local search algorithms are well known forefficiently producing high quality solutions [9 26 27] whichare difficult for systematic search approaches However theyare incomplete [28] and suffer from revisitation and stagna-tion Restarting the whole or parts of a solution remains thetypical approach to deal with such situations

24 Tabu Metaheuristic Tabu metaheuristic [29 30]enhances the performance of local search algorithms Itmaintains a short-term memory storage to remember thelocal changes of a solution Then any further local changesfor those stored positions are forbidden for a certain numberof subsequent iterations (known as tabu tenure)

3 Related Work

There are a large number of existing search algorithms thatattempt to solve the PSP problem by exploring feasiblestructures on different energy models In this section weexplore the works related to HP and 20times20 energy models asbelow

31 HP Energy-Based Approaches Different types of meta-heuristic have been used in solving the simplified PSP prob-lem These include Monte Carlo Simulation [31] SimulatedAnnealing [32] Genetic Algorithms (GA) [33 34] TabuSearch with GA [35] Tabu Search with Hill Climbing [36]Ant Colony Optimisation [37] Immune Algorithms [38]Tabu-based Stochastic Local Search [26 27] and ConstraintProgramming [39]

The Bioinformatics Group headed by Rolf Backofenapplied Constraint Programming [40ndash42] using exact andcomplete algorithms Their exact and complete algorithmswork efficiently if similar hydrophobic core exists in therepository

Cebrian et al [26] used tabu-based local search andShatabda et al [27] used memory-based local search withtabu heuristic and achieved the state-of-the-art results How-ever Dotu et al [39] used constraint programming andfound promising results but only for smaller sized (length lt100 amino acids) proteins Besides local search Unger andMoult [33] applied population-based genetic algorithms toPSP and found their method to be more promising thanthe Monte Carlo-based methods [31] They used absoluteencodings on the square and cubic lattices for HP energy

4 Advances in Bioinformatics

Table 1 HP energy model [23]

H PH minus1 0P 0 0

model Later Patton [43] used relative encodings to representconformations and a penalty method to enforce the self-avoiding walk constraint GAs have been used byHoque et al[22] for cubic and 3DHCP latticesThey used DFS-generatedpathways [44] in GA crossover for protein structure predic-tion They also introduced a twin-removal operator [45] toremove duplicates from the population to prevent the searchfrom stalling Ullah et al in [12 46] combined local searchwith constraint programming They used a 20 times 20 energymodel [25] on FCC lattice and found promising resultsIn another hybrid approach [47] tabu metaheuristic wascombined with genetic algorithms in two-dimensional HPmodel to observe crossover and mutation rates over time

However for the simplified model (HP energy modeland 3D FCC lattice) that is used in this paper a newgenetic algorithm GA+ [8] and a tabu-based local searchalgorithm Spiral Search [9] currently produce the state-of-the-art results

32 Empirical 20 times 20 Matrix Energy Based ApproachesA constraint programming technique was used in [48] byDal Palu et al to predict tertiary structures of real pro-teins using secondary structure information They also usedconstraint programming with different heuristics in [49]and a constraint solver named COLA [50] that is highlyoptimized for protein structure prediction In another work[51] a fragment assembly method was utilised with empiricalenergy potentials to optimise protein structures Amongother successful approaches a population-based local search[52] and a population-based genetic algorithm [13] were usedwith empirical energy functions

In a hybrid approach Ullah and Steinofel [12] applied aconstraint programming-based large neighbourhood searchtechnique on top of the output of COLA solver The hybridapproach produced the state-of-the-art results for severalsmall sized (less than 75 amino acids) benchmark proteins

In another work Ullah et al [46] proposed a two stageoptimisation approach combining constraint programmingand local search The first stage of the approach producedcompact optimal structures by using the CPSP tools based onthe HP model In the second stage those compact structureswere used as the input of a simulated annealing-based localsearch that is guided by the BM energy model

In a recent work [10] Shatabda et al presented a mixedheuristic local search algorithm for PSP and produced thestate-of-the-art results using BM energy model on 3D FCClattice The mixed heuristic local search in each iterationrandomly selects a heuristic from a given number of heuris-tics designed by the authors The selected heuristics are thenused in evaluating the generated neighbouring solutions ofthe current solution Although the heuristics themselves areweaker than the BMenergy their collective use in the random

mixing fashion produces results better than the BM energyitself

33 Parallel Approaches Vargas and Lopes [53] proposedan Artificial Bee Colony algorithm based on two parallelapproaches (master slave and a hybrid hierarchical) forprotein structure prediction using the 3D HP model withsidechains They showed that the parallel methods achieveda good level of efficiency while compared with the sequentialversion A comparative study of parallel metaheuristics wasconducted by Trantar et al [54] using a genetic algorithm asimulated annealing algorithm and a random searchmethodin grid environments for protein structure prediction Inanother work [55] they applied a parallel hybrid geneticalgorithm in order to efficiently deal with the PSP problemusing the computational grid They experimentally showedthe effectiveness of a computational grid-based approachAll-atom force field-based protein structure prediction usingparallel particle swarm optimization approach was proposedby Kandov in [56] He showed that asynchronous parallelisa-tion speeds up the simulation better than the synchronousone and reduces the effective time for predictions signifi-cantly Among others Calvo et al in [57 58] applied a par-allel multiobjective evolutionary approach and found linearspeedups in structure prediction for benchmark proteins andRobles et al in [59] applied parallel approach in local search topredict secondary structure of a protein from its amino acidsequence

4 Our Approach

The driving force of our parallel search framework is SS-Tabu[9] that has two versions (i) the existing algorithm designedfor HP model (as shown in Algorithm 1 and described inSection 41) and (ii) the customised spiral search algorithmdesigned for 20 times 20 BM energy model (as shown inAlgorithm 5 and described in Section 42) We feed the twoversions of spiral search algorithms in different threads indifferent combinations The variations are described in theexperimental results section

41 SS-Tabu Spiral Search SS-Tabu is a hydrophobic coredirected local search [9] that works in a spiral fashion Thisalgorithm (the pseudocode in Algorithm 1) is the basis ofthe proposed parallel local search framework SS-Tabu iscomposed of H and P move selections random-walk [60]and relay-restart [9] However this algorithm is furthercustomised for detailed 20 times 20 energy model as describedin Section 42 Both versions of SS-Tabu are used in paral-lel threads with different combinations within the parallelframework The features of existing SS-Tabu are described inAlgorithm 1

411 Applying Diagonal Move In a tabu-guided local search(see Algorithm 1) we use the diagonal move operator (shownin Figure 2) to build H-core A diagonal move displaces 119894thamino acid from its position to another position on the latticewithout changing the position of its succeeding (119894 + 1)th and

Advances in Bioinformatics 5

Table2Th

e20times20BM

energy

mod

elby

Berrerae

tal[25]

Cysminus3477

Met

minus224

minus19

01Ph

eminus2424

minus2304

minus246

7Ile

minus241

minus2286

minus253

minus2691

Leuminus2343

minus2208

minus2491

minus2647

minus2501

Valminus2258

minus2079

minus2391

minus2568

minus244

7minus2385

Trp

minus208

minus209

minus2286

minus2303

minus2222

minus2097

minus18

67Ty

rminus18

92minus18

34minus19

63minus19

98minus19

19minus17

9minus18

34minus13

35Ala

minus17

minus15

17minus17

5minus18

72minus17

28minus17

31minus15

65minus13

18minus1119

Gly

minus1101

minus0897

minus10

34minus0885

minus0767

minus0756

minus1142

minus0818

minus029

0219

Thr

minus12

43minus0999

minus12

37minus13

6minus12

02minus12

4minus10

77minus0892

minus0717

minus0311

minus0617

Ser

minus13

06minus0893

minus1178

minus10

37minus0959

minus0933

minus1145

minus0859

minus0607

minus0261

minus0548

minus0519

Gln

minus0835

minus072

minus0807

minus0778

minus0729

minus064

2minus0997

minus0687

minus0323

0033

minus0342

minus026

0054

Asn

minus0788

minus0658

minus079

minus0669

minus0524

minus0673

minus0884

minus067

minus0371

minus023

minus0463

minus0423

minus0253

minus0367

Glu

minus0179

minus0209

minus0419

minus0439

minus0366

minus0335

minus0624

minus0453

minus0039

044

3minus0192

minus0161

0179

016

0933

Asp

minus0616

minus040

9minus0482

minus0402

minus0291

minus0298

minus0613

minus0631

minus0235

minus0097

minus0382

minus0521

0022

minus0344

0634

0179

His

minus14

99minus12

52minus13

3minus12

34minus1176

minus1118

minus13

83minus12

22minus064

6minus0325

minus072

minus0639

minus029

minus0455

minus0324

minus066

4minus10

78Arg

minus0771

minus0611

minus0805

minus0854

minus0758

minus066

4minus0912

minus0745

minus0327

minus005

minus0247

minus0264

minus004

2minus0114

minus0374

minus0584

minus0307

02

Lys

minus0112

minus0146

minus027

minus0253

minus0222

minus02

minus0391

minus0349

0196

0589

0155

0223

0334

0271

minus0057

minus0176

0388

0815

1339

Prominus1196

minus0788

minus10

76minus0991

minus0771

minus0886

minus12

78minus10

67minus0374

minus004

2minus0222

minus0199

minus0035

minus0018

0257

0189

minus0346

minus0023

0661

0129

Cys

Met

Phe

IleLe

uVa

lTrp

Tyr

Ala

Gly

Thr

Ser

Gln

Asn

Glu

Asp

His

Arg

Lys

Pro

6 Advances in Bioinformatics

(1) H and P are hydrophobic and polar amino acids(2) maxIter terminates the iteration(3) maxRetry sets the time of relay-restart(4) maxRW sets the time of random-walk(5) initTabuList()(6) for (119894 = 1 to maxIter) do(7) mv larr997888 selectMoveForH()(8) if (mv = null) then(9) applyMove(mv)(10) updateTabuList(i)(11) else(12) mv larr997888 selectMoveForP()(13) if (mv = null) then(14) applyMove(mv)(15) evaluate(AA) AAmdashamino acid array(16) if (improved) then(17) retry++(18) else(19) improvedList larr997888 addTopOfList()(20) retry = 0(21) rw = 0(22) if retry ge maxRetry then(23) relayRestart(improvedList)(24) resetTabuList()(25) rw++(26) if rw ge maxRW then(27) randomWalk(maxPull)(28) resetTabuList()

Algorithm 1 SpiralSearchHP(C)

A

D

D

AB B

C CD G G

E EF F

Figure 2 Diagonal move operator For easy understanding thefigures are presented in 2D space

preceding (119894 minus 1)th amino acids in the sequence The move isjust a corner-flip to an unoccupied lattice point

412 Forming H-Core Protein structures have hydrophobiccores (H-core) that hide the hydrophobic amino acids fromwater and expose the polar amino acids to the surface to bein contact with the surrounding water molecules [61] H-coreformation is an important objective for HP-based proteinstructure prediction models In our work we repeatedly usethe diagonal-move to aid forming the H-core We maintaina tabu list to control the amino acids from getting involvedin the diagonal moves SS-Tabu performs a series of diagonalmoves on a given conformation to build the H-core aroundthe hydrophobic core centre (HCC) as shown in Figure 3TheCartesian distance between theHCCand the current positionor a new position is denoted by 119889

1and 119889

2 respectively The

d1

d1

d2

d2

HCC

HCC

HCC

Figure 3 Spiral search comprising a series of diagonal moves withtabu metaheuristics For simplification and easy understanding thefigures are presented in 2D space

diagonal move squeezes the conformation and quickly formsthe H-core in a spiral fashion

413 Selecting Moves for HP Model In H-move selectionalgorithm (Algorithm 2) the HCC is calculated (Line 2) byfinding arithmetic means of 119909 119910 and 119911 coordinates of allhydrophobic amino acids using (4) The selection is guidedby the Cartesian distance 119889

119894(as shown in (5)) between

HCC and the hydrophobic amino acids in the sequence For

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

4 Advances in Bioinformatics

Table 1 HP energy model [23]

H PH minus1 0P 0 0

model Later Patton [43] used relative encodings to representconformations and a penalty method to enforce the self-avoiding walk constraint GAs have been used byHoque et al[22] for cubic and 3DHCP latticesThey used DFS-generatedpathways [44] in GA crossover for protein structure predic-tion They also introduced a twin-removal operator [45] toremove duplicates from the population to prevent the searchfrom stalling Ullah et al in [12 46] combined local searchwith constraint programming They used a 20 times 20 energymodel [25] on FCC lattice and found promising resultsIn another hybrid approach [47] tabu metaheuristic wascombined with genetic algorithms in two-dimensional HPmodel to observe crossover and mutation rates over time

However for the simplified model (HP energy modeland 3D FCC lattice) that is used in this paper a newgenetic algorithm GA+ [8] and a tabu-based local searchalgorithm Spiral Search [9] currently produce the state-of-the-art results

32 Empirical 20 times 20 Matrix Energy Based ApproachesA constraint programming technique was used in [48] byDal Palu et al to predict tertiary structures of real pro-teins using secondary structure information They also usedconstraint programming with different heuristics in [49]and a constraint solver named COLA [50] that is highlyoptimized for protein structure prediction In another work[51] a fragment assembly method was utilised with empiricalenergy potentials to optimise protein structures Amongother successful approaches a population-based local search[52] and a population-based genetic algorithm [13] were usedwith empirical energy functions

In a hybrid approach Ullah and Steinofel [12] applied aconstraint programming-based large neighbourhood searchtechnique on top of the output of COLA solver The hybridapproach produced the state-of-the-art results for severalsmall sized (less than 75 amino acids) benchmark proteins

In another work Ullah et al [46] proposed a two stageoptimisation approach combining constraint programmingand local search The first stage of the approach producedcompact optimal structures by using the CPSP tools based onthe HP model In the second stage those compact structureswere used as the input of a simulated annealing-based localsearch that is guided by the BM energy model

In a recent work [10] Shatabda et al presented a mixedheuristic local search algorithm for PSP and produced thestate-of-the-art results using BM energy model on 3D FCClattice The mixed heuristic local search in each iterationrandomly selects a heuristic from a given number of heuris-tics designed by the authors The selected heuristics are thenused in evaluating the generated neighbouring solutions ofthe current solution Although the heuristics themselves areweaker than the BMenergy their collective use in the random

mixing fashion produces results better than the BM energyitself

33 Parallel Approaches Vargas and Lopes [53] proposedan Artificial Bee Colony algorithm based on two parallelapproaches (master slave and a hybrid hierarchical) forprotein structure prediction using the 3D HP model withsidechains They showed that the parallel methods achieveda good level of efficiency while compared with the sequentialversion A comparative study of parallel metaheuristics wasconducted by Trantar et al [54] using a genetic algorithm asimulated annealing algorithm and a random searchmethodin grid environments for protein structure prediction Inanother work [55] they applied a parallel hybrid geneticalgorithm in order to efficiently deal with the PSP problemusing the computational grid They experimentally showedthe effectiveness of a computational grid-based approachAll-atom force field-based protein structure prediction usingparallel particle swarm optimization approach was proposedby Kandov in [56] He showed that asynchronous parallelisa-tion speeds up the simulation better than the synchronousone and reduces the effective time for predictions signifi-cantly Among others Calvo et al in [57 58] applied a par-allel multiobjective evolutionary approach and found linearspeedups in structure prediction for benchmark proteins andRobles et al in [59] applied parallel approach in local search topredict secondary structure of a protein from its amino acidsequence

4 Our Approach

The driving force of our parallel search framework is SS-Tabu[9] that has two versions (i) the existing algorithm designedfor HP model (as shown in Algorithm 1 and described inSection 41) and (ii) the customised spiral search algorithmdesigned for 20 times 20 BM energy model (as shown inAlgorithm 5 and described in Section 42) We feed the twoversions of spiral search algorithms in different threads indifferent combinations The variations are described in theexperimental results section

41 SS-Tabu Spiral Search SS-Tabu is a hydrophobic coredirected local search [9] that works in a spiral fashion Thisalgorithm (the pseudocode in Algorithm 1) is the basis ofthe proposed parallel local search framework SS-Tabu iscomposed of H and P move selections random-walk [60]and relay-restart [9] However this algorithm is furthercustomised for detailed 20 times 20 energy model as describedin Section 42 Both versions of SS-Tabu are used in paral-lel threads with different combinations within the parallelframework The features of existing SS-Tabu are described inAlgorithm 1

411 Applying Diagonal Move In a tabu-guided local search(see Algorithm 1) we use the diagonal move operator (shownin Figure 2) to build H-core A diagonal move displaces 119894thamino acid from its position to another position on the latticewithout changing the position of its succeeding (119894 + 1)th and

Advances in Bioinformatics 5

Table2Th

e20times20BM

energy

mod

elby

Berrerae

tal[25]

Cysminus3477

Met

minus224

minus19

01Ph

eminus2424

minus2304

minus246

7Ile

minus241

minus2286

minus253

minus2691

Leuminus2343

minus2208

minus2491

minus2647

minus2501

Valminus2258

minus2079

minus2391

minus2568

minus244

7minus2385

Trp

minus208

minus209

minus2286

minus2303

minus2222

minus2097

minus18

67Ty

rminus18

92minus18

34minus19

63minus19

98minus19

19minus17

9minus18

34minus13

35Ala

minus17

minus15

17minus17

5minus18

72minus17

28minus17

31minus15

65minus13

18minus1119

Gly

minus1101

minus0897

minus10

34minus0885

minus0767

minus0756

minus1142

minus0818

minus029

0219

Thr

minus12

43minus0999

minus12

37minus13

6minus12

02minus12

4minus10

77minus0892

minus0717

minus0311

minus0617

Ser

minus13

06minus0893

minus1178

minus10

37minus0959

minus0933

minus1145

minus0859

minus0607

minus0261

minus0548

minus0519

Gln

minus0835

minus072

minus0807

minus0778

minus0729

minus064

2minus0997

minus0687

minus0323

0033

minus0342

minus026

0054

Asn

minus0788

minus0658

minus079

minus0669

minus0524

minus0673

minus0884

minus067

minus0371

minus023

minus0463

minus0423

minus0253

minus0367

Glu

minus0179

minus0209

minus0419

minus0439

minus0366

minus0335

minus0624

minus0453

minus0039

044

3minus0192

minus0161

0179

016

0933

Asp

minus0616

minus040

9minus0482

minus0402

minus0291

minus0298

minus0613

minus0631

minus0235

minus0097

minus0382

minus0521

0022

minus0344

0634

0179

His

minus14

99minus12

52minus13

3minus12

34minus1176

minus1118

minus13

83minus12

22minus064

6minus0325

minus072

minus0639

minus029

minus0455

minus0324

minus066

4minus10

78Arg

minus0771

minus0611

minus0805

minus0854

minus0758

minus066

4minus0912

minus0745

minus0327

minus005

minus0247

minus0264

minus004

2minus0114

minus0374

minus0584

minus0307

02

Lys

minus0112

minus0146

minus027

minus0253

minus0222

minus02

minus0391

minus0349

0196

0589

0155

0223

0334

0271

minus0057

minus0176

0388

0815

1339

Prominus1196

minus0788

minus10

76minus0991

minus0771

minus0886

minus12

78minus10

67minus0374

minus004

2minus0222

minus0199

minus0035

minus0018

0257

0189

minus0346

minus0023

0661

0129

Cys

Met

Phe

IleLe

uVa

lTrp

Tyr

Ala

Gly

Thr

Ser

Gln

Asn

Glu

Asp

His

Arg

Lys

Pro

6 Advances in Bioinformatics

(1) H and P are hydrophobic and polar amino acids(2) maxIter terminates the iteration(3) maxRetry sets the time of relay-restart(4) maxRW sets the time of random-walk(5) initTabuList()(6) for (119894 = 1 to maxIter) do(7) mv larr997888 selectMoveForH()(8) if (mv = null) then(9) applyMove(mv)(10) updateTabuList(i)(11) else(12) mv larr997888 selectMoveForP()(13) if (mv = null) then(14) applyMove(mv)(15) evaluate(AA) AAmdashamino acid array(16) if (improved) then(17) retry++(18) else(19) improvedList larr997888 addTopOfList()(20) retry = 0(21) rw = 0(22) if retry ge maxRetry then(23) relayRestart(improvedList)(24) resetTabuList()(25) rw++(26) if rw ge maxRW then(27) randomWalk(maxPull)(28) resetTabuList()

Algorithm 1 SpiralSearchHP(C)

A

D

D

AB B

C CD G G

E EF F

Figure 2 Diagonal move operator For easy understanding thefigures are presented in 2D space

preceding (119894 minus 1)th amino acids in the sequence The move isjust a corner-flip to an unoccupied lattice point

412 Forming H-Core Protein structures have hydrophobiccores (H-core) that hide the hydrophobic amino acids fromwater and expose the polar amino acids to the surface to bein contact with the surrounding water molecules [61] H-coreformation is an important objective for HP-based proteinstructure prediction models In our work we repeatedly usethe diagonal-move to aid forming the H-core We maintaina tabu list to control the amino acids from getting involvedin the diagonal moves SS-Tabu performs a series of diagonalmoves on a given conformation to build the H-core aroundthe hydrophobic core centre (HCC) as shown in Figure 3TheCartesian distance between theHCCand the current positionor a new position is denoted by 119889

1and 119889

2 respectively The

d1

d1

d2

d2

HCC

HCC

HCC

Figure 3 Spiral search comprising a series of diagonal moves withtabu metaheuristics For simplification and easy understanding thefigures are presented in 2D space

diagonal move squeezes the conformation and quickly formsthe H-core in a spiral fashion

413 Selecting Moves for HP Model In H-move selectionalgorithm (Algorithm 2) the HCC is calculated (Line 2) byfinding arithmetic means of 119909 119910 and 119911 coordinates of allhydrophobic amino acids using (4) The selection is guidedby the Cartesian distance 119889

119894(as shown in (5)) between

HCC and the hydrophobic amino acids in the sequence For

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 5

Table2Th

e20times20BM

energy

mod

elby

Berrerae

tal[25]

Cysminus3477

Met

minus224

minus19

01Ph

eminus2424

minus2304

minus246

7Ile

minus241

minus2286

minus253

minus2691

Leuminus2343

minus2208

minus2491

minus2647

minus2501

Valminus2258

minus2079

minus2391

minus2568

minus244

7minus2385

Trp

minus208

minus209

minus2286

minus2303

minus2222

minus2097

minus18

67Ty

rminus18

92minus18

34minus19

63minus19

98minus19

19minus17

9minus18

34minus13

35Ala

minus17

minus15

17minus17

5minus18

72minus17

28minus17

31minus15

65minus13

18minus1119

Gly

minus1101

minus0897

minus10

34minus0885

minus0767

minus0756

minus1142

minus0818

minus029

0219

Thr

minus12

43minus0999

minus12

37minus13

6minus12

02minus12

4minus10

77minus0892

minus0717

minus0311

minus0617

Ser

minus13

06minus0893

minus1178

minus10

37minus0959

minus0933

minus1145

minus0859

minus0607

minus0261

minus0548

minus0519

Gln

minus0835

minus072

minus0807

minus0778

minus0729

minus064

2minus0997

minus0687

minus0323

0033

minus0342

minus026

0054

Asn

minus0788

minus0658

minus079

minus0669

minus0524

minus0673

minus0884

minus067

minus0371

minus023

minus0463

minus0423

minus0253

minus0367

Glu

minus0179

minus0209

minus0419

minus0439

minus0366

minus0335

minus0624

minus0453

minus0039

044

3minus0192

minus0161

0179

016

0933

Asp

minus0616

minus040

9minus0482

minus0402

minus0291

minus0298

minus0613

minus0631

minus0235

minus0097

minus0382

minus0521

0022

minus0344

0634

0179

His

minus14

99minus12

52minus13

3minus12

34minus1176

minus1118

minus13

83minus12

22minus064

6minus0325

minus072

minus0639

minus029

minus0455

minus0324

minus066

4minus10

78Arg

minus0771

minus0611

minus0805

minus0854

minus0758

minus066

4minus0912

minus0745

minus0327

minus005

minus0247

minus0264

minus004

2minus0114

minus0374

minus0584

minus0307

02

Lys

minus0112

minus0146

minus027

minus0253

minus0222

minus02

minus0391

minus0349

0196

0589

0155

0223

0334

0271

minus0057

minus0176

0388

0815

1339

Prominus1196

minus0788

minus10

76minus0991

minus0771

minus0886

minus12

78minus10

67minus0374

minus004

2minus0222

minus0199

minus0035

minus0018

0257

0189

minus0346

minus0023

0661

0129

Cys

Met

Phe

IleLe

uVa

lTrp

Tyr

Ala

Gly

Thr

Ser

Gln

Asn

Glu

Asp

His

Arg

Lys

Pro

6 Advances in Bioinformatics

(1) H and P are hydrophobic and polar amino acids(2) maxIter terminates the iteration(3) maxRetry sets the time of relay-restart(4) maxRW sets the time of random-walk(5) initTabuList()(6) for (119894 = 1 to maxIter) do(7) mv larr997888 selectMoveForH()(8) if (mv = null) then(9) applyMove(mv)(10) updateTabuList(i)(11) else(12) mv larr997888 selectMoveForP()(13) if (mv = null) then(14) applyMove(mv)(15) evaluate(AA) AAmdashamino acid array(16) if (improved) then(17) retry++(18) else(19) improvedList larr997888 addTopOfList()(20) retry = 0(21) rw = 0(22) if retry ge maxRetry then(23) relayRestart(improvedList)(24) resetTabuList()(25) rw++(26) if rw ge maxRW then(27) randomWalk(maxPull)(28) resetTabuList()

Algorithm 1 SpiralSearchHP(C)

A

D

D

AB B

C CD G G

E EF F

Figure 2 Diagonal move operator For easy understanding thefigures are presented in 2D space

preceding (119894 minus 1)th amino acids in the sequence The move isjust a corner-flip to an unoccupied lattice point

412 Forming H-Core Protein structures have hydrophobiccores (H-core) that hide the hydrophobic amino acids fromwater and expose the polar amino acids to the surface to bein contact with the surrounding water molecules [61] H-coreformation is an important objective for HP-based proteinstructure prediction models In our work we repeatedly usethe diagonal-move to aid forming the H-core We maintaina tabu list to control the amino acids from getting involvedin the diagonal moves SS-Tabu performs a series of diagonalmoves on a given conformation to build the H-core aroundthe hydrophobic core centre (HCC) as shown in Figure 3TheCartesian distance between theHCCand the current positionor a new position is denoted by 119889

1and 119889

2 respectively The

d1

d1

d2

d2

HCC

HCC

HCC

Figure 3 Spiral search comprising a series of diagonal moves withtabu metaheuristics For simplification and easy understanding thefigures are presented in 2D space

diagonal move squeezes the conformation and quickly formsthe H-core in a spiral fashion

413 Selecting Moves for HP Model In H-move selectionalgorithm (Algorithm 2) the HCC is calculated (Line 2) byfinding arithmetic means of 119909 119910 and 119911 coordinates of allhydrophobic amino acids using (4) The selection is guidedby the Cartesian distance 119889

119894(as shown in (5)) between

HCC and the hydrophobic amino acids in the sequence For

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

6 Advances in Bioinformatics

(1) H and P are hydrophobic and polar amino acids(2) maxIter terminates the iteration(3) maxRetry sets the time of relay-restart(4) maxRW sets the time of random-walk(5) initTabuList()(6) for (119894 = 1 to maxIter) do(7) mv larr997888 selectMoveForH()(8) if (mv = null) then(9) applyMove(mv)(10) updateTabuList(i)(11) else(12) mv larr997888 selectMoveForP()(13) if (mv = null) then(14) applyMove(mv)(15) evaluate(AA) AAmdashamino acid array(16) if (improved) then(17) retry++(18) else(19) improvedList larr997888 addTopOfList()(20) retry = 0(21) rw = 0(22) if retry ge maxRetry then(23) relayRestart(improvedList)(24) resetTabuList()(25) rw++(26) if rw ge maxRW then(27) randomWalk(maxPull)(28) resetTabuList()

Algorithm 1 SpiralSearchHP(C)

A

D

D

AB B

C CD G G

E EF F

Figure 2 Diagonal move operator For easy understanding thefigures are presented in 2D space

preceding (119894 minus 1)th amino acids in the sequence The move isjust a corner-flip to an unoccupied lattice point

412 Forming H-Core Protein structures have hydrophobiccores (H-core) that hide the hydrophobic amino acids fromwater and expose the polar amino acids to the surface to bein contact with the surrounding water molecules [61] H-coreformation is an important objective for HP-based proteinstructure prediction models In our work we repeatedly usethe diagonal-move to aid forming the H-core We maintaina tabu list to control the amino acids from getting involvedin the diagonal moves SS-Tabu performs a series of diagonalmoves on a given conformation to build the H-core aroundthe hydrophobic core centre (HCC) as shown in Figure 3TheCartesian distance between theHCCand the current positionor a new position is denoted by 119889

1and 119889

2 respectively The

d1

d1

d2

d2

HCC

HCC

HCC

Figure 3 Spiral search comprising a series of diagonal moves withtabu metaheuristics For simplification and easy understanding thefigures are presented in 2D space

diagonal move squeezes the conformation and quickly formsthe H-core in a spiral fashion

413 Selecting Moves for HP Model In H-move selectionalgorithm (Algorithm 2) the HCC is calculated (Line 2) byfinding arithmetic means of 119909 119910 and 119911 coordinates of allhydrophobic amino acids using (4) The selection is guidedby the Cartesian distance 119889

119894(as shown in (5)) between

HCC and the hydrophobic amino acids in the sequence For

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 7

(1) H denotes hydrophobic amino acids(2) hcclarr997888 findHCoreCentre(S)(3) for (119894 = 1 to seqLength) do(4) if ((type[i]=ldquoHrdquo) and (nottabuList[i])) then(5) cfnlarr997888 findCommonFreeNeigh(119894 + 1 119894 minus 1)(6) mvlocal larr997888 findShortestMove(hcc cfn)(7) moveListsdotadd(mv)(8) mvglobal larr997888 findShortestMove(moveList)(9) return mv

Algorithm 2 The pseudocode of H-move selection selectMoveForH()

the 119894th hydrophobic amino acid the common topologicalneighbours of the (119894 minus 1)th and (119894 + 1)th amino acidsare computed The topological neighbours (TN) of a latticepoint are the points at unit lattice-distance apart from itFrom the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the HCC using (5) Then thepoint with the shortest distance is picked This point is listedin the possible H-move list for 119894th hydrophobic amino acidif its current distance from HCC is greater than that ofthe selected point When all hydrophobic amino acids aretraversed and the feasible shortest distances are listed inH-move list the amino acid having the shortest distancein H-move list is chosen to apply the diagonal move onit (Algorithm 1 Line 9) A tabu list is maintained for eachhydrophobic amino acid to control the selection priorityamongst them For each successful move the tabu list isupdated for the respective amino acid The process stopswhen no H-move is found In this situation the control istransferred to select and apply P-moves Consider

119909hcc =1

119899ℎ

119899ℎ

sum

119894=1

119909119894 119910hcc =

1

119899ℎ

119899ℎ

sum

119894=1

119910119894 119911hcc =

1

119899h

119899ℎ

sum

119894=1

119911119894

(4)

where 119899ℎis the number of H amino acids in the protein

Consider

119889119894= radic(119909

119894minus 119909hcc)

2

+ (119910119894minus 119910hcc)

2

+ (119911119894minus 119911hcc)

2

(5)

However in P-move selection (Algorithm 1 Line 12) thesame kind of diagonal moves is applied as H-move For each119894th polar amino acid all free lattice points that are commonneighbours of lattice points occupied by (119894 minus 1)th and (119894 +1)th amino acids are listed From the list a point is selectedrandomly to complete a diagonalmove (Algorithm 1 Line 14)for the respective polar amino acid No hydrophobic-core-center is calculated no Cartesian distance is measured andno tabu list is maintained for P-move After one try for eachpolar amino acid the control is returned to select and applyH-moves

414 Handling Stagnation For hard optimisation problemssuch as protein structure prediction local search algorithmsoften face stagnation In HP model-based conformational

search stagnation is encountered when a premature H-coreis formed Handling the stagnations is a challenging issuefor conformational search algorithms (eg GA LS) Thushandling such situation intelligently is important to proceedfurther To deal with stagnation in SS-Tabu random-walk[60] and relay-restart techniques are used on an on-demandbasis

Random-Walk Premature H-cores are observed at local min-ima To escape local minima a random-walk [60] algorithm(Algorithm 1 Line 27) is applied This algorithm uses pullmoves [62] to break the premature H-cores and to creatediversity

Relay-Restart When the search stagnation situation arises anew relay-restart technique (Algorithm 1 Line 23) is appliedinstead of a fresh restart or restarting from the current bestsolution [26 27] We use relay-restart when random-walkfails to escape from the local minima The relay-restart startsfrom an improving solution We maintain an improvingsolution list that contains all the improving solutions after theinitialisation

415 Further ImplementationDetails Like other search algo-rithms SS-Tabu requires initialisation It also needs evalua-tion of the solution in each iteration It starts with a randomlygenerated or parameterised initial solution and enhances itin a spiral fashion Further it needs to maintain a tabu meta-heuristic to guide the local search

Tabu Tenure Intuitively we use different tabu-tenure valuesbased on the number of hydrophobic amino acids (hCount)in the sequenceWe calculate tabu-tenure using the followingformula

tenure = (10 + ℎCount10

) (6)

The tabu-tenure calculated using (6) is used at Lines 5 24 and28 in Algorithm 1 during initialising and resetting tabu-list

Evaluation After each iteration the conformation is evalu-ated by counting the H-H contacts (topological neighbours)where the two amino acids are nonconsecutive The pseu-docode in Algorithm 3 presents the algorithm of calculatingthe free energy of a given conformation Note that the energy

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

8 Advances in Bioinformatics

(1) for (119894 = 1 to seqLength minus 1) do(2) for (119896 = 119894 + 2 to seqLength minus 1) do(3) if AAType[119894] = AAType[119896] = H then(4) nodeI larr997888 AA[119894](5) nodeJ larr997888 AA[119896](6) sqrD larr997888 getSqrDist(nodeI nodeJ)(7) if sqrD = 2 then(8) fitness larr997888 fitness ndash 1(9) return fitness

Algorithm 3 evaluate(AA)

value is negation of the H-H contact count For 20 times 20 BMenergy model the pairwise contact potentials are found inmatrix presented in Table 2

Initialisation Our algorithm starts with a feasible set ofconformations known as populationWe generate each initialconformation following a randomly generated self-avoidingwalk (SAW) on FCC lattice points The pseudocode of thealgorithm is presented in Algorithm 4 It places the firstamino acid at (0 0 0) It then randomly selects a basis vectorto place the successive amino acid at a neighbouring freelattice pointThemapping proceeds until a self-avoidingwalkis found for the whole protein sequence

42 BM Model Adopted Spiral Search The basic differencebetween the HP energy based original spiral search (SS-Tabu[9]) and the BM energy guided adopted spiral search lies onthemove selection criteria In former version of spiral searchthe amino acids are divided into two groups (H and P) Themoves are selected based on these two properties of the aminoacids that are guided by the distance of H amino acid fromthe HCC However to adopt 20 times 20 BM energy modelall 20 amino acids need to be taken into consideration andthe move selection criteria are guided by the distance of anyamino acid from the core centre (CC) of the current structure(Algorithm 5) The CC and the distance are calculated using(7) and (8) respectively

421 Selecting Moves for BM (20 times 20) Model In moveselection (Algorithm 5Line 6) theCC is calculated by findingarithmetic means of 119909 119910 and 119911 coordinates of all aminoacids using (7) The selection is guided by the Cartesiandistance 119889

119894(as shown in (5)) between CC and the amino

acids in the sequence For the 119894th amino acid the commontopological neighbours of the (119894 minus 1)th and (119894 + 1)th aminoacids are computed The topological neighbours (TN) of alattice point are the points at unit lattice distance apart fromit From the common neighbours the unoccupied points areidentifiedThe Cartesian distance of all unoccupied commonneighbours is calculated from the CC using (8) Then thepoint with the shortest distance is picked This point is listedin the possible move list for 119894th amino acid if its currentdistance from CC is greater than that of the selected pointWhen all amino acids are traversed and the feasible shortestdistances are listed in move list the amino acid having the

Table 3 Combination of SS-Tabu variations amongst differentthreads

Combinations HP guide SS-Tabu BM guide SS-Tabu1 (PSSB4H0) 0 thread 4 threads2 (PSSB3H1) 1 thread 3 threads3 (PSSB2H2) 2 threads 2 threads4 (PSSB1H3) 3 threads 1 thread5 (PSSB0H4) 4 threads 0 thread

shortest distance in move list is chosen to apply the diagonalmove on it (Algorithm 5 Line 8) A tabu list is maintainedfor each amino acid to control the selection priority amongstthem For each successful move the tabu list is updated(Algorithm 5 Line 9) for the respective amino acid

119909cc =1

119899

119899

sum

119894=1

119909119894 119910cc =

1

119899

119899

sum

119894=1

119910119894 119911cc =

1

119899

119899

sum

119894=1

119911119894 (7)

where 119899 is the number of amino acids in the protein Consider

119889119894= radic(119909

119894minus 119909cc)

2

+ (119910119894minus 119910cc)

2

+ (119911119894minus 119911cc)

2

(8)

43 Parallel Framework In our implemented prototype weuse four parallel threads The two versions of SS-Tabu aredistributed amongst the four threads as shown in Table 3

Figure 4 shows the architecture of our parallel searchalgorithm In this framework the search starts with a set ofrandomly generated initial solutions (Line 2 in Algorithm 6)The solutions are then divided in subsets (Line 4 inAlgorithm 6) and are distributed to different threads

We allow each thread to run for a predefined period oftime The improved solutions are stored threadwise and aremerged together (Line 9 in Algorithm 6) when all threadsfinish After removing the duplicates (Line 10 in Algorithm 6)from the merged solutions a selected distinct set of solutionsare taken (Line 11 in Algorithm 6) for the next iteration Theiterative process continues until the terminating criteria (Line3 in Algorithm 6) are satisfied

5 Experimental Results and Analyses

We conduct our experiments on two different sets of bench-mark proteins HP benchmarks and 20times20 benchmarksThe

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 9

(1) AAmdashamino acid array of the protein(2) SAWmdashSelf-avoiding-walk(3) basisVec[12] larr997888 getTwelveBasisVectors()(4) AA[0] larr997888 AminoAcid(0 0 0)(5) while (SAW) do(6) for (119894 = 1 to seqLength minus 1) do(7) 119896 larr997888 getRandom(12)(8) basis larr997888 basisVec[119896](9) node larr997888 AA[119894 minus 1] + basis(10) if isFree(node) then(11) AA[119894] larr997888 AminoAcid(node)(12) else(13) SAW larr997888 false

(14) break

(15) return AA[ ]

Algorithm 4 initialise()

(1) maxIter terminates the iteration(2) maxRetry sets the time of relay-restart(3) maxRW sets the time of random-walk(4) initTabuList()(5) for (119894 = 1 to maxIter) do(6) mv larr997888 selectMove()(7) if (mv = null) then(8) applyMove(mv)(9) updateTabuList(i)(10) evalute(AA) AAmdashamino acid array(11) if (improved) then(12) retry++(13) else(14) improvedList larr997888 addTopOfList()(15) retry = 0(16) rw = 0(17) if retry ge maxRetry then(18) relayRestart(improvedList)(19) resetTabuList()(20) rw++(21) if rw ge maxRW then(22) randomWalk(maxPull)(23) resetTabuList()

Algorithm 5 SpiralSearchBM(C)

(1) thrmdashThread(2) currSet larr997888 initialise()(3) for (119894 = 1 to repeat) do(4) subSet larr997888 genSubSet(currSet)(5) for (119894 = 1 to thCount) do(6) thr[119894] = createSSThread(subSet[i] time)(7) thr[119894]sdotstart()(8) if (noAliveThread) then(9) mrgLst = mergeImprovedLists()(10) distinctLst = removeDuplicate(mrgLst)(11) currSet larr997888 genCurrSet(distinctLst)

Algorithm 6 SSParallel(time repeat)

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

10 Advances in Bioinformatics

Table 4 For 9medium sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search ( SS-Tabu ) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 05 hrs times 4 = 2 hrs 2 hrs times 1 = 2 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F90 1 90 minus168 minus168 minus166 minus168 minus167 0 minus168 minus166 0F90 2 90 minus168 minus168 minus166 minus167 minus164 50 minus168 minus165 33F90 3 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus164 33F90 4 90 minus168 minus168 minus166 minus168 minus165 33 minus168 minus165 33F90 5 90 minus167 minus167 minus165 minus167 minus165 0 minus167 minus166 0S1 135 minus357 minus355 minus350 minus355 minus347 30 minus355 minus348 22S2 151 minus360 minus356 minus351 minus354 minus347 31 minus356 minus349 18S3 162 minus367 minus360 minus354 minus359 minus350 26 minus361 minus349 28S4 164 minus370 minus364 minus358 minus358 minus350 40 minus364 minus352 33

Table 5 For 12 large sized proteins the three different sets of excremental datamdash(i) our parallel local search framework (PSS) (ii) the tabuguided spiral search (SS-Tabu) and (iii) the genetic algorithms (GA+) The RI Columns present the relative improvements of parallel localsearch over the single-thread local search and the genetic algorithm The RI is calculated on the average energy values

Our approach The current state-of-the-art approaches(Four threads) (Single thread)

Protein Info 125 hrs times 4 = 5 hrs 5 hrs times 1 = 5 hrsPSS SS-Tabu [9] GA+ [8]

Seq Size LBFE Best Avg (119864119905) Best Avg (119864

119903) RI Best Avg (119864

119903) RI

F180 1 180 minus378 minus359 minus344 minus357 minus340 11 minus351 minus341 8F180 2 180 minus381 minus364 minus352 minus359 minus345 19 minus362 minus346 17F180 3 180 minus378 minus368 minus356 minus362 minus353 12 minus361 minus350 21R1 200 minus384 minus366 minus353 minus359 minus345 21 minus355 minus346 18R2 200 minus383 minus368 minus355 minus358 minus346 24 minus360 minus346 24R3 200 minus385 minus369 minus353 minus365 minus345 20 minus363 minus344 223mse 179 minus323 minus296 minus285 minus289 minus280 12 minus290 minus279 143mr7 189 minus355 minus332 minus319 minus328 minus313 14 minus328 minus316 83mqz 215 minus474 minus430 minus414 minus420 minus402 17 minus427 minus410 63no6 229 minus455 minus429 minus407 minus411 minus391 25 minus420 minus400 133no3 258 minus494 minus422 minus404 minus412 minus393 11 minus421 minus402 23on7 279 na minus516 minus500 minus512 minus485 na minus515 minus485 na

rest of this section will present the experimental results indetail

51 Experiment Setup

511 Implementation The parallel spiral search frameworkhas been implemented in Java 60 using Java standard APIsCurrently the source code is not available publicly due tothe legal bindings However an executable version of theapplication could be requested to the corresponding author

512 Execution We ran our experiments on the NICTA(NICTA website httpwwwnictacomau) cluster Thecluster consists of a number of identical Dell PowerEdge R415computers each equipped with 2 times AMD 6-Core Opteron

4184 processors 28 GHz clock speed 3M L26M L3 Cache64GB memory and running Rocks OS (a Linux variant forcluster) The experimental results presented in this paper areobtained from 50 different runs of identical settings for eachprotein when using HP benchmarks and 20 different runsof identical settings for each protein when using 20 times 20

benchmarks

52 Experimental Results on HP Benchmark The experi-mental results on HP benchmarks are presented in Tables4 and 5 Amongst the sequences F90 S F180 and Rinstances are taken from Peter Clote laboratory web-site (Peter Clote Lab httpbioinformaticsbceduclotelabFCCproteinStructure)These instances have been used in [89 26 27 39] for evaluating different algorithmsMoreover we

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 11

Initialsolutions

Distinctsolutions

p

p

pn

pn

pn

Thread 1spiral search

Thread 2spiral search

spiral searchThread n

Removeduplicates

Improvedsolutions

solutions

Improvedsolutions

Improvedsolutions

MergedCreate (n)

subsets

ge pn

ge pn

ge pn

ge pn

Figure 4 Parallel spiral search framework

use other six larger sequences that are taken from the CASP(CASP website httppredictioncenterorgcasp9targetlistcgi) competition The corresponding CASP target IDs forproteins 3mse 3mr7 3mqz 3no6 3no3 and 3on7 are T0521T0520 T0525 T0516 T0570 and T0563 These CASP targetsare also used in [27] To fit in the HPmodel the CASP targetsare converted to HP sequences based on the hydrophobicproperties of the constituent amino acids The lower boundsof the free energy values (in Column LBFEof Tables 4 and 5)are obtained from [26 27] however there are some unknownvalues (presented as 119899119886) of lower bounds of free energy forlarge sequences

521 Results on Medium Sized HP Benchmark Proteins InTable 4 we present three different sets of result obtainedfrom (i) our parallel local search framework that runs onfour parallel threads (30 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (2 hoursrun) and(iii) a genetic algorithm (GA+) that runs on a single thread(2 hoursrun) In the table the Size column presents thenumber of amino acids in the sequences and the 119871119861119865119864column shows the known lower bounds of free energy forthe corresponding protein sequences in Column 119868119863The bestand average free energy values for three different algorithmsare presented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

522 Results on Large Sized HP Benchmark Proteins InTable 5 we present three different sets of result obtained from(i) our parallel local search framework that runs on fourparallel threads (1 hour 15 minutesrun) (ii) a local search(SS-Tabu) that runs on a single thread (5 hoursrun) and (iii)a genetic algorithm (GA+) that runs on a single thread (5hoursrun) In the table the Size column presents the number

of amino acids in the sequences and the LBFEcolumn showsthe known lower bounds of free energy for the correspondingprotein sequences in Column ID However a lower boundof free energy for protein 3on7 is not known The best andaverage free energy values for three different algorithms arepresented in the table under the specific column headers(PSS SS-Tabu andGA+)TheRIColumns present the relativeimprovements of parallel local search over the single-threadlocal search and the genetic algorithmThe bold-faced valuesindicate better performance in comparison to the otheralgorithms for corresponding proteins

523 Relative Improvement on HP Benchmark The difficultyof improving energy level is increased as the improved energylevel approaches to the lower bound of free energy Forexample if the lower bound of free energy of a protein isminus100 the efforts to improve energy level from minus80 to minus85are much less than that to improve energy level from minus95 tominus100 though the change in energy is the same (minus5) RelativeImprovement (RI) explains how close our predicted resultsare to the lower bound of free energy with respect to theenergy obtained from the state-of-the-art approaches

RI =119864119905minus 119864119903

119864l minus 119864119903lowast 100 (9)

In Tables 4 and 5 we also present a comparison ofimprovements () on average conformation quality (in termsof free energy levels) We compare PSS (target) with SS-Tabuand GA+ (references) For each protein the RI of the target(119905) with respect to the reference (119903) is calculated using theformula in (9) where 119864

119905and 119864

119903denote the average energy

values achieved by the target and the reference respectivelyand 119864l is the lower bound of free energy for the protein inthe HP model We present the relative improvements onlyfor the proteins having known lower bounds of free energyvalues We test our new approach on 16 different proteins of

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

12 Advances in Bioinformatics

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

Real time (minutes)

SSTGASSP

+

(a)

minus360

minus350

minus340

minus330

minus320

minus310

minus300

0 50 100 150 200 250 300

Aver

age e

nerg

y (minus

ve)

CPU time (minutes)

SST

SSPGA+

(b)

Figure 5 Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time) SST GA+ and SSP representtabu-based spiral search [9] genetic algorithms [8] and multipoint parallel spiral search respectively

various lengths The bold-faced values are the minimum andthe maximum improvements for the same column

Improvement with respect to SS-Tabu The experimentalresults in Tables 4 and 5 at column RI under SS-Tabushow that our PSS is able to improve the search quality interms of minimising the free energy level over all the 16proteins considered for the test The relative improvementswith respect to SS-Tabu range from 0 to 50

Improvement with respect to119866119860+The experimental results inTables 4 and 5 at column RI (relative improvement) underGA+ show that our PSS is able to improve the search qualityin terms of minimising the free energy level over all 16proteins considered for the test The relative improvementswith respect to GA+ range from 0 to 33

524 Search Progress We compare the search progressesof SS-Tabu GA+ and PSS on the basis of real executiontime Figure 5(a) shows the average energy values obtainedwith times by the algorithms for protein R1 The graphshows that the progress of PSS stops at 75 minutes (125hours) As we mentioned earlier we run parallel threads(four threads) in our PSS for 125 hours to keep total CPUtime equal to five (125 times 4 = 5) hours From the graphit is clear that multipoint local search with four parallelthreads dramatically outperforms the local search and geneticalgorithms within (14)th of the execution time

However in Figure 5(b) we compare the search pro-gresses of SS-Tabu GA+ and PSS over CPU time The CPUtime of PSS is calculated by summing up the individual timesof all threads (time per thread times 4) in different instances

525 Comments onOurHP-BasedMethod In Tables 4 and 5the Columns LBFE represent the lower bound of free energySome of these values are taken from the literatures andothers are obtained running exact and complete algorithmsbased CPSP-tools [42] However we do not compare ourexperimental results with results obtained from CPSP toolsbecause of a fundamental conceptual difference betweenour approaches and Will and Backofen [63 64] WillrsquosHPstruct algorithm [65] proceeds with threading an inputHP sequence onto hydrophobic cores from a collection ofprecomputed and stored H-cores On the other hand ouralgorithms compute H-cores on the fly like Yue-Dill CHCCmethod [61 66] HPstruct requires a precomputed set of H-cores for the number of H amino acids in the given sequenceTherefore CPSP tools cannot find structure without theavailability of a precomputed optimal H-core

53 Experimental Results on 20 times 20 Benchmark Besides HPenergy model we apply our parallel framework on standard20 times 20 benchmark proteins The protein instances used inour experiments are taken from the literature (as shown inTable 6) The first seven proteins 4RXN 1ENH 4PTI 2IGD1YPA 1R69 and 1CTF are taken from [12] and the next fiveproteins 3MX7 3NBM CMQO 3MRO and 3PNX from [10]In Table 7 we present eight sets of experimental results Theapproaches are described below

(1) LS-Tabu is heuristically guided local search based ontabu metaheuristic The result presented in Table 7under Column LS-Tabu is the output of 20 differentruns of LS-Tabu [10] in an identical setting over 60

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 13

Table 6 The benchmark proteins used in our experiments

ID Length Sequence4RXN 54 MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE

1ENH 54 RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI

4PTI 58 RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

2IGD 61 MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE

1YPA 64 MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG

1R69 69 SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR

1CTF 74 AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

3MX7 90 MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFA

YEYTLEINGKSLKKYM

3NBM 108 SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDAE

RLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ

3MQO 120 PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYADDR

IMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA

3MRO 142 SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALK

AHLEHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL

3PNX 160GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAE

ELPLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYLK

NALESDLQLFI

minutes duration The algorithm runs on a singlethread using Berrera et al 20 times 20 energy model

(2) SS-Tabu is core directed local search based ontabu metaheuristic works in an spiral fashion Theresult presented in Table 7 under Column SS-Tabuis the output of 20 different runs of LS-Tabu [9] inan identical setting over 60 minutes duration Thealgorithm runs on a single thread using Berrera et al20 times 20 energy model

(3) PSSB4H0 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threadsthe SS-Tabu is guided by Berrera et al 20 times 20 energymodel The parallel threads are terminated after 15minutes Therefore the total CPU time remains (15 times4-threads) the same as the SS-Tabu or LS-Tabu

(4) PSSB3H1 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(5) PSSB2H2 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threadsthe SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 2 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(6) PSSB1H3 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in 3 threads

the SS-Tabu is guided by Berrera et al 20 times 20

energy model and in other 3 threads the SS-Tabu isguided by HP energy model The parallel threads areterminated after 15 minutes Therefore the total CPUtime remains (15times4-threads) the same as the SS-Tabuor LS-Tabu

(7) PSSB0H4 is a variant of parallel spiral search runningin 4 threads In this variant of PSS in all 4 threads theSS-Tabu is guided by HP energy model The parallelthreads are terminated after 15minutesTherefore thetotal CPU time remains (15 times 4-threads) the same asthe SS-Tabu or LS-Tabu

(8) 119866119860+ is population-based genetic algorithm that useshydrophobic-core directed macromutation operatorand random-walk-based stagnation recovery tech-nique in addition to the regular GA operators Theresult presented in Table 7 under Column GA+ is theoutput of 20 different runs of GA+ [11] in an identicalsetting over 60 minutes durationThe algorithm runson a single thread using both HP and BM energymodels in a mixing manner

54 Energy Values on 20 times 20 Benchmark In Table 7 theenergy columns show the energy values obtained fromdifferent approaches on 12 benchmark proteins (Table 6)Although the searches are guided by both HP and BM energymodels the energy values are calculated by applying Berreraet al 20 times 20 energy matrix The experimental results showthat amongst the parallel spiral search variants PSSB1H3(6 out of 12 proteins) and PSSB0H4 (6 out of 12 proteins)produce better results in comparison to the other variantsin terms of lowest interaction energies However the GA+

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

14 Advances in Bioinformatics

Table7Th

ebestandaveragecontacte

nergieso

btainedfro

m8different

approaches

usingBe

rreraet

al[25]20times20energy

matrix

Row

wise

bold-fa

cedvalues

arethewinners

forthe

correspo

ndingproteins

amon

gstthe

varia

ntso

fspiralsearch(bothsin

glea

ndparallelframew

orks)and

bold-italic-fa

cedvalues

arethe

winnersforthe

correspo

ndingproteins

amon

gstall8

approachesFor

both

energy

andRM

SDvaluesthe

lower

theb

etter

Com

parin

gall-a

tomicinteractionenergy

andRM

SDvalues

State-of-th

e-art

Spira

lsearch

Parallelspiralsearch(PSS)v

ariantso

nenergy

mod

elmixing

State-of-th

e-art

CPUtim

e1hr

CPUtim

e1hr

CPUtim

e1hr

(4-th

readstimes

15minutes))

CPUtim

e1hr

Proteindetails

1timesbr-th

read

1timesbr-th

read

4timesbr-th

reads

3timesbr-th

reads

2timesbr-th

reads

1timesbr-th

read

0timesbr-th

read

brandhp

based

0timeshp

-thread

0timeshp

-thread

0timeshp

-thread

1timeshp

-thread

2timeshp

-threads

3timeshp

-threads

4timeshp

-threads

singlethreaded

LS-Tabu[10]

SS-Tabu

PSSB

4H0

PSSB

3H1

PSSB

2H2

PSSB

1H3

PSSB

0H4

GA

+[11]

Seq

Size

HEn

ergy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

Energy

RMSD

4RXN

5427

minus15632

629

minus15011

600

minus14222

523

minus15447

521

minus15694

511

minus157

519

minus1572

5517

minus16272

541

1ENH

5419

minus14669

661

minus14301

588

minus1292

3511

minus14688

509

minus14776

502

minus14858

493

minus14839

490

minus1516

5522

4PTI

5832

minus19842

707

minus19077

699

minus17552

641

minus19605

627

minus19733

638

minus19842

638

minus1983

637

minus20

456

646

2IGD

6125

minus17419

933

minus16387

850

minus1510

972

6minus17174

726

minus17283

728

minus17389

733

minus17402

743

minus17683

781

1YPA

6438

minus23998

753

minus23610

686

minus21460

600

minus24519

605

minus24843

593

minus24735

577

minus24

854

586

minus25309

629

1R69

6930

minus20417

647

minus19114

565

minus17561

514

minus20322

492

minus20481

485

minus20588

488

minus20

706

478

minus20

879

517

1CTF

7442

minus21381

723

minus1978

5563

minus1791

8523

minus21838

521

minus2201

506

minus22223

502

minus2216

7506

minus22542

528

3MX7

9044

minus3115

6818

minus30089

862

minus2574

978

7minus3219

4800

minus32409

78minus32623

770

minus32555

764

minus32545

794

3NBM

108

56minus4019

9858

minus38012

695

minus3297

688

minus40

95612

minus40

674

606

minus41296

606

minus41118

600

minus41925

646

3MQO

120

68minus45527

886

minus4224

752

minus33674

739

minus4613

8698

minus46502

683

minus46

927

677

minus46

738

667

minus47

278

684

3MRO

142

6343029

1002

minus39714

961

minus31385

874

minus44

523

811

minus45068

793

minus44

859

785

minus45204

771

minus4477

7872

3PNX

160

84minus57113

938

minus50229

955

minus38349

905

minus58668

873

minus59385

838

minus59599

845

minus60

018

839

minus59225

851

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 15: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 15

performs better in comparison to the parallel spiral searchvariants for 9 out of 12 proteins

55 RMSD Values on 20 times 20 Benchmark The RMSD isfrequently used to measure the differences between valuespredicted by a model and the values actually observed Wecompare the predicted structures obtained by our approachwith the state-of-the-art approaches bymeasuring the RMSDwith respect to the native structures from PDB For any givenstructure the RMSD is calculated using (10) The averagedistance between two120572-Carbons in a native structure is 38 ATo calculate RMSD the distance between two neighbourlattice points (radic2 for FCC lattice) is considered as 38 AConsider

RMSD =radicsum119899minus1

119894=1sum119899

119895=119894+1(119889119901

119894119895minus 119889119899

119894119895)2

119899 lowast (119899 minus 1) 2

(10)

where 119889119901119894119895and 119889119899

119894119895denote the distances between 119894th and 119895th

amino acids respectively in the predicted structure and thenative structure of the protein

In Table 7 the RMSD columns show the root-mean-square deviation (RMSD) values obtained from differentapproaches on 12 benchmark proteins (Table 6) The exper-imental results show that amongst the parallel spiral searchvariants PSSB1H3 (7 out of 12 proteins) produces betterresults in comparison to other variants in terms of lowestRMSD values However when compared with GA+ theparallel variants perform better for 11 out of 12 proteins

56 Effect of Mixing Energy Models The best hydrophobiccores do not always correspond to the best structures in termsof RMSD values [67 68] These observations inspired us tomix the energy models The approaches presented in Table 7are guided by BM HP or both energy models Howeverthe conformations are always evaluated using BM modelThe experimental results show that when the variants areguided by HP or both BM and HPmodels (such as PSSB3H1PSSB2H2 PSSB1H3 and PSSB0H4) it performs better thanthe variant guided by BM model (such as PSSB4H0) There-fore from the observation of RMSD values it is clear thatHP model works as a better guidance heuristic whereas BMmodel works as better model for evaluating conformations

6 Conclusion

In this paper we present a multipoint parallel local searchframework that runs tabu-based local search (spiral search[9]) in parallel threads In our ab initio protein structureprediction method we develop two versions of SS-Tabu thatuses hydrophobic-polar energy model and 20 times 20 Berrera etal [25] energymodel separately on face-centred-cubic latticeCollaboration and negotiation play vital roles in dealing withreal world challenges In our research we try to adopt thisanalogy by considering each thread as a collaborator Weallow each thread to run for a predefined period of timeThe threads are met in an assembly point when they finish

their execution and donate or accept better solutions toproceed with The PSS starts with a set of random initialsolutions by distributing a subset of solutions to differentthreads which are running different combinations of twoversions of SS-Tabu The interim improved solutions arestored threadwise and merged together when the threadsfinish After removing the duplicates from the merged solu-tions a selected distinct set of solutions is considered forthe next iteration In our approach multipoint start helpsfind some promising solutions For the next working set ofsolutions from the merged list the most promising solutionsare selected Therefore multipoint parallelism reduces thesearch space by exploring around the promising solutions inevery iteration The experimental results show that our newapproach significantly improves over the results obtained bythe state-of-the-art single-point search approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Mahmood A Rashid conceived the idea of applying SpiralSearch in a parallel framework M A Hakim NewtonSwakkhar Shatabda Md Tamjidul Hoque and Abdul Sattarhelped Mahmood A Rashid in modeling implementingand testing the approach All authors equally participated inanalysing the test results to improve the approach and weresignificantly involved in the process of writing and reviewingthe paper

Acknowledgments

The authors would like to express their great appreciationto the people managing the Cluster Computing Services atNational ICT Australia (NICTA) and Griffith UniversityMd Tamjidul Hoque acknowledges the Louisiana Board ofRegents through the Board of Regents Support Fund LEQSF(2013-16)-RD-A-19 NICTA the sponsor of the article forpublication is funded by the Australian Government asrepresented by the Department of Broadband Communica-tions and the Digital Economy and the Australian ResearchCouncil through the ICT Centre of Excellence program

References

[1] A Smith ldquoProtein misfoldingrdquoNature Reviews Drug Discoveryvol 426 no 6968 p 78102 2003

[2] C M Dobson ldquoProtein folding and misfoldingrdquo Nature vol426 no 6968 pp 884ndash890 2003

[3] ldquoSo muchmore to knowrdquoThe Science vol 309 no 5731 pp 78ndash102 2005

[4] R Bonneau and D Baker ldquoAb initio protein structure predic-tion progress and prospectsrdquo Annual Review of Biophysics andBiomolecular Structure vol 30 pp 173ndash189 2001

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 16: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

16 Advances in Bioinformatics

[5] C A Rohl C EM Strauss KM SMisura andD Baker ldquoPro-tein structure prediction using rosettardquoMethods in Enzymologyvol 383 pp 66ndash93 2004

[6] J Lee S Wu and Y Zhang ldquoAb initio protein structure predic-tionrdquo in From Protein Structure to Function with Bioinformaticspp 3ndash25 2009

[7] Y Xia E S Huang M Levitt and R Samudrala ldquoAb initioconstruction of protein tertiary structures using a hierarchicalapproachrdquo Journal of Molecular Biology vol 300 no 1 pp 171ndash185 2000

[8] M A Rashid M T Hoque M A H Newton D Pham and ASattar ldquoA new genetic algorithm for simplified protein structurepredictionrdquo in AI 2012 Advances in Artificial Intelligence Lec-ture Notes in Computer Science pp 107ndash119 Springer BerlinGermany 2012

[9] M A Rashid M A H Newton M T Hoque S ShatabdaD Pham and A Sattar ldquoSpiral search a hydrophobic-coredirected local search for simplifiedPSPon 3DFCC latticerdquoBMCBioinformatics vol 14 supplement 2 article S16 2013

[10] S Shatabda and M A H Newton ldquoSattar a mixed heuristiclocal search for protein structure predictionrdquo in Proceedings ofthe 27th AAAI Conference on Artificial Intelligence AAAI Press2013

[11] M A Rashid M A H Newton M T Hoque and A SattarldquoMixing energy models in genetic algorithms for on-latticeprotein structure predictionrdquo BioMed Research Internationalvol 2013 Article ID 924137 15 pages 2013

[12] A D Ullah and K Steinhofel ldquoA hybrid approach to proteinfolding problem integrating constraint programming with localsearchrdquo BMC Bioinformatics vol 11 supplement article S392010

[13] S R D Torres D C B Romero L F N Vasquez and Y J PArdila ldquoA novel ab-initio genetic-based approach for proteinfolding predictionrdquo in Proceedings of the 9th Annual Conferenceon Genetic and Evolutionary Computation (GECCO rsquo07) pp393ndash400 ACM 2007

[14] Y Zhang and J Skolnick ldquoThe protein structure predictionproblem could be solved using the current PDB libraryrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 4 pp 1029ndash1034 2005

[15] J U Bowie R Luthy and D Eisenberg ldquoA method to identifyprotein sequences that fold into a known three-dimensionalstructurerdquo Science vol 253 no 5016 pp 164ndash170 1991

[16] A Torda ldquoProtein threadingrdquo in The Proteomics ProtocolsHandbook pp 921ndash938 2005

[17] K T Simons R Bonneau I Ruczinski and D Baker ldquoAbinitio protein structure prediction of CASP III targets usingROSETTArdquo Proteins supplement 3 pp 171ndash176 1999

[18] D Baker and A Sali ldquoProtein structure prediction and struc-tural genomicsrdquo Science vol 294 no 5540 pp 93ndash96 2001

[19] C Levinthal ldquoAre there pathways for protein foldingrdquo Journalof Medical Physics vol 65 pp 44ndash45 1968

[20] C B Anfinsen ldquoPrinciples that govern the folding of proteinchainsrdquo Science vol 181 no 4096 pp 223ndash230 1973

[21] T C Hales ldquoA proof of the Kepler conjecturerdquo Annals ofMathematics vol 162 no 3 pp 1065ndash1185 2005

[22] M T Hoque M Chetty and A Sattar ldquoProtein folding predic-tion in 3D FCC HP lattice model using genetic algorithmrdquo inProceedings of the IEEE Congress on Evolutionary ComputationThe Annals of Mathematics pp 4138ndash4145 2007

[23] K F Lau and K A Dill ldquoLattice statistical mechanics modelof the conformational and sequence spaces of proteinsrdquoMacro-molecules vol 22 no 10 pp 3986ndash3997 1989

[24] S Miyazawa and R L Jernigan ldquoEstimation of effective inter-residue contact energies from protein crystal structures Quasi-chemical approximationrdquo Macromolecules vol 18 no 3 pp534ndash552 1985

[25] M Berrera H Molinari and F Fogolari ldquoAmino acid empiricalcontact energy definitions for fold recognition in the space ofcontact mapsrdquo BMC Bioinformatics vol 4 article 8 2003

[26] M Cebrian I IDotu P V Hentenryck and P Clote ldquoProteinstructure prediction on the face centered cubic lattice by localsearchrdquo in Proceedings of the 23rd AAAI Conference on ArtificialIntelligence pp 241ndash246 AAAI Press July 2008

[27] S Shatabda M A H Newton D N Pham and A SattarldquoMemory-based local search for simplified protein structurepredictionrdquo in Proceedings of the 3rd ACM Conference onBioinformatics Computational Biology and Biomedicine (BCBrsquo12) ACM Orlando Fla USA 2012

[28] B Berger and T Leighton ldquoProtein folding in the hydrophobic-hydrophilic (HP) model is NP-completerdquo Journal of Computa-tional Biology vol 5 no 1 pp 27ndash40 1998

[29] F Glover andM Laguna Tabu Search vol 1 Kluwer Academic1998

[30] F Glover ldquoTabu search Part Irdquo ORSA Journal on Computingvol 1 no 3 pp 190ndash206 1989

[31] C Thachuk A Shmygelska and H H Hoos ldquoA replicaexchange Monte Carlo algorithm for protein folding in the HPmodelrdquo BMC Bioinformatics vol 8 article 342 2007

[32] A-A Tantar N Melab and E-G Talbi ldquoA grid-based geneticalgorithm combined with an adaptive simulated annealing forprotein structure predictionrdquo Soft Computing vol 12 no 12 pp1185ndash1198 2008

[33] R Unger and J Moult ldquoA genetic algorithm for 3D proteinfolding simulationsrdquo in Proceedings of the 5th InternationalConference on Genetic Algorithms Soft Computing-A Fusionof Foundations Methodologies and Applications pp 581ndash588Morgan Kaufmann Publishers 1993

[34] M T Hoque Genetic Algorithm for ab initio protein structureprediction based on low resolution models [PhD thesis] Gipp-sland School of Information Technology Monash UniversityMonash Australia 2007

[35] H J Bockenhauer A Z M D Ullah L Kapsokalivas and KSteinhofel ldquoA local move set for protein folding in triangularlattice modelsrdquo in Algorithms in Bioinformatics K A Crandalland J Lagergren Eds vol 5251 of Lecture Notes in ComputerScience pp 369ndash381 Springer 2008

[36] G W Klau N Lesh J Marks and M Mitzenmacher ldquoHuman-guided tabu searchrdquo in Proceedings of the 18th National Confer-ence onArtificial Intelligence (AAAIrsquo 02) pp 41ndash47 August 2002

[37] C Blum ldquoAnt colony optimization Introduction and recenttrendsrdquo Physics of Life Reviews vol 2 no 4 pp 353ndash373 2005

[38] V Cutello G Nicosia M Pavone and J Timmis ldquoAn immunealgorithm for protein structure prediction on lattice modelsrdquoIEEE Transactions on Evolutionary Computation vol 11 no 1pp 101ndash117 2007

[39] I DotuM Cebrian P VanHentenryck and P Clote ldquoOn latticeprotein structure prediction revisitedrdquo IEEEACM Transactionson Computational Biology and Bioinformatics vol 8 no 6 pp1620ndash1632 2011

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 17: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Advances in Bioinformatics 17

[40] R Backofen and S Will ldquoA constraint-based approach to fastand exact structure prediction in three-dimensional proteinmodelsrdquo Constraints vol 11 no 1 pp 5ndash30 2006

[41] M Mann S Will and R Backofen ldquoCPSP-tools exact andcomplete algorithms for high-throughput 3D lattice proteinstudiesrdquo BMC Bioinformatics vol 9 article 230 2008

[42] M Mann C Smith M Rabbath M Edwards S Will andR Backofen ldquoCPSP-web-tools a server for 3D lattice proteinstudiesrdquo Bioinformatics vol 25 no 5 pp 676ndash677 2009

[43] A L Patton W F Punch III and E D Goodman ldquoA standardGA approach to native protein conformation predictionrdquo inProceedings of the 6th International Conference on GeneticAlgorithms

[44] M T Hoque M Chetty A Lewis A Sattar and V M AveryldquoDFS-generated pathways in GA crossover for protein structurepredictionrdquo Neurocomputing vol 73 no 13ndash15 pp 2308ndash23162010

[45] M T Hoque M Chetty A Lewis and A Sattar ldquoTwin removalin genetic algorithms for protein structure prediction using low-resolution modelrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 1 pp 234ndash245 2011

[46] A D Ullah L Kapsokalivas M Mann and K SteinhofelldquoProtein folding simulation by two-stage optimizationrdquo inComputational Intelligence and Intelligent Systems Z Cai ZLi Z Kang and Y Liu Eds vol 51 of Communicationsin Computer and Information Science pp 138ndash145 SpringerBerlin Germany 2009

[47] T Jiang Q Cui G Shi and S Ma ldquoProtein folding simulationsof the hydrophobic-hydrophilic model by combining tabusearchwith genetic algorithmsrdquo Journal of Chemical Physics vol119 no 8 pp 4592ndash4596 2003

[48] A Dal Palu A Dovier and F Fogolari ldquoConstraint logicprogramming approach to protein structure predictionrdquo BMCBioinformatics vol 5 article 186 2004

[49] A Dal Palu A Dovier and E Pontelli ldquoHeuristics opti-mizations and parallelism for protein structure prediction inCLP(FD)rdquo in Proceedings of the 7th ACMSIGPLANConferenceon Principles and Practice of Declarative Programming (PPDPrsquo05) pp 230ndash241 July 2005

[50] A Dal Palu A Dovier and E Pontelli ldquoA constraint solver fordiscrete lattices its parallelization and application to proteinstructure predictionrdquo Software vol 37 no 13 pp 1405ndash14492007

[51] A Dal Palu A Dovier F Fogolari and E Pontelli ldquoExploringprotein fragment assembly using CLPrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence vol3 pp 2590ndash2595 AAAI Press 2011

[52] L Kapsokalivas X Gan A A Albrecht and K SteinhofelldquoPopulation-based local search for protein folding simulation intheMJ energy model and cubic latticesrdquo Computational Biologyand Chemistry vol 33 no 4 pp 283ndash294 2009

[53] C Vargas Benitez and H Lopes ldquoParallel artificial bee colonyalgorithm approaches for protein structure prediction using the3DHP-SC modelrdquo in Intelligent Distributed Computing IV vol315 of of Studies in Computational Intelligence pp 255ndash264Springer Berlin Germany 2010

[54] A-A Tantar N Melab and E-G Talbi ldquoA comparative studyof parallel metaheuristics for protein structure prediction onthe computational gridrdquo in Proceedings of the 21st InternationalParallel and Distributed Processing Symposium (IPDPS rsquo07)March 2007

[55] A-A Tantar N Melab E-G Talbi B Parent and D HorvathldquoA parallel hybrid genetic algorithm for protein structureprediction on the computational gridrdquo Future Generation Com-puter Systems vol 23 no 3 pp 398ndash409 2007

[56] I Kondov ldquoProtein structure prediction using distributedparallel particle swarm optimizationrdquo Natural Computing vol12 pp 29ndash41 2013

[57] J C Calvo and J Ortega ldquoParallel protein structure predictionby multiobjective optimizationrdquo in Proceedings of the 17thEuromicro International Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo09) pp 268ndash275 February2009

[58] J C Calvo J Ortega and M Anguita ldquoComparison of parallelmulti-objective approaches to protein structure predictionrdquoJournal of Supercomputing vol 58 no 2 pp 253ndash260 2011

[59] V Robles M Perez V Herves J Pena and P LarranagaldquoParallel stochastic search for protein secondary structurepredictionrdquo in Parallel Processing and Applied Mathematicsvol 3019 of Lecture Notes in Computer Science pp 1162ndash1169Springer Berlin Germany 2004

[60] M A Rashid S Shatabda M A H Newton M T Hoque DN Pham and A Sattar ldquoRandom-walk a stagnation recoverytechnique for simplified protein structure predictionrdquo in Pro-ceedings of the ACM Conference on Bioinformatics Computa-tional Biology and Biomedicine (BCB rsquo12) pp 620ndash622 ACM2012

[61] K Yue and K A Dill ldquoSequence-structure relationships inproteins and copolymersrdquo Physical Review E vol 48 no 3 pp2267ndash2278 1993

[62] N Lesh M Mitzenmacher and S Whitesides ldquoA complete andeffective move set for simplified protein foldingrdquo in Proceedingsof the 7th Annual International Conference on Research inComputational Molecular Biology pp 188ndash195 April 2003

[63] S Will ldquoConstraint-based hydrophobic core construction forprotein structure prediction in the face-centered-cubic latticerdquoPacific Symposium on Biocomputing Pacific Symposium onBiocomputing pp 661ndash672 2002

[64] R Backofen and S Will ldquoA constraint-based approach to struc-ture prediction for simplified protein models that outperformsother existingmethodsrdquo in Logic Programming pp 49ndash71 2003

[65] S Will Exact constraint-based structure prediction in simpleprotein models [PhD thesis] University of Jena 2005

[66] K Yue K M Fiebig P D Thomas H S Chan E IShakhnovich and K A Dill ldquoA test of lattice protein foldingalgorithmsrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 92 no 1 pp 325ndash329 1995

[67] S Shatabda M H Newton M A Rashid D N Pham and ASattar ldquoHow good are simplified models for protein structurepredictionrdquo Advances in Bioinformatics In press

[68] S Shatabda M A H Newton and A Sattar ldquoSimplifiedlattice models for protein structure prediction how good aretheyrdquo in Proceedings of the 27th AAAI Conference on ArtificialIntelligence 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 18: Research Article A Parallel Framework for …downloads.hindawi.com › archive › 2014 › 985968.pdfResearch Article A Parallel Framework for Multipoint Spiral Search in ab Initio

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology