Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using...

16
2018 Vol. 4 No. 1:1 iMedPub Journals www.imedpub.com Research Article DOI: 10.21767/2470-6973.100026 Chemical informatics ISSN 2470-6973 1 © Under License of Creative Commons Attribution 3.0 License | This arcle is available in: hp://cheminformacs.imedpub.com/ Inna Slynko 1,2 , Jan KF Dreher 1 and Andreas H Göller 1* 1 Bayer Pharma AG, Medicinal Chemistry - Computaonal Chemistry, Wuppertal, Germany 2 Grünenthal GmbH, Computaonal Chemistry, Aachen, Germany *Corresponding author: Andreas H Göller [email protected] Tel: +49202365442 Bayer Pharma AG, Medicinal Chemistry - Computaonal Chemistry, Wuppertal, Germany. Citation: Slynko I, Dreher JKF, Göller AH (2018) Every Jack has His Jill: Finding a Target for Your Combinatorial Library. Chem Inform Vol. 4 No. 1:1. Introducon The chemical library belongs to the biggest research assets of any pharmaceucal company. Such screening libraries are typically between one to five million compounds [1]. Whether the full library or only subsets are tested in HTS campaigns and how such subsets are composed depends on target areas, assay designs and company’s strategy. HTS and especially in vitro and in vitro assays of individual compounds are costly in terms of substance consumpon. Therefore, all libraries bleed out. Instead of resynthesizing old compounds, companies set up campaigns to evolve the libraries into new chemical space following one of three strategies, namely, buying from chemical catalogs, buying readily available proprietary compounds or designing novel proprietary chemistry. Typical design concept for novel libraries is to create structurally diverse compounds with Lipinski drug-like [2] or lead-like [3,4] properes. Since chemical space is almost infinite with approximately 10 60 compounds with a molecular weight lower than 500 Da, [5] and currently only about 10 to 20 million compounds relevant to drug discovery are covered by commercial sources and proprietary repositories, the queson arises: which of numerous imaginary libraries are relevant and which not? Received: December 21, 2017; Accepted: December 26, 2017; Published: January 01, 2018 Abstract Pharmaceucal companies regularly run campaigns to evolve their proprietary chemical libraries which are among their most valuable assets. Ulmate goal with those library expansions is to address novel chemical space with maximal fit to pharmaceucally relevant targets which is beyond just applying property or drug- likeness filters. In this work we present a structured and highly automated process to idenfy putave biological targets starng from any chemistry-driven virtual or exisng compound library. Mulple ligand similarity searches are performed in ChEMBL ligand space, linking library compounds to targets from ChEMBL database. The results are presented to the computaonal chemist in a highly intuive and interacve manner. For a set of targets selected by a scienst, holo crystal structures are automacally retrieved and prepared for docking. The co- crystallized ligand, ChEMBL compounds and combinatorial library are then docked by an automac procedure. The scienst finally is provided with a holisc picture of library-target fit hypotheses to draw his conclusions about relevant targets, library adjustments, library re-designs and ideas for completely new virtual libraries. Keywords: Library design; Target fishing; Automated workflow Every Jack has His Jill: Finding a Target for Your Combinatorial Library One way to address the queson of target relevance is to start from known chemical maer and to apply core modificaons like changes of ring size or type, or shiſting nitrogen and funconal groups. Alternavely, one can design libraries purely chemistry- driven, based on aracve chemical scaffolds, synthesis routes or concepts like escaping from flatland, [6] giving diversity and serendipity a chance. Combined with IP space analysis both routes can yield viable libraries. We were now interested if it would be possible to find the right target or target family for a subset of our internal library designs, which were originally driven by feasible chemistry and aracve novelty. Or otherwise, if it would be possible to derive a raonale how to modify such a library design in order to tailor the respecve library to a specific target or target family. We expect that a library designed with a target family in mind possesses a higher chance to hit the relevant chemical space, especially, since there are many indicaons for existence of privileged scaffolds [7].

Transcript of Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using...

Page 1: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

iMedPub Journals www.imedpub.com

Research Article

DOI: 10.21767/2470-6973.100026

Chemical informaticsISSN 2470-6973

1© Under License of Creative Commons Attribution 3.0 License | This article is available in: http://cheminformatics.imedpub.com/

Inna Slynko1,2, Jan KF Dreher1 and Andreas H Göller1*

1 BayerPharmaAG,MedicinalChemistry-ComputationalChemistry,Wuppertal,Germany

2 GrünenthalGmbH,ComputationalChemistry,Aachen,Germany

*Corresponding author: AndreasHGöller

[email protected]

Tel: +49202365442

BayerPharmaAG,MedicinalChemistry-ComputationalChemistry,Wuppertal,Germany.

Citation: SlynkoI,DreherJKF,GöllerAH(2018)EveryJackhasHisJill:FindingaTargetforYourCombinatorialLibrary.ChemInformVol.4No.1:1.

IntroductionThe chemical library belongs to the biggest research assetsof any pharmaceutical company. Such screening libraries aretypically between one to fivemillion compounds [1].Whetherthefull libraryoronlysubsetsaretestedinHTScampaignsandhowsuchsubsetsarecomposeddependsontargetareas,assaydesignsandcompany’sstrategy.HTSandespecially in vitroandin vitro assays of individual compounds are costly in terms ofsubstanceconsumption.Therefore,alllibrariesbleedout.Insteadof resynthesizingoldcompounds, companies setupcampaignstoevolvethelibrariesintonewchemicalspacefollowingoneofthreestrategies,namely,buyingfromchemicalcatalogs,buyingreadily available proprietary compounds or designing novelproprietarychemistry.Typicaldesignconceptfornovel librariesistocreatestructurallydiversecompoundswithLipinskidrug-like[2]orlead-like[3,4]properties.

Since chemical space is almost infinitewithapproximately1060 compoundswithamolecularweightlowerthan500Da,[5]andcurrentlyonlyabout10to20millioncompoundsrelevanttodrugdiscovery are covered by commercial sources and proprietaryrepositories, thequestionarises:whichofnumerous imaginarylibrariesarerelevantandwhichnot?

Received: December21,2017; Accepted: December26,2017;Published: January01,2018

AbstractPharmaceutical companies regularly run campaigns to evolve their proprietarychemicallibrarieswhichareamongtheirmostvaluableassets.Ultimategoalwiththose libraryexpansions is toaddressnovelchemicalspacewithmaximalfit topharmaceuticallyrelevanttargetswhichisbeyondjustapplyingpropertyordrug-likenessfilters.Inthisworkwepresentastructuredandhighlyautomatedprocessto identifyputativebiological targets starting fromany chemistry-drivenvirtualorexistingcompound library.Multiple ligand similarity searchesareperformedin ChEMBL ligand space, linking library compounds to targets from ChEMBLdatabase. The results are presented to the computational chemist in a highlyintuitiveandinteractivemanner.Forasetoftargetsselectedbyascientist,holocrystalstructuresareautomaticallyretrievedandpreparedfordocking.Theco-crystallizedligand,ChEMBLcompoundsandcombinatoriallibraryarethendockedbyanautomaticprocedure.Thescientistfinallyisprovidedwithaholisticpictureoflibrary-targetfithypothesestodrawhisconclusionsaboutrelevanttargets,libraryadjustments,libraryre-designsandideasforcompletelynewvirtuallibraries.

Keywords: Librarydesign;Targetfishing;Automatedworkflow

Every Jack has His Jill: Finding a Target for Your Combinatorial Library

Onewaytoaddressthequestionoftargetrelevance is tostartfromknownchemicalmatterandtoapplycoremodificationslikechangesofringsizeortype,orshiftingnitrogenandfunctionalgroups.Alternatively,onecandesignlibrariespurelychemistry-driven,basedonattractive chemical scaffolds, synthesis routesor concepts likeescaping fromflatland, [6]givingdiversityandserendipity a chance. Combined with IP space analysis bothroutescanyieldviablelibraries.

We were now interested if it would be possible to find theright targetor target family for a subsetof our internal librarydesigns,whichwereoriginallydrivenbyfeasiblechemistryandattractivenovelty.Orotherwise,ifitwouldbepossibletoderivearationalehowtomodifysuchalibrarydesigninordertotailortherespectivelibrarytoaspecifictargetortargetfamily.Weexpectthatalibrarydesignedwithatargetfamilyinmindpossessesahigherchancetohittherelevantchemicalspace,especially,sincetherearemanyindicationsforexistenceofprivilegedscaffolds[7].

Page 2: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

2 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

We know also that computational methods, especially highthroughput methods like structure- or ligand-based virtualscreening or target-family likeness filters, are far from perfectandwillatmaximumprovidecertainenrichments.Therefore,wedecided to combine computationalmethods,whichprovideuswithahighdegreeofautomationandthroughput,withoptimumuseof expert knowledgeandguidance.Nevertheless,wehavetostressthatstartinglibraries,aswellaslibrariesdesignedwithhelpof theprocessdescribed in thisarticle,have tobestrictlynovel,which requires interventionof an expert and cannot beautomated.

Hencethequestionarises:howtofindthematchingtargetforthelibrary,oratleastforsomelibrarycompounds.Inthelastyearsmanyresearchers lookedintothistopicmostlyfromadifferentperspective,namely,how to control target selectivityof a leadcompound and avoid adverse effects, [8-10] how to identifyhidden opportunities in drug repurposing projects, [11-13] orhowtosupportthedifficultbutpromisingdesignofmultitargetdrugs[14-16].Despiteotherrationalefortargetfishingpresentedhere, additional information on potential off-target activity orselectivity of compounds from a starting library is a welcomeside-product.

Computational target prediction methods published to date[13,17] can be classified as ligand-based, network-based, side-effect-based,orprotein-structure-baseddependingonthedataused [18]. Ligand-based methods connect similarity measureswithbindingprofiles for similar compounds inorder topredictpotential targets. Network-based methods incorporate theknowledgeaboutligandandtargetinteractions,whicharethenrepresented as networks. Side-effect-based approaches utilizetheinformationaboutoff-targetactivitiesofsimilardrugs.

Potentialtargetscanalsobepredictedbyproteinstructure-basedmethodsincludingdocking,protein-ligandinteractionsorproteinbindingsitecomparisons,butthisisatediousmanualproceduresolelybasedonprofoundexpertknowledge.

Quite new is the inverse approach-to create ligand bioactivityfingerprints encoding the hit status of compounds from HTScampaigns [19,20]. In combination with conventional ligandfingerprintsthoseallowtoidentifychemicallysimilarligandsthatshouldhavesimilarbioactivityprofiles.

Ligand-based methods are fast and easy to use, but they arelimitedtosearchspacesofhighlysimilarcompounds.Toacertainextent,theyareabletoextrapolateintonewchemicalspaceviascaffoldhopping.

Docking,ontheotherhand, isdependentontheavailabilityofproteincrystalstructures.Forabouthalfofthetargetsrelevanttopharmaceuticalresearchtherearenocrystalstructuresavailable.Docking,inprinciple,canidentifynewchemicalmatter,butitischallenging with respect to protein pre-processing and ligandranking[18].

Pharmacophore methods, finally, are somewhere in between.To someextent they canextrapolateby scaffoldor substituenthopping. On the other hand, pharmacophore methods oftenprovidetheuserwithanoverwhelmingmanifoldofhypotheses

thatwithoutdetailedSARknowledgecannotbeseparated intomeaningfulandchancemodels.

Weightingtheprosandconsoftheformerconceptswedecidedfor a hybrid approach. We filter down the published - highlyincomplete and sparsely populated - pharmacological universebyfastligand-basedmethodstoamanageablesubset.Wethenprocess auser-selected subsetof the ligandhit sets related tospecific targets by docking. Our approach is as far as possibleautomated for efficient identification of potential biologicaltargets with co-crystal structures. The general process startswith multiple automated ligand-based similarity searches intheChEMBL [21]database,which contains chemical structuresof smallmoleculeswith their associated biological test resultsand targets.Consequently,groupingofhitsbasedonbiologicaltarget,extractionofstructures fromProteinDataBank [22]viathe accession codes and automated docking simulations areperformed.

The approach is novel in theway howmultiple computationalmethods are combined in an efficient process, providing thecomputational chemist with a holistic picture of potential hitsbasedontheavailableknowledge.Itisimplementedinawaytoautomatethetediousmanualwork,toprovideanexpertwiththecapabilitytointeractwithresultsandtoallowhimtoconcentrateondecision-making.

Despiteahighdegreeofautomationofthisprocess,thecrucialstepwillalwaysbethefinalone,wheretherealvalueisgeneratedbythemodelingexpert,whowillmakedecisionsbasedonvisualinspectionandhisexperienceinordertoadjustthecombinatoriallibrary to selected target(s) by adding, replacing or removingchemicalsubstituents,orexchangingascaffold.Asaresult,oneormorenoveltargetedlibrariescanbedesigned.

Byourapproachwewillloseallthosetargetsourlibrarywouldshow some activity on but where the published ligands aretoo dissimilar in 2Dmetrics. A part of those targets could be“rescued”bydirectdockingintothecompletecrystallizedtargetspace,buteventhenwewouldstillmisssometargetsduetotheshortcomingsofrigidreceptordocking.

Wedonotaimfortheidentificationofacompletetargetomeforour library,but for the identificationof targets thatfit into thepathwaysofourmedical indications.Wewill thereforenotaimforthehighest-rankedtarget,butfortheonebestfittingtoourprojectportfolio.

It is also important to understand that we do not describe aprocessofautomatedligand-andtarget-basedvirtualscreening.Instead, thesimilaritysearchesareappliedasacoarsefilter toidentifytargetsfromwhichtheexpertselectstargetsofinterest.Dockingisappliedtoconfirmtargetfitbasedonposeconsistencybetween cocrystallized ligand, ChEMBL hits and docked librarycompounds and has to be seen as a sharper filter to finallyidentifythemostappropriatetargetforourlibrary.Inthispaperwepresentaconceptandafirstimplementationoftheprocessthat can be easily adjusted to individual needs, like adding acorporate database of chemical structures and biological data,extending the range of similarity search methods, exchangingprotein preparation anddockingmethodor adding automated

Page 3: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

3© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

pharmacophoremodeling.Thoughimplementedinacommercialsoftware solution, the described protocol can be also realizedusingothertoolsandsoftware.

Methods and Process DescriptionThe basic concept of our target-fishing approach relies on the“similarityprinciple”, [23]according towhichsimilarmoleculesexert similar biological activities. Therefore, a combinatoriallibraryinitswhole,itssubsetsorindividualcompounds,thataresimilar to known actives, should be able to point at targets ofinterest.Promisingtargets,whichwereidentifiedindirectlyusingligand similarity, are then selected for further investigation viaautomateddocking.Conceptually,thisresemblestheprocessofexperimentaltargetvalidationusingchemicalprobes.

The automated protocol constructed and executed using theworkflow software Pipeline Pilot [24] can be summarized intofour steps, namely, database preparation, similarity search,analysisanddocking,asitisshownschematicallyinFigure 1.Inafifthstep,thecomputationalchemistswillvisuallyinspectresultsanddrawinformeddecisions.

Database preparationFrom the broad range of data from the scientific literature,includingbiologicalactivitiesfordrug-likebioactivecompoundsasavailableinthepublicdatabaseChEMBL[25],informationaboutchemical structures, identifiers, assays and targets is extractedand saved into the appropriate file formats for the similaritysearchesinstep2.(ThedatainthisworkwerebasedonChEMBLversion14(releasefromJuly2012)comprisingalmost14millionexperimentalresultsforabout1.9millioncompounds,whereasthe current release 23 from May 2017 contains around 2.1millioncompounds).ThedatabasestructureofChEMBLconsistsof about 50 tables, which are mapped by primary keys andcontain information about compound, source, drugproperties,experimentaldata,target,mechanismofbinding,etc.Inordertoaccessthemostimportantentitytypesfromthedatabase,SQLquerieswereconstructedand implemented inPipelinePilot toextractthedataaboutcompounds,targets,assaysandactivities,aswellasadjustablefiltersforparameterslikeorganism,activitytype,activitythresholdandconfidencescore.

FurtherinvestigationoftheChEMBLdatabaserevealedthattherearemorethan3000differentactivitytypesmeasuredinhundredsofdifferentunits.Amongthemthetop-representedactivitytypes,whichwereusedinourstudy,arepotency,EC50,IC50,inhibition,Ki. Moreover, grouping of compounds by organisms revealed1621speciesonwhichtheyweretested.Thus,weimplementedanumberofdefaultfiltersforthemostrepresentedactivitytypes(IC50,EC50,Ki,Kd),units(M,nM,µM,mM)andorganisms(human,mouseandrat)aswellasfortheactivitythreshold(10µM).ThosefilterscanbeeasilysetviaPipelinePilotprotocolcheckboxesandvariables.

Toensureasmuchaspossiblethattargetsareassignedtocorrectassays,onlyrecordswithChEMBLconfidencescorehigherthan7 were selected. The confidence score is assigned during themanual curation process by the data extractors and reflectsassay-targetrelationships.Itrangesfrom0to9,where0meansuncurateddataand9equalstohighdegreeofconfidence.

Theapplicationofabove-mentionedfiltersreducedtheamountof ChEMBL entries from 12.3 to 3.8 million, which represents764,419unique registeredmolecules. The compounds and theinformationabout targetsandassaysweresaved intoseparatefiles.Thus,alladditional informationwas joinedtocompoundsafter the similarity search.We apply a predefined hierarchicalfilestructureforthepurposesofdocumentationandtofacilitatefurtherre-analysesand follow-upstudies.Finally, theextractedChEMBL data were converted into the appropriate structureformatsrequiredforthechosensimilaritymethodsasdescribedinthenextstep.

Similarity searchForeachcompoundofacombinatorial libraryligand-basedvirtualscreens against database compounds are performed. Multiplemethodologies are applied to make maximum use of differentsimilaritymeasures.FinalhitlistsarecombinedbyMAX-rankvotingasdescribedbyBaberetal.[26]andWhittleetal.[27].

Inthisworkweimplementedthreeapproaches,namely(i)atom-

Schematic visualization of the five workflow stepsstarting from database preparation and ending withdocking results and visual interpretation. Here PP isPipelinePilotandSQL-StructuredQueryLanguage.

Figure 1

Page 4: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

4 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

based circular fingerprints ECFP4, (ii) non-linear Feature Treedescriptor FTrees and (iii) DBTOP topomer search similarities.Eachofthemrepresentsstructuralandpharmacophoricfeaturesinadifferentandcomplementaryway.

The extended connectivity fingerprints ECFP4 describe thepresenceorabsenceofoverlappingparticularsubstructures[28].Thenumber4inthenamecorrespondstotheeffectivediameterof the largest feature, thus the largestpossible fragmenthasawidthof4bonds. TheTanimotocoefficient isusedasdistancemetricforscoring.

DBTOP fromCertara is a 3D similarity searchwheremolecularstructures are compared as sets of fragments (so-calledtopomers),whichare characterizedbyCoMFA-like steric shapeand pharmacophoric features [29]. One single rule-basedconformation is generated for each fragment and oriented byopenvalencebond,whiletherestisorientedagainusingarule-based scheme. Aligned fragments are then compared by theirfields until the minimum topomeric difference between twomoleculesisidentified.

The BioSolveIT FTrees method calculates the feature treedescriptor, which represents hydrophobic fragments andfunctionalgroupsofthemoleculeandthewaythesegroupsarelinkedtogether[30].Thedescriptorsoftwomoleculesarethencomparedtoeachother.

ECFP4 and FTrees are available as Pipeline Pilot components,whereasDBTOPwasrunfromthecommandlineusingPipelinePilot“RunonServer”component.Sinceweaimfortargetfishingandideageneration,weacceptlowoverallligandsimilaritiesandthereforelimithitlistsoftheindividualsearchesbythemaximumnumbersofhitsandnotbysimilaritythresholds.

TheimplementedPipelinePilotprotocolallowsausertoselectsimilarity searchmethods via checkboxes and to set individualparameters for similarity threshold or number of top-hits tosave.Itautomaticallycombinesresultsofsimilaritysearchesandreportshits,theirsimilarityscoresaswellastargets,activityandassaydata.

Analysis and selectionHitsaregroupedbasedonthetargetsagainstwhichtheyshowactivity.TheresultsarepresentedasPipelinePilotHTMLreportcomprisedof an interactivebar chart, representing top targetsandnumbersofhitspertarget(Figure 2a).

Therankingorderimplementedhereisdisputable,sincecurrentlytargets are sorted by number of hits identified,which yields acertainbiastowardstargetswithhighernumbersofcongenericcompounds reported. Since the rank score bears a certain riskofmissinginterestingtargetswithsmallhitclusters,theuser isabletosetathreshold forthenumberof targetsretrieved.Uptonow,foreach input librarywewereableto identifyasetofinteresting targets. Nevertheless, alternate scoring schemestakingintoaccount,forinstance,overallnumbersofcompoundstested,activityranges,andnumbersofcongenericserieswillbeevaluated.

Forconvenientoverviewthebarchartisequippedwithtooltips

andhyperlinks,showingthefulltargetnameandhitcountsforthedifferentsimilaritymeasures.Sincewearesolelyinterestedin targets with crystal structures, information about proteinstructure availability is also retrieved from RCSB Protein DataBank[22]andsummarizedinthetablenexttothebarchart,seeFigure 2a.

Furthermore,aclickonanybarofthechartexecutesaPipelinePilotsub-protocol,whichprovidesasecondHTMLreport(Figure 2b) containing table and attached structure grid view withdetailed information about the hits, e.g., chemical structure,activitydata,assayresultsorspeciesonwhichtheyweretested.Moreover,thetableareaandthegridviewarecross-linkedandpossesstooltipscontainingchemicalstructureanddetailedassayinformation. This gives the user a quick overview of a certaintargetand its compoundsaswell asassistswith further targetselection.Thedesiredtargetscanbepreselectedfordockinginthenextstepusingcheckboxes.

DockingAutomated docking of library compounds, ChEMBL hits andcocrystallized ligand into the selected targets is performed. AllavailablePDBstructuresforuser-selectedtargetsaredownloadedbytheworkflow,i.e.,oftenmultiplecrystalstructurespertarget.Forinstance,theamountofstructuresdepositedinPDBforcyclin-dependentkinase2ismorethan300.Thisposesaquestionhowtoprioritize the crystal structures fordocking inanautomatedway.Onequality criterion fora crystal structure,which canbeeasily accessed, is its resolution. On the other hand, dockingmaybestillnotsuccessful,whenitisdoneintoawrongproteinconformation. Since residues of apo-structure (without boundligand)mayoccupypartsof thebindingpocket,wedecided tolimitourdockingtoholo-structures(ligand-bound).Furthermore,the presence of a ligand simplifies automated grid generation.Thus,topNholo-structureswiththebestresolutionareselectedfor each target,whereN is a number specifiedby the user. Incaseofmultiplechains,alwayschainAissavedforeachstructurein order to simplify structural alignment. Alternative selectionschemes could include target selection by ligand similarity orpocketshapediversity.

Ligand preparation was done in two steps. First, protonationstatesatpH7.4forco-crystallizedligand,ChEMBLhitsandlibrarycompoundswerecalculatedusingthepKamoduleco-developedbyBayerandSimulationPlus [31]and implementedasPipelinePilotcomponent“ADMETpredictor”[32],whileringconformers,tautomersandstereoisomersweregeneratedusingSchrödingerLigPreputility,release9.8.

An automatic docking procedure was applied using theSchrödinger script XGlide.py (version 3.7; v45017). The scriptperforms automatic protein alignment and preparation, gridgeneration, re-docking of crystal structure ligands as well asdocking of other compounds (here, library compounds andChEMBL hits). For each selected target a separate directoryis created containing subdirectories for crystal structures,prepared ligands and docking results. The script is executedfromthecommandlineusingPipelinePilotcomponent“RunonServer”.Thefollowingdockingparameterswereapplied:protein

Page 5: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

5© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

alignment, preparation and grid generation were turned on;ligandpreparationwassettofalse,Glidestandardprecision(SP)wasselectedasthescoringfunction.Theresultsforeachtargetweresavedasposeviewerfiles,whichattheendarecopiedintoonefolderfortheanalysis.

InspectionThefinalstep intheprocess isby intentionnotautomatic,andprobablycanneverbe.Thecomputationalchemistloadsdockingposesfortargetsofinterestforvisualizationandanalysis.Inthefirststepheinspectsthequalityofre-dockingofco-crystallized

ligands and identifies commonalities and differences in thebindingmodestoindividualcrystalstructuresofeachtarget.Inthesecondstep,heinspectsdockingofChEMBLhitstoverifytheinteractionhotspots.Third,heanalyzesthe librarycompoundswith good and bad docking scores and judges the plausibilityof the bindingmodes obtained. Finally, hewill either considerbiological testing of library compounds on targets of interest,or modifying the library proposals in order to optimize theirinteractions to a certain target, or generation of a completelynewlibraryproposal.

b)

TheexampleofPipelinePilotprotocolresults:a)HTMLreportwithcross-linkedbarchartand20top-rankedtargetsderivedfromthecombinationofthreesimilaritysearchmethods(DBTOP,ECFP4andFTrees)usingthedesignedoxadiazoleslibraryasastartingpoint,seeResultsSectionformoredetails;b)exampleofanHTMLreportwithcross-linkedtableandgridviewofthehitsforonespecifictarget,hereforHTH-typetranscriptionalregulatorEthR.

Figure 2

Page 6: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

6 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

ResultsValidation of similarity search processOfthethreepurelyautomatedtechnicalsteps,namelydatabasepreparation, similarity search and docking including gridpreparation, themost critical one for the overall performanceis the identification of targets via the similarity searches. WethereforepreformedaretrospectivestudyinChEMBLtotestforthe performance of finding targets via searches with librariesknowntobeactiveonthosetargets.

For this, we extracted ChEMBL data for compounds tested onall species with reported IC50, EC50, Ki, Kd and activity units ofnM or µM. No activity threshold filter was set. The appliedfiltersreducedthenumberofChEMBLentriesto851,915whichconstitute327,520uniquemoleculesand17568differentDOC_IDs [Fussnote einfügen: DOC_ID, TARGET_ID,MOLECULE_ID allhavethesameidentifiernameCHEMBL_IDindifferenttablesoftheChEMLdatabase]. From those,633 setsbasedon identicalDOC_IDwerederivedcontainingbetween100and150moleculeseach,representingourchemicallibraries.Thisisjustifiedbythefactthatcompoundsfromonepublicationnormallymoreorlessrepresents a congeneric series. The 633 sets are connected to264 different TARGET_IDs.We finally selected 22 DOC_ID setswhichsharetheirTARGET IDwith5to7otherdocuments (thedistribution runs between 1 and 13 different documents perTARGET_ID).

Thissetupallowsustoperform-usingthecompoundsfromonedocument-a“library-based”similaritysearch.Bythosesimilaritysearchesweshouldthenbeabletore-findthetargetthesearchlibraryisknowntobeactiveon,onlybasedonsimilarityofthelibrarycompoundstothecompoundsinotherdocumentsonthetarget.

Detailed results are provided in Table S1 in ESI. The mediannumbersofdocumentsidentifiedare45forthecombinedsearchand10,43,and7forECFP-4,DBTOPandFtrees,respectively.1

We are thus always able to identify the targets of the librarycompoundseventhoughthemediansimilaritiestotheChEMBLcompoundsareasexpectedquitelowwith0.33forECFP-4,139forDBTOPand0.88 for Ftrees.With twoexceptionsall targetswere identifiedby all threemethods. Tyrosine-protein_kinase_SYK (CHEMBL2599) was not found by ECFP-4 and Ftrees andCytochrome_P450_2D6(CHEMBL289)byECFP-4.

Thus, we are consistently able to identify the target we werelookingfor,butnotalwaysatrank1.Nevertheless,meanranksofthetesttargetsare3.5forECFP-4,8.7forDBTOP,5.8forFtreesand1.4 for theconsensus rank,whichalways ranks the searchtargetrank1or2.

Thehit rate andespecially the rankingof the search targets isevenbetterthantheexpectedoutcome,i.e.,thatthesimilarity

1Duringthestep-wisepreparationofthelibrarysetsonlyrepresentativesubsets were kept via first occurance filters. This resulted in datareductionandthereforethefinalnumbersofDOC-IDspertargetwerealwayslowerthanthenumbersintheunfiltereddataset.Theseresultsinhighernumbersofdocumentsretrieved.

searches are performed to identify a shortlist of targets forselectionbytheexpert,nottoidentifytherank1targets.

Process application examplesTheprocessdescribedinMethodsandProcessDescriptionwasdeveloped to identify potential targets for existing chemistry-drivencombinatoriallibraryproposalsandtomodifytheproposalsinawaythattheycandirectlycontributetoearlyprojectsatBayerPharmaceuticalsGlobalDrugDiscovery.Theprocessisappliedtoin-house libraries that areproprietaryand cannotbedisclosedhere.Therefore,wehadtodesignaproofofconceptcasestudyforthispublication.Thedownsideofthisapproachisthatwearenotabletopresentexperimentaldataforourprospectivelibraryproposals(thestartinglibraryorthederivativesforthetargetswehit).AsastartingpointwechoseapublicationfromtheJournalof Medicinal Chemistry from 2012 which describes structure-baseddrugdesignforaseriesofpotent1,2,4-oxadiazoles,whichtargetM.tuberculosistranscriptionalrepressorEthR(seeFigure 3aforexamples)[33].Wedesignedacombinatoriallibrary,thatissimilarbutdistincttothepublishedcompoundsfromChEMBL,with theaim todemonstrate that thedevelopedmethodologyisable(i)torecoverthecompoundsfromthepublicationandtoshowthatEthRproteincanbeidentifiedamongthetoptargets,(ii)toidentifypotentialnewtargetsforourexamplelibrary,(iii)toprovideexamplesoftarget-fishing-basedlibrarymodificationsand(iv)toprovideexamplesoftheshort-comingsofsuchafullyautomatic approach and to highlight the importance of expertinteraction.

In particular, we introduced three changes to our library withrespecttothelibraryfromthepublication.First,wemodifiedthepiperidineringtoacyclo-hexyl,i.e.,shiftedthenitrogenbyoneposition.Second,wereplacedthealiphaticlipophilicsidechainbyvariousR2groupsofdifferentsize,polarityandchargestate,connected via nitrogen or amide bonds. Third, we introducedalternative lipophilic R1 groups at the only point of variationfromthepublishedlibrary.CoredefinitionsandexamplesforthepublicationandthelibrarycompoundsareshowninFigures 3a and 3b,respectively.

Exampleofdatabasepreparation: Step1of theworkflow is tosearchforsimilarcompoundsandtheirassociatedtargetsusingthedesignedlibraryasareference.Fornow,ChEMBLdatawereextractedforcompoundstestedonallspecieswithreportedIC50,EC50,Ki,KdandactivityunitsofnMorµM.Noactivitythresholdfilterwasset.TheappliedfiltersreducedthenumberofChEMBLentriesto851,915whichconstitute327,520uniquemolecules.

Example of similarity search: In step 2, similarity searches areperformed.Weusedall threecurrently implementedmethods,namelyDBTOP,ECFP4andFTreesasdescribedinMethods.400highestrankhitsweresavedforeachmetric,andadditionallyaconsensus rankwas calculated. Thediagram inFigure 2a givesthelistofthetop20targets,associatedwiththeresultsofthesimilaritysearches,asinteractivebarchart.Table S1ofSupportingInformation provides more detailed information about targetranks according to the three similarity search methods andconsensus rank; additionally, it lists the numbers of identifiedhitsandPDBstructuresforeachtarget.Thetargets inTable S1

Page 7: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

7© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

aresortedbydescendingnumberofligandsidentifiedbyECFP4similaritysearch.Asmentionedearlier,theimplementedrankingbynumberofhitspertargetmaybebiasedtowardsthetargetswithlargecongenericseries.

Exampleofanalysis:Step3isthefirstoftwoexpertinterventionsteps. Target selection could be done automatically based ontheir ranks, butmanual selectionwill allow to concentrate ontargetsrelevantinthecontextofacompany’sresearchportfolio.

The transcriptional repressor EthR was ranked number 7 bythe consensus score, which combines the results of the threesimilaritysearchmethods.ThescoringaccordingtoECFP4methodrankedEthRonpositionthree.ECFP4wasabletoidentifyall33

compounds fromthepublication,whereasFTrees foundonly2and DBTOP none, underlining the necessity to apply multipleligand-based search methods to obtain the complete picture.DBTOP is based on steric and pharmacophoric fields of thewholemoleculeandthereforeismoresusceptibletolargersizedifferencesbetweenqueryanddatabasemoleculesthanFTrees,which abstracts themolecular fragments into pharmacophoricrepresentations, or ECFP4 circular fingerprints, where the hitsaredominatedbyoccurrencesoffragmentfeatures.Dependingonlibrary,contributionsofdifferentmethodswilldiffer.Somein-house library screens, for instance,weredominatedbyDBTOPhits.Itisapriorinotobviouswhichsimilaritymetricwilldominateintheconsensushitlist.

Schematicrepresentationofcoresandexamplecompoundsfor:a)M.tuberculosistranscriptionalrepressorEthR inhibitorseriesfromthepublicationofFlipoetal.[28];b)combinatorialoxadiazolelibrary.

Figure 3

Page 8: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

8 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

Figure 4showstheexamplehitsidentifiedbydifferentsimilaritysearchmethodstounderlinethisassumption.Itisworthtonotethatsimilarityscores(see Table S2ofSupportingInformation)asexpectedarequitelow,pointingoutthatchemicalmodificationofthelibrarycompoundsguidedbythefinaldockingstepmightbeneeded.

Asexpected,thenumbersofhitsforthedifferentsearchmethodsdiffer.Butinaddition,alsothenumbersofPDBstructuresretrieveddiffer.Forinstance,33PDBstructuresof11-beta-hydroxysteroiddehydrogenase 1 are found using only ECFP4 (see Table S1),whereas thecombinationof threesimilaritymethodsretrieved38PDBentries.ThereasonforthisliesinthefactthatallECFP4hitsareannotatedwithUniProt[34] identifierP28845(human)whereas thecombinationofECFP4andFTrees resulted inhits,whichweretestedonhumanandmouse11-beta-hydroxysteroiddehydrogenase 1 (UniProt identifiers P28845 and P50172,respectively). While the human sequence shares 79% identitytothemouseorthologue,there ishigh levelofconservationofamino acids in the binding site. All ECFP4 hits share the sameoxadiazole motif while FTrees identified two additional motifs(Figure 5).Again,itisstronglyemphasizedthatitisadvantageoustoemploymultipleligandsimilaritymetrics.

Our proof-of-concept target EthR is rank seven by consensusscoreandrankthreebyECFP4similaritysearch.Inthefollowingwewill analyze the two top-ranked targets inmoredetail (seealsoTable S1),togetherwithourtargetofinterest,EthR(whichwould resemble the real-life situationwithsometargets in thelistnotbeingrelevantforthecurrentportfolio.

Example of docking: For step four we selected the twotop-ranked targets for docking, namely, top-ranked targetmetabotropic glutamate receptor 5, second-ranked receptorsmoothened homolog, and our proof-of-concept target HTH-type transcriptional regulator EthR which is ranked seventh.The docking of our library compounds, ChEMBL hits and co-crystallized ligands was performed using the fully automatedXGlide procedure as described in Methods. A maximum of 2crystal structures per target were retrieved automatically. WehadtoextendthesetbyonemorestructureinthecaseofEthR,asdescribedinthefollowing.

Ourdecisionobjectivefortargetfit iscorrectre-dockingoftheco-crystallizedligand,consistentdockingoftheChEMBLhitsand

finally consistentdockingof the librarycompoundsor similarlydecoratedsubsetsthereof.Weprovidedockingscoresasameansoffurtherconfirmationofconsistentplacement,butnotasafilterordesigncriterionperse.Highdockingscoresareastronghintfor important interactions to the targetmatched,whereas lowscoresarenotalwayscorrelatedtoweakbindinginteractions.

Helix-Turn-Helix-type (HTH-type) transcriptional regulator EthRCurrentlythereare23proteinstructureentriesinRSCBproteindata bank based on UniProt ID accession code P9WMC1(Mycobacterium tuberculosis).Sincethenumberofstructuresfordockingisactuallyacompromisebetweenexpectedinformationgain and effort, two structures for dockingwere automaticallyselectedfromthe17holo-structuresavailable,basedoncrystalstructureresolution.Bydefault,weprocesstwodifferentcrystalstructures sincemodelling experience tells that usingmultipletarget structures for rigid docking reduces the risk of missingimportant target information. We later added one additionalstructure, namely 3O8H, due to its different pocket shape andligand-bindingmode.

ThehitsfoundbyECFP4arebothagonistsandantagonistswithbestEC50of60nMand IC50of400nM,respectively, i.e.,highlyactivecompounds.

G1M:Anexamplewherelibraryfitswellintothetarget:Dockingof the library compounds into the first crystal structure 3G1Mwith a resolution of 1.7 Å yields in high docking scores andposescomparable to theco-crystallized ligand (IC50of500nM,retrieved from PDB Bind [35]). An additional hydrogen bondtoAsn176 canbeobservedbetweenEthR and someof librarycompounds containing tertiary amineor amide linker attachedto the oxadiazole-cyclohexane core (an example can be seenin Figure 6). In contrast, the co-crystallized ligand, which hasan oxadiazole-piperidine scaffold, is missing a hydrogen bonddonor at this position. Moreover, the analysis of the bindingpocket around the ligand can provide further suggestions forcompoundmodifications,e.g.,forextendedinteractionsintothehydrophobicpocketformedbyMet102,Val152,Leu90.

3Q0W: differences in protein conformation and incompletebinding site setup: In contrast, docking into the second EthRstructure (co-crystallized ligand has Ki of 400nM [35]) led to

AnexampleoftwoChEMBLhitsobtainedbysimilaritysearchforthedesignedoxadiazolelibraryusingdifferentsimilaritymethods-compound7(inhibitorofanandamideaminohydrolase)wasidentifiedbyDBTOPsimilarityandcompound8(inhibitorofcytochromeP450)byFTreesandECFP4.

Figure 4

Page 9: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

9© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

low-scoredposesforour librarycompounds. It turnedoutthata cocrystallized glycerolmolecule, that had not been removedbytheautomatedproteinpreparation,wassituateddeepinthebindingsite,establishinghydrogenbondtoAsn176andblockingligandentry.

After its removal, docking of all compounds was possible.Nevertheless, the poses are still quite inconsistent. The amidemoietyforabouthalfoftheposesislocateddeepinthepocketandmakeshydrogenbondstoAsn176andAsn179,analogouslytothe3G1MdockingsshowninFigure 6,andfortheotherhalfit points out of the pocket. Such differences can be explainedby conformational flexibility of theprotein,which canbe seen

in comparison of the two EthR crystal structures (PDB codes3G1Mand3Q0W,thesuperimpositionisshowninFigure S1,seeSupporting Information).Slightbutpronounceddifferencescanbeobservedat the loop region (residuesAsn93-Asp98),wheretheflipofPro94isaccompaniedbynarrowingtheentrychannel,which sterically hinders theplacementof substituents towardsthisloopin3G1M.

3O8H: Alternate binding mode: Our library was intentionallydesignedtobechemicallysimilartotheEthRinhibitorBDM41906[33](PDBID:3SFI).3G1Mand3SFIhavethesameoverallshape,the library compounds dock consistently into both pockets(resultsarenotshown).

ExemplaryECFP4hit9forhumantarget11-beta-hydroxysteroiddehydrogenase1andstructurallydifferentFTreeshits10and11formouseprotein.

Figure 5

HTH-typetranscriptionalregulatorEthR(3G1M,lightgreyrepresentation)withco-crystallizedligand(cyan)anddockingsolutionforonelibrarycompound(magenta,glideSPscore=-12.40).

Figure 6

Page 10: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

10 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

Nevertheless, closer inspection of EthR structures revealed asecondsetofcrystalstructureswithconsiderablylargerbindingpocket.SuchpocketenlargementismainlycausedbytheflipofsidechainsofThr121,Gln125,Trp138andPhe184(seeTable S3 forcomparisonofavailableEthRcrystalstructures).

Figure 7a shows thealignmentof 3G1Mand3O8Halongwithinteraction volumes generated by SiteMap [36]. As expected,cross-docking of the 3O8H ligand (IC50=580 nM) into the rigid3G1M receptor, without taking into account any induced fiteffect, yields a completely different and wrong bindingmode,where the aromatic sulfonamide is pointing out of the pocket(see Figure 7b).

About two thirds of the library members dock consistently toBDM41906. About one third, due to the pronounced pocketdifferences,dockinconsistently.Librarymembersfrombothsetsignoretheadditionalcavityavailablein3O8H.

Insummary,wewereinfactabletoidentifyEthRasapotentiallyinterestingtargetbasedonligandsimilarityanddockingresults

a)

ab)

a)Alignmentofcrystalstructures3G1M(greenresidues)and3O8H(cyanresidues),proteinribbonsaredepictedinlightgrey.Aminoacidswithdifferent side chainorientations responsible for change inpocket shapeare shownas sticks. SiteMap [31]generatedsurfacesareshowninbluemeshfor3G1Mandmagentameshfor3O8H;b)Overlayofthecrystallized3G1Mligand(green),thecrystallized_3O8Hligand(cyan)andthedockingposeofthe3O8Hligandin3G1M(magenta).

Figure 7

forourdesignedoxadiazolelibrary.Wehavealsodemonstrated,thatfurtheroptimizationstrategylargelydependsonthechoiceofEthRcrystalstructure,sincethepocketresiduesarethesubjectof conformational changes. Based on the docking results frombothpocketshapes,wegainedworthwhileadditionalinformationaboutflexibleandrigidsubpocketsandkeyinteractionfeatures.If it were for our library extension campaign, we would now,basedonthetargetinformation,slightlyoptimizethedecorationoftheinitiallibraryandadditionallydesignasecondlibrarythattargetsthedeepcavityavailablein3O8H.Wewouldcross-checkthedesignforIPspaceandifnecessaryiterativelyadjusttocreatenovelty.

Metabotropic glutamate receptor 5ThehighestrankedtargetaccordingtoECFP4,themetabotropicglutamate receptor 5, is a class C G-protein-coupled receptorresponding to the neurotransmitter glutamate. There is onlyone holo structure (PDB ID 4OO9) identified in PDB for thetransmembrane ligand-binding domain, since earlier structural

Page 11: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

11© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

studieshadbeenrestricted to theamino-terminalextracellulardomain, providing little understanding of the membrane-spanningsignal transductiondomain.4OO9 isco-crystallized incomplexwiththenegativeallostericmodulator,mavoglurant.

The similarity searches for the library compounds identifiedin total 160 agonists and antagonists of the metabotropicglutamatereceptor5usingconsensusscoring,withbestaffinityvalues of EC50=5 nM, IC50=130 nM, and Ki=150 nM. The ECFP4method ranked this target at the top position, while FTreesrankeditatthepositionthreewith149and25inhibitorsbeingidentified, respectively.Therewerenometabotropicglutamatereceptor5inhibitorsamongtop25targetsidentifiedbyDBTOPmethod.Thehits representdifferentstructuralclusterssuchas

Representativehitsformetabotropicglutamatereceptor5.Figure 8

Example docking solution of library compound (magenta ligand) usingmanual grid set up (glideSP=-10.33) intometabotropicglutamatereceptor5structure4OO9(proteinisshownaslightgreycartoon).Thecrystalstructureligandisshownascyansticks.

Figure 9

piperidine-amides,piperidine-sulfonamidesandspiro-hexyl-4,5-dihydrooxazoles(seeFigure 8forexamples).

4OO9: Failure of the automatic procedure: All steps of theautomaticworkflowtechnicallyproceededwellandcompoundsweresuccessfullydocked.However,a closer lookat thecrystalstructure 4OO9 revealed that during automated proteinpreparation and docking, the docking grid was positionedaroundaco-crystallizedsmallorganicmoleculecomingfromtheexperimentalconditions,namelyoleicacid,andnotaroundtheallosteric modulator mavoglurant [37]. Thus, the docking wasperformed into thewrongpocket (seeFigure S2 of SupportingInformation).

Page 12: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

12 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

Theexampleof 4OO9 shows that the fully automateddockingprocedure has its drawbacks. Protein preparation and dockingsetup require user inspection and in certain cases manualcorrection.Theeffortneverthelessisacceptable,sincetheexpertshould be knowledgeable of the target in order to understandandtojudgetheobservedligandinteractions.Ontheotherhand,onecouldimplementamechanismtoretrieveinformationaboutthe actual ligand and its binding mode, and use it during theproteinpreparationstep.

Dockingintothemanuallypreparedbindingsiterevealsthatourlibrarycompoundsexhibitnumerousinteractionstothereceptorsimilartothecrystalstructureligand,e.g.,hydrogenbondstoAsn-747andSer-809,andextendtheir interactionsdeeper intothepocketlinedoutbyArg-648andVal-740(seeFigure 9),whichcanbefurtheranalyzedtoguidepossiblecompoundmodifications.

Smoothened homologTheSmoothened(SMO)receptorisakeysignaltransducerintheHedgehog (Hh) signalling pathway. SMO is classified as a classF (frizzled) G-protein-coupled receptor (GPCR). It contains theconserved seven-transmembrane helical fold common to theclass A GPCRs and an unusually complex arrangement of longextracellularloopsstabilizedbyfourdisulphidebonds.

Thesimilaritysearchforthelibrarycompoundsidentifiedoverall111SMO inhibitorswithabest IC50of16nM,usingconsensusscoring. 103 hits arose from ECFP4 search and 20 hits fromFTrees. All hits are piperidine-amides or piperidine-ureas, butagainFTreeswasabletodetectmorediversecompounds.

4JKV:Optimizingthelibraryforthetarget:The2.5Åresolutioncrystal structure of the human SMO receptor contains the

transmembranedomaintogetherwithantagonistLY294068015(seeFigures 10 and 11),whichbindstheextracellularendoftheseven-transmembrane-helixbundleviaextensivecontactstotheloops.RedockingreproducedthebindingmodewithanRMSDof0.52Å.

DockingintoSmoothenedhomologresultedinmanywell-scoredsolutions for the ChEMBL hits and library compounds withconsistentdockingposes(seeFigure 11 forexamples).

Insteadoftheconservedhydrogenbondbetweenthecarbonylgroupof ligandandAsn219,which isobservedformostof theChEMBLinhibitorsandLY2940680,the librarycompoundsformoneortwo(e.g.,ligandswithpositivelychargedaliphaticringlikecompound16,seeFigures 10 and 11)additionalhydrogenbondswiththebackbonecarbonylofTyr394.

Onthedownside,mostofthelibrarycompoundsdonotpi-stacktoPhe484.Exceptionsare,forinstance,sulfonamideslike17(seeFigure 11). One of oxadiazole nitrogens of library compoundsusuallyparticipatesinhydrogenbondingtoArg400analogoustothephtalazinenitrogens in thecrystal structure [38],butnoneof the librarycompounds isable tofill thehydrophobicpocketoccupiedbythephtalazinecore.

The observations about key interactions of LY2940680 15 andthe ChEMBL compounds provide us ideas for possible librarymodificationsinordertotargetSMObindingpocketoptimally.

Figure 11bshowsanexampleofhybridcompound18whichhasthe oxadiazole replaced by phtalazine core while keeping thelarger p-cyanophenyl and the positively charged pyrrolidino-amide. The Glide SP docking score for 18 (-12.39) is virtuallyidentical to 15 (-12.66). Alternate hetero-bicycle replacements

SMOinhibitorcrystalstructure ligandLY294068015(PDBcode:4JKV), tworepresentative librarycompounds16and17andthemodifiedhybridcompound18.

Figure 10

Page 13: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

13© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

a)

a)

Dockingintothesmoothened(SMO)receptor(4JKV:proteinisshownaslightgreycartoon).a)redockedLY294068015(cyansticks)anddockingposesoftwolibrarycompounds16and17(greenandmagenta);b)LY294068015(cyan)andthedockingposeofmodifiedhybrid18(greensticks),thephtalazinecoresofbothcompoundsarewell-aligned.

Figure 11

Page 14: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

14 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

alsofitwell.Aromaticheadgroupslinkedviasulfonamideoramidelikeincompound17(glideSP=-11.47),ontheotherhand,areabletoparticipateinpi-stackingwithPhe484.Overall,fromthepointofviewoftarget interactions,there isabunchofoptions,withsimilarbutalsobetterLipinskipropertiesthanLY2940680havingamolecularweightof512DaandanAlogPof4.56.

ConclusionWhen planning for the extension of a compound library, oneis confronted with a universe of synthesis options. The onlylimitations, thus, are one’s own creativity, lab and budgetresourcesinordertotransformtheideasintochemicallibraries.Therefore, a rational concept to explore the options and pickthelibrarieswithacertainprobabilitytohitbiologicaltargetsisdesirable.Expertknowledgecanguidetheplanningprocedure.However, in such case the library design can be limited to theperson’s experience around the projects he has worked on.Metricslikeligandefficiencyallowtostayinanattractivepropertyprofile range but do not assist the selection of compoundsamenabletotargetfamiliesofinterest.Althoughdrug-likenessortargetclass-likenessscorestakeintoaccountoverallsimilaritytoknowndrugsoractives, substructureorglobalpharmacophorefeaturesarequiteroughestimatesoftargetfamilyfit.

Inthispaperwethereforedescribetheproductiveimplementationofaconceptaiming,ononehand,toidentifyputativetargetsfora chemistry-driven libraryproposal and, on theother hand, toidentifyoptions forcompoundmodifications inorder tocreatenewlibrariesbetterfittingtocertaintargets.Thisarticlereportsonadesignedvirtualcombinatoriallibraryandhitsidentifiedbytheworkflow,aswellasonlibrarymodificationideaswithoutthedesirableproofofsynthesisandexperimentaltesting.

The real in-house examples cannot be disclosed here, andthe examples shownwill not trigger any synthesis and testingat Bayer. Currently we are still not able to provide significantstatisticsaboutsuccessratesofthedescribedworkflowduetoitsnoveltyandthelongturn-aroundtimesfortheprocessoflibrarydesign,out-sourcedchemicalrealization,registrationandtesting.

Our workflow aims to automate all tedious time-consumingtechnicalstepsandallowtoconcentrateonrationaldesign.Wealways start with a chemistry-driven library carefully checkedfornoveltyandendupwithaproposalthatagainischeckedfornoveltyasapartofthedesignprocedure.

Theworkflowisdivided intoasetofprotocols implemented inPipeline Pilot that control the crucial steps, run automaticallyand require minimum user interaction which is productivelyused at Bayer Drug Discovery. The implementation describedin this paper compares a virtual compound library to thechemicalspacerepresentedinChEMBLbymultipleligand-basedsimilarity metrics, retrieves ligand and target information andpresentstheresults inan intuitiverepresentationtoanexpert,who then decides whether to proceed with ligand design fortargets of interest. The protocol automatically retrieves PDBstructuresandsetsupdockingrunsforthecocrystallizedligand,theChEMBLcompoundsand librarystructures.Themosttime-consuming step is, by design, the final one, i.e., the visual

inspectionbyacomputationalchemist,whocanfurthertriggerlibrarymodificationsorre-design.Thesystemiseasytouseandit ishighlyproductive.Theworkflow ismodularandcaneasilybe extended to alternate database sources, similarity metrics,hit prioritization algorithms, or docking protocols. Due to itsaccessibility, the ChEMBL databasewas chosen as a source ofbiological andchemical information.However, incorporationofalternatedatasourcesisobvious.

Asexpected,thefirstpartoftheprocess,whichisstrictlyligand-based, ishighly reliableand fast.Theuseofmultiple similaritymetrics is advantageous since various approaches representchemical similarity differently, and the consensus-basedassessmentservesasagoodbasisformoredetailedanalysisofhitsand their corresponding targetsand for inspiringcreativityin library design. The platform is open for the incorporationof alternate methods, e.g., shape-based screening orpharmacophorefingerprints. The current target ranking,whichisbasedonthenumberofhitsidentified,isnotoptimal,sinceitfavorslargecongenericseries.Thus,furthermodificationstotheprotocolsareongoingwork.

It stands to reasonthatcommonchallengesofstructure-baseddrugdesignareespeciallyrelevantforanautomatedprocedure,and special attention should be given to it. One of issuesconcerns the assignment of ligand protonation states, whichneverthelesscanbereliablyestimatedbymodernpKapredictorsoftware. Correct stereochemistry is more problematic. TherearecaseswherestereochemistryofaPDBligandisambiguous;stereochemistryofChEMBL compounds isnot alwaysexplicitlydefined,andlibrarycompoundsmayexistasracematesorwithunknown configuration depending on a synthesis route. Thebest compromise here is to enumerate relevant stereoisomersandtoletthebindingpocketdecide.Finally,wearefacedwithincorrect bond orders in a PDB ligand and unknown tautomerformsofChEMBLorlibrarycompounds.Tautomerismisanissuestilllackingasoundsolution.Itisdealtbyrule-basedgenerationanddockingofsetsoftautomers.

Another issue concerns the protein preparation step. As wehaveshowninthispaper,automaticpreparationwillsometimesdetectawrongbindingsiteornotremovesmallco-crystallizedsubstances. In the current XGlide implementation, all crystalwatersareremoved.Therefore,afractionofautomaticallycreatedresultshas tobediscardedandmanually re-processed. Even ifthere isstill roomfor improvement,weconsider thatcurrentlyXGlide is one of the best solutions for automatic preparation,proteinalignmentanddocking.

Thethirdstepofdockingandscoringalsohasitslimitationswhicharewelldescribedintheliterature.Aconsistentbindingmodeoflibrarycompoundsisanecessarybutnotsufficientconditionforacompoundbindingtoatarget,especiallysincedockingscoresare oftenmisleading. These issues togetherwith limitations ofprevioussteps,e.g.,completeremovalofwatersthatsometimesprovideimportantcontactstoatarget,arethemainpitfallsoftheautomaticprocedure.Thus,closevisualinspectionbyanexpert,whoisknowledgeableofatarget,willfinallyallowtojudgetherelevanceofresults.

Page 15: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

15© Under License of Creative Commons Attribution 3.0 License

Chemical informaticsISSN 2470-6973

Onecouldarguewhetheritisworthtouseanautomaticprocesswithitsnumerouslimitations.Inouropinion,theadvantagesbyfaroutweightherisks.Theprocessallowsanexperttosetupaqueryveryeasilyandtoputhistimeonanalysisandre-designof the library. Walking through a full process takes about tento thirtyminutes for a set-up, about 4-8 h for data extractionandsimilaritysearch,andabout2to3hfordockingpercrystal

structureifparallelized(dependingontheamountofcompoundstobedocked).

A final word to the expected outputWith all known algorithmic and data quality limitations a finallibrary and its assignment to a targetwill always be “only” aneducatedguessforalibrarywithsignificantlyenhancedchancetohitatarget.Nevertheless,wefeelthatitisworththeeffort.

References1 Schamberger J, GrimmM, Steinmeyer A, Hillisch A (2011) Bigger

Data,CollaborativeToolsandtheFutureofPredictiveDrugDiscovery.DrugDiscovToday16:636-641.

2 LipinskiCA,LombardoF,DominyBW,FeeneyPJ(2001)Experimentaland computational approaches to estimate solubility andpermeabilityindrugdiscoveryanddevelopmentsettings.AdvDrugDelivRev46:3-26.

3 HannMM(2011)Molecularobesity,potencyandotheraddictionsindrugdiscovery.MedChemComm2:349-355.

4 LobellM,HendrixM,HinzenB,KeldenichJ,MeierH,etal.(2006) InsilicoADMETtrafficlightsasatoolfortheprioritizationofHTShits.ChemMedChem1:1229-1236.

5 BohacekRS,McMartinC,GuidaWC(1996)Theartandpracticeofstructure-based drug design: A molecular modeling perspective.MedResRev16:3-50.

6 Lovering F, Bikker J, Humblet C (2009) Escape from Flatland:IncreasingSaturationasanApproachtoImprovingClinicalSuccess.JMedChem52:6752-6756.

7 WelschME,SnyderSA,StockwellBR(2010)Privilegedscaffoldsforlibrarydesignanddrugdiscovery.CurrOpinChemBiol14:347-361.

8 Khanna K (2012) Drug discovery in pharmaceutical industry:productivity challenges and trends. Drug Discov Today 17: 1088-1102.

9 AzzaouiK,HamonJ,FallerB,WhitebreadS,JacobyE,etal. (2007)Modeling promiscuity based on in vitro safety pharmacologyprofilingdata.ChemMedChem2:874-880.

10 Huggins DJ, Sherman W, Tidor B (2012) Rational Approaches toImprovingSelectivityinDrugDesign.JMedChem55:1424-1444.

11 Ashburn TT, Thor KB (2004) Drug repositioning: identifying anddevelopingnewusesforexistingdrugs.NatRevDrugDiscov3:673-683.

12 Liu Z, Fang H, Reagan K, Xu X, Mendrick DL (2013) In silico drugrepositioning:whatweneedtoknow.DrugDiscovToday18:110-115.

13 Ekins S,Williams AJ, KrasowskiMD, Freundlich JS (2011) In silicorepositioning of approved drugs for rare and neglected diseases.DrugDiscovToday16:298-310.

14 RothBL,ShefflerDJ,KroezeWK(2004)Magicshotgunsversusmagicbullets: selectively non-selective drugs for mood disorders andschizophrenia.NatRevDrugDiscov3:353-359.

15 Medina-Franco JL, Giulianotti MA, Welmaker GS, Houghten RA(2013)Shiftingfromthesingletothemultitargetparadigmindrugdiscovery.DrugDiscovToday18:495-501.

16 Bottegoni G, Favia AD, RecanatiniM, Cavalli A (2012) The role offragment-basedandcomputationalmethods inpolypharmacology.DrugDiscovToday17:23-34.

17 Jenwitheesuk E, Horst JA, Rivas KL, Van Voorhis WC, SamudralaR (2008) Novel paradigms for drug discovery: computationalmultitargetscreening.TrendsPharmacolSci29:62-71.

18 SchomburgKT,BietzS,BriemH,HenzlerAM,UrbaczekS,etal.(2014)Facingthechallengesofstructure-basedtargetpredictionbyinversevirtualscreening.JChemInfModel54:1676-1686.

19 Petrone PM, Simms B, Nigsch F, Lounkine E, Kutchukian P, et al.(2012) Rethinking molecular similarity: comparing compounds onthebasisofbiologicalactivity.ACSChemBiol7:1399-1409.

20 RinikerS,WangY,JenkinsJL,LandrumGA(2014) Usinginformationfromhistoricalhigh-throughputscreenstopredictactivecompounds.JChemInfModel54:1880-1891.

21 ChEMBL (2016) Available from: https://www.ebi.ac.uk/chembl/(Accessedon:January05,2018).

22 Protein Data Bank (2016) A Structural View of Biology. Availablefrom: http://www.rcsb.org/pdb/home/home.do (Accessed on:January05,2018).

23 Johnson M, Maggiora GM (1990) Concepts and Applications ofMolecularSimilarity.JohnWileyandSons,NewYork,USA.

24 PipelinePilot,AccelrysSoftwareInc.(2013) BIOVIAPipelinePilot.

25 GaultonG,BellisLJ,BentoAP,Chambers J,DaviesM,etal. (2012)ChEMBL: a large-scale bioactivity database for drug discovery.NucleicAcidsRes40:D1100-D1107.

26 BaberJC,ShirleyWA,GaoY,FeherM(2006)TheUseofConsensusScoring inLigand-BasedVirtualScreening.JChemInfModel46:277-288.

27 WhittleM,GilletVJ,WillettP,LoeselJ(2006)AnalysisofDataFusionMethodsinVirtualScreening: TheoreticalModel.JChemInfModel46:2193-2205.

28 RogersHahnM (2010) Extended-connectivity fingerprints. J ChemInfModel50:742-754.

29 CramerRD,JilekRJ,AndrewsKM(2002) Topomersimilaritysearchingofconventionalstructuredatabases.JMolGraphModel20:447-462.

30 RareyM, Dixon JS (1998) Feature trees: A newmolecular similaritymeasurebasedontreematching.JComputAidedMolDes12:471-490.

31 Fraczkiewicz R, LobellM,Göller AH, KrenzU, Schoenneis R, et al.(2015)Bestofbothworlds:combiningpharmadataandstateoftheartmodelingtechnologytoimproveinSilicopKaprediction.JChemInfModel55:389-397.

32 ADMETPredictor(2014)SimulationsPlus,Lancaster,CA,93534,USA.

33 FlipoM,DesrosesM,Lecat-GuilletN,VillemagneB,BlondiauxN,etal.(2012)Ethionamideboosters.2.Combiningbioisostericreplacementandstructure-baseddrugdesigntosolvepharmacokineticissuesinaseriesofpotent1,2,4-oxadiazoleEthRinhibitors.JMedChem55:68-83.

Page 16: Every Jack has His Jill: Finding a Target for Your ... · experimental target validation using chemical probes. The automated protocol constructed and executed using the workflowsoftware

2018Vol. 4 No. 1:1

16 This article is available in: http://cheminformatics.imedpub.com/

Chemical informaticsISSN 2470-6973

34 UniProt (2016) Available from: http://www.uniprot.org (Accessedon:January05,2018).

35 PDB Bind (2014) Available from: http://www.pdbbind-cn.org(Accessedon:January05,2018).

36 Halgren TA (2009) Identifying and characterizing binding sites andassessingdruggability.JChemInfModel49:377-389.

37 DoréS,OkrasaK,PatelJC,Serrano-VegaM,BennettK,etal.(2014)Structure of class C GPCR metabotropic glutamate receptor 5transmembranedomain.Nature511:557-562.

38 WangWH,KatritchV,HanGW,HuangXP,LiuW(2013)Structureofthe human smoothened receptor bound to an antitumour agent.Nature497:338-343.