Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF:...

80
i Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF Virtual Screening and Drug Design Diploma Thesis Medicinal and Pharmaceutical Chemistry – Computational Drug Design Institute of Pharmacy Naturwissenschaftliche Fakultät I Martin-Luther-Universität Halle-Wittenberg Pharmacist Suhaib Shekfeh From Homs/Hama - Syria Referees: 1. Prof. Dr. Wolfgang Sippl (MLU Halle-Wittenberg) 2. Prof. Dr. Manfred Jung (Albert-Ludwigs Universität Freiburg)

description

The present work focused on one epigenetic target, namely the histone acetyltransferase PCAF. Just a few inhibitors and modulators are known for HAT generally and PCAF specifically. All of them are either complex molecules, which try to mimic the substrate-cosubstrate complex, or natural products with unknown binding mode, or synthetic compounds developed by modifications on the natural inhibitors. Moreover these proteins have been only co-crystallized with large flexible compounds like the cofactor Ac-CoA. Two large binding pockets exist in these proteins to bind the co-factor and the substrate.We have used the knowledge of recently identified inhibitors of the related serotonin acetyltransferase AANAT to find the optimal setting for docking and virtual screening on PCAF. The docking study on AANAT was successful by explaining the binding mode of this kind of inhibitors. The knowledge on this enyzme was subsequently used to virtually screen commercial compound libraries for PCAF inhibitors. We could identify two small-molecule reversible inhibitors of PCAF active in the micromolar range.We investigated also the impact of physics-based rigorous scoring methods to enrich the results of virtual screening. “PBSA-score after AMBER refinement” was found to be a reliable method to get better enrichment of active inhibitors among decoys.In addition, we continued a previous work to discover irreversible inhibitors of HAT PCAF using the covalent docking method by GOLD. At last we tried to address the usability and performance of fragment-based methods for discovering de novo ligands. Docking fragments from the ZINC library into PCAF and searching for commercially available compounds containing fragments from the top-ranking solutions resulted in several interesting hits. These hits represent novel candidates for future biological testings.

Transcript of Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF:...

  • i

    Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF

    Virtual Screening and Drug Design

    Diploma Thesis

    Medicinal and Pharmaceutical Chemistry Computational Drug Design

    Institute of Pharmacy

    Naturwissenschaftliche Fakultt I

    Martin-Luther-Universitt Halle-Wittenberg

    Pharmacist Suhaib Shekfeh

    From Homs/Hama - Syria

    Referees:

    1. Prof. Dr. Wolfgang Sippl (MLU Halle-Wittenberg)

    2. Prof. Dr. Manfred Jung (Albert-Ludwigs Universitt Freiburg)

  • ii

    List of Abbreviations .....................................................................................................................................v 1 Introduction: Epigenetics and HAT ........................................................................................................1

    1.1 Histone Acetyltransferases: Role in Epigenetics, Structure, and Classification .................................1 1.2 Histone Lysine Acetyltransferase - Catalytic Mechanism ..................................................................6 1.3 HAT Modulators: Chemical Regulation of Acetyltransferases ...........................................................9

    1.3.1 Non-peptidic Natural Product HAT Inhibitors ..........................................................................10 1.3.2 Irreversible HAT Inhibitors (Aryl and alkyl N-substituted Isothiazolones) .............................. 11 1.3.3 Other Synthetic HAT Inhibitors .................................................................................................12

    1.4 Structural Overview of Serotonin Acetyltransferases AANAT.........................................................14 1.5 Structural Overview of PCAF HAT ..................................................................................................20

    2 Aim of the Work ......................................................................................................................................24 3 Computational Methods - Docking and Virtual Screening.................................................................26

    3.1 The Search Problem ..........................................................................................................................26 3.2 The Scoring Problem.........................................................................................................................27 3.3 Solvation Effects ...............................................................................................................................29 3.4 Solvation Effects and Scoring Functions ..........................................................................................31 3.5 Effects of Rescoring Docking Hits using MM-GBSA or MM-PBSA Methods ...............................34 3.6 Docking Programs and Rescoring Methods......................................................................................36

    3.6.1 PBSA Scoring using ZAP Library and AMBER-score ..............................................................37 3.6.2 Cscore ........................................................................................................................................38

    3.7 Similarity Search...............................................................................................................................39 3.8 ZINC Compound Library..................................................................................................................40 3.9 Fragment-based Drug Design ...........................................................................................................41

    4 Implementation .......................................................................................................................................42 4.1 Molecular Modeling..........................................................................................................................42 4.2 Dataset (Test Set) for Docking and Enrichment Studies...................................................................42 4.3 Docking Optimization.......................................................................................................................43 4.4 Isothiazolones - Covalent Docking ...................................................................................................44 4.5 Fragment Docking (Fragments Derived from the ZINC Database)..................................................44 4.6 PCAF in vitro Assay..........................................................................................................................45

    5 Results and Discussion............................................................................................................................46 5.1 Optimization of the GOLD Docking Procedure for AANAT ...........................................................46

    5.1.1 Reproducing the Binding Mode of AANAT Ligands ..................................................................47 5.1.2 Scoring AANAT Inhibitors .........................................................................................................48 5.1.3 Evaluation of Further Scoring Methods....................................................................................55

    5.2 Virtual Screening and Experimental Validation of Selected PCAF Hits...........................................58 5.3 Covalent Docking of Isothiazolones .................................................................................................60 5.4 Fragment-based Drug Design ...........................................................................................................64

    6 Conclusions..............................................................................................................................................68 7 References ................................................................................................................................................69

  • iii

    Declaration of Authorship I hereby confirm that I have authored this thesis independendtly and without use of others than the indicated resources. All passages,which are literally or in general manner taken out of publications or other sources, are marked as such. Suhaib Shekfeh Halle (Saale) , Germany 15 May 2009

  • iv

    Acknowledgement I would like to thank

    - My advisor and the first Refree of this work: Prof. Dr. habil Wolfgang Sippl for providing all the academic advices, and giving me the opportunity to work in such interesting project.

    - My second advisor Prof. Dr. Manfred Jung (Freiburg University) for reading and correcting this work , and for performing the in vitro assay of PCAF inhibition in his laboratory.

    - My Family especially my Mother and My Grandmother in Syria for all kinds of Support that they gave and always give to me.

    - All the Colleagues in Prof. Sippl AG (Rene, Urszula, Ralf, German, Mark, Martin, Kanin) for all the help they gave and for providing the friendly environment.

    - For all the Friends in Germany and especially in Halle (Salle).

  • v

    List of Abbreviations Ada = Adaptorprotein

    Ac-CoA = Acetyl Cofactor-A

    AANAT = Aryl Alkyl N-AcetylTransferase = Serotonin AcetylTransferase.

    ASP = Astex Statistical Potential

    BAX = Bcl2 associated X protein

    BEAR = Binding Estimation After Refinement

    bp= base pairs (of nucleotides)

    CBP = CREB-binding protein

    CDK = Cyclin-Dependent Kinase

    CREB = cAMP Response Element Binding protein

    Evdw, Eelec, Esolv = van der Waals Energy, Electrostatic Energy, Solvation Energy

    GA = Genetic Algorithm

    GOLD = Genetic Optimizaion of Ligand Docking

    GNAT = GCN5-related N-AcetylTransferase, tGCN5 = Tetrahymna GCN5

    GB = Generalized Born

    HAT = Histone Acetyltransferase

    HDAC = Histone deacetylase

    HTD = High Throughput Docking

    MD = Molecular Dynamics

    MM-PBSA = Molecular Mechanics-Poison Boltzmann/ Solvent-accessible Surface Area

    MM = Molecular Mechanics

    MC = Monte Carlo Simulation

    MM-GBSA = Molecular Mechanics-Generalized Born/Solvent-accessible Surface Area

    MOE = Molecular Operating Enveroment

    MCSS = Multiple Copy Simultaneous Search

    MYST = MOZ,Ybf2/Sas3,Sas2 und Tip60

    NuA3 = Nucleosomal Acetyltransferase for H3

  • vi

    NuA4 = Nucleosomal Acetyltransferase for H4

    PDB = Protein Data Bank, previously called Brookhaven PDB

    PB = Poisson-Boltzmann

    PCAF = p300/CBP-associated factor, hPCAF = human PCAF

    PMF = Potential of Mean Force

    ROF = Lipinskis Rule Of Five

    ROC curve = Receiver Operating Characteristic Curve

    RMSD = Root Mean Square Deviation

    Rhodanine = 2-Thioxo-4-thiazolidinone or 2-Thioxo-1,3-thiazolidin-4-one

    SAR = Structure-Activity Relationship

    SAGA = Spt, Ada, GCN5 Acetyltransferase

    SANT = Swi3, Ada2, NcoR, TFIIIB

    SAP = Sin-associated Protein

    SAS = Solvent Accessible Surface

    Sas 2 = Something about Silencing 2

    SBVS = Structure-Based Virtual Screening

    Sin3 = Switch Independent 3

    SMRT = Silence-Mediator of Retinoic Acid and Thyroid Hormone Receptor SPC = Simple Point Charge

    Spt = Suppressor of Transcription

    SVL = Scientific Vector Language (Script Language of MOE)

    TIF= Transcriptional Intermediary Factor 2

    TrpNH2 = Tryptamine

    VS = Virtual Screening

    vdw = van der Waals potential/energy

  • Introduction 1

    1 Introduction: Epigenetics and HAT

    1.1 Histone Acetyltransferases: Role in Epigenetics, Structure, and Classification

    The genetic material present in the nucleus of eukaryotic cells in tightly packed form, which

    functions as a dynamic structure and basic contributor in the regulation of various nuclear

    processes, including transcription, DNA replication and repair, mitosis and apoptosis [1]. Core

    histones are small basic proteins which form a well defined structure, known as nucleosome. There

    are four types of core histones known, named H2A, H2B, H3, and H4. The nucleosome core

    consists of two copies of each histone type H2A, H2B, H3, and H4, forming an octamer. Around

    this octamer, 147 base pairs of DNA are wrapped in left-handed turn. The linker histone H1 binds

    the nucleosome and the entry and exit sites of the DNA, thus locking the DNA into place, and

    allowing the formation of a higher order structure [2]. An important post-translational modification

    of histones is the acetylation of -amino groups on conserved lysine residues. Acetylation

    neutralizes the positively charged lysines and therefore affects interactions of the histones with

    other proteins and/or with the DNA. Histone acetylation has long been associated with

    transcriptionally active chromatin and also implicated in histone deposition during DNA replication

    [3].

    Histone acetyltransferases (HATs) can be classified into several families based on their sequence

    conservation (Table 1) [5]. The human genome encodes up to 25 proteins that show lysine

    acetyltransferase activity. At the primary structure level there is little similarity between the

    different HATs, and even members of the same family usually display considerable sequence

    diversity. Furthermore, there is no single homolog domain that is conserved in all HATs, although

    many enzymes contain recognizable Acetyl-CoenzymeA (Ac-CoA) binding motifs and

    bromodomains [6]. More similarities are observed at the tertiary structure level (Figure 1 and 2).

    HATs display a conserved core domain which contains a L-shaped cleft, formed by the N- and C-

    terminal segments of the core domain. This cleft contains the catalytic site, where Ac-CoA binds in

    the short segment and the macromolecular substrate binds in the long segment. Beyond the core

    domain, there is little structural similarity between the different HATs. In vitro assays indicated that

    HATs have different substrate specificities, although the molecular mechanisms underlying the

    binding specificities, as well as the true physiological specificities of HATs, remain poorly

    understood [4].

  • Introduction 2

    Important and extensively investigated families of HATs are (see also Table 1):

    - GNAT family (GCN5-related N-acetyltransferase): includes GCN5, PCAF (p300/CBP-

    associated factor), other acetyltranferases like serotonin acetyltransferase (AANAT),

    aminoglycoside N-acetyltransferases (AAC-3, and AAC-6), spermidine/spermine N-

    acetyltransferase, the elongator subunit Elp3, and Hpa2. HAT1 could be classified to GNAT

    or as separate family.

    - MYST family (named after its founding members, which include MOZ, YBF2/SAS3, SAS2

    and TIP60) [5].

    - p300/CBP family [7-9].

    Figure 1. Comparison of the three-dimensional structures of GCN5-related N-acetyltransferases: GCN5, PCAF, and

    AANAT. (A) tGCN5: the ternary complex with CoA and an 11-residue peptide (in blue) is shown. The black line

    indicates CoA or Ac-CoA. (B) PCAf, complexed with CoA, (c) AANAT: the complex with the bisubstrate analog is

    shown (indole ring colored blue). The four conserved motifs of the GNAT superfamily C, D, A, and B are shown in

    purple, green, yellow, and red, respectively (adapted from [5(b)]).

    Over 40 transcription factors and 30 other nuclear, cytoplasmic, bacterial, and viral proteins have

    been shown to be acetylated in vivo by HATs [8, 10]. For example, p300/CBP proteins are involved

    in diverse physiological processes, such as proliferation, differentiation and apoptosis [11]. GCN5p

    is the catalytic subunit of the two multi-protein complexes, ADA and SAGA, involved in

    remodeling the chromatin structure and acetylation of histone tails at specific lysines. Table 2

    presents a list of all known families of acetyltransferases.

  • Introduction 3

    Figure 2. Superposition of the putative active-site region of GCN5 (in yellow) and HAT1 (in red) with bound Ac-CoA

    (shown in capped sticks, adapted from [15]).

    Table 1: Main families of HATs, their substrates and their involvement in cancer mechanisms

    [5 (a)].

    AcetylTransferases Family Substrate Involvement in Cancer

    GCN5 GNAT H2B,H4,cMyc Critical regulator of cell cycle and cMyc

    PCAF GNAT H3,H4,cMyc,p53,MyoD,E2F Critical regulator of cell cycle , p53,E2F,

    and cMyc

    CPB CPB/p300 H2A,H2B,H3,H4,pRb,

    E2F,p53,c-Myb,

    MyoD,AR,FoxO

    Translocation: MOZ-,MORF-,and MLL-

    p300/CPB fusions.

    Mutation : biallelic mutations,p300

    epithelial cancer.

    Inactivation: haemotological malignancy

    P300 CPB/p300 H2A,H2B,H3,H4,pRb,E2F,p53

    ,c-Myb,MyoD,AR,FoxO

    Translocation: MOZMORFMLL-

    p300/CBP fusions

    Mutation: biallelic mutations,p300

    epithelial cancer.

    Inactivation: haemotological malignancy

    TIP60 MYST H2A,H3,H4,cMyc,AR Association with androgen receptor in

    prostate cancer.

    MOZ MYST H3,H4 Fusion with p300/CPB and TIF2

    MORF MYST H3,H4 Fusion with p300/CPB

    ACTR SRC H3,H4 Upregulation in breast cancer

  • Introduction 4

    Table 2. Summary of acetyltransferases families, numbers in brackets are UniProt accession

    numbers [23].

    Gene Family

    Name Synonyms

    Gene product name and synonyms

    HAT1 HAT1

    (O14929)

    -- Histone acetyltransferase type B catalytic subunit

    (HAT1)

    HTATIP

    (Q92993)

    TIP60 60 kDa HIV-1 Tat-interacting protein, (Tip60)

    (NuA4/TRRAP complex component)

    MYST1

    (Q9H7Z6)

    MOF, hMOF Homolog of Drosophila males absent on the first

    (hMOF) Component of human male specific

    lethal complex (MSL)

    MYST2

    (O95251)

    HBO1, HBOa HAT binding to origin recognition complex

    (HBO1), Component of inhibitor of growth

    complexes (ING4, ING5).

    MYST3

    (Q92794)

    MOZ, RUNXBP2,

    ZNF220

    Monocytic leukaemia zinc finger protein, (MOZ)

    Runt-related transcription factor-binding protein

    (RunxBP2) , Zinc finger protein 220 kDa

    (ZNF220) (Component of ING5 complex)

    MYST

    MYST4

    (Q8WYB5)

    MORF, MOZ2 MOZ-related factor (MORF), MOZ2, Querkopf

    (Component of ING5 complex).

    GCN5L2

    (Q92830)

    GCN5, HGCN5 General control of nitrogen metabolism (GCN5)-

    like 2 Homolog of yeast GCN5, STAF97

    GNAT

    PCAF

    (Q92831)

    --- p300/CBP-associated factor (P/CAF)

    EP300

    (Q09427)

    p300 E1A-associated protein 300 kDa, (p300) p300/CBP

    CREBBP

    (Q92793)

    CPB CREB-binding protein (CBP)

    NCOA1

    (Q15788)

    SRC1, RIP160 Steroid receptor coactivator (SRC1)

    Nuclear receptor coactivator (NCOA1)

    160-kDa receptor interacting protein (RIP160)

    NCOA2

    (Q15596)

    TIF2 Transcriptional intermediary factor (TIF2)

    Nuclear receptor coactivator 2 (NCOA2)

    SRC/p160

    NCOA3

    (Q9Y6Q9)

    AIB1, ACTR, p/CIP

    RAC3, TRAM1

    Nuclear receptor coactivator (NCOA3)

    Amplified in breast cancer (AIB1)

  • Introduction 5

    Gene Family

    Name Synonyms

    Gene product name and synonyms

    Thyroid hormone receptor activator molecule

    (TRAM1).

    Receptor-associated coactivator (RAC3)

    Steroid receptor coactivator protein (SRC3)

    p300/CBP-interacting protein ( p/CIP)

    TFIIIC subunit 4

    family

    GTF3C4

    (Q9UK98)

    -- General transcription factor 3C polypeptide 4

    (GTF3C4)

    Transcription factor IIIC-delta subunit, (TF3Cd)

    TFIIIC 90-kDa subunit ( TFIIIC 90)

    ATF ATF2

    (P15336)

    CREB2, CREBP1 Cyclic AMP-dependent transcription factor

    (CREB2)

    Activating transcription factor (ATF2)

    cAMP response element-binding protein

    (CREBP1) , HB16

    CIITA CIITA

    (P33076)

    MHC2TA MHC class II transactivator (CIITA)

    TAF1 TAF1

    (P21675)

    BA2R, CCG1, TAF2A Transcription initiation factor (TFIID) subunit 1

    TBP-associated factor (TAF1)

    TBP-associated factor 250 kDa (TAFII250)

    Cell-cycle gene 1 (CCG1)

    Testis-specific chromodomain protein Y1

    (CDY1)

    CDY1

    (Q9Y6F8)

    -- Chromodomain Y-like protein (CDYL1,

    CDYL2)

    CDY

    CDYL1

    (Q9Y232)

    CDYL2

    (Q8N8U2)

    CDYL1, CDYL2 Chromodomain Y-like protein (CDYL1,

    CDYL2)

    TFIIB GTF2B

    (Q00403)

    TF2B, TFIIB Transcription initiation factor (TFIIB)

    General transcription factor TFIIB (GTF2B)

    MCM3AP MCM3AP

    (O60318)

    GANP, KIAA0572,

    MAP80, SAC3

    Mini chromosome maintenance 3-associated

    protein (MCM3AP).

    80-kDa MCM3-associated protein (MAP80).

    Germinal centre-associated nuclear protein

    (GANP).

    ESCO ESCO1

    (Q5FWF5)

    EFO1, KIAA1911 Establishment of cohesion 1 homolog 1(ESCO1,

    ECO1).

  • Introduction 6

    Gene Family

    Name Synonyms

    Gene product name and synonyms

    Establishment factor-like protein 1, (EFO1p,

    hEFO1).

    CTF7 homolog 1.

    ESCO2

    (Q56NI9)

    -- Establishment of cohesion 1 homolog 2

    (ESCO2).

    ECO1 homolog 2.

    ARD1 ARD1A

    (P41227)

    hARD1, TE2, ARD2 Arrest defective protein (ARD1)

    N-alpha acetyltranferase

    (retroposon-mediated gene duplication product)

    CLOCK CLOCK

    (O15506)

    KIAA0334 Circadian locomoter output cycles protein kaput

    (CLOCK)

    MGEA5

    NCOAT

    NCOAT MGEA5

    HEXC

    (O60502)

    --

    Meningioma-expressed antigen 5 (MGEA5)

    Nuclear cytoplasmic O-linked N-

    acetylglucosaminase and acetyltransferase

    (NCOAT)

    1.2 Histone Lysine Acetyltransferase - Catalytic Mechanism

    Reported studies have proposed two different catalytic mechanisms for HATs [12]. GNAT (GCN5-

    related N-acetyltransferase) family members use a sequential ordered mechanism that involves an

    acetyl transfer from Ac-CoA directly to the N- of the substrate lysine residue (Figure 3). For the

    GNAT family, initial structural and kinetic data revealed an ordered sequential mechanism for the

    acetyl-transfer [13]. In this mechanism, Trievel et al. proposed the ternary complex mechanism for

    the catalysis by GCN5 [14]. In the suggested mechanism Ac-CoA and then the lysine substrate

    binds to form a ternary complex, then a glutamate residue (GLU173) is positioned to abstract a

    proton from the amino group of the lysine residue, then the uncharged amino group performs a

    nucleophilic attack on the carbonyl carbon of the reactive thioester group of Ac-CoA.

    According to Trievel et al. [14], GLU173 of GCN5 is the possible candidate to perform the base

    catalysis, because it is close enough to the histone lysine, and it has been found later that the

    mutation of GLU173 to GLN abolishes the activity in vivo and in vitro [15].

  • Introduction 7

    In general, there is a need for an active-site glutamate (GLU173 in GCN5/ScKAT2) to activate the

    -amine of lysine to facilitate the direct nucleophilic attack of the carbonyl carbon of Ac-CoA [16].

    The formed tetrahedral intermediate then collapses to the acetyl-lysine product and CoA (Figure.

    4).

    Figure 3. Ordered sequential mechanism resulting in the formation of a ternary complex (adapted from [16]).

  • Introduction 8

    Figure 4. Comparison between the two proposed mechanisms: ternary complex formation and ping-pong mechanism

    (adapted from [17]).

    In the second mechanism, which is called ping-pong (i.e. double displacement) catalytic

    mechanism, a cysteine residue within the enzyme active site receives the acetyl moiety in the first

    step from Ac-CoA, and in a second step the acetyl moiety is transferred to the substrate lysine

    residue (Figure. 4) [16]. It has been noticed that all biochemically and structurally characterized

    HATs have a conserved glutamate residue in the active-site, which seems to have a similar function

    of deprotonating the amino group of the target lysine substrate before the acetyl transfer. Currently

    it is thought that all characterized HATs follow an ordered sequential bibi kinetic mechanism

    where differences between families may affect substrate specificity but not the overall mechanism

    of catalysis [4].

    A recent study on p300 demonstrated that p300 HAT is itself polyacetylated and contains an

    activation loop that requires (auto)acetylation for full enzyme activation [18]. This is similar to the

    situation with protein kinases, where activity is also regulated through an autoinhibitory switch

    involving phosphorylation of an activation loop. Additional to their role in catalyzing reversible

    post-translational modifications, the similarity between HATs and kinases include also how these

    proteins are recruited to their target complexes. In the case of kinases this usually involves the SH2

    and 1433 domains that recognize phosphopeptide motifs. HATs frequently contain a

  • Introduction 9

    bromodomain that bind acetyl-lysine-containing sequence motifs in histones and other proteins

    [19].

    1.3 HAT Modulators: Chemical Regulation of Acetyltransferases

    One of the direct structural insights into small molecule-mediated inhibition of HAT proteins came

    from a crystal structure of the tetrahymena GCN5 (tGCN5) HAT domain bound to a modified H3-

    CoA-20 inhibitor [20]. This bisubstrate inhibitor was prepared with an isopropionyl bridge between

    CoA and the peptide to mimic the Ac-CoA-lysine intermediate [21]. Until now, H3-CoA-20 is the

    most potent inhibitor of GCN5/PCAF HATs identified, with an IC50 of 300 nM for tGCN5. On

    other hand, the bisubstrate inhibitor Lys-CoA, in which an acetyl bridge is introduced between the

    amine group and CoA, is a potent p300 inhibitor (IC50 = 500 nM) but a weak PCAF inhibitor (IC50

    of 200 M) [17] (Figure 5). This suggested that the p300 enzyme family uses also a ternary

    complex mechanism [22]. In contrast to Lys-CoA, neither the H3-CoA-20 nor the H4-CoA-20

    peptide-CoA conjugates, where CoA is linked to lysine 14 and 8 of the respective histone peptides,

    are potent p300 inhibitors (IC50 values above 10 M). On the other hand, H3-CoA-20 is a potent

    PCAF inhibitor (IC50 = 360 nM) [17, 20]. These findings suggested that p300 might use a ternary

    complex mechanism that differs somehow from that of the GCN5/PCAF HAT proteins.

    In spite of the great interest in HATs as therapeutic targets, just a few synthetic small-molecules

    (beside of few natural product) inhibitors of HATs have been discovered to date. The most crucial

    disadvantages of the identified substrate-based inhibitors are their low cell permeability and

    metabolic instability, which decreases their suitability for investigations in vivo [23].

  • Introduction 10

    1.3.1 Non-peptidic Natural Product HAT Inhibitors

    Figure 5. Molecular structures of HAT inhibitors (adapted from [23]).

    Anacardic acid is a major component of cashew nutshell liquid and was identified in a natural

    product screen as a noncompetitive HAT (i.e. p300 and PCAF) inhibitor [24]. It has poor membrane

    permeability and, therefore, shows little effect on cells [25]. It works as weak non-specific

    inhibitors of p300/CBP and PCAF (IC50 = 8.5 and 5 M, respectively). Interestingly, CTPB, the

    amide derivative of anacardic acid enhances HAT activity of p300 by fourfold, but not that of

    PCAF [26]. Later Mantelingu et al. [27] have described the identification of chemical entities

    essential to activate p300 HAT activity. Significantly, by employing surface-enhanced Raman

    spectroscopy of the enzyme-inhibitor complexes, they have shown that the activation of HAT

    activity is achieved by the alteration of the p300 structure.

    Another natural product with HAT-inhibitory activity is Curcumin, a yellow pigment extracted

    from the root of the turmeric herb Curcuma longa L [28]. Curcumin has long been known to

    possess interesting pharmacological properties; apart from its chemopreventive and antiproliferative

    activities. It has been found to have antioxidative, anti-inflammatory, anti-infective and antiseptic

    properties, and is widely used in Indian medicine and culinary traditions [27]. Curcumin has been

    reported to inhibit the HAT activity of p300/CBP but not that of PCAF [28]. The observed kinetics

    of p300 enzyme inhibition by Curcumin was originally interpreted that this compound does not

  • Introduction 11

    bind to the active site but act as an allosteric inhibitor [28]. Subsequently, it was shown that

    Curcumin is in fact a covalent inhibitor of p300 but not PCAF, presumably targeting some of the

    amino acid residues by virtue of its electrophilic unsaturated ketone function [29].

    Garcinol is a polyprenylated benzophenone natural product isolated from the edible fruit Garcinia

    indica and was shown to be an active site inhibitor of p300 and PCAF, where inhibition kinetics

    were observed to be uncompetitive with respect to Ac-CoA but competitive with respect to the

    histone substrate [30]. Garcinol inhibits p300 (IC50 = 7 M) and PCAF (IC50 = 5 M) both in vitro and in vivo [31, 32]. Recently, Mantelingu et al. synthesized and tested a set of Garcinol derivatives

    (e.g., LTK-13, LTK-14, LTK-19) (Figure 7) that are selective for p300 (IC50 = 57 M) and inactive at PCAF [33]. However, these compounds tend to be poorly soluble and are unstable

    because of facile oxidation of the isoprene moieties.

    1.3.2 Irreversible HAT Inhibitors (Aryl and alkyl N-substituted Isothiazolones)

    Aryl and alkyl N-substituted isothiazolone compounds have been shown to inhibit H3 and H4

    acetylation by PCAF and p300 irreversibly (e.g., CCT077791). [34] Stimson et al. showed that a

    series of isothiazolones, identified from high-throughput screening, inhibits HAT catalytic activity.

    They are also cell permeable, and can reduce global acetylation as well as acetylation of specific

    histones (H3 and H4) as well as nonhistone proteins, like alpha-tubulin. In this series of aryl and

    alkyl N-substituted isothiazolones, the inhibition is due to the irreversible interaction with thiol

    groups (Figure 6). HAT inhibition of isothiazolones is abolished in the presence of thiol-reducing

    agents like dithiothreitol (DTT) or glutathione [34]. Furthermore, HAT activity was not restored in

    experiments involving the incubation of PCAF with the two isothiazolones CCT077791 and

    CCT077792 followed by dialysis for 24 hours. The SAR study of this serie of compounds has

    proved that their activity is related to the nature and electron withdrawing/pushing properties of the

    substitutes [34]. These properties affect strongly the chemical kinetics of breaking down the sulfur-

    nitrogen bond in the isothiazolone ring. The compounds also seem to have considerable off-target

    effects, which may be attributable to their high chemical reactivity towards free thiol groups.

  • Introduction 12

    Figure 6. Proposed mechanism of the covalent binding of isothiazolones to thiol groups.

    1.3.3 Other Synthetic HAT Inhibitors

    Figure 7. Some structures of synthetic HAT inhibitors.

  • Introduction 13

    -Methylene--butyrolactones, like MB-3, are small-molecule HAT inhibitors of purely synthetic

    origin (Figure 5). They were designed based on the known interactions between Ac-CoA and the

    acetyl acceptor Lys side-chain of the macromolecular substrate [30]. Biel et al. developed MB-3, a

    small, cell-permeable inhibitor of human GCN5. The compound contains an -methylene--

    butyrolactone scaffold, which is a known substructure element in natural products. MB-3 shows

    only weak inhibition of CBP (IC50 = 500 M) and GCN5 (IC50 = 100 M) [26]. Costi et al. reported that cinnamoyl compounds are also inhibitors for p300 [35 (a)].

    Recently some chemical modifications have been made on Garcinol to develop p300 selective

    inhibitors (Isogarcinol (IG) and LTK14) [35 (b)]. SAR study has been done in the same work to

    understand the binding to p300 and PCAF [35 (b)].

    Trifluoro-methyl phenyl benzamides have been found to modulate p300 [27]. Cycloalkylidene-(4-

    phenylthiazol-2-yl)hydrazone derivatives have been synthesized and have been identified as

    capable of inhibiting growth of a GCN5 [35 (d)]. One of these derivatives, CTPH2 has showed

    inhibition of GCN5. It has been confirmed that this compound targets the Gcn5p functional network

    through an interacting protein [35 (e)].

    Another way to inhibit HAT activity is to block the recognition of acetylated partners by targeting

    the bromodomain [35 (f)]. Developing of bromodomains inhibitors could be useful for developing

    anti-HIV-therapeutics. A series of selective ligands for the PCAF bromodomain has been

    discovered recently [35 (g)].

  • Introduction 14

    1.4 Structural Overview of Serotonin Acetyltransferases AANAT

    Figure 8. A view of the AANAT-inhibitor complex containing the four-stranded (1-4) -sheet and showing the bisubstrate analog bound in the active site. Side-chains of the tryptamine-binding residues are displayed. GNAT motifs

    C, D, A and B are color coded red, green, blue and magenta, respectively (adapted from [36]).

    Melatonin is produced in the pineal gland on a circadian cycle and is involved in the regulation of

    the biological clock in vertebrate organisms [37]. Circulating levels of melatonin rise and fall daily

    under the control of an endogenous circadian clock. The biosynthesis of melatonin in the pineal

    gland involves the conversion of 5-hydroxytryptamine (serotonin) to 5-hydroxy-N-acetyltryptamine

    (N-acetylserotonin), catalyzed by the serotonin N-acetyltransferase (also named arylalkylamine N-

    acetyltransferase AANAT). This step is followed by O-methylation to 5-methoxy-N-

    acetyltryptamine (melatonin) catalyzed by 5-hydroxyindole O-methyltransferase (HIOMT). The

    activitys change of AANAT is the main factor which controls the rhythmic production of

    melatonin. In contrast to AANAT, HIOMT is constitutively active and does not regulate melatonin

    circadianic rhythm.

    AANAT belongs to the GCN5-related N-acetyltransferase (GNAT) family of proteins, which share

    a common conserved structural domain [38]. Regarding the similarity of the function, the domain

  • Introduction 15

    has originally evolved to bind CoA through conserved backbone regions and to facilitate acetyl

    transfer to the substrate.

    CoA binds between the backbone amides of the P-loop in the motifs A-D and a V-shaped cavity

    created between two parallel strands (Figure 8). The exposed amide backbone within this cavity

    binds to the alanylpantetheine backbone of CoA. Interestingly, the adenine moiety of the cofactor is

    solvent-exposed and does not significantly contribute to the binding with the protein. In general, the

    CoA binding site is burried within the protein and offers the possibility to bind small drug-like

    molecules [36, 39].

    Figure 9. Bisubstrate inhibitors that have been co-crystallized with AANAT (PDB code 1KUY, 1KUV, and 1KUX

    [36]).

  • Introduction 16

    Figure 10. Coenzyme A structure.

    The pantetheine-pyrophosphate moiety forms extensive hydrogen-bonding contacts to main-chain

    functional groups of residues LEU124, VAL126, and GLN132-SER137 of the conserved GNAT

    motif A.

    The two residues closest to the sulfur atom of CoA are TYR168 (3.1 away from cofactors sulfur)

    and GLU161 (3.8 away from cofactors sulfur). The adenine and 3-phosphate-ribose group of

    CoA are present in two alternative conformations, which are stabilized by two different sets of

    crystal contacts. The presence of two conformations is not surprising, because the adenine and 3-

    phosphate moiety occur in various conformations in previously determined GNAT structures

    (reviewed in [40]). The tryptamine moiety in the serotonin binding pocket is also found in two

    alternative conformations (referred to as cis and trans), localized in the hydrophobic binding pocket

    of serotonin formed by residues PHE56, PRO64, MET159, VAL183, LEU186 [36, 39].

    In AANAT, two histidine residues in (-strand 4) of motif A, HIS120 and HIS122, have been suggested [38] to play the role of the general base in catalysis because of their proximity to the NH2

    group of the bi-substrate analog (HIS120 is 7.5 away from the substrate, HIS122 is 8.7 away

    from the substrate). However, site-directed mutagenesis showed that Michaelis Menten constant

  • Introduction 17

    (Km) but not the maximum rate of the catalytic reaction (Vmax) was affected by HIS120 to GLN

    and HIS122 to GLN mutations in ovine AANAT [39]. Another candidate for the role of catalytic

    base in AANAT is GLU161 in the loop following strand S5. The mutation to alanine for this

    residue doesnt affect enzymatic activity [41], providing evidence against this possibility. Thus, the

    identity of the catalytic base in AANAT remains unknown. It is possible that some active-site

    residues of AANAT can play the catalytic role, making site-directed mutagenesis results difficult to

    interpret [41]. Furthermore, the pKa of the nucleophilic substrate amino group may be lowered in

    the hydrophobic AANAT enzyme active site and a catalytic base might be expected to have only a

    small impact on the acceleration rate. A similar proposal has been suggested for ribosomal catalysis

    of peptide bond formation [42].

    Catalytic enhancement of the chemical step may result from stabilization of a tetrahedral complex

    and/or activation of leaving group (CoA-SH) departure. Polarization of the thioester carbonyl group

    as well as stabilization of a potential tetrahedral intermediate could be achieved by hydrogen

    bonding of the thioester carbonyl group to the backbone of the hydrophobic residue localized in

    beta-strand 4 (LEU124 in AANAT) [43].

  • Introduction 18

    Figure 11. Schematic representation of the interactions between AANAT and a bisubstrate inhibitor. The surrounding

    residues in the cofactor and substrate binding pockets of AANAT are shown (PDB code 1KUX). Blue arrows refers to

    backbone hydrogen bonds while green arrows refers to side-chain hydrogen bonds. Blue areas refers to the ligands

    exposure, while the residues with light blue shadow refers protein exposure.

  • Introduction 19

    Figure 12 . Binding mode of the bisubstrate inhibitor 3 (see Figure 9) to AANAT serotonin acetyltransferase (PDB

    code 1KUX).

  • Introduction 20

    1.5 Structural Overview of PCAF HAT

    Figure 13. Structure of the PCAFCoA complex representing the general secondary structure of mamallian GNAT

    family acetyltranferases and the location of the Ac-CoA binding site. The four domains of the protein are color-coded.

    Motifs AD and motif B (based on structural conservation) are colored blue and green, respectively. The N- and C-

    terminal protein segments flanking the core are colored magenta and gold, respectively. CoA is colored red (adapted

    from [44]).

    In the PCAF crystal structure, CoA is bound in a conformation, forming an extensive set of protein

    interactions that are mediated predominantly by the pantetheine arm and the pyrophosphate group

    [44] with motif A-D and motif B (Figure 13). All but two groups of the 16 member pantheteine

    armpyrophosphate chain make contacts with the protein. Most of the contacts are mediated

    through either protein backbone hydrogen bonds or protein side chain van der Waals contacts [44].

    GNAT conserved residues in PCAF motifs A and B interact extensively with CoA. It could be

    noticed that residues 580 and 582587 in the 4loop3 region of motif A make direct and water-

    mediated hydrogen bonds with the pyrophosphate group [45]. Thr587 makes a hydrogen bond to

    the pyrophosphate oxygen. The aliphatic side chain of GLN581 and a CYSALAVAL sequence

  • Introduction 21

    (residues 574576) at the top of the 4-strand makes van der Waals contacts with the aliphatic part

    of the pantetheine arm [44] (see Figure 14 for details).

    In addition, the backbone of CYS574 and VAL576 forms hydrogen bonds with the pantetheine arm.

    Residues in the 5loop4 region of GNAT motif B interact by van der Waals contacts with the -

    mercaptoethylamine segment of the pantetheine arm and thus play a major role in orienting the

    reactive sulfhydryl atom for the acetyl transfer [44] (Figure 13). Other protein residues, involved in

    the binding, are ALA613, TYR616 and PHE617. Also TYR616 makes van der Waals contacts with

    the end of the pantetheine arm near the pyrophosphate group [44]. Residues GLN525 and LEU526,

    which are located at the substrate-binding cleft, also make van der Waals contacts with the

    pantetheine arm of coenzyme A. The proximity of these residues to the cofactorsubstrate junction

    suggests that they play an important role in substrate specificity and/or catalysis [16, 46] (Figure

    14).

    In the PCAF substrate-binding cleft, there are two residues that are in proximity to act as a general

    base for the catalysis via a ternary complex mechanism. These residues, GLU570 in the 4-strand

    and Asp610 in the loop between the 5-strand and the 4-helix, are both located in the core domain

    of PCAF and are strictly conserved within the GCN5/PCAF subfamily of histone acetyltransferases.

    Mutational analysis strongly favors the catalytic involvement of GLU570 since mutation of the

    corresponding residue in yeast GCN5 (GLU173) to alanine or glutamine mutations debilitates the

    GCN5 activity in both transcriptional activation in vivo and histone acetylation in vitro [47,48]. In

    contrast, mutation of the yeast counterpart of ASP610 in PCAF affects slightly the transcriptional

    activation in vivo and histone acetylation in vitro [48, 49-51]. According to Clements et al. [44],

    GLU570 exists in an ideal environment to play a catalytic role, first because GLU570 is located

    proximal to an acidic patch which forms an attractive surface for the basic lysine substrate., and

    secondly because the carboxylate of Glu570 is surrounded by several hydrophobic residues

    (PHE563, PHE568, ILE571, VAL572, LEU606, ILE637 and TYR640) that probably function to

    raise the pKa of the glutamate side chain and thus facilitate the proton extraction from the lysine

    substrate. Thirdly, the carboxylate of GLU570 is only ~11.5 away from the putative position of

    the reactive thioester of acetyl-coenzyme A [44] (Figure 15).

    It was suggested that, the proton extraction may proceed directly through the carboxylate of

    GLU570 or, alternatively, through a water molecule. What supports this hypothesis is the presence

  • Introduction 22

    of a water molecule tightly bound to the carboxylate oxygen of GLU570 which is close to the

    coenzyme structure [44]. Further requirement for the catalysis is the presence of a hydrogen bond

    donor which stabilizes the tetrahedral intermediate. The potential hydrogen bond donor is the

    backbone NH of CYS574, although in the presence of the bound substrate additional donors may

    also exist (i.e. backbone amine groups of the histone or transcription factor substrate) [44].

    Figure14. Schematic representation of the interaction between Ac-CoA and the surrounding residues in the cofactor

    binding pocket of PCAF (PDB code 1CM0). Blue arrows refers to backbone hydrogen bonds while green arrows refers

    to side-chain hydrogen bonds. Blue areas refers to the ligands exposure, while the residues with light blue shadow

    refers protein exposure.

  • Introduction

    23

    Figure 15. Binding mode of Ac-CoA at PCAF, showing residues that contribute to ligand binding (PDB code 1CM0).

  • Aim of the work 24

    2 Aim of the Work

    In contrast to many nucleotide-dependent protein inhibitors, a small molecule HAT inhibitor

    doesnt need to mimic the adenine moiety as the adenine ring is loosely bound to the surface of

    HATs. Therefore, less risk exits to get non-selective binding to the multitude of nucleotide binding

    proteins (e.g. ATP-binding proteins). In addition, the conserved backbone interactions observed for

    CoA may be used to get high-affinity binding, utilizing a wide range of drug-like moieties such as

    carboxylate, amide, or sulfonamide groups. The V-shaped cavity in HAT is buried and thus

    provides a hydrophobic environment that is suitable to binding small drug-like molecules. The CoA

    binding site of GNAT members is conserved and thus similar, while considerable structural

    differences could be found in the substrate binding site. These regions have evolved to bind a broad

    range of acetyl-group acceptors, including proteins and small-molecule substrates (histones,

    cofactors, serotonin, etc.). Thus, to gain selectivity over other homologous proteins that bind Ac-

    CoA, it is desirable for an inhibitor to span both sites or to interact with the substrate binding site.

    In the current work the focus was put on the docking analysis of known inhibitors for PCAF and the

    related serotonin acetyltransferase AANAT for which a series of potent inhibitors has been reported

    recently. As all of the currently known PCAF inhibitors show either complex structures or are

    natural products with unknown binding mode, the rational discovery of drug-like inhibitors

    represents still a challenge.

    As there is high homology as well as structural similarity between the cofactor binding pockets of

    PCAF and AANAT, the structures of recently identified AANAT inhibitors will be used as

    template to design novel PCAF inhibitors. To reach this goal, docking and virtual screening settings

    will be tested to find optimal docking conditions for GNAT acetyltransferases. The gained

    knowledge on AANAT will then be used to dock compounds identified by similarity searching into

    the PCAF binding pocket.

    A second focus will be given to the development of recently identified isothiazolones as irreversible

    PCAF inhibitors. Different modelling techniques will be applied in order to get ideas to further

    improve the activity of this series of compounds and to establish first structure-activity

    relationships.

  • Aim of the work

    25

    Beside the application of different computer-based methods to identify and develop small molecule

    PCAF inhibitors, a special focus will be given on the evaluation of different docking and scoring

    methods for available ligand data set. It is hoped, with a systematic evaluation, to improve the

    quality of docking and virtual screening methods. These data could be helpful in further improving

    the optimization process of PCAF inhibitor lead structures.

  • Computational Methods 26

    3 Computational Methods - Docking and Virtual Screening

    Docking is a method which predicts the preferred orientation of one molecule relative to a

    second one (usually a macromolecule) to form a stable complex. Knowledge of the preferred

    orientation in turn may be used to estimate the strength of association between two molecules

    using special mathematical functions called scoring functions. By this way, docking plays an

    important role in the rational drug design.

    Molecular docking can be used for three main purposes:

    1) to predict the binding mode of a known active ligand.

    2) to identify new ligands using virtual screening.

    3) to predict the binding affinities of related compounds from a known series of actives.

    The docking process can be divided into two parts: the search algorithm and the scoring

    algorithm. Those two algorithms try to solve the two classical problems of docking process,

    the search problem and the scoring problem.

    3.1 The Search Problem

    The search algorithm should sample the degrees of freedom of the ligand/macromolecule

    system sufficiently to include the true binding modes, while the scoring algorithm should

    represent the thermodynamics of interaction to distinguish the true binding modes from all

    others explored.

    Treatment of ligand flexibility can be divided into three basic categories [52]:

    - Systematic methods (incremental construction, conformational search, databases)

    - Random or stochastic methods (Monte Carlo, genetic algorithms, tabu search)

    - Simulation methods (molecular dynamics ab intio docking, energy minimization)

    The evaluation and ranking of predicted ligand conformations are always considered as

    crucial step of structure-based virtual screening. Even when binding conformations are

    correctly predicted, the calculations will not be successful if they cannot differentiate between

    true binders and inactives.

  • Computational Methods 27

    3.2 The Scoring Problem

    Scoring problem represents the second challenge for docking and virtual screening methods.

    Virtual screening is used to identify new lead molecules. In every virtual screening, molecules

    must be docked into a protein site to get a predicted pose of ligand binding. The best pose

    by scoring of each molecule is then selected to get a top-ranking hit list.

    Scoring functions implemented in docking programs make various assumptions and

    simplifications in the evaluation of modelled complexes and do not fully account for a number

    of physical phenomena that determine molecular recognition for example, entropic effects.

    Essentially, three types or classes of scoring functions are currently applied:

    - Force-field-based scoring: (D-score [53], G-score [53], Gold [54], Autodock [55], Dock

    [56]).

    - Empirical scoring: (Ludi [57, 58], F-score [59], Chemscore [60], Score [61, 62], Fresno

    [63], X-score [66]).

    - Knowledge-based scoring: (PMF [67-69], DrugScore [67], SMoG [68]).

    Consensus scoring combines information from different scores to balance errors in single

    scores and improve the probability of identifying true ligands. An exemplary implementation

    of consensus scoring is X-CSCORE [69, 70], which combines GOLD-like, DOCK-like,

    Chemscore, PMF and FlexX scoring functions. However, the potential value of consensus

    scoring might be limited, if terms in different scoring functions are significantly correlated,

    which could amplify calculation errors, rather than balance them.

    In principle, the fitness or scoring functions try to predict the free energies of binding of every

    molecule being screened. In practice, the best ranking, that we look for, is the ranking that is

    most compatible with the real binding energy. Actually, docking results are often judged by

    enrichment of true hits among a larger number of molecules tested, which are determined

    by number of real actives among the hit list. The more true positives (real actives) and less

    false positives (decoys) we get in the top-scoring hit list, the better enrichment indexes should

    be assigned for this docking (virtual screening) run.

    In any virtual screening, few benchmarks and metrics for the performance should be

    considered: firstly the root mean square deviation (RMSD) between a generated docking pose

  • Computational Methods 28

    and the captured experimental pose in the crystal structure should be considered. Usually

    absolute RMSD is used in the docking to estimate the distance between corresponding atom

    pairs of two conformers. The optimal docking run should be able to reproduce approximately

    the experimental binding pose with RMSD less than 2 . Secondly a visual inspection should

    be done for the suggested docking poses and rational judgment of these predictions should be

    made by considering the quality of the interactions between the chemical groups of ligands

    and the significantly important residues in the protein. In such step, creating the molecular

    surfaces with properties maps (electrostatic energy map or van der Waals contacts) could be

    essential. The molecular interaction fields for different chemical probes, created e.g. by the

    GRID software, could be useful to consider the best predicted binding mode.

    Later some enrichment indexes could be calculated like the sensitivity (Se, true positive rate),

    which is the ratio of the number of active molecules found by the virtual screening method to

    the number of all active database compounds. The second index of enrichment is the

    specificity (Sp, false positive rate), which represents the ratio of the number of inactive

    compounds that were not selected by the virtual screening methods to the total number of

    inactives in the whole database. One of the most used methods currently to describe the

    enrichment is the receiver of operator curve (ROC), which describes the selectivity (Se) as a

    function of (1-Sp). As Sp is the ratio of discarded inactives to the total inactives, then 1-Sp is

    the ratio of the selected inactives, or in another words the selected decoys. The ROC curve is

    plotted by considering the different scores of actives as thresholds. For every threshold, the

    number of decoys and number of actives within this cut-off is counted. Then we can get the

    ROC curve as map of the distribution for actives and decoys according to their scores. By this

    method, we avoid the selection of arbitrary threshold by considering all Se and Sp pairs for

    each score threshold, which represent important advantage of this method over the other

    enrichment indexes [71, 72].

    The most difficult challenge for docking, is the accurate prediction of the binding affinities of

    compounds, unless if these compounds were from a single series. In all studies there was no

    strong correlation between the ability of a docking program to produce a correct pose and its

    success in a virtual screen. This difficulty can be attributed to the inherent danger of using a

    one single metric such as RMSD, as poses can be fundamentally correct despite a large

    deviation in one part of the molecule. Another problem comes from observing those cases

    where the poses are barely in the correct binding site or completely with wrong binding mode,

  • Computational Methods 29

    and yet good enrichment is observed. Enrichment may be due to screening out compounds

    that are wrong for the target rather than selecting those that are right. Clearly, the enrichment

    indexes should be considered but always with visual inspection of predicted binding mode

    and its agreement with X-ray structures or the enzymatic kinetic studies for the inhibition type

    (competitive, non-competitive, and uncompetitive).

    It is always a difficult task to get accurate prediction of binding affinities for a diverse set of

    molecules. At its simplest level, this is a problem of subtraction of large numbers,

    inaccurately calculated, to get a small number. The large numbers are the interaction energy

    between the ligand and protein on one hand and the cost of bringing the two molecules out of

    solvent and into an intimate complex on the other hand. The result of this subtraction is the

    free energy of binding, which is the ultimate target in any drug design study [73]. The

    problem arises from the condensed phases in which biology occurs and also from the many

    degrees of freedom of biomolecules [74]. In water, and with highly flexible proteins and

    ligands, accurate calculations are much more costly and error prone. Additionally, as pointed

    out by Tirado-Rives and Jorgensen [75], the window of activity, as they called it, is very

    small. That means that there is just small free energy difference, estimated to be just 4.5

    kcal/mol, between the best possible detected ligand in a virtual screening study (potency, ~

    50 nM) and the experimental detection limit (potency, approximately 100 M. Among the

    most accurate methods today are thermodynamic integration/free energy perturbation

    methods, which could sometimes calculate the differences in affinities between related

    molecules with accuracy about 1 kcal/mol [76, 77]. But even these methods only compare

    close analogues, but they do not predict absolute binding affinities nor can they compare

    affinities among the diverse compounds.

    3.3 Solvation Effects

    Protein-ligand binding happens in a salt-water environment. Such an environment has a strong

    effect on energetics of protein-ligand binding. Water has a dielectric constant of about 80,

    whereas the dielectric constant of vacuum is 1. As a favorable interaction exists between the

    charge and the high-dielectric environment, new one-body solvation energy for each atomic

    charge would arise [78]. As a consequence, there can be a substantial energy penalty for

    moving the polar part of a ligand out of water and into the binding site.

  • Computational Methods 30

    Moreover, water molecules performs a screening on the charge-charge interactions of fully

    hydrated atoms by approximately 80-fold. However, atoms in a protein-ligand interface are

    hold apart from the solvent and therefore interact with an effective dielectric constant less

    than 80. In general, we can consider atoms that are further apart, more likely to interact

    through solvent, and this idea led to introduce a new computational model; called as crude

    screening model.

    The crude screening model is consisting of a distance-dependent dielectric. For atoms i and j,

    the dielectric between two atoms I and j is Dij = C Rij, where C is a constant often set to 4 and

    Rij is the inter-atomic distance between two atoms i and j. This model allows the modeling of

    one chief effect of the solvent with efficient manner, and it is used in a number of ligand-

    protein docking algorithms. However, this model is not enough to account for all solvations

    effects [73].

    In addition, the electrostatic interaction of two atoms is not only linked to their mutual

    distance, it depends also on the positions of all the other protein and ligand atoms, because

    these positions determine where the high-dielectric solvent can penetrate. Another important

    effect of water is the hydrophobic effect, which is the tendency of water molecules to drive

    non-polar solutes together [79]. This promotes the association of non-polar surfaces of the

    ligand and the protein. The hydrophobic effect is often considered by an additional solvation

    energy term that is proportional to molecular surface area, with a positive coefficient.

    Two computational models have been developed to describe the electrostatic solvation effects

    of water. The more precise model is called Poisson-Boltzman (PB), while another faster but

    less precise model is called Generalized Born approach (GB). Combining the PB or GB

    electrostatics model with a surface area term (to account the hydrophobic effect) yields the

    PBSA [80] and GBSA [81] solvation models, respectively.

    These two models are called implicit solvent models because they do not treat any water

    molecules explicitly during a simulation. The influence of solvent on binding can also be

    treated with molecular dynamics (MD) or Monte Carlo (MC) simulations that include

    thousands of explicit water molecules modeled with an empirical force field [82- 84].

    Dielectric screening, the solvation of polar groups, and the hydrophobic effect all emerge

  • Computational Methods 31

    automatically within this approach. But it is substantially more computationally demanding

    than an implicit solvent model.

    When ligand solvation is not considered in molecular docking, there is no penalty for placing

    a charged ligand atom in a region where the receptor only weakly complements it. In this

    situation, a highly charged molecule will be overestimated to have better interaction energy

    than a true ligand. The true ligand, bearing less formal charge, would be estimated to have

    less favorable interaction energy with this receptor site [73].

    When a charged molecule transfers from water to a binding site, it changes a high dielectric

    for a low dielectric environment. When the cost of moving a charged species from a high to a

    low dielectric environment is considered, the bias toward highly charged molecules is

    eliminated [73].

    In the same way, when non-polar solvation is not considered, larger molecules would

    typically receive better scoring values than they should receive. In the docking poses, these

    molecules often have fragments that are poorly complemented by the binding site. To solve

    this problem, taking the hydrophobic effect into account and considering the non-polar

    solvation (estimated by the loss of molecular surface) could result in better estimations. In this

    case, molecules that make few favorable interactions with the enzyme would be disfavored

    relative to molecules that are well complemented by the binding site. The non-polar solvation

    term acts as a balance to the van der Waals term in the interaction energy, leading to

    complexes with a higher proportion of interacting surfaces [73].

    In summary, ignoring the electrostatic component of ligand solvation results in higher ranking

    of compounds with high formal charges than the known neutral inhibitors for this enzyme.

    Also ignoring the non-polar component of ligand solvation biases the results towards larger

    compounds that dont complement the binding site as the known, smaller ligands [73].

    3.4 Solvation Effects and Scoring Functions

    The GOLD program [85-87] (Genetic Optimization of Ligand Docking) utilizes a genetic

    algorithm (GA) to find an optimal ligand conformation for a given protein target and thus

    evaluates poses with a fitness function (Goldscore, Chemscore or ASP score). The Goldscore

  • Computational Methods 32

    fitness function is force-field-based and includes directional hydrogen (H)-bonding term, a

    soft van der walls potential (vdw) term, and an internal energy term. The interesting features

    of this function are the additional H-bonding term, the indirect consideration of desolvation

    through the H-bonding term, and the evaluation of internal energies.

    The LUDI program differentiate between the ionic bonding Energy and H-bonds' energy and

    also contain a term for accounting the entropic effect, or the contribution due to freezing out

    of rotational degrees of freedom upon binding.

    FlexX software [59] has modified the LUDI scoring function later to replace the hydrophobic

    interaction term (van der walls forces, abbreviated as vdw ) with two terms : one for ligand-

    receptor aromatic contacts and another for other hydrophobic interactions, additionally the

    coefficients of other terms has been re-calibrated using a set of 19 complexes [59]. Examples

    of other empirical scoring functions are Chemscore [60], Fresno [63], Score [61], and the

    scoring function of Hammerhead [88, 89]. These scores are only different in their weights or

    geometric constraints (which affects the penalty function that accounts for deviations from

    ideal H-bond geometry).

    At last, there is the scoring function of Autodock [55, 90], which is a function that combines

    both force-field-like and empirically based attributes. This scoring function has firstly three

    terms similar a molecular-mechanics force field (vdw, H-bonds, and electrostatics), but in this

    instance, they are weighted by empirical weighting factors. The last two terms represents the

    entropic contribution and the proteinligand solvation penalty.

    An intermediate but practical approach to address solvation effects is to treat the solvent as a

    continuum dielectric medium [91, 92]. Shoichet et al. have chosen to use continuum

    electrostatics to evaluate the ligand solvation term, assuming that the ligand is completely

    desolvated upon binding, and that every ligand desolvates the protein equally [73]. They start

    with the DOCK energy function and add separate electrostatic and non-polar corrections to

    ligand solvation as determined by the program HYDREN [93]. In spite of all approximations

    in such method, this simple implementation had a considerable effect on the ranking of the

    known actives and the size and charge of other ligands populating the top of the hit list [73].

  • Computational Methods 33

    The scoring functions discussed above aim to approximate the important contributions to the

    free energy of binding in a manner consistent with the demands of high-throughput docking

    (HTD). Most of the terms added to these functions to address the effects of solvation are

    included to capture the qualitative effects in an easily implemented atom-based fashion (i.e.,

    weight down pairwise Coulombic interactions, penalize buried polar groups, reward buried

    hydrophobic interactions).

    Recently some researchers have used more rigorous, physics-based approaches to capture the

    effects of solvent in a HTD scoring function (i.e. continuum electrostatics). However, to speed

    up the calculations, these more rigorous approaches still utilize an approximate continuum

    electrostatics method like generalized born (GB) and take algorithmic shortcuts. With the

    availability of faster methods to calculate Poisson-Boltzmann (PB) -based electrostatics, it is

    possible to use a full solvation-based HTD scoring function.

    To get more precise results using Poisson-Boltzmann/Surface Area (PBSA) implicit model,

    electrostatic (Coulombic + solvation) energies could be calculated using ZAP, which is an

    OpenEyes library to apply the PBSA implicit solvation calculation [94]. In this method,

    solutions to the Poisson-Boltzmann equation are obtained using an exponentially switched

    atomic Gaussian function to represent the dielectric boundary, such that the dielectric constant

    varies smoothly from = 2 for the molecular region to = 80 for the solvent [95,96,97].

    Atomic charges are calculated on a grid with 0.5 spacing. Electrostatic solvation energies

    Gelec are then obtained by summing the product of every atoms charge and the potential

    over all atoms and subtracting out the self-energy and Coulombic terms.

    The apolar contribution to desolvation is calculated using Gap = A, where A is the total

    loss of solvent exposed surface area of the protein and ligand upon forming a complex (also

    calculated using ZAP). The quantity (= 47 cal/mol/2) was chosen such that Gap

    represents the difference (complex vs. protein + ligand) in transfer energy from a low

    dielectric environment (such as an alkane solvent or binding site in a protein) with = 2, to a

    water with = 80 [98, 99]. Such Equation could be used:

    apsolvgaselec

    gasvdw

    solvbind G+G+EE=G +

  • Computational Methods 34

    The last equation states that the binding energy in in solvent equals to electrostatic energy in

    gas phase plus the solvation electrostatic contribution plus the solvents apolar contribution.

    The sum of electrostatic + loss area contribution could enhance the correlation with the

    observed potency. If area loss term is ignored, calculations comparing the binding affinities

    of dissimilar ligands will be biased towards overly charged and overly large molecules.

    3.5 Effects of Rescoring Docking Hits using MM-GBSA or MM-PBSA Methods

    One of the first applications of molecular mechanicsPoissonBoltzmann surface area (MM-

    PBSA) scoring was the trial of Wang et al. which consists of hierarchical technique that used

    an initial database screening and a MM-PBSA rescoring to find HIV-1 reverse transcriptase

    inhibitors [100]. An initial docking screen with subsequent rescoring by a molecular

    mechanicsgeneralized Born surface area (MMGBSA) method has been recently used to

    improve the enrichment of known ligands for several enzymes [101105].

    MMPBSA and MMGBSA methods involve minimization and often dynamic sampling of

    the proteinligand complexes, and include ligand and receptor conformational energies and

    strain. They evaluate the electrostatics and solvation components of the binding energy by PB

    or GB methods, including the desolvation of both ligand and receptor. The MMGBSA

    binding energy is determined by (E (complex) E (receptor) E (ligand)) where E is an energy

    estimation using GBSA solvation model [102]. As we are using implicit solvation model, it is

    clear that solute configurational entropy effects are completely ignored.

    There are three main limitations in these methods:

    1) The force fields and solvation energies are not uniformly accurate

    2) For reasons of computational efficiency, only a small part of configuration space near the

    docking starting pose could be really explored

    3) Configurational entropy effects would be ignored.

    In spite of these limitations, the MMGBSA rescoring methods represent a substantially

    higher level of scoring methodologies than that applied by most docking programs and are

    attractive alternatives to the more complete computationally-expensive methods of the energy

    calculation like free-energy perturbation and thermodynamic integration [106-108]. The

  • Computational Methods 35

    principal improvement conferred by MMGBSA rescoring over docking is the inclusion of

    receptor binding site relaxation and the optimal induced fitting of the docking solutions.

    Consequently, this induced fitting could improve the rank of larger ligands that would be

    missed by rigid receptor docking.

    The structural relaxation with MMGBSA performed well when the initial docking geometry

    resembled the crystallographic pose, but there is a little to do when large protein

    conformational changes were provoked by ligand binding site or the docking binding mode

    was away from the real crystallographic binding mode. In most cases this relaxation led not

    only to improved rankings but also improved geometries. For many ligands, RMSD values

    between the MMGBSA predictions and the crystallographic results declined relative to those

    of the docking predictions and, especially in hydrophilic or anionic cavity, many ligands

    refined by MMGBSA had improved hydrogen bonding to the site. But this rescoring method

    couldn't rescue the wrong docking solutions for some false negatives (missed hits) [102].

    By allowing the receptor to respond to ligand binding, one allows for new and potentially

    unfavorable receptor conformations. These must be distinguished by the MMGBSA energy

    functions from the true low-energy conformations that may be sampled in solution. This is

    challenging and hard task, as the receptor conformational energies are large and the errors in

    these calculations are typically on the same order of the net interaction energy of the protein

    ligand complex. Although some of the errors are cancelled by subtraction of the internal

    energies before and after ligand binding, one is still subtracting two large numbers with

    relatively large errors to find a small one, the net binding free energy. Consistent with this

    view, ligands could achieve their maximal advantage over decoys on rescoring when we

    allowed only a 5 region around the binding site to relax [102].

    But still, relaxing the entire system is the more physically correct way to calculate these

    energies [102]. Our own results refers that the results with just minimum binding site

    relaxation has lower capability to distinguish between real actives and false positives (data not

    shown).

    Additionally some changes in polarity (due to some substitutes) could increase the solvation

    cost, but that could be not captured by the GBSA model. The challenges of balancing ligand

    electrostatic interaction energies and desolvation penalties were also apparent in any anionic

  • Computational Methods 36

    cavity [102]. Overall, the results of MMGBSA rescoring of docking hit lists on the model

    binding sites seem conflicted. On the one hand, rescoring could:

    1) rescue many docking false negatives

    2) improve the geometric fidelity of most of the predicted structures

    3) and increase the diversity of the hit lists.

    PBSA scoring as implemented in DOCK6 is considered as one of the best methods for

    rescoring nowadays. This method has proven to be very efficient in increasing the enrichment

    factors in spite of the approximation that it contains. Our reliance on fixed ligand

    conformations is another source of error in this work; we can improve the result by allowing

    the ligand conformational flexibility [109]. Also the calculations in PBSA scoring also do not

    correct for lost degrees of rotational and translational freedom on binding, nor do they

    consider gains in vibrational entropy of the system on ligand binding.

    Moreover it has not been investigated how the terms (vdw, electrostatics, and surface loss)

    should add up. In some cases, it would be possible that the desolvation penalizing of hydrogen

    bonding groups is not enough adequately [73]. The failure to adequately penalize neutral

    polarity also may stem from the use of an inductive method for calculating partial atomic

    charges [110]. So it could be thought that using quantum mechanically-derived partial atomic

    charges may improve matters [73, 111].

    Recently several studies have shown good results in the application of MM-GBSA and MM-

    PBSA rescoring methods [112-114]. Even in the absence of more intensive, detailed energy

    evaluation schemes, it is clear that fairly simple considerations can dramatically improve the

    ability to distinguish binders from non-binders. One of the improvements that could be done

    is trying to calculating desolvation penalties that reflected the degree of burial for each

    orientation of each ligand [73]. Correcting for solvation helps us to recognize more true

    inhibitors and fewer decoys in virtual screening for receptors of known structure.

    3.6 Docking Programs and Rescoring Methods

    GOLD 4.0 (Genetic Optimization for Ligand Docking) is an automated ligand docking

    program that uses a genetic algorithm for flexible ligand docking to a fixed protein structure.

  • Computational Methods 37

    Three different fitness and scoring functions are available with GOLD: Goldscore,

    Chemscore, and ASP score.

    An additional important feature of GOLD is the possibility to use docking constraints by

    several methods:

    1) Distance constraint, for use with individual ligands

    2) Substructure based distance constraint, for use with multiple ligands that have a common

    substructure or functional group.

    3) Hydrogen bond constraint, for specifying a hydrogen bond between a particular ligand

    atom and a particular atom in the protein.

    4) Protein hydrogen bond constraint, for specifying that a particular protein atom should be

    hydrogen-bonded to the ligand, but without specifying to which ligand atom.

    5) Region (hydrophobic) constraint, for biasing the docking towards solutions in which

    particular regions of the binding site are occupied by specific ligand atoms or types of ligand

    atom.

    6) Template similarity constraint, for biasing the conformation of docked ligands towards a

    given solution, or template.

    7) Scaffold constraint, to place a ligand fragment at an exact specified position in the binding

    site.

    Protein hydrogen bond constraint could be used efficiently to find an universal setting that

    enable us to perform virtual screening run with high enrichment factors and energetically

    preferred binding mode.

    3.6.1 PBSA Scoring using ZAP Library and AMBER-score

    ZAP library is a PBSA optimizer provided by OpenEye. The Poisson equation in this

    approach describes how electrostatic fields change in a medium of varying dielectric, such as

    an organic molecule in water. The Boltzmann modification is to take in consideration the

    effect of mobile charge, e.g. salt. PB is an effective way to simulate the effects of water in

    biological systems. It relies on a charge description of a molecule, the designation of low

    (molecular) and high (solvent) dielectric regions and a description of an ion-accessible

    volume and produces a grid of electrostatic potentials. From this, transfer energies between

    different solvents, binding energies, pka shifts, pI's, solvent forces, electrostatic descriptors,

    solvent dipole moments, surface potentials and dielectric focusing are calculated. As

  • Computational Methods 38

    electrostatics is one of the two principal components of molecular interaction (the other, of

    course, is the shape complementary factor), ZAP is OpenEye's attempt to solve the whole

    electrostatic energy as precise as possible.

    The AMBER-score includes the following terms: AMBER molecular mechanics, with

    implicit solvation, and molecular dynamics simulation, receptor flexibility, and conjugate

    gradient minimization. AMBER-score implements molecular mechanics implicit solvent

    simulations with the traditional all-atom AMBER force field for protein atoms and the general

    AMBER force field (GAFF) for ligand atoms. The interaction between the ligand and the

    receptor is represented by adding the electrostatic and the van der Waals energy terms,

    additionally the solvation energy is calculated using a Generalized Born (GB) solvation

    model. The user has the option to choose one of the following GB models: (i) Hawkins,

    Cramer and Truhlar pairwise GB model with parameters described by Tsui and Case (gb=1)

    [115], (ii) Onufriev, Bashford and Case model, GB (OBC) (gb=2) [116], and (iii) a modified

    GB (OBC) (gb=5) [75]. The surface area term is derived using a fast LCPO algorithm [117].

    The AMBER-score is calculated as:

    E (Complex) - [E (Receptor) + E (Ligand)]

    where E (Complex), E (Receptor), and E (Ligand) are respectively, the internal energies of the complex,

    receptor, and ligand (all solvated) as approximated by AMBER forcefield with GBSA

    solvation terms. The calculation of each of these three energies uses the same protocol:

    minimization with a conjugate gradient method is followed by MD simulation (Langevin

    molecular dynamics at constant temperature), another minimization, and a final energy

    evaluation. The user can specify the number of pre-MD-minimization cycles, the number of

    MD simulation steps, and the number of post-MD-minimization cycles in the dock input file.

    During the final energy evaluation, a surface area term is included. The receptor energy is

    determined once. The AMBER-score energy protocol is performed for every ligand and its

    corresponding complex.

    3.6.2 Cscore

    Scoring functions can be adapted from force field approaches, estimating the enthalpy of

    binding via the pair-energy of the complex. Other functions estimate the entropy of

    binding, incorporating terms for desolvation and loss of conformational flexibility.

    While such functions are more chemically appealing, they require significantly more

  • Computational Methods 39

    statistical fitting than those based on force fields. FlexX is an example of this second

    approach. Statistically-fit functions are dependent on their training set. Each author has

    tried to make this as general as possible, but concerns remain as to the extensibility of

    these functions to new systems. Since each scoring function has been derived from a

    different set of crystal structures, it is reasonable to use multiple functions when

    evaluating a protein-ligand pair.

    According to the consensus scoring principles, Structures which are considered good fits

    in multiple scoring functions can be examined further, while those which do not can be

    dropped. CScore approach could be used in Sybyl7.3 and 8.1 as consensus scoring in

    virtual high throughput screening [118]. CScore provides several functions:

    G_score, based on the work of Willett's group. D_score, based on the work of Kuntz et al. PMF_score, based on the work of Muegge and Martin. Chemscore, based on the work of Eldridge, Murray, Auton, Paolini, and Mee .

    The consensus can be generated from any combination of these or other previously-calculated

    scores. There is possibility to add FlexX scoring function to Cscore if the FlexX license is

    available.

    3.7 Similarity Search

    Several fingerprint systems are implemented in Chemical Computings package: Molecular

    Chemical fingerprints can be used to search in large compound databases for structurally

    related molecules to a given search query. Several fingerprint systems are implemented in

    Chemical Computings Molecular Operating Environment (MOE) [119]. Moreover each

    fingerprint system will support a number of similarity metrics and use different

    representation. Most important fingerprints systems are:

    1) MACCS Structural Keys (feature list version). Each feature indicates the presence of

    one of the 166 public MDL MACCS structural keys computed from the molecular

    graph. The fingerprint is represented as a sparse list of keys present in the molecule.

    2) Bit MACCS: MACCS Structural Keys (bit packed version). Each feature indicates

    the presence of one of the 166 public MDL MACCS structural keys calculated from

    the molecular graph. The fingerprint is a dense bit vector of feature bits 6 words long.

  • Computational Methods 40

    3) Protein Ligand Interactions Fingerprints: Each feature represents a protein-ligand

    interaction type, e.g. hydrogen bond or ionic interaction.

    4) PiDAPH3: 3-point pharmacophore based fingerprint calculated from a 3D

    conformation. Each atom is given one of 8 atom types computed from 3 atomic

    properties: "in pi system", "is donor", "is acceptor". Anions and cations are not

    represented. Then, all triplets of atoms are coded as features using the three inter-

    atomic distances and three atom types of each triangle. The resulting fingerprint is

    represented as a sparse feature list.

    5) piDAPH4: 4-point pharmacophore based fingerprint calculated from a 3D

    conformation. Each atom is given one of 8 atom types computed from 3 atomic

    properties: "in pi system", "is donor", "is acceptor". Anions and cations are not

    represented. Then, all quadruplets of atoms are coded as features using the six inter-

    atomic distances, four atom types and chirality of each quadruplet. The resulting

    fingerprint is represented as a sparse feature list.

    6) GpiDAPH3: 3-point pharmacophore based fingerprint calculated from the 2D

    molecular graph. Each atom is given one of 8 atom types computed from 3 atomic

    properties: "in pi system", "is donor", "is acceptor". Anions and cations are not

    represented. Then, all triplets of atoms are coded as features using the three graph

    distances and three atom types of each triangle. The resulting fingerprint is represented

    as a sparse feature list.

    Tanimoto similarity search could be later accomplished using MOE. Tanimoto similarity

    module calculates the similarity values for each target molecule with respect to one or more

    reference molecules using molecular fingerprints systems. The Tanimoto similarity search is

    defined by the expression: Similarity = Nab/ (Na+Nb+Nab)

    where : Nab is the number of fingerprint bits presented in both reference and target molecule,

    Na is the number of fingerprint bits presented only in the Reference molecule, Nb is the

    number of fingerprint bits presented only in the Target molecule. Tanimoto similarity index

    ranges from zero (no common bits) to one (exact same bits).

    3.8 ZINC Compound Library

    ZINC, is a free database of commercially-available compounds for virtual screening, provided

    from the University of California-San Francisco. The number of commercial compounds

    included in ZINC currently is over 8 million purchasable compounds in ready-to-dock, 3D

  • Computational Methods

    41

    formats. ZINC 8 is available currently on-line for download (http://zinc.docking.org). It is

    currently built from the catalogs of ten major compound vendors, and is updated periodically

    by deleting the unavailable compounds and updating the vendors lists or even adding new

    chemical vendors. Of these 8 Millions compounds, there are 5 Millions compounds which are

    Lipinski compliant [120] with the caveat that Molinspirations LogP has been used as a

    surrogate for cLogP. Of these, 1.1 Million are lead-like molecules, which are defined as

    having molecular weight between 150 and 350, calculated LogP less than four, number of

    hydrogen-bond donors less than or equal to three, and number of hydrogen-bond acceptors

    less than or equal to six. A total of 63 thousands molecules are fragment-like,

    - with calculated LogP values between -2 and 3

    - less than three hydrogen-bond donors

    - less than six hydrogen-bond acceptors

    - less than three rotatable bonds

    - molecular weight less than 250

    3.9 Fragment-based Drug Design

    Knowledge of how good a given fragment binds to a protein target, allows us to optimize the

    hits by growing the fragments or even by finding new leads by combining and linking

    different fragments. The main benefit of using fragments rather than small-molecules is the

    notable reduction of the space size as fragments contains less number of atoms.

    The fragment universe is much smaller in size than the chemical universe of small molecules.

    The size of the chemical universe of compounds below 160 Da is estimated to be about ~14

    million compounds [121]. So, screening a fragment library of 10,000 compounds captures

    substantially more chemical diversity space than a conventional high-throughput screening.

    An additional factor working in favour of fragment-based screening is that hypothesis

    proposed by Hann and co-workers [122], this hypothesis states that less complex molecules

    should show higher hit rates against protein targets. As a result, even though a typical

    fragment screen will only explore much less than 1% of the available low-molecular-mass

    universe, the ability to find leads is substantially higher and subsequently increases the value

    of the screen. This theoretical model has been recently validated by the Novartis group [123],

    in which the observed hit rates for fragment screens were 101,000 times higher than

    conventional high-throughput screens.

  • Implementation 42

    4 Implementation 4.1 Molecular Modeling

    Seven X-ray crystal structures are reported in the Protein Data Bank for mammalian AANAT.

    The three protein-ligand structures with the highest crystallographic resolution (PDB codes

    1CJW, 1KUV and 1KUX) represent suitable targets for virtual screening purposes [35]. All

    three structures contain a potent bi-substrate inhibitor of AANAT. 1KUX (resolution 1.8 )

    has been selected for the current virtual screening study. The coordinates of the protein were

    extracted from the corresponding pdb file. The inhibitor was removed, an