CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE Virtual Screening at the post-genomic era Dr. Didier...

Post on 31-Dec-2015

215 views 0 download

Transcript of CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE Virtual Screening at the post-genomic era Dr. Didier...

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Virtual Screening at the post-Virtual Screening at the post-genomic eragenomic era

Dr. Didier ROGNAN

Bioinformatic Group

UMR CNRS 7081

Illkirch, France

didier.rognan@pharma.u-strasbg.fr

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Virtual screening: DefinitionVirtual screening: Definition

Searching electronic databases (2D, 3D) for molecules fitting:

a pharmacophore

an active site

Walters et al. Drug Discovery Today 1998, 3, 160-178Schneider et al., Drug Discovery Today 2002, 7, 64-70.

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Sci Scientific reasonsntific reasons1. Increasing number of interesting macromolecular targets (500 10,000)2. Increasing number of protein 3-D structures (X-ray, NMR)3. Better knowledge of protein-ligand interactions4. Dévelopement of chem- and bio-informatic methods5. Increasing computing facilities

Economic reasons

1. High cost of high-througput screening (HTS): 0.2 € /molecule

2. Increase the ratio

ions Applications1. Identifying the very first ligands of orphan targets2. Identifying/optimizing new chemical scaffolds

Importance of virtual screeningImportance of virtual screening

# of active molecules (hits)

# of tested molecules

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Protein-based virtual screening Protein-based virtual screening

2. Evaluation

« Scoring »

Mol # Gbind

11121 -44.51 222 -42.21 3563 -41.50 6578 -40.31 25639 -40.28. .....100000 22.54

Database (3-D)

1. Orientation « docking »

Target-Ligand Complex

Target(3D !!)

Hit list

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Docking Docking

GoalQuickly find (1-2 min./molécule)

the orientation of the ligand in the active site the protein-bound conformation MéthodsOrientationSurface complementarityComplementarity of intermolecular interactions

Conformational freedomIncremental constructionConformational sampling (MC, GA, SA)

Abagyan et al. Curr. Opin. Struct. Biol. 2001, 5, 375-382

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Docking :Docking : OrientationOrientation

Surface-based orientation (e.g. DOCK)

2. Molecular surface (active site)

3. Filling the surface by overlapping spheres

4. Matching sphere centerswith atoms

1. 3D structure

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

http://cartan.gmd.de/flexx

Docking :Docking : OrientationOrientation

interactions-based orientation (e.g. FlexX)

-Statistical rules for locating ligand atoms

-Overall placement of a base fragment by triangulation

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Docking: Ligand flexibilityDocking: Ligand flexibility

- by preselecting several conformers/molecules

- by incremental construction

Termination adding the 2nd adding the 1st peripheral fragment peripheral fragment

Reading preferred torsion valuesSelecting the « best »

Ligand Fragment decomposition base fragment

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

- by a genetic algorithm (e.g. Gold)

http://www.ccdc.cam.ac.uk/prods/gold/

Initial population

Selection of parents

Genetic operators

Selection of children

New population

Convergence test

size

Parent ScoreA 2.5B 5.0C 1.5D 1.0

B

A CD

Survival rate

100110010010010011

100110011010011010

100110010

100101010

gene:

x,y,z coords.tors. anglesorientation…

crossing over

mutation

New

genera

tion

crossing over rate

mutation rate

# o

f evolu

tion

s

Chromosome = Ligand (orientation, conformation)

Docking: Ligand flexibilityDocking: Ligand flexibility

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Docking AccuracyDocking Accuracy

Analysing 100 high-resolution PDB complexes Paul,N. and Rognan, D. Proteins, in press

0 2 4 6 8 10 12 140

10

20

30

40

50

60

70

80

90

100

Accuracy of the best possible pose (n =30)

% o

f com

plex

es

rmsd, Å

Dock FlexX Gold ConsDock

Finding a reliable pose out of a set of 30-50 solutions is feasible !

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Docking AccuracyDocking Accuracy

0 2 4 6 8 10 12 140

10

20

30

40

50

60

70

80

90

100

Accuracy of the top-ranked pose

% o

f com

plex

es

rmsd, Å

Dock FlexX Gold ConsDock

Analysing 100 high-resolution PDB complexes Paul,N. and Rognan, D. Proteins, in press

Ranking the most reliable solution at the top of the list is still an issue !

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Source of Docking ErrorsSource of Docking Errors

Nature of the active site (flat vs. cavity)

Missed influence of waterLigand flexibilityInaccuracy of the scoring functionUnusual binding mode/interactions

Inadequate set of protein coordinatesWrong atom typing

Impossible

Difficult

Easy

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

ScoringScoring

Thermodynamic Methods: FEP, TI (2)

Force-fields (10-100)

QSAR, 3D-QSAR (100-1,000)

Empirical scoring functions (>100,000)

# of molecules

Err

or,

kJ/

mol

Accu

racy

2 1000 100,000

2

10

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

ScoringScoring

First-principle methods:sum of physically meaningfull terms

Regression-based free energy approximations:sum of regression-weighted terms

Potential of mean forcesdistance-dependent atom pair-weighted Helmotz free energies

Gohlke et al. Curr. Opin. Struct. Biol. 2001, 11,231-235

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Empirical Scoring functionEmpirical Scoring function

Constant

H-bond term

g1( r) =

0

0.25)/0.4-r(1

1

Å 0.65 r if

Å 0.65 r Å 0.25 if

0.25År if

g2( ) =

0

30)/50-α(1

1

º80 α if

80º α 30º if

º03α if

f(r) =

0

R1)/3.-r(1

1

R2 r if

R2r R1 if

R1r if

lipophilic term

buried-polar repulsive term

rotational term

0

,,,0 )()()()(2)(1 reacdesolvrotrot

LppL

PllPbp

LllLlipo

hbhbbinding GGHGrfrfGrfGgrgGGG

desolvation term

FresnoRognan et al. (1999) J. Med. Chem., 42, 4650-4658.

Hrot = 1 + (1-1/Nrot) r

(Pp(r) + P’p(r))/2

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Scoring AccuracyScoring Accuracy

Current accuracy: 5-10 kJ/mol (1-2 pK unit)

Weak point of all docking programs

Entropic contributions are difficult to handle ! !

Way-around: use of consensus scoring functions

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

S

S

Br

O

O

NH

H

O

Isis/Base

C[1](=C(C(=CS@1(=O)=O)SC[9]:C:C:C(:C:C:@9)Br)C[16]:C:C:C:C:C@16)N

2-D Fingerprint

Full database

FilteringChemical reactivtypharmacokinéticsDrug-likeness

C[1](=C(C(=CS@1(=O)=O)SC[9]:C:C:C(:C:C:@9)Br)C[16]:C:C:C:C:C@16)N

Filtered database

2D 3D

HydrogensIonisation

3-D Database

Library set-upLibrary set-up

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

ApplicationsApplications

High-resolution X-ray structures (enzymes)

Target Ligands Base Hit ReferenceRate

CD4-gp120 inhibitors 150,000 9.7 % Li et al., PNAS (1997)

gp41 inhibitors 20,000 12.5 % Debnath et al., J. Med. Chem. (1999)

FT inhibitors 219,000 19.0 % Perola et al., J. Med. Chem (2000)

kinesin inhibitors 20,000 12.5 % Hopkins et al., Biochemistry (2000)

HIV1 Tar-Tat inhibitors 153,000 25.0 % Filikov et al., JCAMD (2000)

gp41 inhibitors 20,000 12.5 % Debnath et al., J. Med. Chem

Bcl-2 inhibitors 207,000 20.0 % Enyedi et al., J. Med. Chem (2001)

HCA-II inhibitors 90,000 61.0 % Grüneberg et al., Angew. (2001)

RAR agonists 250,000 6.6 % Shapira et al., BMC Struct. Biol. (2001)

TPI inhibiteurs 108,000 20.0 % Joubert et al., Proteins (2001)

ER antagonists 1,500,000 72.0 % Shapira et al. IBM Sys. J. (2001)

FT: farnesyltransférase, HCA: human carbonic anhydrase, RAR: retonic acid receptor, ER:Estrogen receptor, TPI: triosephosphate isomerase, PEP: phosphoenolpyruvate

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Conclusions

What is possible ?What is possible ?

Discriminate true hits from random ligands Enriching a reduced library by a factor 20 Retrieving about 50% of all true hits Prioritizing ligands for synthesis and experimental screening Using virtual screening for lead finding

What remains to improve ?What remains to improve ?

Predicting the exact orientation Predicting the absolute binding free energy Discriminating true hits from “similar inactives“ Catching all hits Using virtual screening for lead optimization Throughput (100K mols/day 1M/day ?) Pre and post-processing of vHTS

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

Virtual screening at the genomic scaleVirtual screening at the genomic scale

Primary Sequence

3-D Model

virtual Hits

True Hits

SélectivityAffinityADME/Tox

GPCR-Gen

vHTS

Validation

Available analoguesFocussed Libraries

vs. Enzymes (PDB library)vs. RCPGs (RCPG library)

e-Libraries “Bioinfo” (350,000)

“RCPG” ( 30,000)

“Endo” ( 2,000)

Optimisation

RCPGs of the human genome

CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE

1012 molecules virtual Library

109

107

107 (108 conformations)

105 (106 conformations)

104

103

100

ADME/Tox

Similarité 2-D

Conformations 3-D

Similarity 3-D

Docking

Scoring

expt. Validation True hits

Virtual screening: TomorrowVirtual screening: Tomorrow