Criblage virtuel - unistra.frinfochim.u-strasbg.fr/FC/docs/HTS/FC_HTS_2010_print.pdf ·...

Alexandre VarnekFaculté de Chimie, ULP, Strasbourg, FRANCE

• Criblage virtuel

CibleHTS

Criblage à haut débitHigh-throughput

screening Hits

Lead

Génomique

Analyse de données

Optimisation

Candidat au développement

Criblage à haut débit

Drug Discovery and ADME/Tox studies should be performed in parallel

idea target combichem/HTS hit lead candidate drug

ADME/Tox studies

Methodologies of a virtual screening

from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003

~106 – 109

molecules

VIRTUAL SCREENING

INACTIVES

HITS

CHEMICAL DATABASE

Virtual screening approaches

Similarity search

Filters

(Q)SAR

Docking

Pharmacophore models

~101 – 103

molecules

Criblage à haut débit (HTS)

Mots clés:

- Chimie combinatoire-Criblage à haut débit (High Throughput Screening (HTS))

- Screening virtuel- Aspect Drug-like- Training sets jusqu’à 1000000 composés

Virtual Screening

Molecules available for screening

(1) Real molecules1 - 2 millions in in-house archives of large pharma and agrochemical companies3 - 4 millions of samples available commercially

(2) Hypothetical moleculesVirtual combinatorial libraries (up to 1060 molecules)

Methods of virtual High-Throughput Screening

• Filters• Similarity search • Classification and regression structure –

property models• Docking

Filters to estimate “drug-likeness”

Lipinski rules for intestinal absorption (« Rules of 5 »)

• H-bond donors < 5 • (the sum of OH and NH groups);

• MWT < 500;

• LogP < 5

• H-bond acceptors < 10 (the sum of N and O atoms without H attached).

OO

HO

CH3

O

O

CH3

O

OO

O

H3C

H3C

O

O

HO

HN

O

H3CCH3

MW = 837logP=4.49HD = 3 HA = 15

Paclitaxel (Taxol): violation of 2 rules

http://www.cas.vanderbilt.edu/bioimages/image/t/tabr2-wp42574.htm�

http://www.cas.vanderbilt.edu/bioimages/biohires/t/htabr2-br42587.JPG�

The Rule of Five Revisited: Applying Log D in Place of Log P in Drug-Likeness FiltersS. K. Bhal, K. Kassam, I. G. Peirson, and G. M. Pearl , MOLECULAR PHARMACEUTICS, v.4, 556-560, (2007)

Utilizing pH dependent log D as a descriptor for lipophilicity in place of log P significantly increases the number of compounds correctly identified as drug-like using the drug-likeness filter: log D5.5 < 5

95% of all drugs are ionizable :75% are bases and 20% acids

logD vs logP

Synthetic Accessibility

is proportional to fragment’s occurrence in the PubChem database

Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8

Altogether 605,864 different fragment types have been obtained by fragmenting the PubChem structures. Most of them (51%), however are singletons (present only once in the whole set). Only a relatively small number of fragments, namely 3759 (0.62%), are frequent (i.e. present more than 1000-times in the database).


Frequency distribution of fragments


http://www.jcheminf.com/content/1/1/8/figure/F3�

The most common fragments present in the million PubChem molecules. The "A" represents any non-hydrogen atom, "dashed" double bond indicates an aromatic bond and the yellow circle marks the central atom of the fragment.




Distribution of (- Sascore) for natural products, bioactive molecules and molecules from catalogues.

Correlation of calculated (-SAscore ) and average chemist estimation for 40 molecules (r2 = 0.890)


Similarity Search:unsupervised and supervised approaches

2d (unsupervised) Similarity Search

0 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 1 0 1

1 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 1 0 1

Tanimoto coef

0.80NNN

NTB&ABA

B&A =−+

=

NO

N

S

N

O

OH

NO

N

S

N

O

OCl

H

molecular fingerprints

Présentateur

Commentaires de présentation

Recherche par similarité; comparaison des clés structurales;

Contineous and Discontineous SAR

structural similarity “fading away” …

0.82

0.39

0.84

0.72

0.67

0.64

0.53

0.56

0.52

reference compounds

Structural Spectrum of Thrombin Inhibitors

small changes in structure have dramatic effects on activity

“cliffs” in activity landscapes

discontinuous SARscontinuous SARs

gradual changes in structure result in moderate changes in activity

“rolling hills” (G. Maggiora)

Structure-Activity Landscape Index: SALIij = ∆Aij / ∆Sij

∆Aij (∆Sij ) is the difference between activities (similarities) of molecules i and jR. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646

VEGFR-2 tyrosine kinase inhibitors

bad news for molecular similarity analysis...

MACCSTc: 1.00

Analog

6 nM

2390 nM

small changes in structure have dramatic effects on activity

“cliffs” in activity landscapeslead optimization, QSAR

discontinuous SARs

Example of a “Classical” Discontinuous SAR

Adenosine deaminase inhibitors

(MACCS Tanimoto similarity)

Any similarity method must recognize thesecompounds as being “similar“ ...

Supervised Molecular Similarity Analysis

Dynamic Mapping of Consensus Positions

Prototypic “mapping algorithm” for simplified binary-transformed* descriptor spaces

Uses known active compounds to create activity-dependent consensus positions in chemical space

Operates in descriptor spaces of step-wise increasing dimensionality (“dimension extension”)

Selects preferred descriptors from large pools* median-based, i.e. assign “1” to a descriptor if its value is greater than (or equal

to) its screening database median; assign “0” if it is smaller

Godden et al. & Bajorath. J Chem Inf Comput Sci 44, 21 (2004)

DMC AlgorithmCalculate and binary transform descriptors

Compare descriptor bit strings of reference molecules and determine consensus bits

Select DB compounds matching consensus bits

Re-generate bit stringspermitting bit variability

Select DB compounds matching extended bit strings

Repeat until a small selection set is obtained

Descriptor bit strings for reference molecules

1. Dimension extension:

= 1.0 or = 0.0no variability

≥ 0.9 or ≤ 0.110% variability

≥ 0.8 or ≤ 0.220% variability

e.g. 0%, 10%, 20% permitted bit variability:longer bit strings – fewer matching DB compounds

…

Calculate consensus bit string:

2. Dimension extension:

(white “0”, black “1” gray, variably set bits)012

QSAR/QSPR models

Libraries profiling: indexing a database by simultaneous assessment of various activities

(Prediction of Activity Spectra for Substances)

PASS software

Example:

iiii inactact

actw +=For each fragment i

PASS

Calculations of « P(act) » and « P(inact) »

Molecule is considered as active if P(act) > P(inact) or/and P(act) > 0.7

Naïve Bayes estimator

Quantitative Structure-Property Relationships(QSPR)

Y = f (Structure) = f (descriptors)

QSPR restricts reliable predictions for compounds which are similar to those used for the obtaining the models.

Similarity / pharmacophore search approaches are still inevitable as complementary tools

Combinatorial Library Design

Virtual Screening... when target structure is unknown

Virtual library Screening library

Diverse

Subset

Parallel synthesis

or

synthesis of single

compounds

Design of

focussed library

Screening

HTS

Hits

Generation of Virtual Combinatorial Libraries

if R1, R2, R3 = and then

Markush structure PO

R1 R3R2

PO

PO

PO

PO

PO

PO

PO

PO

Fragment Marking approach

1. Substituent variation (R1)2. Position variation (R2)3. Frequency variation 4. Homology variation (R3)

(only for patent search)OH

R1

R3(CH2)n

Cl

R2

n = 1 – 3

R2 =NH2

R3 = alkyl orheterocycle

R1 = Me, Et, Pr

The types of variation in Markush structures:

Generation of Virtual Combinatorial Libraries

Reaction transform approach

from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003

Issues and Concepts in Combinatorial Library Design

• Size of the library

• Coverage of properties („chemical space“)

• Diversity, Similarity, Redundancy

• Descriptor validation

• Subset selection from virtual libraries

Hot topics in chemoinformatics

Predictions vs interpretation

Public availability of chemoinformatics tools

- multi-component synergistic mixtures, new materials, metabolic pathways, ...

QSAR of complex systems

New approaches in structure-property modeling

- descriptors,- applicability domain- machine-learning methods (inductive learning transfer, semi-supervised learning, ....)

New techniques to mine chemical reactions

Présentateur


Schematiquement,

Nathan BROWN “Chemoinformatics—An Introduction for Computer Scientists”ACM Computing Surveys, Vol. 41, No. 2, Article 8, February 2009



• Ensemble modeling• Non-linear machine-learning methods (SVM, NN, …)• Descriptors correlations

Problems :

• Reliable estimation (prediction) of the given property.

What do end users expect from QSAR models ?

Public accessibility of models:WEB based platform for virtual

screening

Présentateur


Schematiquement,

Some Screen Shots: Welcome Page…

ISIDA property prediction WEB serverinfochim.u-strasbg.fr/webserv/VSEngine.html

http://infochim.u-strasbg.fr/webserv/VSEngine.html

ISIDA ScreenDB tools

-only INTERNET browser is required

-Different descriptors-(ISIDA fragments, FPT, ChemAxon)

- Similarity search with differentmetrics (Tanimoto, Dice, …)

- ensemble modeling approach(simulteneous application of severalmodels)

- models applicability domain(automatic detection of useless

models)

The most fundamental and lasting objective of synthesis is not

production of new compounds but production of properties

George S. HammondNorris Award Lecture, 1968

Criblage virtuel - unistra.frinfochim.u-strasbg.fr/FC/docs/HTS/FC_HTS_2010_print.pdf ·...

Documents

Transcript of Criblage virtuel - unistra.frinfochim.u-strasbg.fr/FC/docs/HTS/FC_HTS_2010_print.pdf ·...