Criblage virtuel - unistra.frinfochim.u-strasbg.fr/FC/docs/HTS/FC_HTS_2010_print.pdf ·...
Transcript of Criblage virtuel - unistra.frinfochim.u-strasbg.fr/FC/docs/HTS/FC_HTS_2010_print.pdf ·...
Alexandre VarnekFaculté de Chimie, ULP, Strasbourg, FRANCE
• Criblage virtuel
CibleHTS
Criblage à haut débitHigh-throughput
screening Hits
Lead
Génomique
Analyse de données
Optimisation
Candidat au développement
Criblage à haut débit
Drug Discovery and ADME/Tox studies should be performed in parallel
idea target combichem/HTS hit lead candidate drug
ADME/Tox studies
Methodologies of a virtual screening
from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003
~106 – 109
molecules
VIRTUAL SCREENING
INACTIVES
HITS
CHEMICAL DATABASE
Virtual screening approaches
Similarity search
Filters
(Q)SAR
Docking
Pharmacophore models
~101 – 103
molecules
Criblage à haut débit (HTS)
Mots clés:
- Chimie combinatoire-Criblage à haut débit (High Throughput Screening (HTS))
- Screening virtuel- Aspect Drug-like- Training sets jusqu’à 1000000 composés
Virtual Screening
Molecules available for screening
(1) Real molecules1 - 2 millions in in-house archives of large pharma and agrochemical companies3 - 4 millions of samples available commercially
(2) Hypothetical moleculesVirtual combinatorial libraries (up to 1060 molecules)
Methods of virtual High-Throughput Screening
• Filters• Similarity search • Classification and regression structure –
property models• Docking
Filters to estimate “drug-likeness”
Lipinski rules for intestinal absorption (« Rules of 5 »)
• H-bond donors < 5 • (the sum of OH and NH groups);
• MWT < 500;
• LogP < 5
• H-bond acceptors < 10 (the sum of N and O atoms without H attached).
OO
HO
CH3
O
O
CH3
O
OO
O
H3C
H3C
O
O
HO
HN
O
H3CCH3
MW = 837logP=4.49HD = 3 HA = 15
Paclitaxel (Taxol): violation of 2 rules
The Rule of Five Revisited: Applying Log D in Place of Log P in Drug-Likeness FiltersS. K. Bhal, K. Kassam, I. G. Peirson, and G. M. Pearl , MOLECULAR PHARMACEUTICS, v.4, 556-560, (2007)
Utilizing pH dependent log D as a descriptor for lipophilicity in place of log P significantly increases the number of compounds correctly identified as drug-like using the drug-likeness filter: log D5.5 < 5
95% of all drugs are ionizable :75% are bases and 20% acids
logD vs logP
Synthetic Accessibility
is proportional to fragment’s occurrence in the PubChem database
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Altogether 605,864 different fragment types have been obtained by fragmenting the PubChem structures. Most of them (51%), however are singletons (present only once in the whole set). Only a relatively small number of fragments, namely 3759 (0.62%), are frequent (i.e. present more than 1000-times in the database).
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Frequency distribution of fragments
Synthetic Accessibility
The most common fragments present in the million PubChem molecules. The "A" represents any non-hydrogen atom, "dashed" double bond indicates an aromatic bond and the yellow circle marks the central atom of the fragment.
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Synthetic Accessibility
Synthetic Accessibility
Distribution of (- Sascore) for natural products, bioactive molecules and molecules from catalogues.
Correlation of calculated (-SAscore ) and average chemist estimation for 40 molecules (r2 = 0.890)
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Similarity Search:unsupervised and supervised approaches
2d (unsupervised) Similarity Search
0 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 1 0 1
1 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 1 0 1
Tanimoto coef
0.80NNN
NTB&ABA
B&A =−+
=
NO
N
S
N
O
OH
NO
N
S
N
O
OCl
H
molecular fingerprints
Contineous and Discontineous SAR
structural similarity “fading away” …
0.82
0.39
0.84
0.72
0.67
0.64
0.53
0.56
0.52
reference compounds
Structural Spectrum of Thrombin Inhibitors
small changes in structure have dramatic effects on activity
“cliffs” in activity landscapes
discontinuous SARscontinuous SARs
gradual changes in structure result in moderate changes in activity
“rolling hills” (G. Maggiora)
Structure-Activity Landscape Index: SALIij = ∆Aij / ∆Sij
∆Aij (∆Sij ) is the difference between activities (similarities) of molecules i and jR. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646
VEGFR-2 tyrosine kinase inhibitors
bad news for molecular similarity analysis...
MACCSTc: 1.00
Analog
6 nM
2390 nM
small changes in structure have dramatic effects on activity
“cliffs” in activity landscapeslead optimization, QSAR
discontinuous SARs
Example of a “Classical” Discontinuous SAR
Adenosine deaminase inhibitors
(MACCS Tanimoto similarity)
Any similarity method must recognize thesecompounds as being “similar“ ...
Supervised Molecular Similarity Analysis
Dynamic Mapping of Consensus Positions
Prototypic “mapping algorithm” for simplified binary-transformed* descriptor spaces
Uses known active compounds to create activity-dependent consensus positions in chemical space
Operates in descriptor spaces of step-wise increasing dimensionality (“dimension extension”)
Selects preferred descriptors from large pools* median-based, i.e. assign “1” to a descriptor if its value is greater than (or equal
to) its screening database median; assign “0” if it is smaller
Godden et al. & Bajorath. J Chem Inf Comput Sci 44, 21 (2004)
DMC AlgorithmCalculate and binary transform descriptors
Compare descriptor bit strings of reference molecules and determine consensus bits
Select DB compounds matching consensus bits
Re-generate bit stringspermitting bit variability
Select DB compounds matching extended bit strings
Repeat until a small selection set is obtained
Descriptor bit strings for reference molecules
1. Dimension extension:
= 1.0 or = 0.0no variability
≥ 0.9 or ≤ 0.110% variability
≥ 0.8 or ≤ 0.220% variability
e.g. 0%, 10%, 20% permitted bit variability:longer bit strings – fewer matching DB compounds
…
Calculate consensus bit string:
2. Dimension extension:
(white “0”, black “1” gray, variably set bits)012
QSAR/QSPR models
Libraries profiling: indexing a database by simultaneous assessment of various activities
(Prediction of Activity Spectra for Substances)
PASS software
Example:
iiii inactact
actw +=For each fragment i
PASS
Calculations of « P(act) » and « P(inact) »
Molecule is considered as active if P(act) > P(inact) or/and P(act) > 0.7
Naïve Bayes estimator
Quantitative Structure-Property Relationships(QSPR)
Y = f (Structure) = f (descriptors)
QSPR restricts reliable predictions for compounds which are similar to those used for the obtaining the models.
Similarity / pharmacophore search approaches are still inevitable as complementary tools
Combinatorial Library Design
Virtual Screening... when target structure is unknown
Virtual library Screening library
Diverse
Subset
Parallel synthesis
or
synthesis of single
compounds
Design of
focussed library
Screening
HTS
Hits
Generation of Virtual Combinatorial Libraries
if R1, R2, R3 = and then
Markush structure PO
R1 R3R2
PO
PO
PO
PO
PO
PO
PO
PO
Fragment Marking approach
1. Substituent variation (R1)2. Position variation (R2)3. Frequency variation 4. Homology variation (R3)
(only for patent search)OH
R1
R3(CH2)n
Cl
R2
n = 1 – 3
R2 =NH2
R3 = alkyl orheterocycle
R1 = Me, Et, Pr
The types of variation in Markush structures:
Generation of Virtual Combinatorial Libraries
Reaction transform approach
from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003
Issues and Concepts in Combinatorial Library Design
• Size of the library
• Coverage of properties („chemical space“)
• Diversity, Similarity, Redundancy
• Descriptor validation
• Subset selection from virtual libraries
Hot topics in chemoinformatics
Predictions vs interpretation
Public availability of chemoinformatics tools
- multi-component synergistic mixtures, new materials, metabolic pathways, ...
QSAR of complex systems
New approaches in structure-property modeling
- descriptors,- applicability domain- machine-learning methods (inductive learning transfer, semi-supervised learning, ....)
New techniques to mine chemical reactions
Nathan BROWN “Chemoinformatics—An Introduction for Computer Scientists”ACM Computing Surveys, Vol. 41, No. 2, Article 8, February 2009
Predictions vs interpretation
Predictions vs interpretation
• Ensemble modeling• Non-linear machine-learning methods (SVM, NN, …)• Descriptors correlations
Problems :
• Reliable estimation (prediction) of the given property.
What do end users expect from QSAR models ?
Public accessibility of models:WEB based platform for virtual
screening
Some Screen Shots: Welcome Page…
ISIDA property prediction WEB serverinfochim.u-strasbg.fr/webserv/VSEngine.html
http://infochim.u-strasbg.fr/webserv/VSEngine.html
ISIDA ScreenDB tools
-only INTERNET browser is required
-Different descriptors-(ISIDA fragments, FPT, ChemAxon)
- Similarity search with differentmetrics (Tanimoto, Dice, …)
- ensemble modeling approach(simulteneous application of severalmodels)
- models applicability domain(automatic detection of useless
models)
The most fundamental and lasting objective of synthesis is not
production of new compounds but production of properties
George S. HammondNorris Award Lecture, 1968