Roaming through natural products space with JChem & KNIME · ChemAxon UGM 2016. Nestlé Institute...
Transcript of Roaming through natural products space with JChem & KNIME · ChemAxon UGM 2016. Nestlé Institute...
Roaming through natural products
space with JChem & KNIME
Jasna Klicic Badoux
ChemAxon UGM 2016
Nestlé Institute of Health Sciences (NIHS)
Developing the scientific foundation of how nutrition can be used to
empower people to improve and maintain their health
• founded in 2011
• based near Lausanne, Switzerland
• over 150 scientists
Confidential
Exploring the link between nutrition and health
molecular level
Nakaya H., Nat. Immun. 2011, 12, 786
cpd libs
+ HTS
Confidential
Confidential
Chemical space of natural products
Chemical space of natural products: size
10641081063105
est. d
rug-lik
e m
ole
cule
s
public
data
bases o
f know
n c
pds
Chem
Spid
er, P
ubC
hem
typic
al big
pharm
a c
pd c
olle
ction
ChE
MB
L
data
bases o
f natu
ral pro
ducts
and
meta
bolit
es:
DN
P,
HM
DB
,
Lip
idM
aps
number of molecules
Confidential
Natural products vs. drug-like molecules: properties
Grabowski K., Nat. Prod. Rep. 2008, 25, 892
17%
property DNP drug-like
property variability +++ (size, polarity) ---
HB donors & acceptors +++ ---
aromatic atoms 1 in 6 1 in 2
heteroatoms 6 O & 1 N per mol 3 O & 3 N per mol
chiral centers 6 per mol 1 per mol
Lipinski rule-of-five 74% comply 100% (by definition)
Confidential
Molecular frameworks
Confidential
keep rings and ring linkers
keep exocyclic double bonds
mutate all atoms to C
make all bonds single
molecule
atomic scaffold
graph scaffold
G.W. Bemis & M.A. Murcko
J. Med. Chem. 1996, 39, 2887
Natural products vs. drug-like molecules: scaffolds
overall broader scaffold diversity for druglike molecules than for
natural products !
68%
1 mol/scaff
51 338 atomic scaffolds(2.4 mol/scaff)
121 593 drug-like molecules
31 050 atomic scaffolds(3.8 mol/scaff)
128 600 natural products
acyclic
2%
≥ 2
mo
l/sca
ff
acyclic
9%
59%
1 mol/scaff
≥ 2
mo
l/sca
ff
11%
accounts for
half the library
4%
Gra
bo
wskiK
., N
at.
Pro
d. R
ep
.2
00
8, 2
5, 8
92
Confidential
molecular and phys-
chem properties very similar
to DNP averages
Property DNP NIHS
Aromatic atoms 5.1 (7.2) 6.6 (6.8)
Chiral atoms 5.9 (6.6) 6.1 (6.9)
Heavy atoms 30.6 (17) 31.3 (16)
Oxygen atoms 6.5 (5.9) 7.6 (5.9)
Nitrogen atoma 0.5 (1.5) 0.5 (1.3)
HB acceptors 6 (5.5) 7.1 (5.6)
HB donors 3.1 (3.6) 3.6 (3.6)
Lipinski 0.8 (1.1) 0.9 (1.2)
MW 430 (246) 440 (230)
Rigid bonds 26.8 (16) 27.9 (15)
Rotatable bonds 6.1 (6.7) 6 (5.3)
Rings 3.4 (2.3) 3.6 (2.1)
logP 2.9 (3.2) 2.1 (2.7)
TPSA 109 (96) 125 (94)
CRC Dictionary of Natural Products
(DNP 24.2, Taylor & Francis Group)
~ 250'000 cpds
NIHS samples library
~20'000 pure cpds
NIHS sample library compared to DNP: properties
Confidential
NIHS sample library compared to DNP: scaffolds
relatively more scaffold singletons & higher scaffold diversity for NIHS vs DNP
8 791 atomic scaffolds(1.9 mol/scaff)
18 082 natural products
acyclic
8%
77%
1 mol/scaff
≥ 2
mo
l/sca
ff
4%
accounts for
half the library
17%
31 050 atomic scaffolds(3.8 mol/scaff)
128 600 natural products
acyclic
9%
59%
1 mol/scaff
≥ 2
mo
l/sca
ffConfidential
Favorite JChem KNIME nodes
Confidential
Favorite JChem KNIME nodes: Standardizer
Confidential
• strip salts
• neutralize
• tauromerize
• aromatize
• ...
Favorite JChem KNIME nodes: Chemical Terms
Confidential
• molecular and phys-chem properties
• Bemis-Murcko framework
Favorite JChem functions: conversion
Confidential
in particular: name to structure
Library subset selection
Confidential
How to look for a representative subset ?
Objective: select 15% of representative and "best chance" compounds
from the in-house pure compound collection
NIHS collection 15% subset?
Select molecules that:
a) are bioavailable (cellular assay)
b) have a better chance to have a good binding energy
c) representative of diverse natural product chemical classes
Confidential
How to look for a representative subset ?
Objective: select 15% of representative and "best chance" compounds
from the in-house pure compound collection
Select molecules that:
a) are bioavailable (cellular assay)
b) have a better chance to have a good binding energy
c) representative of diverse natural product chemical classes
Confidential
15% subsetNIHS collection bioavailability
filter
"best chance"
filtersdiversity
subsets
Lipinski's "rule of five"
empirical rule formulated by Christopher A. Lipinski in 1997
most orally administered drugs are relatively small and moderately lipophilic molecules
indication of compound's bioavailability
MW < 500
clogP ≤ 5
HBD ≤ 5
HBA ≤ 10
No more than one violation
Confidential
Protein-ligand binding step-by-step
+ΔG
ΔGbind = ΔGComplex* – ( ΔGProt* + ΔGLig* )
* = hydrated
ΔGbind = ΔHbind – TΔSbind
Confidential
Protein-ligand binding step-by-step
+
Confidential
forming protein-ligand
bonds ΔH
Protein-ligand binding step-by-step
+
breaking ligand-
water bonds ΔH
ordered water to
bulk water ΔS
+ +
constraining ligand
and protein ΔS
What leads to good binding energy:
a) unfavorable ligand hydration in free state
b) low increase of entropy upon adapting the binding conformation
c) favorable protein-ligand interactions
Confidential
Protein-ligand binding step-by-step
What leads to good binding energy:
a) unfavorable ligand hydration in free state
b) low increase of entropy upon adapting the binding conformation
c) favorable protein-ligand interactions
non-specific
interaction
highly planar
molecules
few polar atoms
(HBD/A, aromaticity)
low permeabilityvery polar/charged
molecules
low binding affinity long flexible chains
+ ΔG
Confidential
"Best chance" filters - excluded molecules
1) number of aromatic rings < 5
2) no more than 90% of aromatic (heavy) atoms
3) at least 2 HB donors or acceptors
Confidential
Additional filters – examples of excluded molecules
4) clogP at least -1
5) contains at least 1 ring or fewer than 8 rotatable bonds
TBD
Confidential
Chemical class assignment - method
Chemical class analysis of Dictionary of Natural Products (DNP, version 24.2),
250k cpds
DNP entry
atomic scaffold
flavonoid
chemical class
scaffold –
chem. class45'000 entries
13% entries have multiple
class assignments per
scaffold
Confidential
Chemical class assignment - results
DNP
NIHSscaffold –
chem. classNIHS collection
1) chemical class assignment of compounds in NIHS collection
2) chemical class distribution of NIHS collection, compared to DNP
Legend
Confidential
Compond selection method - implementation
1) bioavailability filter: Lipinski rule-of-five
2) additional filters to prioritize "best-chance" compounds
3) division into subsets and class assignment; final selection of 25% per subset/class
based on chemical diversity
16'500 cpds
NIHS collection2500 cpds
12'400
cpds10'700
cpds
a) class assignment
b) multiple class assign.
c) no class assignment
Lipinskidiversity
(3 subsets)
"best
chance"
filters
Confidential
Bioavailability of CNS bioactives - CNS-MPO
Wager T.T., ACS Chem. Neurosci. 2010, 1, 435
analyzed physchem/molecular properties of 119
marketed CNS drugs and 108 Pfizer CNS
candidates (2010):
majority of the CNS drugs have properties in
the specific range
Confidential
CNS-MPO score
monotonic decreasing "bump"
Wager T.T., ACS Chem. Neurosci. 2010, 1, 435
Confidential
Implementation in
JChem/KNIME
CNS-MPO score of Dotmatics collection
favorable score for
44% of the library
1 2 3 4 5 6
177540
4247 4185
34523853
≤
16'500 cpds
NIHS
collection
5000 cpds7'708
cpds6'754
cpds
CNS
MPOdiversity3 subsets
7'730
cpds
Lip
inski
Compound selection
"best
chance"
filters
Confidential
JChem & KNIME: the perfect marriage
Confidential
Targeted metabolites and their analogs
Lipinski, additional filters
295
metabolites
from DNP
137
similar cpds in NIHS
library (Tan. 0.75)
725
Confidential
Targeted metabolites and analogs: implementation
Confidential
Confidential
T H A N K Y O U !
NIHS
Jaroslaw Szymczak
Terry Reilly
Laurent Dobler
Denis Barron
Sofia Moço
Yann Ratinaud
Loraine Merimod
Radovan Chytracek
Bruce O'Neel
ChemAxon
Akos Tarcsay
Anna Forró
Ivan Harmath
Árpád Figyelmesi
Tamas Pelcz
KNIME
Jon Fuller