Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate...
-
Upload
pierce-short -
Category
Documents
-
view
214 -
download
1
Transcript of Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate...
Cheminformatics in Drug Discovery and Chemical Genomics Research
Weifan Zheng, Ph.D.Associate Professor
Department of Pharmaceutical SciencesBRITE Institute, NC Central University
Adjunct Associate ProfessorDepartment of Medicinal Chemistry
University of North Carolina at Chapel Hill
UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered
Biotech/Pharma Orphan Disease Chemical Genomics
Computational Needs
Compound Collection Docking Scoring Data Analytics
CECCR Cheminformatics Center
UKY Seminar Weifan Zheng, Ph.D.
Drug Discovery & Development Pipeline
UKY Seminar Weifan Zheng, Ph.D.
Phases and Costs of Drug Discovery
UKY Seminar Weifan Zheng, Ph.D.
• GR: Genetic Research; DR: Discovery Research; DD: Drug Discovery • CADD: computer-assisted drug discovery• ADMET: Absorption, distribution, metabolism, elimination, toxicity
Drug Discovery Process and the Roles of CADD
GR DR DD Preclin
IND
I II III
T H L CH2L LOT2H
CADD
Clinical trials
UKY Seminar Weifan Zheng, Ph.D.
Human Genome Project Success
“Genome announcement 'technological triumph'Milestone in genetics ushers in new era of discovery, responsibility”
CNN, June 26, 2000
UKY Seminar Weifan Zheng, Ph.D.
Chemogenomics/Chemical Genomics
Chris AustinF. Collins
UKY Seminar Weifan Zheng, Ph.D.
• Chemogenomics – 69,000 in google (Oct.16, 2006)
• Chemical genomics – 113,000 in google (Oct.16, 2006)
• Chemical biology – 4,210,000 (Oct.16, 2006)
• Chemical genetics– 104,000 (Oct.16, 2006)
Chemical Genomics
UKY Seminar Weifan Zheng, Ph.D.
Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.
Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.
to create a national resource in chemical probe development. The center uses the latest industrial-scale technologies to collect data that is useful for defining the cross-section between chemical space and biological activity (and do soon genomic scale).
Chemical SynthesisCenters
Chemical SynthesisCenters
MLIMLI
MLSCN (9+1)9 centers 1 NIH intramural20 x 10 = 200 assays
MLSCN (9+1)9 centers 1 NIH intramural20 x 10 = 200 assays
PubChem(NLM)
PubChem(NLM)
ECCR (6)ExploratoryCenters
ECCR (6)ExploratoryCenters
CombiChemParallel synthesis
DOS4 centers + DPI
100K – 1M compounds
CombiChemParallel synthesis
DOS4 centers + DPI
100K – 1M compounds
compounds
200 assays
SAR matrix
NIH Molecular Library Initiative
UKY Seminar Weifan Zheng, Ph.D.
N
O
O
O
R1
• Biochemical assays• Cell-based functional assays• Phenotypic assays
• Databases– PubChem (http://pubchem.ncbi.nlm.nih.gov/)– ChemBank (http://chembank.broad.harvard.edu/)
– WOMBAT (http://sunsetmolecular.com/index.php)– Jubilant (http://www.jubilantbiosys.com/)– Gvk/Bio (http://www.gvkbio.com/)
Biological Assay Data
UKY Seminar Weifan Zheng, Ph.D.
VirtualLibraries
Diverse Lib Design
Targeted Lib Design
CombinatorialSynthesis
HTS
KDD(QSAR, P.R.)
Rules
RealLibraries
SAR Data
Drug DiscoveryChemical Genomics
Logistics
Sci
entif
icHigh Throughput Chemistry and Screening: Informatics
UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered
Biotech/Pharma Orphan Disease Chemical Genomics
Computational Needs
Compound Collection Docking Scoring Data Analytics
CECCR Cheminformatics Center
UKY Seminar Weifan Zheng, Ph.D.
3,0003 / 1,000 per week = ~0.5 million years!!!• Library Design: rational selection of a subset
of building blocks to obtain a maximum amount of information
(3000) R1
R2 (3000)
R3 (3000)
Challenges in Combinatorial Chemistry
UKY Seminar Weifan Zheng, Ph.D.
Design for Activity: Similarity
• If we know a compound is active, and we want to design a set of compounds that may be active against the same target, we may select– A set of compounds that are similar to the
active compound
• The similarity principle: similar compounds should have similar biological activity
UKY Seminar Weifan Zheng, Ph.D.
X1 X2 X3 • • • X20
Str. 1 2 5 1 • • • 4Str. 2 4 7 9 • • • 7Str. 3 1 6 8 • • • 6
• • • • • • • •• • • • • • • •• • • • • • • •
Str.100 0 3 5 • • • 1
123
X1
X2
Molecular Identity and Molecular Similarity
UKY Seminar Weifan Zheng, Ph.D.
Design for General Application: Diversity
UKY Seminar Weifan Zheng, Ph.D.
- Maxi Min- Minimize (Sum 1/Dij*Dij)
Similarity and Diversity
UKY Seminar Weifan Zheng, Ph.D.
0
2
4
6
8
10
12
Nu
mb
er
of
Clu
ste
r H
its
5s 5r 10s 10r 15s 15r 20s 20r 25s 25r 30s 30r
Number of Active Clusters
40
80
120
160
200
Cluster Hits Obtained by SAGE and Random Sampling
UKY Seminar Weifan Zheng, Ph.D.
Drug Discovery & Development Failures
Venkatesh & Lipper, J. Pharm. Sci. 89, 145-154 (2000)
poor PK
efficacy
Tox
Market
39%
29%
21%6%
UKY Seminar Weifan Zheng, Ph.D.
Multi-Factorial Design
00.10.20.30.40.50.60.70.80.9
1
score
UKY Seminar Weifan Zheng, Ph.D.
)()( SEwSE ii
Total Score is the Weighted Sum of Individual Terms
UKY Seminar Weifan Zheng, Ph.D.
Penalty Scores
Iteration
Initial Library
Better Library
Optimal Library
Lipinski PropertiesP450 Activity
Diversity
R1 R2
R1
R2
R1
R2
R1
R2
Initial Ten solutions (undesigned)
The final ten solutions (well designed)
clogP
Designed Library Has a Better MW-clogP Distribution
X1 X2 X3 • • • X20
Str. 1 2 5 1 • • • 4Str. 2 4 7 9 • • • 7Str. 3 1 6 8 • • • 6
• • • • • • • •• • • • • • • •• • • • • • • •
Str.100 0 3 5 • • • 1
123
X1
X2
Molecular Identity and Molecular Similarity
UKY Seminar Weifan Zheng, Ph.D.
• Iterative Random Sampling
OriginalSpace
EmbeddingSpace (2D)
a b
D(a,b) D’(a,b)
If D’ > D, move a, b closerIf D’ < D, move a, b apart
SPE Algorithm (Agrafiotis)
UKY Seminar Weifan Zheng, Ph.D.
Chemical Space - Compound Collection Comparison
UKY Seminar Weifan Zheng, Ph.D.
Chemical Space - Compound Collection Comparison
UKY Seminar Weifan Zheng, Ph.D.
Chemical Space - Compound Collection Comparison
UKY Seminar Weifan Zheng, Ph.D.
SPE Embedding of ChemSpace
UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered
Biotech/Pharma Orphan Disease Chemical Genomics
Computational Needs
Compound Collection Docking Scoring Data Analytics
CECCR Cheminformatics Center
UKY Seminar Weifan Zheng, Ph.D.
Quantitative Structure-Activity Relationship (QSAR)
Structures Activity
str1 a1
str2 a2
str3 a3
str4 a4
str5 a5
str6 a6
str7 a7
str8 a8
str9 a9
str10 a10
..
.
.
...
.
.predict
actu
al
..
.
.
.
.
..
predict
actu
al
q2=0.8R2=0.75
Multiple Linear regression (MLR); partial least square (PLS); Artificial neural nets; k-nearest neighbor (kNN)
UKY Seminar Weifan Zheng, Ph.D.
• Structurally similar compounds should have similar biological activities
• Biological similarities are often due to similarities of substructures (pharmacophore)
• Biological activities can be estimated from molecular similarities, which are calculated with pharmacophore-specific descriptors
Basic Assumptions of KNN-QSAR Method
UKY Seminar Weifan Zheng, Ph.D.
00.10.20.30.40.50.60.70.80.9
q2
AChE(60) 5HT1A(14) DHFR(23) D1 ANT (29)
Dataset
CoMFA/q2-GRSGA-PLSkNN-QSAR
Comparison of CoMFA, GA-PLS, and KNN-QSAR
UKY Seminar Weifan Zheng, Ph.D.
01020304050
60708090
100
0 20 40 60 80 100
%Screened
%A
ctiv
e R
etri
eved
%Random
%Retrieved
QSAR Based Virtual Screening for GPCR Ligand Design
UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered
Biotech/Pharma Orphan Disease Chemical Genomics
Computational Needs
Compound Collection Docking Scoring Data Analytics
CECCR Cheminformatics Center
UKY Seminar Weifan Zheng, Ph.D.
Docking and Scoring
• Early 1980’s, Kuntz, I.D. developed the first computerized molecular docking program: DOCK
• GOLD, FRED,
GLIDE, FLEXX, AutoDock, ICM
X-raystructure
1. Use Delaunay tessellation to derive geometrical chemical descriptors of protein ligand interface
2. Establish correlation between the geometrical chemical descriptors and protein-ligand binding affinity using Perceptron Learning algorithm
Our Approach to Derive DT-SCORE
UKY Seminar Weifan Zheng, Ph.D.
Receptor-ligand Complexes
Descriptor Generation
Tessellation of receptor-ligand interface
Model Generation & Prediction
Binding constant
DT-SCORE
Perceptron Learningalgorithm
Flowchart to Derive DT-SCORE
UKY Seminar Weifan Zheng, Ph.D.
• Rigorous definition of nearest neighbors in 2D & 3D space - Delaunay tessellation
Nearest neighbors are unambiguously defined in sets of three (in 2D) and in sets of four (in 3D)
Delaunay Tessellation in 2D
UKY Seminar Weifan Zheng, Ph.D.
Delaunay Tessellation of the Receptor-Ligand Interface
UKY Seminar Weifan Zheng, Ph.D.
RR
R
L
R
R
An atom is sharedby several tetrahedra
A Detailed View of Active Site Tessellation
RRRLRRLLRLLL
RLLL: Formed by 1 receptor atom and 3 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atomsRRRL: Formed by 3 receptor atoms and 1 ligand atom
Each of the above tetrahedron types is further discriminated by atom types on the vertices
3 Types of Tetrahedra at the Receptor-Ligand Interface
UKY Seminar Weifan Zheng, Ph.D.
RRRLRRLLRLLL
NCNO ONOS …… CNOO NOCS …… COSC OSXN ……
5 3 …… 8 2 …… 4 0 ……
Geometrical Descriptors According to Tetrahedron Types
UKY Seminar Weifan Zheng, Ph.D.
( R·L Interaction Pattern – Binding Affinity Relationship Table)
Receptor-Ligand Complexes
Binding Affinity
RLLL RRLL RRRL
NCNO ONOS … CNOO NOCS … COSC OSXN …
(R • L)1 y1 0 3 … 2 8 … 1 3 …
(R • L)2 y2 1 7 … 3 1 … 0 3 …
… … … … … … … … … … …
(R • L)m-1 ym-1 3 4 … 0 5 … 4 6 …
(R • L)m ym 2 0 … 2 2 … 1 0 …
“QSAR” Input Table
UKY Seminar Weifan Zheng, Ph.D.
Input Layer Output Layer
2
1
3
N
x1
x2
x3
xN
y
w1
w2
w3
wN
(.)nf
xi = input of neuronwi = weight associated with the input xi
fn(.) = Activation function of output neuron.
Single-Layer Perceptron Network
Entire dataset
Test setTraining set
Model development (q2) Prediction of thetest set (R2)
80%(214 complexes)
20%(50 complexes)
(264 complexes)
Training Vs. Test Set Selection and Validation
UKY Seminar Weifan Zheng, Ph.D.
• Average value from multiple (ca. 80) models
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
0 200 400 600 800 1000
Number of Iterations
q2(R2)
Training Set
Test Set
Model Stability
UKY Seminar Weifan Zheng, Ph.D.
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Actual pKd
Pre
dic
ted
pK
d
214 complexes: q2 = 0.73
Actual vs. Predicted Binding Affinity for the Training Set
UKY Seminar Weifan Zheng, Ph.D.
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12 14 16
Actual pKd
Pre
dic
ted
pK
d
50 complexes: R2 = 0.61
Actual vs. Predicted Binding Affinity for the Test Set
UKY Seminar Weifan Zheng, Ph.D.
• NCCU and UNC– Jerry Ebalunode, Ph.D., BRITE– Min Shen, Ph.D., Lexicon– Alex Tropsha, Ph.D., Chair of MedChem,
UNC-Chapel Hill
• Funding– NIH P20HG003898– NIH R21GM076059
Acknowledgements
UKY Seminar Weifan Zheng, Ph.D.
• GSK
– Sunny Hung (GSK)
– George Seibel (JNJ)
– Ken Kopple (retired)
– Jeff Wiseman (Locus)
• Lilly
– Minmin Wang
– Greg Durst
– Jim Wikel (retired)