VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc...
-
Upload
david-cross -
Category
Documents
-
view
215 -
download
0
Transcript of VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc...
VS Explorer – Analyzing large scale docking experiments
ChemAxon 2005 User Group Meeting
Marc ZimmermannMartin Hofmann
Page 2Marc Zimmermann, 2005 ChemAxon UGM05
•28 million compounds currently known
•Drug company biologists screen up to 1 million compounds against target using ultra-high throughput technology
•Chemists select 50-100 compounds for follow-up
•Chemists work on these compounds, developing new, more potent compounds
•Pharmacologists test compounds for pharmacokinetic and toxicologicalprofiles
•1-2 compounds are selected as potential drugs
Selection of Potential Drugs
Page 3Marc Zimmermann, 2005 ChemAxon UGM05
High Volume Screening Analysis – the Methods
Screening
vHTS(similarity, docking)
HTS
Clustering
active
inactive
AssemblingFiltering
Modeling
Virtual Screening – Computational or in silico analog of biological screening
o
Score, rank, and/or filter a set of structures using one or more computational procedures
o
Helps to decide:
Which compounds to screen
Which libraries to synthesize
Which compounds to purchase from an external source
Virtual Screening – Computational or in silico analog of biological screening
o
Score, rank, and/or filter a set of structures using one or more computational procedures
o
Helps to decide:
Which compounds to screen
Which libraries to synthesize
Which compounds to purchase from an external source
Page 4Marc Zimmermann, 2005 ChemAxon UGM05
High Volume Screening Analysis – the Tools at SCAI
Screening
ClusteringAssemblingFiltering
Modeling
HTSviewVS Explorer DB Annotator
FTreesFlexX
GRID Layer
ProMinerTopNet
Page 6Marc Zimmermann, 2005 ChemAxon UGM05
•Enable scientists to quickly and easily find compounds binding to a
particular target proteino growth of targets numbero growth of 3D structures determination (PDB database)o growth of computing powero growth of prediction quality of protein-compound interactions
•Experimental screening very expensive : not for academic or small
companies
•Aim : Active molecules
Tested molecules
Computational Aspects of Drug Discovery : Virtual Screening
Page 7Marc Zimmermann, 2005 ChemAxon UGM05
In silico drug discovery process (EGEE, Swissgrid, …)
Clermont-Ferrand
The grid impact :
•Computing and storage resources for genomics research and in silico drug discovery
•cross-organizational collaboration space to progress research work
•Federation of patient databases for clinical trials and epidemiology in developing countries
Grids for neglected diseases and diseases of the developing world
Support to local centres in plagued areas (genomics research, clinical trials and vector control)
SCAI Fraunhofer
Swiss Biogrid consortium
Local research centresIn plagued areas
Page 8Marc Zimmermann, 2005 ChemAxon UGM05
Structure-Based Virtual Screening
Protein-Ligand Docking
o Aims to predict 3D structures when a molecule “docks” to a protein
Need a way to explore the space of possible protein-ligand geometries (poses)
Need to score or rank the poses
o Problem: many degrees of freedom (rotation, conformation, solvent effects)
Protein-Ligand Docking
o Aims to predict 3D structures when a molecule “docks” to a protein
Need a way to explore the space of possible protein-ligand geometries (poses)
Need to score or rank the poses
o Problem: many degrees of freedom (rotation, conformation, solvent effects)
Ligand databaseTarget Protein
Molecular docking
Ligand docked into protein’s active site
Page 9Marc Zimmermann, 2005 ChemAxon UGM05
Grid VS Results Browser
•Quick overview on very large log-files
•Sorting and merging of files
•Storing and retrieval in databases
•Similarity searches and property
predictions
•Interface to R statistics box
•Prototype is under construction
concat('ZINC', lpad(p.sub_id_fk,8,'0')) | target | ligand | conformations || score || timeZINC00000057 | 1cet | ZINC00000057 | 172 || -7.45 || 3.25 ZINC00000061 | 1cet | ZINC00000061 | 203 || -18.37 || 3.84 sZINC00000066 | 1cet | ZINC00000066 | 241 || -25.58 || 39.92 sZINC00000122 | 1cet | ZINC00000122 | 399 || -14.14 || 7.41 sZINC00000197 | 1cet | ZINC00000197 | 272 || -8.60 || 2.44 sZINC00000290 | 1cet | ZINC00000290 | 259 || -15.00 || 20.40 sZINC00000349 | 1cet | ZINC00000349 | 82 || -10.81 || 22.20 sZINC00000453 | 1cet | ZINC00000453 | 256 || -14.61 || 3.76 sZINC00000484 | 1cet | ZINC00000484 | 447 || -18.33 || 35.53 sZINC00000607 | 1cet | ZINC00000607 | 418 || -15.77 || 7.43 s
concat('ZINC', lpad(p.sub_id_fk,8,'0')) | target | ligand | conformations || score || timeZINC00000057 | 1cet | ZINC00000057 | 172 || -7.45 || 3.25 ZINC00000061 | 1cet | ZINC00000061 | 203 || -18.37 || 3.84 sZINC00000066 | 1cet | ZINC00000066 | 241 || -25.58 || 39.92 sZINC00000122 | 1cet | ZINC00000122 | 399 || -14.14 || 7.41 sZINC00000197 | 1cet | ZINC00000197 | 272 || -8.60 || 2.44 sZINC00000290 | 1cet | ZINC00000290 | 259 || -15.00 || 20.40 sZINC00000349 | 1cet | ZINC00000349 | 82 || -10.81 || 22.20 sZINC00000453 | 1cet | ZINC00000453 | 256 || -14.61 || 3.76 sZINC00000484 | 1cet | ZINC00000484 | 447 || -18.33 || 35.53 sZINC00000607 | 1cet | ZINC00000607 | 418 || -15.77 || 7.43 s
"Smiles";"Data""c1(N2CCC(CC2)C(OCC)=O)sc3c(ccc(Cl)c3)n1";MAC-0000001;02;101.66;104.66"C(=O)(Nc(cc1)ccc1Cl)N(CCCN2c(c(Cl)cc3C(F)(F)F)nc3)CC2";MAC-0000002;02;101.14;105.89"n1(CC(CNCCNc2nccc(n2)C(F)(F)F)O)c3c(cc1)cccc3";MAC-0000003;02;101.64;97.32"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Cl)Cl)CC2)cn1";MAC-0000004;02;100.09;101.14"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Br)F)CC2)cn1";MAC-0000005;02;108.98;97.02"C(F)(F)(F)c1ccnc(NCCNC(=O)c2ccco2)n1";MAC-0000006;02;110.19;106.15"C(F)(F)(F)c1ccnc(NCCNC(c2ccccc2)=O)n1";MAC-0000007;02;107.42;98.46"C(NCc1ccco1)(=S)Nc(cccn2)c2";MAC-0000008;02;103.86;97.98"C(F)(F)(F)c1ccnc(NCCNC(=S)Nc(cccn2)c2)n1";MAC-0000009;02;107.77;98.6"C(=O)(c1cccs1)N(CCCN2CC(O)COc(ccc3C(C)=O)cc3)CC2";MAC-0000010;02;107.41;104.92"C(F)(F)(F)c1ccnc(NCC=C)n1";MAC-0000011;02;105.78;106.84"N1(CCNc2ncccc2C(F)(F)F)C(=O)CC3(CCCC3)C1=O";MAC-0000012;02;105.26;103.38"N1(CCCNc(c(Cl)cc2C(F)(F)F)nc2)C(=O)CC3(CCCC3)C1=O";MAC-0000013;02;102;106.84
"Smiles";"Data""c1(N2CCC(CC2)C(OCC)=O)sc3c(ccc(Cl)c3)n1";MAC-0000001;02;101.66;104.66"C(=O)(Nc(cc1)ccc1Cl)N(CCCN2c(c(Cl)cc3C(F)(F)F)nc3)CC2";MAC-0000002;02;101.14;105.89"n1(CC(CNCCNc2nccc(n2)C(F)(F)F)O)c3c(cc1)cccc3";MAC-0000003;02;101.64;97.32"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Cl)Cl)CC2)cn1";MAC-0000004;02;100.09;101.14"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Br)F)CC2)cn1";MAC-0000005;02;108.98;97.02"C(F)(F)(F)c1ccnc(NCCNC(=O)c2ccco2)n1";MAC-0000006;02;110.19;106.15"C(F)(F)(F)c1ccnc(NCCNC(c2ccccc2)=O)n1";MAC-0000007;02;107.42;98.46"C(NCc1ccco1)(=S)Nc(cccn2)c2";MAC-0000008;02;103.86;97.98"C(F)(F)(F)c1ccnc(NCCNC(=S)Nc(cccn2)c2)n1";MAC-0000009;02;107.77;98.6"C(=O)(c1cccs1)N(CCCN2CC(O)COc(ccc3C(C)=O)cc3)CC2";MAC-0000010;02;107.41;104.92"C(F)(F)(F)c1ccnc(NCC=C)n1";MAC-0000011;02;105.78;106.84"N1(CCNc2ncccc2C(F)(F)F)C(=O)CC3(CCCC3)C1=O";MAC-0000012;02;105.26;103.38"N1(CCCNc(c(Cl)cc2C(F)(F)F)nc2)C(=O)CC3(CCCC3)C1=O";MAC-0000013;02;102;106.84
M END> <Object Id>MAC-0000100
> <Batch Ref>03
> <Supplier Object Id>6743501
> <ENZ_KINETIC_RES_ACT.RES_ACT>
M END> <Object Id>MAC-0000100
> <Batch Ref>03
> <Supplier Object Id>6743501
> <ENZ_KINETIC_RES_ACT.RES_ACT>
Page 10Marc Zimmermann, 2005 ChemAxon UGM05
Rapid prototyping using ChemAxon Libraries
GUI (Swing)
File I/ODB connect
Table Module
Chem Module
•100% Pure JAVA (JRE)o
Swing
o
JTable
•Using ChemAxon (MarvinBeans) for the chemical stuff
•OJDBC for database connection to Oracle
Page 11Marc Zimmermann, 2005 ChemAxon UGM05
Molecule Rendering
From spreadsheets to molecular spreadsheets
o Overloading cellRenderer with Marvin from
Switch SMILES Structure on / off
Page 12Marc Zimmermann, 2005 ChemAxon UGM05
File Import / Export
•Implemented as a thread
•Comma Separated Files
o CSV Parser
o Preview Window
o Tag missing Values
•SDF Molecular Files
o SDF Properties Names as Row-Keys
o Import Coordinates
o Based on MolImporter from
Preview
Page 13Marc Zimmermann, 2005 ChemAxon UGM05
Smart Indexing for large Collections
• Large index storing filepointers or database keys
• JAVA TableModel only stores the full information for a limited number of elements (cache)
Index
FilePointer
Page 14Marc Zimmermann, 2005 ChemAxon UGM05
Interactive Focus on Data
• Large index storing filepointers or database keys
• JAVA TableModel only stores the full information for a limited number of elements
• EventHandler for scrolling triggers reload from external memory (e.g. a cursor for RDB)
• Update of the TableModel
Index
FilePointer
Page 15Marc Zimmermann, 2005 ChemAxon UGM05
Column Sorting
• EventHandle starting a sorting thread
• Resorting of the Index for flat files
• New database query:+ ORDER BY columnLabel
• Coming next:
o Implementation of efficient online sorting algorithms in order to reduce the file access
o Merging of two tables
Index sort(List)
Object
FilePointer
Page 16Marc Zimmermann, 2005 ChemAxon UGM05
DB Annotator: Semantics for databases
Semantic annotation of relational data
o Linking databases and ontologies
o Using the VS Explorer as Plugin
Ontologybrowser
VS Explorer
Page 17Marc Zimmermann, 2005 ChemAxon UGM05
DHFR Assay for E.coli:
•Folate -> DHF -> THF -> synthesis of thymidin
•Important for cell growth
•DHFR inhibitor: Trimethoprim
DHF
Trimethoprim
Bioorg Med Chem Lett. 2003 Aug 4; 13(15):2493-6
High throughput screening identifies novel inhibitors of
Escheria coli dihydrofolate reductase that are
competitive
with dihydrofolate.
Zolli-Juran M, Cechetto JD, Hartlen R, Daigle DM, Brown ED.
http://hts.mcmaster.ca/HTSDataMiningCompetition.htm
Bioorg Med Chem Lett. 2003 Aug 4; 13(15):2493-6
High throughput screening identifies novel inhibitors of
Escheria coli dihydrofolate reductase that are
competitive
with dihydrofolate.
Zolli-Juran M, Cechetto JD, Hartlen R, Daigle DM, Brown ED.
http://hts.mcmaster.ca/HTSDataMiningCompetition.htm
Page 18Marc Zimmermann, 2005 ChemAxon UGM05
Docking with FlexX1
•PDB structure 1RA2
•Cocrystallized DHFR and NADP
•FlexX places water particles
1Rarey M, Kramer B, Lengauer T and Klebe G, J Mol Biol 1996, 261(3):470-89.
15th Symposium on QSAR 2004; Poster
Drilling into a HTS data set of e. coli.
Zimmermann M, Tresch A, Maass A, Hofmann M
15th Symposium on QSAR 2004; Poster
Drilling into a HTS data set of e. coli.
Zimmermann M, Tresch A, Maass A, Hofmann M
Page 19Marc Zimmermann, 2005 ChemAxon UGM05
In silico Screening Workflow:
HTS
2D Similarity Analysis
Fragment Analysis
Classification
MD Simulation
QSAR
Training Set Test SetDocking
CandidatesActivityRegion
active
inactive
Page 20Marc Zimmermann, 2005 ChemAxon UGM05
1CET – Lactate Dehydrogenase of Plasmodium Falciparum
Malaria Target:
o Chloroquine binds in the
cofactor binding site of
Plasmodium Falciparum
lactate dehydrogenase
o PDB structure: 1CET
o Ligand: Chloro-Quinolin
o Test Ligands: Ambinter data
set from ZINC
Page 21Marc Zimmermann, 2005 ChemAxon UGM05
1CET vs. 50 000 Compounds on 200 Nodes: Global Statistics
•Done : 100%
•Rescheduled : 46
•Running on nodes : 2296 h – 96 days
o Autodock.pl : 2288 h
o Total transfer : 8h
•submission script : 36 h
•time gain of : 64 (instead of
200)
•Ideal : 11,5 h
•Grid Time : 205,5
h
o Scheduled : 179h
o Ready : 78 mn
o Waiting : 78 mn
o Submitted : 24 h
Page 22Marc Zimmermann, 2005 ChemAxon UGM05
Planning Next Steps
•2M compounds vs. 1 protein target
o Input : 13GB
o Output : 2 TB output (dlg), 0,5 TB (pdb)
o 12 CPU/year
o Ideal : 3 days with 1350 CPUs
o Reality : clusters grid with users, queues, errors…
•Challenges for our application?
o 100% obtained results
o Minimal process time
o Grid resources consuming (storage, cpu)
o User interface for the application
o …