20120615_Granatum_COST_v2
-
Upload
christos-kannas -
Category
Technology
-
view
91 -
download
0
description
Transcript of 20120615_Granatum_COST_v2
Towards a modular Web-based Workflow environment for
enabling large scale Virtual Screening in Cancer
Chemoprevention Research19 June 2012
COST ConferencePersonalised Medicine: Better Healthcare for the Future
Christos KannasComputer Science Dept., University of Cyprus
2
Outline
• About the Project• Overview of the Project• Objectives• State of the Art Review• Implementation
• Virtual Screening Process• Predictive Model Preparation• In-Silico Tools and Methods
• Early in silico experiments• Concluding Remarks
June 19, 2012
3
About the Project• The vision of the GRANATUM project is to:
• bridge the information, knowledge and collaboration gap among biomedical researchers in Europe (at least),
• ensure that the biomedical scientific community has homogenized, integrated access to the globally available information and data resources needed to perform complex cancer chemoprevention experiments, and conduct studies on large-scale datasets.
• The GRANATUM project is partially funded by the European Commission under the Seventh Framework Programme in the area of Virtual Physiological Human (ICT-2009.5.3).
• http://www.granatum.org/
June 19, 2012
4
Overview of the Project
June 19, 2012
5
Objectives
• Design a scientific algorithmic workflow for the development of in silico chemoprevention models.
• Implement workflow(s) for the selection of promising chemopreventive agents.
• Connect the custom in-silico models for compound selection to other datasets, and evidence included in the Linked Biomedical Data Space.
• Test the performance of custom in-silico models.
June 19, 2012
6
State Of the Art• Significant overlap of chemoprevention and traditional
drug discovery process (DDP).• Special case with additional constraints, e.g. no toxicity
• In Silico Models and Tools: heavily borrowing from DDP.
June 19, 2012
SOA Review
Online resources Databases (e.g. ChemBL), journals, reports, …
Infrastructure tools Chemoinformatics toolkits (e.g. RDKit and CDK): compound representation, property and descriptor calculation, substructure mining, …
Advanced comp. chem. Biological property predictive models, compound 3D conformations, docking tools, …
Machine learning Classification and regression methods, available open source libraries
Scientific workflow systems
Knime, Taverna, Galaxy, …
7
Virtual Screening Process Template
June 19, 2012
Input• Linked
Biomedical Space
• Files
Preprocessing• File format
transformations• Standardization• Descriptor
Calculation• Fragmentation
Processing• Attribute filter• Similarity
search• Substructure
Search• Docking• Predictive
Models
Postprocessing• Cleaning• Formatting
Output• Storage• Visualization
8
Predictive Model Preparation Template
Predictive Model
Biological data
Chemical data
Algorithm• Algorithm
parameters
June 19, 2012
9
Chemopreventive Property ModelsAnti – oxidant
Direct Effect
Indirect Effect
Direct/Indirect Effect
Anti – inflammatory
COX-2 but not COX-1
inhibitor
Reduction of TNF-a
Reduction of LOX
Induction of AP-1
Reduction of Interleukins
Anti – proliferating
Cyclin D1 down-
regulation
Her-2 down-regulation
Cyclin E down-
regulation
EGFR down-regulation
Apoptotic
Anti-apoptotic members of Bcl-2 family
down-regulation
IAP family down-
regulation
Caspase up-regulation/acti
vation
Anti – metastatic / Anti
– agiogenic
COX-2 down-regulation
VEGF down-regulation
PDGF down-regulation
Estrogenic Activity
ER-alpha binding affinity
ER-beta binding affinity
ER-alpha/beta binding affinity
No affinity
Estrogen Antagonists
Selective Estrogen Receptor
Modulators (SERMs)
Estrogen Receptor
Modulators
June 19, 2012
10
In Silico Tools and Methods• Generic Chemoinformatics Tool:
• E-Health Lab and collaborators resources• RDKit
• Docking Experiment Tools:• AutoDock Vina• Chil2 GlamDock
• Data Mining & Statistics Tools:• In house tools• R
• Scientific Workflow System:• Galaxy
June 19, 2012
11
Early In Silico Experiments• In silico tool & models validation • Steps:
• Prepare compound dataset • Mix of natural products and known inhibitors (4% actives)
• Implementation/application of predictive models• Rule of Five• Toxicity model
• Implementation/application of docking model• ER-alpha
• Compound prioritization• Top selections visualization/evaluation
June 19, 2012
12
Virtual Screening Process Example
Natural products
collection + known ER-
alpha inhibitors
Calculate physicochemic
al molecular descriptors
Rule of Five filter
Toxicity modelDocking to ER-alpha
Compound prioritization; Report on top
selections
June 19, 2012
13
Cytotoxicity Predictive Model
Cytotoxicity Predictive
Model
• Cytotoxicity Bio-Chemical data
• SVM:• Kernel: Linear• Stratified K-
Fold:• 5-folds• 10-folds
Morgan Fingerprints
• Bit Vector 2048-bits
Oral Drug-like Filtering
• HBA <= 10• HBD <= 5• Molecular
Weight <=500• logP <= 5
Clean Molecules
• Remove Salts
Cytotoxicity Dataset
• Source : The Scripps Research Institute Molecular Screening Center
• PubChem Bio-Assay: AID 464
• Tested: 706• Active: 331• Inactive: 375
June 19, 2012
14
Virtual Screening Process Example
Ranked order of Cytotoxicity Prediction, Docking and Oral Druglikness Filtering results
ER-Alpha Docking (GlamDock)ER-Alpha Protein 2451 molecules for docking experiments
Cytotoxicity (Predictive Model)SVM Classifier Trained with Bio-Assay 464 dataset Predict: 2451 molecules
Calculate Morgan FingerprintsBit Vector 2048-bits
Oral Druglike FilteringHBA <= 10 HBD <= 5 Molecular Weight <=500 logP <= 5 Result: 2035 pass, 416 not
pass
Clean MoleculesRemove Salts 2451 molecules (42 Known, 2409 Indofine)
Demo DatasetKnown ER-Alpha Inhibitors (42) Indofine Dataset (2494) Result: 2451 OK, Remove 85 (valence errors, empty
molecule block)
June 19, 2012
row-20-top-known row-36-top-known row-42-top-known
row-729-top-unknown row-1652-top-unknown row-1988-top-unknown
Docking results: known ER inhibitors
row-20-top-known
June 19, 2012 15
Docking results: known ER inhibitors
row-36-top-known
June 19, 2012 16
Docking results: known ER inhibitors
row-42-top-known
June 19, 2012 17
Docking results: Indofine compounds
row-729-top-unknown
June 19, 2012 18
Docking results: Indofine compounds
row-1652-top-unknown
June 19, 2012 19
Docking results: Indofine compounds
row-1988-top-unknown
June 19, 2012 20
21
Concluding Remarks• Support of chemopreventive specific predictive models.
• Initial promising results on ERa (based on Indofine dataset).
• Modular architecture and workflow management.• Integrated with additional tools within the Granatum
Project.• Linked Biomedical Data Space.• Social Collaborative Workspace.
• Product Release:• Advanced Prototype Version: October 2012• Final Version: April 2013
June 19, 2012
22June 19, 2012