20120615_Granatum_COST_v2

22
Towards a modular Web-based Workflow environment for enabling large scale Virtual Screening in Cancer Chemoprevention Research 19 June 2012 COST Conference Personalised Medicine: Better Healthcare for the Future Christos Kannas Computer Science Dept., University of Cyprus

description

Presentation about the Virtual Screening Scientific Workflow for Cancer Chemoprevention developed within the GRANATUM project.

Transcript of 20120615_Granatum_COST_v2

Page 1: 20120615_Granatum_COST_v2

Towards a modular Web-based Workflow environment for

enabling large scale Virtual Screening in Cancer

Chemoprevention Research19 June 2012

COST ConferencePersonalised Medicine: Better Healthcare for the Future

Christos KannasComputer Science Dept., University of Cyprus

Page 2: 20120615_Granatum_COST_v2

2

Outline

• About the Project• Overview of the Project• Objectives• State of the Art Review• Implementation

• Virtual Screening Process• Predictive Model Preparation• In-Silico Tools and Methods

• Early in silico experiments• Concluding Remarks

June 19, 2012

Page 3: 20120615_Granatum_COST_v2

3

About the Project• The vision of the GRANATUM project is to:

• bridge the information, knowledge and collaboration gap among biomedical researchers in Europe (at least),

• ensure that the biomedical scientific community has homogenized, integrated access to the globally available information and data resources needed to perform complex cancer chemoprevention experiments, and conduct studies on large-scale datasets.

• The GRANATUM project is partially funded by the European Commission under the Seventh Framework Programme in the area of Virtual Physiological Human (ICT-2009.5.3).

• http://www.granatum.org/

June 19, 2012

Page 4: 20120615_Granatum_COST_v2

4

Overview of the Project

June 19, 2012

Page 5: 20120615_Granatum_COST_v2

5

Objectives

• Design a scientific algorithmic workflow for the development of in silico chemoprevention models.

• Implement workflow(s) for the selection of promising chemopreventive agents.

• Connect the custom in-silico models for compound selection to other datasets, and evidence included in the Linked Biomedical Data Space.

• Test the performance of custom in-silico models.

June 19, 2012

Page 6: 20120615_Granatum_COST_v2

6

State Of the Art• Significant overlap of chemoprevention and traditional

drug discovery process (DDP).• Special case with additional constraints, e.g. no toxicity

• In Silico Models and Tools: heavily borrowing from DDP.

June 19, 2012

SOA Review

Online resources Databases (e.g. ChemBL), journals, reports, …

Infrastructure tools Chemoinformatics toolkits (e.g. RDKit and CDK): compound representation, property and descriptor calculation, substructure mining, …

Advanced comp. chem. Biological property predictive models, compound 3D conformations, docking tools, …

Machine learning Classification and regression methods, available open source libraries

Scientific workflow systems

Knime, Taverna, Galaxy, …

Page 7: 20120615_Granatum_COST_v2

7

Virtual Screening Process Template

June 19, 2012

Input• Linked

Biomedical Space

• Files

Preprocessing• File format

transformations• Standardization• Descriptor

Calculation• Fragmentation

Processing• Attribute filter• Similarity

search• Substructure

Search• Docking• Predictive

Models

Postprocessing• Cleaning• Formatting

Output• Storage• Visualization

Page 8: 20120615_Granatum_COST_v2

8

Predictive Model Preparation Template

Predictive Model

Biological data

Chemical data

Algorithm• Algorithm

parameters

June 19, 2012

Page 9: 20120615_Granatum_COST_v2

9

Chemopreventive Property ModelsAnti – oxidant

Direct Effect

Indirect Effect

Direct/Indirect Effect

Anti – inflammatory

COX-2 but not COX-1

inhibitor

Reduction of TNF-a

Reduction of LOX

Induction of AP-1

Reduction of Interleukins

Anti – proliferating

Cyclin D1 down-

regulation

Her-2 down-regulation

Cyclin E down-

regulation

EGFR down-regulation

Apoptotic

Anti-apoptotic members of Bcl-2 family

down-regulation

IAP family down-

regulation

Caspase up-regulation/acti

vation

Anti – metastatic / Anti

– agiogenic

COX-2 down-regulation

VEGF down-regulation

PDGF down-regulation

Estrogenic Activity

ER-alpha binding affinity

ER-beta binding affinity

ER-alpha/beta binding affinity

No affinity

Estrogen Antagonists

Selective Estrogen Receptor

Modulators (SERMs)

Estrogen Receptor

Modulators

June 19, 2012

Page 10: 20120615_Granatum_COST_v2

10

In Silico Tools and Methods• Generic Chemoinformatics Tool:

• E-Health Lab and collaborators resources• RDKit

• Docking Experiment Tools:• AutoDock Vina• Chil2 GlamDock

• Data Mining & Statistics Tools:• In house tools• R

• Scientific Workflow System:• Galaxy

June 19, 2012

Page 11: 20120615_Granatum_COST_v2

11

Early In Silico Experiments• In silico tool & models validation • Steps:

• Prepare compound dataset • Mix of natural products and known inhibitors (4% actives)

• Implementation/application of predictive models• Rule of Five• Toxicity model

• Implementation/application of docking model• ER-alpha

• Compound prioritization• Top selections visualization/evaluation

June 19, 2012

Page 12: 20120615_Granatum_COST_v2

12

Virtual Screening Process Example

Natural products

collection + known ER-

alpha inhibitors

Calculate physicochemic

al molecular descriptors

Rule of Five filter

Toxicity modelDocking to ER-alpha

Compound prioritization; Report on top

selections

June 19, 2012

Page 13: 20120615_Granatum_COST_v2

13

Cytotoxicity Predictive Model

Cytotoxicity Predictive

Model

• Cytotoxicity Bio-Chemical data

• SVM:• Kernel: Linear• Stratified K-

Fold:• 5-folds• 10-folds

Morgan Fingerprints

• Bit Vector 2048-bits

Oral Drug-like Filtering

• HBA <= 10• HBD <= 5• Molecular

Weight <=500• logP <= 5

Clean Molecules

• Remove Salts

Cytotoxicity Dataset

• Source : The Scripps Research Institute Molecular Screening Center

• PubChem Bio-Assay: AID 464

• Tested: 706• Active: 331• Inactive: 375

June 19, 2012

Page 14: 20120615_Granatum_COST_v2

14

Virtual Screening Process Example

Ranked order of Cytotoxicity Prediction, Docking and Oral Druglikness Filtering results

ER-Alpha Docking (GlamDock)ER-Alpha Protein 2451 molecules for docking experiments

Cytotoxicity (Predictive Model)SVM Classifier Trained with Bio-Assay 464 dataset Predict: 2451 molecules

Calculate Morgan FingerprintsBit Vector 2048-bits

Oral Druglike FilteringHBA <= 10 HBD <= 5 Molecular Weight <=500 logP <= 5 Result: 2035 pass, 416 not

pass

Clean MoleculesRemove Salts 2451 molecules (42 Known, 2409 Indofine)

Demo DatasetKnown ER-Alpha Inhibitors (42) Indofine Dataset (2494) Result: 2451 OK, Remove 85 (valence errors, empty

molecule block)

June 19, 2012

row-20-top-known row-36-top-known row-42-top-known

row-729-top-unknown row-1652-top-unknown row-1988-top-unknown

Page 15: 20120615_Granatum_COST_v2

Docking results: known ER inhibitors

row-20-top-known

June 19, 2012 15

Page 16: 20120615_Granatum_COST_v2

Docking results: known ER inhibitors

row-36-top-known

June 19, 2012 16

Page 17: 20120615_Granatum_COST_v2

Docking results: known ER inhibitors

row-42-top-known

June 19, 2012 17

Page 18: 20120615_Granatum_COST_v2

Docking results: Indofine compounds

row-729-top-unknown

June 19, 2012 18

Page 19: 20120615_Granatum_COST_v2

Docking results: Indofine compounds

row-1652-top-unknown

June 19, 2012 19

Page 20: 20120615_Granatum_COST_v2

Docking results: Indofine compounds

row-1988-top-unknown

June 19, 2012 20

Page 21: 20120615_Granatum_COST_v2

21

Concluding Remarks• Support of chemopreventive specific predictive models.

• Initial promising results on ERa (based on Indofine dataset).

• Modular architecture and workflow management.• Integrated with additional tools within the Granatum

Project.• Linked Biomedical Data Space.• Social Collaborative Workspace.

• Product Release:• Advanced Prototype Version: October 2012• Final Version: April 2013

June 19, 2012

Page 22: 20120615_Granatum_COST_v2

22June 19, 2012