Dr. Michael Schroeder Department of Computing City University, London, UK [email protected] msch...

24
Dr. Michael Schroeder Department of Computing City University, London, UK [email protected] http://www.soi.city.ac.uk/~msch Visiting Scientist Medical Research Council Cambridge, UK BioGrid

Transcript of Dr. Michael Schroeder Department of Computing City University, London, UK [email protected] msch...

Page 1: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Dr. Michael Schroeder Department of ComputingCity University, London, [email protected]://www.soi.city.ac.uk/~msch

Visiting ScientistMedical Research CouncilCambridge, UK

BioGrid

Page 2: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Drowning in information...

• Biology has changed dramatically from an information-light to an information-intensive area

• Much publicised Human Genome Project is only tip of the iceberg

• >500 tools online

• >8000 new abstracts per month

LLNEYLEEVE EYEEDE

Page 3: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Heureka!

?????????????

BioGrid

• Provide access to multiple, heterogeneous and geographically distributed information sources.

• perform active searches for relevant information in non-local domain (includes retrieving, analysing, manipulating, and integrating information)

Page 4: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

BioGrid ObjectivesObjectives:Information and knowledge grid allowing knowledge discovery and access to multiple types of structured and unstructured data, including gene expression and protein interaction data

Business objectives: • Grid for next generation classification research infrastructure for large proteomics and genomics databases; •Efficient transactional enterprise collaboration; •Faster time to market biotech innovation

Page 5: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

ExampleA scientist is interested in a gene,e.g. NOX4– Search PubMed for articles

• Too many hits• Gene also known under different name

– Analyse gene expression data• Which genes behave similar to NOX4• Function of NOX4?

– Analyse protein interactions• Which interactions and processes does

expression of NOX4 trigger?

Page 6: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Challenges

• Semantic Complexity– Computer does not “understand” data– DBs and systems cannot inter-operate

• Computational complexity – generating protein interaction map takes ca. 7

days– analysing large sets of gene expression data can

take up to an hour– analysis of large text bodies complex

Page 7: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

BioGrid Vision

BioGrid

Interactiondata

Metabolic

pathway data

Expressiondata

Sequences

Character-isation

of target

sequence

Scientific literature

Page 8: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Approach• Semantic Web

– global and local ontologies to capture meta-data and facilitate semantic inter-operability

• Grid technology– transparent access to distributed resources

• Agent technology– personal information agent collecting and presenting

relevant information on behalf of its user

BioGridClient

BioGridClient

BioGrid

Client

BioGrid

Server

LiteratureClassification Server

The Grid

Space

Explorer

PSIMAP

Page 9: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Classification server

• Finding and processing relevant scientific literature

BioGrid

Interactiondata

Metabolic pathway data

Expressiondata

Sequences

Character-

isation of

target sequenc

eScientific literature

Page 10: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Results of PubMed• Lorenz P, Transcriptional repression mediated by

the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44.

• Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31....

Author

Title

YearJournal

However, to a machine things look different!

Page 11: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Results of PubMed

....

Solution: tag data (XML)

Page 12: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Results of PubMed• <author> </

author><title>

. </title>

<journal> </journal><year><year>• <author> </

author><title>

. </title>

<journal> </journal><year><year>

• ...

However, to a machine things look different!

Page 13: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Results of PubMed

• ...

Solution: use ontologies(Semantic Web)

Page 14: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Semantic Web

• DAML+OIL is XML-based language to specify ontologies

• Annotations of data refer to global ontology (where appropriate), hence joint understanding of data possible

• Ongoing efforts in bioinformatics: e.g. gene ontology

Page 15: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Classification Server

Scientific objectives:•Effective concept recognition•Pattern matching•Intelligent data sourcing agents and tagging technology •Automated categorisation in a biotechnology-domain •Metadata hierarchy •Functional interoperability methodology design•Domain knowledge mapping,•Implementing a logical domain ontology •Integration of agent & classification logic & visualisation technology.

Page 16: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Space Explorer

• … is a general purpose visualisation tool facilitating interactive exploration of large data sets

• … deals with multi-variate and proximity data • … provides

• principal component analysis• multi-dimensional scaling (principal co-ordinate analysis, spring

embedding)• clustering

• … provides• dendrograms• 2D and 3D (using VRML) scatter plots• graphs and colour maps

BioGrid

Interactiondata

Metabolic pathway data

Expressiondata

Sequences

Character-

isation of

target sequenc

eScientific literature

Page 17: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.
Page 18: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Example: gene expression data

Page 19: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Example: Protein topology

Page 20: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Protein Interaction: PSIMAP

BioGrid

Interactiondata

Metabolic pathway data

Expressiondata

Sequences

Character-

isation of

target sequenc

eScientific literature

• Based on 3D structure, PSIMAP determines interactions of proteins

• Structure of map of great importance for understanding of biological processes

• Generation and analysis of the map are computationally expensive

Page 21: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

PartnersNo.

Organisation(abbreviation)

Country

RTD role in the project

1University of Groningen (RUG)

NLUser, Bioinformatics on drug discovery

2ZooRobotics (ZRO)

NLCo-ordinator, Supplier of GRID Classification Server, Exploitation Mng.

3City University London (CIT)

UKSupplier of intelligent agents and Space Explorer

4University of Cyprus (UCY)

ELSupplier of GRID knowledge engineering

5Medical Research Centre (MRC)

UKSupplier of PSIMAP, User, bio informatics on Food and Nutrition

Page 22: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

WP3:Classification logic integration

WP1:Source domain analysis (data, standards, protocols)

WP2:Hierarchy creation, Metadata model development

WP4:Implementation agent technology

WP7:Dissemination & Exploitation

WP5:Implementation Visualisation technology

WP0:Management

Integration Analysis Prototype Development

Main deliverable:1st prototype

Main deliverable:2nd prototype

Measurement andEvaluation

WP6:Measurement and evaluation of results

Pert diagram

Page 23: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Work packages 

   

Workpackage title

WP0 Management

WP1 Source domain analysis

WP2 Hierarchy creation, Metadata model development

WP3 Classification logic integration

WP4 Agent implementation

WP5 Visualisation implementation

WP6 Measurement and evaluation

WP7 Dissemination and exploitation

Page 24: Dr. Michael Schroeder Department of Computing City University, London, UK msch@soi.city.ac.uk msch Visiting Scientist Medical.

Expression Space:Space Explorer

Pathway Space:

BioGrid

Interaction Space:PSIMAP

Literature Space:Classification Server

BioGrid Mission: Distributed computational biology platform for fast pharmaceutical research