AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical...

20
AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow 3 rd September 2004

Transcript of AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical...

Page 1: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data

Dr Richard Sinnott

Technical Director National e-Science Centre|||

Deputy Director Technical Bioinformatics Research Centre University of Glasgow

3rd September 2004

Page 2: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Overview of BRIDGES

Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES)

NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges

Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases

Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …

Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …

Aim is integrated infrastructure supportingData federationSecurity

Page 3: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Grids & Life SciencesExtensive Research Community

>1000 per research university

Extensive ApplicationsMany people care about them

Health, Food, Environment, …

Interacts with many disciplinesPhysics, Chemistry, Maths/Statistics, Nano-engineering, …

Huge and expanding number of databases relevant to bioinformatics community

Heterogeneity, Interdependence, Complexity, Change, Dirty…

Linking in co-ordinated, secure manner full of open issues to be addressedCompute demands growing as more in-silico research undertaken

Page 4: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Database GrowthPDB Content Growth

•DBs growing exponentially!!!•Biobliographic (MedLine, PubMed…)

•Amino Acid Seq (SWISS-PROT, …)

•3D Molecular Structure (PDB, …)

•Nucleotide Seq (GenBank, EMBL, …)

•Biochemical Pathways (KEGG, WIT…)

•Molecular Classifications (SCOP, CATH,…)

•Motif Libraries (PROSITE, Blocks, …)

Page 5: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Complexity of Biological DataN

ucl

eoti

de

seq

uen

ces

Nu

cleo

tid

e st

ruct

ure

s

Gen

e ex

pre

ssio

ns

Pro

tein

Str

uct

ure

s

Pro

tei n

fu

nct

ion

s

Pro

tein

-pro

tein

inte

ract

ion

(p

ath

way

s)

Cel

l

Cel

l sig

nal

lin

g

Tis

sues

Org

ans

Ph

ysio

logy

Org

anis

ms

Pop

ula

tion

s

+ links to plant/crops, environmental, health, … information sources

Page 6: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

More genomes …...Arabidopsis

thaliana

mouse

rat

Caenorhabitis elegans

Drosophilamelanogaster

Mycobacteriumleprae

Vibrio cholerae

Plasmodiumfalciparum

Mycobacteriumtuberculosis

Neisseria meningitidis

Z2491

Helicobacter pylori

Xylella fastidiosa

Borrelia burgorferi

Rickettsia prowazekii

Bacillus subtilis

Archaeoglobusfulgidus

Campylobacter jejuni

Aquifex aeolicus

Thermotoga maritima

Chlamydiapneumoniae

Pseudomonasaeruginosa

Ureaplasmaurealyticum

Buchnerasp. APS

Escherichia coli

Saccharomycescerevisiae

Yersinia pestis

Salmonellaenterica

Thermoplasmaacidophilum

Page 7: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Bio e-Science Projects

Page 8: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Bridges Project

Glasgow Edinburgh

Leicester Oxford

London

Netherlands

Publically Curated Data

Private data

Private data

Private data

Private data

Private data

Private data

CFG Virtual Organisation Ensembl

MGI

HUGO

OMIM

SWISS-PROT

… DATA HUB

RGD

SyntenyGrid

Service

blast

+

VO Authorisation

Information Integrator

OGSA-DAI

Page 9: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Grid Security

OGSA security Single sign-on based on (X.509) digital certificates

establish credentials– Certification authority based (RAL in UK)

Services (and clients) have APIs for fine grained security

Based on GSS-API

Provides for authentication but need authorisation

Various technologies for authorisation including PERMIS, CAS, …

Collaborating with PrivilEge and Role Management Infrastructure Standards Validation (PERMIS) team

Lead by Prof David Chadwick, University of Salford– (www.permis.org)

Page 10: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Security Authorisation

PERMIS allows toDefine roles for who can do what on what

Policy = { Role x Target x Action }– Can user X invoke service Y and access or change data Z?

» Policies created with PERMIS PolicyEditor (output is XML based policy)

Page 11: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Security Authorisation

PERMIS Privilege Allocator then used to sign policies

Associates roles with specific users Policies stored as attribute certificates in LDAP server

When is authorisation done?Two main choices

Portal personalised for users based on their policies– If not allowed to invoke service then they do not get to see it

Actions of users (with given role) are authorised every time the service is invoked

– They can see the service but potentially not be allowed to invoke it» Performance issues… but more likely scenario for authorisation

In both cases, if not explicitly agreed in policy then rejected and logged!– Both cases being explored

Plan to exploit the GGF SAML AuthZ specification Based on GT3.3 – currently have BLAST service in GT3.2Final

– Identified issues with standards…

Page 12: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Where we are today!Information Integrator DB repository established and populated

… with public data sets (OMIM, HUGO, RGD, SWISS-PROT)… linked to relevant resources (ENSEMBL- rat, human, mouse, MGI)

GT3 based Grid services developed (BLAST) using own meta-scheduler

General usage of ScotGrid and local Condor poolPortal developed using IBM WebSphereGenome visualisation browsers

SyntenyVista – for viewing synteny between local/remote data setsMagnaVista – for exploring genetic information across multiple (remote) resources

Gaining experience with security technologiesSetting up policies with Grid security authorisation software etc

Rolled-out Alpha version of system to CFG group July ‘04

Page 13: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Lessons learned

Public data resources opennessOften cannot query directly Often not easy/possible to find schemasJoint Data Standards Study investigating this

Started on 1st June and involves– Digital Archiving Consultancy– Bioinformatics Research Centre (Glasgow)– NeSC (Edinburgh and Glasgow)

Look at technical, political, social, ethical etc issues involved in accessing and using public life science resources

– Will liase with NDCC– Interview relevant scientists, data curators/providers

8 month project with final report in January– Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI

GT3 not without pain! (… understatement!!!!)Hopefully GT4 will be better?

Page 14: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Page 15: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

www.nesc.ac.uk

Page 16: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Page 17: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Page 18: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Page 19: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004

Page 20: AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.

AHM September 2004