Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance...

30
Integrated e- Infrastructure for Distributed, Data-driven, Data-intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for Computational Science University College London Integrating the Strengths of the e-Research Community, NeSC, Thursday, 10th March 2011

Transcript of Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance...

Page 1: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Integrated e-Infrastructure for Distributed, Data-driven, Data-

intensive High Performance Computing: Biomedical

Requirements

Peter V Coveney

Centre for Computational Science

University College London

Integrating the Strengths of the e-Research Community,NeSC, Thursday, 10th March 2011

Page 2: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

• Computational Biomedicine– HIV/AIDS– Cardiovascular medicine– Cancer

• ICT, e-Health and the Virtual Physiological Human

• Infrastructure support

• Shortcomings in UK infrastructure

• Major policy hurdles

• UCL CLMS initiative

• Conclusions

Contents

Page 3: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

UCL Projects• VPH Network of Excellence – EU (€8M); no HPC• ContraCancrum – EU (€3.4M); no HPC• VPH-Share – EU (€10.7M); no HPC• P-Medicine – EU (€13.7M); no HPC• INBIOMEDVision – EU (€2M)• MAPPER – EU (€2M); no HPC

• A new approach to Science at the Life Sciences Interface – EPSRC (£4M) + HECToR

• Large Scale Lattice-Boltzmann Simulation of Liquid Crystals – EPSRC (£800K) + HECToR

Page 4: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Patient-specific medicine• ‘Personalised medicine’ - use the patient’s specific profile to better

manage disease or a predisposition towards a disease• Tailoring of medical treatments based on the characteristics of an

individual patient

Patient-specific medical-simulation• Use of genotypic and or phenotypic simulation to customise treatments

for each particular patient, where computational simulation can be used to predict the outcome of courses of treatment and/or surgery

Why use patient-specific approaches?• Treatments can be assessed for their effectiveness with respect to the

patient before being administered, saving the potential expense of ineffective treatments

See: P. V. Coveney et al (eds), Interface Focus, Theme Issue on VPH Vol. 1, No. 3 Online 25th April 2011

Page 5: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

HIV-1 Protease is a common target for HIV drug therapy

• Enzyme of HIV responsible for protein maturation

• Target for Anti-retroviral Inhibitors

• Example of Structure Assisted Drug Design

• 9 FDA inhibitors of HIV-1 protease

So what’s the problem?• Emergence of drug resistant

mutations in protease• Render drug ineffective• Drug resistant mutants have

emerged for all FDA inhibitors

Monomer B101 - 199

Monomer A1 - 99

Flaps

Leucine - 90, 190

Glycine - 48, 148

Catalytic Aspartic Acids - 25, 125

Saquinavir

P2 Subsite

N-terminalC-terminal

Medical/clinical domain I: HIV/AIDS

Integrate simulation with conventional clinical decision support systems to refine results

VPHShareVPHShare

Page 6: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

The goal: to simulatelarge scale patient specific cerebral blood flow in clinically relevant time frames Objectives:

•To study cerebral blood flow using patient-specific image-based models.•To provide insights into the cerebral blood flow & anomalies.•To develop tools and policies by means of which users can better exploit the ability to reserve and co-reserve HPC resources.•To develop interfaces which permit users to easily deploy and monitor simulations across multiple computational resources.•To visualize and steer the results of distributed simulations in real time

Yield patient-specific information which helps plan embolisation of arterio-venous malformations, aneurysms, etc.

Medical/clinical domain II: Grid enabled neurosurgical imaging using simulation

M. D. Mazzeo and P. V. Coveney, Computer Physics Communications, 178, (12), 894-914, (2008). DOI: 10.1016/j.cpc.2008.02.013.

Page 7: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Medical/clinical domain III: ContraCancrum

Multi-level data Multi-level Modelling

Two dedicated clinical studies in ContraCancrum, one in glioma and one in lung cancer (200 cases/year)

Schedule 1 Schedule 2 Schedule … Schedule n

Multi -level Models of Cancer

Other clinical data needed

Prediction of the best treatment schedule / schema

http://www.contracancrum.eu

Clinically Oriented Translational Cancer Multilevel Modelling

Page 8: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Virtual Physiological Human

• Funded under EU FP 7; ~ €250M• 20 projects: 1 NoE, 5 IPs, 11 STREPs, 3 CAs.

“a methodological and technological framework that, once established, will enable collaborative investigation of the human body as a single complex system ...”

Networking NoE

VPHShareVPHShare

Page 9: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

VPH-Share Overview

VPH-Share will provide the organisational fabric (the infostructure), realised as a series of services, offered in an integrated framework, to expose and to manage data,

information and tools, to enable the composition and operation of new VPH workflows and to facilitate collaborations between the members of the VPH community.

HIV Heart Aneurisms Musculoskeletal

€11M, 2011-2015, EU FP7 – Promotes cloud technologies

VPHShareVPHShare

Page 10: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

p-medicinep-medicine

Predictive disease modeling in Predictive disease modeling in p-medicine p-medicine will contribute to the optimization of will contribute to the optimization of cancer treatment by fully exploiting the individual data of the patient. cancer treatment by fully exploiting the individual data of the patient.

p-medicine p-medicine is focusing on Wilms tumor, breast cancer and acute is focusing on Wilms tumor, breast cancer and acute lymphoblastic leukemia lymphoblastic leukemia

The The p-medicine p-medicine infrastructure supports both a generic seamless, multi-level infrastructure supports both a generic seamless, multi-level data data integrationintegration purpose and a VPH-specific, multi-level, cancer purpose and a VPH-specific, multi-level, cancer data repository data repository to to facilitate model validation and clinical translation through trials.facilitate model validation and clinical translation through trials.The infrastructure is scalable for any disease as long as The infrastructure is scalable for any disease as long as predictive modeling is predictive modeling is clinically significantclinically significant in one or more levels (from molecular to tissue level) and the in one or more levels (from molecular to tissue level) and the development of such models is feasible development of such models is feasible (i.e. there is enough understanding of the (i.e. there is enough understanding of the biological mechanisms involved to develop them).biological mechanisms involved to develop them).Led by a clinical oncologist - Prof Norbert Graf! Led by a clinical oncologist - Prof Norbert Graf!

Disease Modelling at the molecular Level

Disease Modelling at the cellular Level

N

SG1

G2

M G0

A

Disease Modelling at the tissue/organ Level

Multi-scale therapy predictions/disease evolution results

Multi-level disease modelingMulti-level disease modeling

€13M, 2011-2013, EU FP7

Page 11: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Large scale data & computingLarge scale data & computingLarge scale data & computingLarge scale data & computing

21/04/23

Seamless access and integration of distributed, heterogeneous datain a data warehouse repeatedly over time (≈ 200 GB / patient and time point)Seamless access and integration of distributed, heterogeneous datain a data warehouse repeatedly over time (≈ 200 GB / patient and time point)

Models are built for use in clinical Models are built for use in clinical decision supportdecision support

results are needed in a timely results are needed in a timely fashionfashion

It is necessary to have the It is necessary to have the possibility of seamlessly “plugging possibility of seamlessly “plugging in” resources for parallel and large in” resources for parallel and large scale computing “here and now”scale computing “here and now”

petascale computing is needed to petascale computing is needed to perform e.g.:perform e.g.:

activities like drug binding affinity activities like drug binding affinity determinationdeterminationBlood flow through tumoursBlood flow through tumours

Gratis via VPH-NoE supervised VPH Gratis via VPH-NoE supervised VPH Virtual Community allocations of Virtual Community allocations of time on DEISA and, in future PRACE time on DEISA and, in future PRACE via MAPPER, …? via MAPPER, …?

Page 12: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

MAPPER: Objectives and Challenges

MAPPER will develop computational strategies, software and services for distributed multiscale simulations across disciplines, exploiting existing and evolving European e-Infrastructure.

Driven by seven exemplar multiscale applications, MAPPER will deploy a computational science infrastructure for distributed multiscale computing on and across European e-Infrastructures.

By taking advantage of existing software and services, MAPPER will deliver high quality components aiming at large-scale, heterogeneous, high performance multidisciplinary multiscale computing, while maintaining ease of use and transparency for end users.

MAPPER will advance state-of-the-art in high performance computing on e-Infrastructures by enabling distributed execution, across all European e-Infrastructures, of multiscale models.

http://www.mapper-project.eu

Page 13: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

VPH ToolKit

http://toolkit.vph-noe.eu

Page 14: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

VPH Virtual Community on DEISA

+ euHeart in second wave, and other non-VPH EU projects

VPH was awarded 2 million standard DEISA core hours for 2009, renewed for 2010 and 2011  

• HECToR (Cray, UK)• SARA (IBM Power 6, Netherlands)

DEISA-TeraGrid interoperability project has additional access to LRZ

Page 15: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

• Computational experiments integrated seamlessly into current clinical practice

• Clinical decisions influenced by patient specific computations: turnaround time for data acquisition, simulation, post-processing, visualisation, final results and reporting.

• Fitting the computational time scale to the clinical time scale:– Capture the clinical workflow– Get results which will influence clinical decisions: 1 day? 1 week?– This project - 15 to 30 minutes

• Development of procedures and software in consultation with clinicians

• Security/Access is major concern

• Need to integrate Data, Compute via Workflows

• On-demand availability of storage, networking and computational resources

VPH requires HPC and Data Integration

Page 16: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Many of the projects we are involved in have non-standard requirements with respect to HPC service providers

• Ability to co-reserve resources HARC• Launch emergency simulations SPRUCE• Consistent interfaces for federated access AHE• Access to back end nodes: steering, visualisation• Lightpath network connections• Data integration from multiple sources IMENSE• Support for software (ReG steering toolkit etc)

Page 17: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Individualized MEdiciNe Simulation Environment IMENSE• Data repository – this is the key store for project data containing all

patient data, and simulation data derived from the patient data.

• Integrated web portal – this provides the central interface from which users upload and access data sets, and analysis services. The interface provides users with the facility to search for patient data based on a number of criteria.

• Web Services – the web services platform implements required data processing functions.

• Workflow environment – the workflow environment provides a virtual experiment system, from which users can launch pre-defined workflows to automate moving data between the data environment and multiple data processing services.

Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011

Page 18: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

IMENSE Interface

IMENSE Environment

Page 19: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Workflows

• GSEngine is a workflow orchestration engine developed by the ViroLab project

• Can be used to orchestrate applications launched by AHE• It allows services to be orchestrated using both point and

click and scripting interfaces• Workflows stored in a repository and shared between

users • Many of the aims of ViroLab similar to VPH-I projects, so

GSEngine will be useful here

Malawski et al, Future Generation Computer Systems, 26, (1), 138—146, 2010

Page 20: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Inside IMENSE: Integrating the components

Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011

Page 21: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

UK Infrastructural Failures

UK computing e-Infrastructure is crumbling. Not a holding partner in PRACE.

No Tier-0 site in the works. Only one Tier-1 machine (with issues).

HECToR has had several major failures, researchers seem to have trouble using/trusting it, given its usage.

What’s happening next? Tier-2 facilities are also being dismantled. NGS core nodes being shut down!! We cannot maintain a good level of e-Science research

without the infrastructure to support it Relative to other countries we’re in full scale retreat!

Page 22: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Infrastructure in the UK is fragmented

22

Data

HPC

NGS

Networks

?

Page 23: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

TeraGrid eXtreme Digital (XD)

• Two sets of services:

– XES will provide a set of well-known (and standard) protocol specifications and profiles

– CPS will support both the diversity of different services and capabilities required by the community

• From the desktop to the largest machines!

• XD design is firmly tied to the user requirements of the science and engineering research community.

• Presents the individual user with a common user environment

• Caters to both researchers whose computations require very little data movement and those performing very data-intensive computations.

• Will offer a highly capable service interface to “community user accounts” such as science gateways

https://www.teragrid.org/web/about/xdtransition

Page 24: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

We face major policy hurdles

• For our projects to be successful, we need integrated compute, storage, networks and services.– HPC’s antediluvian policies prevent this from happening

They still have a batch job mentality!

• No coordinated allocations policies in the EU – Need to apply for a project, then if successful apply for

compute access Can’t do project if compute application rejected!

Page 25: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Importance of connectivity

With limited national facilities, connectivity to other countries becomes crucial.

1-10Gbit wide area networks are needed for large simulations and data movements.

However, network provisioning is currently extremely difficult and time-consuming. Researchers end up having to request the links,

rather than resource providers.

Page 26: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Policy issues

E-science research has always required changes in resource provider policies to thrive.

Support for advance machine and network reservations. Including urgent computing.

Improvements in accessibility and usability. Support for Audited Credential Delegation. Interoperability between machines & infrastructures.

DEISA’s Failure to address this augurs poorly for the future

Page 27: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Political issues

Streamlined procedures for UK or EU scientific projects. All-in proposals which, when accepted, grant everything

needed for a research project. This includes funding for research as well as HPC resource allocations.

More sensible service level agreements. If a simulation uses multiple machines and one fails, a full

allocation refund should be given.

MAPPER Policy Document – copies available

Page 28: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Supported by the

Provost's Strategic Fund

Computational Life and Medical Science

The CLMS Network is 3 year initiative from September 2010

Management:

Dean’s Committee

Steering Committee

Director: P.V. Coveney http://www.clms.ucl.ac.uk

Page 29: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

1. Expand UCL’s world-leading position in life and biomedical sciences

2. Steering the collaboration with academic institutions: within UCL, with UCLP and the NHS, UK-CMRI, Yale, and others

3. Exploit initiatives in integrative biomedical systems science from the UK Research Council, EU and others around the world

4. Grow collaborations with industry, create business and commercial opportunities, promote UCL IP licensing

5. Plan for the next stages of activity in computational life and medical sciences at UCL

CLMS GoalsCLMS brings together UCL researchers with clinicians from UCL partnersto develop shared data + compute + data transfer + application support services

Integrated e-Infrastructure and Services

Page 30: Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.

Conclusions

• Biomedical projects all put pressure on resource providers to offer new services and new ways of working

• For interactive and urgent work the batch processing model does not work

• The very conservative model adopted by HPC providers proscribes their resources from being used in innovative ways to do new science and engage new and different kinds of users

• If HPC is to be exploited in computational biomedicine it needs to be used in a way that fits in with the medical & clinical workflow

• VPH and similar initiatives: Will only increase pressure for non-standard services from resource providers