Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance...
-
Upload
sharleen-rodgers -
Category
Documents
-
view
212 -
download
0
Transcript of Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance...
Integrated e-Infrastructure for Distributed, Data-driven, Data-
intensive High Performance Computing: Biomedical
Requirements
Peter V Coveney
Centre for Computational Science
University College London
Integrating the Strengths of the e-Research Community,NeSC, Thursday, 10th March 2011
• Computational Biomedicine– HIV/AIDS– Cardiovascular medicine– Cancer
• ICT, e-Health and the Virtual Physiological Human
• Infrastructure support
• Shortcomings in UK infrastructure
• Major policy hurdles
• UCL CLMS initiative
• Conclusions
Contents
UCL Projects• VPH Network of Excellence – EU (€8M); no HPC• ContraCancrum – EU (€3.4M); no HPC• VPH-Share – EU (€10.7M); no HPC• P-Medicine – EU (€13.7M); no HPC• INBIOMEDVision – EU (€2M)• MAPPER – EU (€2M); no HPC
• A new approach to Science at the Life Sciences Interface – EPSRC (£4M) + HECToR
• Large Scale Lattice-Boltzmann Simulation of Liquid Crystals – EPSRC (£800K) + HECToR
Patient-specific medicine• ‘Personalised medicine’ - use the patient’s specific profile to better
manage disease or a predisposition towards a disease• Tailoring of medical treatments based on the characteristics of an
individual patient
Patient-specific medical-simulation• Use of genotypic and or phenotypic simulation to customise treatments
for each particular patient, where computational simulation can be used to predict the outcome of courses of treatment and/or surgery
Why use patient-specific approaches?• Treatments can be assessed for their effectiveness with respect to the
patient before being administered, saving the potential expense of ineffective treatments
See: P. V. Coveney et al (eds), Interface Focus, Theme Issue on VPH Vol. 1, No. 3 Online 25th April 2011
HIV-1 Protease is a common target for HIV drug therapy
• Enzyme of HIV responsible for protein maturation
• Target for Anti-retroviral Inhibitors
• Example of Structure Assisted Drug Design
• 9 FDA inhibitors of HIV-1 protease
So what’s the problem?• Emergence of drug resistant
mutations in protease• Render drug ineffective• Drug resistant mutants have
emerged for all FDA inhibitors
Monomer B101 - 199
Monomer A1 - 99
Flaps
Leucine - 90, 190
Glycine - 48, 148
Catalytic Aspartic Acids - 25, 125
Saquinavir
P2 Subsite
N-terminalC-terminal
Medical/clinical domain I: HIV/AIDS
Integrate simulation with conventional clinical decision support systems to refine results
VPHShareVPHShare
The goal: to simulatelarge scale patient specific cerebral blood flow in clinically relevant time frames Objectives:
•To study cerebral blood flow using patient-specific image-based models.•To provide insights into the cerebral blood flow & anomalies.•To develop tools and policies by means of which users can better exploit the ability to reserve and co-reserve HPC resources.•To develop interfaces which permit users to easily deploy and monitor simulations across multiple computational resources.•To visualize and steer the results of distributed simulations in real time
Yield patient-specific information which helps plan embolisation of arterio-venous malformations, aneurysms, etc.
Medical/clinical domain II: Grid enabled neurosurgical imaging using simulation
M. D. Mazzeo and P. V. Coveney, Computer Physics Communications, 178, (12), 894-914, (2008). DOI: 10.1016/j.cpc.2008.02.013.
Medical/clinical domain III: ContraCancrum
Multi-level data Multi-level Modelling
Two dedicated clinical studies in ContraCancrum, one in glioma and one in lung cancer (200 cases/year)
Schedule 1 Schedule 2 Schedule … Schedule n
Multi -level Models of Cancer
Other clinical data needed
Prediction of the best treatment schedule / schema
http://www.contracancrum.eu
Clinically Oriented Translational Cancer Multilevel Modelling
Virtual Physiological Human
• Funded under EU FP 7; ~ €250M• 20 projects: 1 NoE, 5 IPs, 11 STREPs, 3 CAs.
“a methodological and technological framework that, once established, will enable collaborative investigation of the human body as a single complex system ...”
Networking NoE
VPHShareVPHShare
VPH-Share Overview
VPH-Share will provide the organisational fabric (the infostructure), realised as a series of services, offered in an integrated framework, to expose and to manage data,
information and tools, to enable the composition and operation of new VPH workflows and to facilitate collaborations between the members of the VPH community.
HIV Heart Aneurisms Musculoskeletal
€11M, 2011-2015, EU FP7 – Promotes cloud technologies
VPHShareVPHShare
p-medicinep-medicine
Predictive disease modeling in Predictive disease modeling in p-medicine p-medicine will contribute to the optimization of will contribute to the optimization of cancer treatment by fully exploiting the individual data of the patient. cancer treatment by fully exploiting the individual data of the patient.
p-medicine p-medicine is focusing on Wilms tumor, breast cancer and acute is focusing on Wilms tumor, breast cancer and acute lymphoblastic leukemia lymphoblastic leukemia
The The p-medicine p-medicine infrastructure supports both a generic seamless, multi-level infrastructure supports both a generic seamless, multi-level data data integrationintegration purpose and a VPH-specific, multi-level, cancer purpose and a VPH-specific, multi-level, cancer data repository data repository to to facilitate model validation and clinical translation through trials.facilitate model validation and clinical translation through trials.The infrastructure is scalable for any disease as long as The infrastructure is scalable for any disease as long as predictive modeling is predictive modeling is clinically significantclinically significant in one or more levels (from molecular to tissue level) and the in one or more levels (from molecular to tissue level) and the development of such models is feasible development of such models is feasible (i.e. there is enough understanding of the (i.e. there is enough understanding of the biological mechanisms involved to develop them).biological mechanisms involved to develop them).Led by a clinical oncologist - Prof Norbert Graf! Led by a clinical oncologist - Prof Norbert Graf!
Disease Modelling at the molecular Level
Disease Modelling at the cellular Level
N
SG1
G2
M G0
A
Disease Modelling at the tissue/organ Level
Multi-scale therapy predictions/disease evolution results
Multi-level disease modelingMulti-level disease modeling
€13M, 2011-2013, EU FP7
Large scale data & computingLarge scale data & computingLarge scale data & computingLarge scale data & computing
21/04/23
Seamless access and integration of distributed, heterogeneous datain a data warehouse repeatedly over time (≈ 200 GB / patient and time point)Seamless access and integration of distributed, heterogeneous datain a data warehouse repeatedly over time (≈ 200 GB / patient and time point)
Models are built for use in clinical Models are built for use in clinical decision supportdecision support
results are needed in a timely results are needed in a timely fashionfashion
It is necessary to have the It is necessary to have the possibility of seamlessly “plugging possibility of seamlessly “plugging in” resources for parallel and large in” resources for parallel and large scale computing “here and now”scale computing “here and now”
petascale computing is needed to petascale computing is needed to perform e.g.:perform e.g.:
activities like drug binding affinity activities like drug binding affinity determinationdeterminationBlood flow through tumoursBlood flow through tumours
Gratis via VPH-NoE supervised VPH Gratis via VPH-NoE supervised VPH Virtual Community allocations of Virtual Community allocations of time on DEISA and, in future PRACE time on DEISA and, in future PRACE via MAPPER, …? via MAPPER, …?
MAPPER: Objectives and Challenges
MAPPER will develop computational strategies, software and services for distributed multiscale simulations across disciplines, exploiting existing and evolving European e-Infrastructure.
Driven by seven exemplar multiscale applications, MAPPER will deploy a computational science infrastructure for distributed multiscale computing on and across European e-Infrastructures.
By taking advantage of existing software and services, MAPPER will deliver high quality components aiming at large-scale, heterogeneous, high performance multidisciplinary multiscale computing, while maintaining ease of use and transparency for end users.
MAPPER will advance state-of-the-art in high performance computing on e-Infrastructures by enabling distributed execution, across all European e-Infrastructures, of multiscale models.
http://www.mapper-project.eu
VPH ToolKit
http://toolkit.vph-noe.eu
VPH Virtual Community on DEISA
+ euHeart in second wave, and other non-VPH EU projects
VPH was awarded 2 million standard DEISA core hours for 2009, renewed for 2010 and 2011
• HECToR (Cray, UK)• SARA (IBM Power 6, Netherlands)
DEISA-TeraGrid interoperability project has additional access to LRZ
• Computational experiments integrated seamlessly into current clinical practice
• Clinical decisions influenced by patient specific computations: turnaround time for data acquisition, simulation, post-processing, visualisation, final results and reporting.
• Fitting the computational time scale to the clinical time scale:– Capture the clinical workflow– Get results which will influence clinical decisions: 1 day? 1 week?– This project - 15 to 30 minutes
• Development of procedures and software in consultation with clinicians
• Security/Access is major concern
• Need to integrate Data, Compute via Workflows
• On-demand availability of storage, networking and computational resources
VPH requires HPC and Data Integration
Many of the projects we are involved in have non-standard requirements with respect to HPC service providers
• Ability to co-reserve resources HARC• Launch emergency simulations SPRUCE• Consistent interfaces for federated access AHE• Access to back end nodes: steering, visualisation• Lightpath network connections• Data integration from multiple sources IMENSE• Support for software (ReG steering toolkit etc)
Individualized MEdiciNe Simulation Environment IMENSE• Data repository – this is the key store for project data containing all
patient data, and simulation data derived from the patient data.
• Integrated web portal – this provides the central interface from which users upload and access data sets, and analysis services. The interface provides users with the facility to search for patient data based on a number of criteria.
• Web Services – the web services platform implements required data processing functions.
• Workflow environment – the workflow environment provides a virtual experiment system, from which users can launch pre-defined workflows to automate moving data between the data environment and multiple data processing services.
Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011
IMENSE Interface
IMENSE Environment
Workflows
• GSEngine is a workflow orchestration engine developed by the ViroLab project
• Can be used to orchestrate applications launched by AHE• It allows services to be orchestrated using both point and
click and scripting interfaces• Workflows stored in a repository and shared between
users • Many of the aims of ViroLab similar to VPH-I projects, so
GSEngine will be useful here
Malawski et al, Future Generation Computer Systems, 26, (1), 138—146, 2010
Inside IMENSE: Integrating the components
Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011
UK Infrastructural Failures
UK computing e-Infrastructure is crumbling. Not a holding partner in PRACE.
No Tier-0 site in the works. Only one Tier-1 machine (with issues).
HECToR has had several major failures, researchers seem to have trouble using/trusting it, given its usage.
What’s happening next? Tier-2 facilities are also being dismantled. NGS core nodes being shut down!! We cannot maintain a good level of e-Science research
without the infrastructure to support it Relative to other countries we’re in full scale retreat!
Infrastructure in the UK is fragmented
22
Data
HPC
NGS
Networks
?
TeraGrid eXtreme Digital (XD)
• Two sets of services:
– XES will provide a set of well-known (and standard) protocol specifications and profiles
– CPS will support both the diversity of different services and capabilities required by the community
• From the desktop to the largest machines!
• XD design is firmly tied to the user requirements of the science and engineering research community.
• Presents the individual user with a common user environment
• Caters to both researchers whose computations require very little data movement and those performing very data-intensive computations.
• Will offer a highly capable service interface to “community user accounts” such as science gateways
https://www.teragrid.org/web/about/xdtransition
We face major policy hurdles
• For our projects to be successful, we need integrated compute, storage, networks and services.– HPC’s antediluvian policies prevent this from happening
They still have a batch job mentality!
• No coordinated allocations policies in the EU – Need to apply for a project, then if successful apply for
compute access Can’t do project if compute application rejected!
Importance of connectivity
With limited national facilities, connectivity to other countries becomes crucial.
1-10Gbit wide area networks are needed for large simulations and data movements.
However, network provisioning is currently extremely difficult and time-consuming. Researchers end up having to request the links,
rather than resource providers.
Policy issues
E-science research has always required changes in resource provider policies to thrive.
Support for advance machine and network reservations. Including urgent computing.
Improvements in accessibility and usability. Support for Audited Credential Delegation. Interoperability between machines & infrastructures.
DEISA’s Failure to address this augurs poorly for the future
Political issues
Streamlined procedures for UK or EU scientific projects. All-in proposals which, when accepted, grant everything
needed for a research project. This includes funding for research as well as HPC resource allocations.
More sensible service level agreements. If a simulation uses multiple machines and one fails, a full
allocation refund should be given.
MAPPER Policy Document – copies available
Supported by the
Provost's Strategic Fund
Computational Life and Medical Science
The CLMS Network is 3 year initiative from September 2010
Management:
Dean’s Committee
Steering Committee
Director: P.V. Coveney http://www.clms.ucl.ac.uk
1. Expand UCL’s world-leading position in life and biomedical sciences
2. Steering the collaboration with academic institutions: within UCL, with UCLP and the NHS, UK-CMRI, Yale, and others
3. Exploit initiatives in integrative biomedical systems science from the UK Research Council, EU and others around the world
4. Grow collaborations with industry, create business and commercial opportunities, promote UCL IP licensing
5. Plan for the next stages of activity in computational life and medical sciences at UCL
CLMS GoalsCLMS brings together UCL researchers with clinicians from UCL partnersto develop shared data + compute + data transfer + application support services
Integrated e-Infrastructure and Services
Conclusions
• Biomedical projects all put pressure on resource providers to offer new services and new ways of working
• For interactive and urgent work the batch processing model does not work
• The very conservative model adopted by HPC providers proscribes their resources from being used in innovative ways to do new science and engage new and different kinds of users
• If HPC is to be exploited in computational biomedicine it needs to be used in a way that fits in with the medical & clinical workflow
• VPH and similar initiatives: Will only increase pressure for non-standard services from resource providers