Post on 15-Mar-2016
description
e-Science and GridThe VL-e approach
L.O. (Bob) Hertzberger
Computer Architecture and Parallel Systems GroupDepartment of Computer Science
Universiteit van Amsterdam
bob@science.uva.nl
Background informationexperimental sciences
• Experiments become increasingly more complex Driven by detector developments
Resolution increases Automation & robotization increases
• Results in an increase in amount and complexity of data
• Something has to be done to harness this development Virtualization of experimental resources: e-Science
The Application data crisis
• Scientific experiments start to generate lots of data medical imaging (fMRI): ~ 1 GByte per measurement (day) Bio-informatics queries: 500 GByte per database Satellite world imagery: ~ 5 TByte/year Current particle physics: 1 PByte per year LHC physics (2007): 10-30 PByte per year
• Data is often very distributed
Paradigm shift in Life science
• Past experiments where hypothesis drivenEvaluate hypothesisComplement existing knowledge
• Present experiments are data drivenDiscover knowledge from large amounts
of data Apply statistical techniques
The what of e-Science• e-Science is the application domain “Science” of Grid
& Web More than only coping with data explosion A multi-disciplinary activity combining human expertise &
knowledge between: A particular domain scientist ICT scientist
• e-Science demands a different approach to experimentation because computer is integrated part of experiment
Consequence is a radical change in design for experimentation
• e-Science should apply and integrate Web/Grid methods where and whenever possible
Grid and Web ServicesConvergence
Grid
Definition of Web Service Resource Framework(WSRF) makes explicit distinction between “service” and stateful entities acting upon service i.e. the “resources” Means that Grid and Web communities can move forward on a common base
WSRF
Started far apart in apps & tech
OGSIGT2
GT1
HTTPWSDL, WS-*
WSDL 2, WSDM
Have beenconverging
Ref: Foster
Web
Grid service ‘offerings’• Capability to run programs and scripts on remote
sites on demand• Ability to exchange and replicate large bulk-data sets• Replica location services for files based on logical
names• Job monitoring using a distributed relational
information system• Resource brokering and transparent access to
remote facilities• Management of user groups, roles and access rights
Relation to European Grid infrastructures
• Common European e-Infrastructure middleware (EGEE) for core grid services
• Based on successful EU DataGrid, CrossGrid, and LCG software suite
• Already deployed worldwide on a O(100) site production facility
• Support through EGEE Regional Operations Centre (SARA and NIKHEF)
EGEE: Enabling Grids for E-science in Europe (EU FP6)
Levels of Grid abstraction
Computational Grid
Data Grid
Information Web/Grid
Semantic/Knowledge Web/Grid
e-Science Objectives• It should enhance the scientific process by:• Stimulating collaboration by sharing data & information
Improve re-use of data & information• Combing data and information from different modalities
Sensor data & information fusion• Realize the combination of real life & (model based) simulation experiments
• It should result in:• Computer aided support for rapid prototyping of ideas
Stimulate the creativity process
• It should realize that by creating & applying: New computing methodologies and an infrastructure stimulating this
• We try to do this via the Virtual Lab for e-Science (VL-e) project
Virtual Lab for e-Science research Philosophy
• Multidisciplinary research & development of related ICT infrastructure
• Generic application support Application cases are drivers for computer & computational
science and engineering research
Grid/Web ServicesHarness multi-domain distributed resources
Managementof comm. & computing
VL-e Application Oriented Services
Food Informatics
Dutch Telescience
Medical Diagnosis &
Imaging
VL-e projectBio-
Informatics Data
Intensive Science/
HEP
Bio-Diversity
Virtual Lab for e-Science research Philosophy
• Multidisciplinary research and development of related ICT infrastructure
• Generic application support Application cases are drivers for computer & computational
science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever
possible
Grid/ Web ServicesHarness multi-domain distributed resources
Managementof comm. & computing
Managementof comm. & computing
Managementof comm. & computing
Potential Genericpart Potential Generic
partPotential Generic
part
ApplicationSpecific
Part
ApplicationSpecific
Part
ApplicationSpecific
Part
Virtual Laboratory Application Oriented Services
App
licat
ion
pull
Generic e-Science aspects• Virtual Reality Visualization & user interfaces• Imaging • Modeling & Simulation
Interactive Problem Solving• Data & information management
Data modeling dynamic work flow management
• Content (knowledge) management Semantic aspects Meta data modeling
Ontologies
• Wrapper technology• Design for Experimentation
Virtual Lab for e-Science research Philosophy
• Multidisciplinary research and development of related ICT infrastructure
• Generic application support Application cases are drivers for computer & computational
science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible
• Rationalization of experimental process among others the experimental pipeline Reproducible & comparable
Issues for a reproducible scientific experiment
interpretation
Rationalization of the experiment and processes via protocols
processingprocessed data
conversion, filtering,analyses, simulation, …experiment
parameters/settings,algorithms,
intermediate results,…
Parameter settings,Calibrations,
Protocols…
software packages,algorithms
…
raw dataacquisition
sensors,amplifiers imaging devices,, …
presentationvisualization, animationinteractive exploration, …
MetadataMuch of this is lost when an experiment is completed.
Scientific Workflow Management Systems in an e-Science environment• Functionalities:
Automating experiment routines;
Rapid prototyping of experimental computing systems;
Hiding integration details between resources;
Managing experiment lifecycle;
• Cross different layers of middleware for managing: Data; Computing; Information; Knowledge.
Generic Grid middleware
Data management
Computing tasks
Information
Knowledge
SWMS High level workflow services
Engine
User support
Domain specific Applications
e-Science framework
Grid infrastructure
Virtual Lab for e-Science research Philosophy
• Multidisciplinary research and development of related ICT infrastructure
• Generic application support Application cases are drivers for computer & computational science and
engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible
• Rationalization of experimental process Reproducible & comparable
• Two research experimentation environments Proof of concept for application experimentation Rapid prototyping for computer & computational science experimentation
The VL-e infrastructure
Grid Middleware
Surfnet
Application specificservice
Application Potential
Generic service &
Virtual Lab. services
Grid &
NetworkServices
Virtual Laboratory
VL-e Proof of Concept Environment
Telescience Medical Application Bio ASP
VL-e Experimental Environment
Virtual Lab.rapid prototyping
(interactive simulation)
Additional Grid Services
(OGSA services)
Network Service (lambda networking)
VL-e Certification Environment
Test & Cert.Compatibility
Test & Cert.Grid Middleware
Test & Cert.VL-software
Infrastructure for Applications
• Applications are a driving force of the PoC
• Experience shows applications value stability
• Foster two-way interaction to make this happen
VL-e PoC environment• Latest certified stable software environment of
core grid and VL-e services• Core infrastructure built around clusters and
storage at SARA and NIKHEF (‘production’ quality) Good basis for Tier-1
• Controlled extension to other platforms and distributions
• On the user end: install needed servers: user interface systems, storage elements for data disclosure, grid-secured DB access
• Focus on stability and scalability
Hosted services for VL-e
• Key services and resources are offered centrally for all applications in VL-e
• Mass data and number crunching on the large resources at SARA
• Storage for data replication & distribution• Persistent ‘strategic’ storage on tape• Resource brokers, resource discovery, user
group management
Why such a complex scheme?• “software is part of the infrastructure”• stability of core software needed to
develop the new scientific applications• enable distributed systems management
(who runs what version when?)
“the grid is one big error amplifier”“computers make mistakes like
humans, only much, much faster”
Building a scalable infrastructure
With good code, stable releases & supportyou can build large working systems, useful to science
Conclusions• e-Science is a lot more more than trying to cope with
data explosion alone• Implementation of e-Science systems requires further
rationalization and standardization of experimentation process
• e-Science success demands the realization of an environment allowing application driven experimentation & rapid dissemination of feed back of these new methods
• We try to do that via development of Proof of Concept• Good basis for HEP Tier-1