Marcin Okoń Pozna ń Supercomputing and Networking Center, Supercomputing Department
Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf ·...
Transcript of Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf ·...
RI-222919
www.deisa.eu
Extreme Computing on the Distributed
European Infrastructure for
Supercomputing Applications
Wolfgang Gentzsch
The DEISA Project & Board of Directors of OGF
gentzsch at rzg.mpg.de
OGF 25Cloud Workshop
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 2
RI-222919
HPC Centers
� HPC Centers are service providers, for past 30 years
� Services are computing, storage, applications, data, and other IT services
� They serve (local) research, education, and industry (HLRS in Stuttgart serving Bosch, Daimler, Porsche)
� Very professional: to their end-users, they appear almost as a set of Cloud services (AWS Definition: easy, secure, flexible, on demand, pay per use, self serve)
� But: no virtualization, semi-automatic, operating in step-function (mostly static) mode (increase of performance…
� That’s where they themselves can become a Cloud customer, adding to their portfolio dynamically scaling and adopting to changing business and user demands
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 3
RI-222919
Grids
1998: The Grid: Blueprint for a New Computing Infrastructure:
“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.”
2002: The Anatomy of the Grid:
“. . . coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.”
2002: Grid Checklist:1) coordinates resources that are not subject to centralized control
…2) … using standard, open, general-purpose protocols and
interfaces3) … to deliver nontrivial qualities of service.
Quotes: Ian Foster, Carl Kesselman, Steve Tuecke
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 4
RI-222919
Clouds
• IT resources provisioned outside of corporate data center
• Resources accessed over the internet
• Variable cost of services
• SaaS, PaaS, IaaS, HaaS
• A virtual computing environment
• Build and deliver always-on, pay-per-use IT services
• Near infinite-scale computing, storage, database, related Web services, AND users
• Scaling resources and services up and down
• Abstraction of the hardware from the service
• No need for on-premises software and servers
RI-222919
www.deisa.eu
The DEISA Ecosystem forHPC Applications
Distributed European Infrastructure for
Supercomputing Applications
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 6
RI-222919
DEISA1: May 1st, 2004 – April 30th, 2008
DEISA Project & Partners
DEISA2: May 1st, 2008 – April 30th, 2011
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 7
RI-222919
Vision:Establishing persistent European HPC ecosystem integrating national Tier-1 (Tflop/s) centres and the new European Tier-0 (Pflop/s) centres
Mission:Enhance Europe’s capability in computing and science by integrating most powerful supercomputers into a European HPC e-infrastructure
Built European Supercomputing Service on top of existing national services, based on the deployment and operation of a persistent,production quality, distributed supercomputing environment with continental scope
Strategy:
• Consolidate the existing DEISA1 HPC infrastructure and services
• Deliver a turnkey operational solution for the future persistent European HPC ecosystem
DEISA: Vision - Mission - Strategy
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 8
RI-222919
Technologies
reque
sts
support
Applications
Operations
offer
spro
duct
requests
config
uratio
n
offers
service
offers technology
requests development
Categories of DEISA services
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 9
RI-222919
DEISA
Sites
UnifiedUnified
AAAAAANetworkNetwork
connectivityconnectivity
DataData
transfer transfer
toolstools
Data stagingData staging
toolstools
JobJob
reroutingrerouting
SingleSingle
monitormonitor
systemsystem
CoCo--
reservationreservation
and coand co--
allocationallocation
WorkflowWorkflow
managemntmanagemnt
MultipleMultiple
ways toways to
accessaccess
CommonCommon
productionproduction
environmntenvironmnt
WANWAN
sharedshared
File systemFile system
Network
and
AAA
layers
Job manag.
layer and
monitor.
Presen-
tation
layer
Data
manag.
layer
DEISA Service Layers
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 10
RI-222919
Gateway
CSC
Gateway
ECMWF
Gateway
FZJ
Gateway
IDRIS
Gateway
SARA
Gateway
LRZ
Gateway
HPCX
Gateway
HLRS
NJS CINECA IBM P5
IDB UUDB
Gateway
BSC
Gateway
CINECA NJS FZJ IBM
IDB UUDB
NJS RZG IBM
IDB UUDB
NJS ECMWF IBM P5
IDB UUDB
NJS CSC Cray XT4/5
IDB UUDB
NJS HPCX Cray XT4
IDB UUDB
NJS LRZ SGI ALTIX
IDB UUDB
NJS HLRS NEC SX8
IDB UUDB
CINECA user
LRZ user
job
job
NJS SARA IBM
IDB UUDB
NJS BSC IBM PPC
IDB UUDB
Gateway
RZG
NJSIDRIS IBM P6
IDB UUDB
AIXLL-MC
AIXLL
LINUXPBS Pro
Super-UXNQS II
GridFTP
LINUXMaui/Slurm
UNICOS/lcPBS Pro
LINUXLL
AIXLL-MC
AIXLL-MC
UNICOS/lcPBS Pro
AIXLL-MC
DEISA UNICORE Infrastructure
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 11
RI-222919
AIXLL-MC
AIXLL
LINUXPBS Pro
Super-UXNQS II
GridFTP
UNICOS/lcPBS Pro
LINUXLL
AIX, LinuxLL-MC
AIX, LinuxLL-MC
IBM P5
IBM P6 & BlueGene/P
IBM P6 & BlueGene/P
IBM P6
Cray XT4/5
Cray XT4
SGI ALTIX
NEC SX8
IBM P5+ / P6IBM PPC
IBM P6 & BlueGene/P
UNICOS/lcPBS Pro
AIX, LinuxLL-MC
DEISA Global File System
LINUXMaui/Slurm
Global transparent file system based on the Multi-Cluster General Parallel File System(MC-GPFS of IBM)
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 12
RI-222919
DEISA Life Sciences PortalNICE EnginFrame Cluster/Grid/Cloud Portal
Provides remote, interactive, transparent, and secure access to applications and data on your corporate Intranet or Internet,
or in the Cloud.
Interactive
Applications
Intranet Clients
Win LX
UXMac
Intranet Clients
Win LX
UXMac
Virtualized Data Center Clusters
Users
BatchApplications
Virtualized Storage
Cloud Portal
/ Gateway
Cloud Portal
/ Gateway
Administrators
Administrators
Users
Administrators
Administrators
Users
Sta
nd
ard
pro
toco
lsS
tan
da
rd p
roto
co
ls
Licenses
Users and administrators can access and control computing resources via an intuitive and standard Web interface
virtually anywhere using a standard Web browser.
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 13
RI-222919
DEISA Extreme Computing Initiative(DECI)
• DECI launched in 2005: complex, demanding, innovative simulations requiring the exceptional capabilities of DEISA
• Multi-national proposals encouraged
• Proposals reviewed by national evaluation committees
• Projects chosen on the basis of innovation potential, scientific excellence, relevance criteria, and national priorities
• Most powerful HPC architectures for most challenging projects
• Most appropriate supercomputer architecture selected
RI-222919
www.deisa.eu
Analyzing the Workload of an HPC/Grid Center
Is your scientific application ready for the Cloud ?
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 15
RI-222919
A Closer Look at HPC Centers’ Load *
� Single, cpu-intensive, tightly-coupled, highly scalable computational engineering & science parallel jobs
� Single, cpu-intensive, weakly-scalable, computational engineering & science parallel jobs
� Capacity computing, throughput, parameter jobs
� Managing massive data sets, possibly geographically distributed
� Analysis and visualization of data sets
* Similar to the analysis of T.Sterling and D.Stark, LSU, in a recent HPCwire article
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 16
RI-222919
Analysis 1: tasks supporting HPC
� Supporting heavy compute and data-intensive work, such as…
� data analysis and visualization which are suitable for the use of Cloud services
� Especially for SME’s, small groups, individual researchers, not having full set of specific software and hardware
� Cost of ownership (hardware, ISV licenses) may be high
� No need for local expertise (installing, tuning, maintaining software)
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 17
RI-222919
Analysis 2: management of data sets
� Data-oriented science: data generation, acquisition, organization, correlation, archiving, mining, presentation
� Tertiary storage is difficult and expensive
� Especially distributed data sets are target for Cloud services
� Data integrity higher with cloud services providers, removes single point of failure (hurricanes, lightening strikes, floods)
� Challenge with mission-critical HPC: data security, national security, intellectual property protection, privacy
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 18
RI-222919
Analysis 3: throughput computing
� Array of jobs, parameter studies, throughput job-streams
� Application loads of many sequential or slightly parallel application tasks
� Obviously very promising for Cloud computing
� Cloud services greatly enhance availability of resources and operational flexibility, improving efficiency, reducing cost of equipment and maintenance personnel
� Better focusing on the resources unique to the needs of the HPC applications not served by Clouds
� Challenge: workloads that are security or IP sensitive
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 19
RI-222919
Analysis 4: weakly-scalable single jobs
� Weakly scalable because of fixed problem size (discrete volumes, finite elements, mesh points), or low degree of parallelization
� Users often demand hands-on access to the specifics of the physical machine
� Virtualization often precludes architecture-specific performance tuning essential to HPC: user productivity versus optimal performance of long-running jobs on Beowulf-type clusters and MPPs
� I/O bandwidth often needs to be well balanced with application needs (not assured by the abstraction of today’s Clouds)
� Even worse: checkpoint and restart
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 20
RI-222919
Analysis 5: capability computing
� Big science, grand challenge applications, running hours, days or weeks on teraflop or petaflop systems, with potentially 106 cores and 1013 TB main memory
� Highly scalable, massively parallel, tightly coupled, optimally tuned applications
� Resilience through checkpoint / restart
� HPC systems: unique design, limited market: loss of economy of scale
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 21
RI-222919
An HPC ChecklistWhen is your HPC app ready for the Cloud ?
� If there are no issues with licenses, IP, secrecy, sensitive data, privacy, legal or regulatory issues, . . .
� If your app is (almost) architecture independent, not optimized for specific architecture (i.e. single process, loosely-coupled low-level parallel, I/O-robust)
� If it’s just one app and zillions of parameters
� If latency and bandwidth are not an issue
� If time (wait, wall, run) doesn’t really matter
� If your job is low-priority, simple SLAs, can re-run, . . .
Ideally, your HPC Center’s meta-scheduler knows all this and schedules automatically ☺☺☺☺
OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 22
RI-222919
Conclusion
DEISA Initiative is successful - because:
• Built on top of proven, professional infrastructure of HPC centers with expertise in implementation, operation, services respecting user need.
• Moderately and evolutionary enhancing existing HPC services - from local to global - according to user requirements: revolution by evolution.
• Supports user at level of user-friendly access to resources AND at level of application supporting users porting their apps onto turnkey architecture.
• Ecosystem of resources, middleware, applications is respecting administrative, cultural and political autonomy of partners/centres.
• Real chance that DEISA ecosystem will continue to operate successfully in a sustainable way after EU funding, in the interest of the ‘global scientist’,
(almost) as an HPC Cloud !