Post on 05-Mar-2015
CLUSTER AND GRID COMPUTING
Jiří Chudoba, Institute of Physics
COMPUTING CAPACITIES
Ever increasing demands
Moore’s law helps
Increasing complexity
Infrastructure
Management
Changing trends
Computing as a service
26.4.2010 2Jiri.Chudoba@fzu.cz
HPC
Supercomputers vs clusters
26.4.2010 3Jiri.Chudoba@fzu.cz
26.4.2010 Jiri.Chudoba@fzu.cz 4
26.4.2010 Jiri.Chudoba@fzu.cz 5
26.4.2010 Jiri.Chudoba@fzu.cz 6
HPC
Supercomputers vs clusters
DEISA (Distributed European Infrastructure for
Supercomputing Applications)
EGI (European Grid Initiative)
Choice application dependent
memory requirements
intercommunications
26.4.2010 7Jiri.Chudoba@fzu.cz
COMPUTING ROOMS
Power
Cooling
Space
Network
Reliability
Safety
26.4.2010 8Jiri.Chudoba@fzu.cz
26.4.2010 Jiri.Chudoba@fzu.cz 9
Price optimization leads to a mixture of different solutions
Water cooling added to the original air cooling
Excellent utilization
Institute of Physics hardware (partial view)
3000 cores (27 Tflops), 6 TB RAM, 300 TB disks, 10 Gb,
150 kW continuous power consumption (without cooling)
GRID AND CLOUD COMPUTING
Interconnection of several computing farms
Transparent for users, Virtual Organizations
Many projects: EGEE, OSG NorduGrid,
BEinGRID (Business Experiments in Grid),
Volunteer computing (SETI@home), …
Clouds – Amazon EC2, GoGrid, VMware
vCloud,
Pros: huge resources, reliability for projects
Cons: reliability for single jobs, complexity26.4.2010 10Jiri.Chudoba@fzu.cz
GRIDS IN THE CZECH REPUBLIC
CESNET
National Grid Initiative, Metacentrum
EGEE partner, EGI member
IOP (FZU)– member of WLCG
WLCG – Worldwide LHC Computing Grid
26.4.2010 11Jiri.Chudoba@fzu.cz
WLCG
A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments
Managed and operated by a worldwide collaboration between the experiments and the participating computer centres
The resources are distributed – for funding and sociological reasons
Our task is to make use of the resources available to us – no matter where they are located We know it would be simpler to put all the resources in 1 or 2 large
centres
26.4.2010 12Jiri.Chudoba@fzu.cz
(W)LCG – PROJECT AND COLLABORATION
LCG was set up as a project in 2 phases: Phase I – 2002-05 - Development & planning;
prototypes End of this phase the computing Technical Design Reports were
delivered (1 for LCG and 1 per experiment)
Phase II – 2006-2008 – Deployment & commissioning of the initial services Program of data and service challenges
During Phase II, the WLCG Collaboration was set up as the mechanism for the longer term: Via an MoU – signatories are CERN and the funding agencies
Sets out conditions and requirements for Tier 0, Tier 1, Tier2 services, reliabilities etc (“SLA”)
Specifies resource contributions – 3 year outlook
26.4.2010 13Jiri.Chudoba@fzu.cz
De-FZK
US-FNAL
Ca-
TRIUMF
NDGF
CERN
Barcelona/PIC Lyon/CCIN2P3
US-
BNL
UK-RAL
Taipei/ASGC
14th April 2010Sergio Bertolucci, CERN
14
Today we have 49 MoU signatories, representing 34
countries:
Australia, Austria, Belgium, Brazil, Canada, China,
Czech Rep, Denmark, Estonia, Finland, France,
Germany, Hungary, Italy, India, Israel, Japan, Rep.
Korea, Netherlands, Norway, Pakistan, Poland,
Portugal, Romania, Russia, Slovenia, Spain,
Sweden, Switzerland, Taipei, Turkey, UK, Ukraine,
USA.
WLCG Today
Tier 0; 11 Tier 1s; 61 Tier 2 federations
(121 Tier 2 sites)
Amsterdam/NIKHEF-SARA
Bologna/CNAF
TODAY WLCG IS:
Running increasingly high workloads: Jobs in excess of 650k /
day Anticipate millions / day soon
CPU equiv. ~100k cores
Workloads are: Real data processing
Simulations
Analysis – more and more (new) users
26.4.2010 15Jiri.Chudoba@fzu.cz
SERVICE QUALITY: DEFINED IN MOU
26.4.2010 17Jiri.Chudoba@fzu.cz
MoU defines key performance and support metrics for Tier 1 and Tier 2 sites Reliabilities are an approximation for some of these
Also metrics on response times, resources, etc.
The MoU has been an important tool in bringing services to an acceptable level
Personal conclusions
Grids
large international projects, massive data
processing (with low parallelism), groups with
occasional demands, high capacity utilization
Clouds
price attractive solutions for some problems
Local clusters
total control, but may be costly (operation,
infrastructure)
26.4.2010 18Jiri.Chudoba@fzu.cz