J.chudoba-Cluster and Grid Computing

18
CLUSTER AND GRID COMPUTING Jiří Chudoba, Institute of Physics

Transcript of J.chudoba-Cluster and Grid Computing

Page 1: J.chudoba-Cluster and Grid Computing

CLUSTER AND GRID COMPUTING

Jiří Chudoba, Institute of Physics

Page 2: J.chudoba-Cluster and Grid Computing

COMPUTING CAPACITIES

Ever increasing demands

Moore’s law helps

Increasing complexity

Infrastructure

Management

Changing trends

Computing as a service

26.4.2010 [email protected]

Page 3: J.chudoba-Cluster and Grid Computing

HPC

Supercomputers vs clusters

26.4.2010 [email protected]

Page 4: J.chudoba-Cluster and Grid Computing

26.4.2010 [email protected] 4

Page 5: J.chudoba-Cluster and Grid Computing

26.4.2010 [email protected] 5

Page 6: J.chudoba-Cluster and Grid Computing

26.4.2010 [email protected] 6

Page 7: J.chudoba-Cluster and Grid Computing

HPC

Supercomputers vs clusters

DEISA (Distributed European Infrastructure for

Supercomputing Applications)

EGI (European Grid Initiative)

Choice application dependent

memory requirements

intercommunications

26.4.2010 [email protected]

Page 8: J.chudoba-Cluster and Grid Computing

COMPUTING ROOMS

Power

Cooling

Space

Network

Reliability

Safety

26.4.2010 [email protected]

Page 9: J.chudoba-Cluster and Grid Computing

26.4.2010 [email protected] 9

Price optimization leads to a mixture of different solutions

Water cooling added to the original air cooling

Excellent utilization

Institute of Physics hardware (partial view)

3000 cores (27 Tflops), 6 TB RAM, 300 TB disks, 10 Gb,

150 kW continuous power consumption (without cooling)

Page 10: J.chudoba-Cluster and Grid Computing

GRID AND CLOUD COMPUTING

Interconnection of several computing farms

Transparent for users, Virtual Organizations

Many projects: EGEE, OSG NorduGrid,

BEinGRID (Business Experiments in Grid),

Volunteer computing (SETI@home), …

Clouds – Amazon EC2, GoGrid, VMware

vCloud,

Pros: huge resources, reliability for projects

Cons: reliability for single jobs, complexity26.4.2010 [email protected]

Page 11: J.chudoba-Cluster and Grid Computing

GRIDS IN THE CZECH REPUBLIC

CESNET

National Grid Initiative, Metacentrum

EGEE partner, EGI member

IOP (FZU)– member of WLCG

WLCG – Worldwide LHC Computing Grid

26.4.2010 [email protected]

Page 12: J.chudoba-Cluster and Grid Computing

WLCG

A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments

Managed and operated by a worldwide collaboration between the experiments and the participating computer centres

The resources are distributed – for funding and sociological reasons

Our task is to make use of the resources available to us – no matter where they are located We know it would be simpler to put all the resources in 1 or 2 large

centres

26.4.2010 [email protected]

Page 13: J.chudoba-Cluster and Grid Computing

(W)LCG – PROJECT AND COLLABORATION

LCG was set up as a project in 2 phases: Phase I – 2002-05 - Development & planning;

prototypes End of this phase the computing Technical Design Reports were

delivered (1 for LCG and 1 per experiment)

Phase II – 2006-2008 – Deployment & commissioning of the initial services Program of data and service challenges

During Phase II, the WLCG Collaboration was set up as the mechanism for the longer term: Via an MoU – signatories are CERN and the funding agencies

Sets out conditions and requirements for Tier 0, Tier 1, Tier2 services, reliabilities etc (“SLA”)

Specifies resource contributions – 3 year outlook

26.4.2010 [email protected]

Page 14: J.chudoba-Cluster and Grid Computing

De-FZK

US-FNAL

Ca-

TRIUMF

NDGF

CERN

Barcelona/PIC Lyon/CCIN2P3

US-

BNL

UK-RAL

Taipei/ASGC

14th April 2010Sergio Bertolucci, CERN

14

Today we have 49 MoU signatories, representing 34

countries:

Australia, Austria, Belgium, Brazil, Canada, China,

Czech Rep, Denmark, Estonia, Finland, France,

Germany, Hungary, Italy, India, Israel, Japan, Rep.

Korea, Netherlands, Norway, Pakistan, Poland,

Portugal, Romania, Russia, Slovenia, Spain,

Sweden, Switzerland, Taipei, Turkey, UK, Ukraine,

USA.

WLCG Today

Tier 0; 11 Tier 1s; 61 Tier 2 federations

(121 Tier 2 sites)

Amsterdam/NIKHEF-SARA

Bologna/CNAF

Page 15: J.chudoba-Cluster and Grid Computing

TODAY WLCG IS:

Running increasingly high workloads: Jobs in excess of 650k /

day Anticipate millions / day soon

CPU equiv. ~100k cores

Workloads are: Real data processing

Simulations

Analysis – more and more (new) users

26.4.2010 [email protected]

Page 16: J.chudoba-Cluster and Grid Computing
Page 17: J.chudoba-Cluster and Grid Computing

SERVICE QUALITY: DEFINED IN MOU

26.4.2010 [email protected]

MoU defines key performance and support metrics for Tier 1 and Tier 2 sites Reliabilities are an approximation for some of these

Also metrics on response times, resources, etc.

The MoU has been an important tool in bringing services to an acceptable level

Page 18: J.chudoba-Cluster and Grid Computing

Personal conclusions

Grids

large international projects, massive data

processing (with low parallelism), groups with

occasional demands, high capacity utilization

Clouds

price attractive solutions for some problems

Local clusters

total control, but may be costly (operation,

infrastructure)

26.4.2010 [email protected]