1 Report on CHEP 2007 Raja Nandakumar. 2 Synopsis Two classes of talks and posters ➟ Computer...

18
1 Report on CHEP 2007 Raja Nandakumar

Transcript of 1 Report on CHEP 2007 Raja Nandakumar. 2 Synopsis Two classes of talks and posters ➟ Computer...

1

Report on CHEP 2007

Raja Nandakumar

2

Synopsis Two classes of talks and posters

➟Computer hardware▓ Dominated by cooling / power consumption▓ Mostly in the plenary sessions

➟Software▓ Grid job workload management systems

Job submission by the experiments Site Job handling, monitoring Grid operations (Monte Carlo production, glexec, interoperability, …) Data integrity checking ….

▓ Storage systems Primarily concerning dCache and DPM Distributed storage systems

Parallel session : Grid middleware and tools

3

Computing hardware

Power requirements of LHC computing➟ Important for running costs

▓ ~330W to provision for 100W of electronics

➟Some sites running with air or water cooled racks

Electronics

Server fans

Voltage regulation

Case power supply

Room power distribution

UPS

Room cooling

Electronics 100 W

Server fans 13 W

Voltage regulation 22 W

Case power supply 48 W

Room power distribution 4 W

UPS 18 W

Room cooling 125 W

4

High performance and multi-core

computing Core Frequencies ~ 2-4 GHz, will not change significantly Power

➟ 1,000,000 cores at 25 W / core = 25 MW▓ Just for the cpu

➟ Have to bring core power down by multiple orders of magnitude▓ reduces chip frequency, complexity, capability

Memory Bandwidth➟ As we add cores to a chip, it is increasingly difficulty to provide sufficient

memory bandwidth➟ Application tuning to manage memory bandwidth becomes critical

Network and I/O Bandwidth, data integrity, reliability➟ A Petascale computer will have Petabytes of Memory➟ Current Single File Servers achieve 2-4 GB/s

▓ 70+ hours to checkpoint 1 Petabyte➟ IO management is a major challenge

Memory Cost➟ Can’t expect to maintain current memory / core numbers at petascale.

▓ 2GB/core for ATLAS / CMS

5

Grid job submission

Most new developments were on pilot agent based grid systems➟ Implement job scheduling based on “pull” scheduling paradigm➟ The only method for grid job submission LHCb

▓ DIRAC (> 3 years experience)▓ Ganga is the user analysis front end

➟ Also used in Alice (and Panda and Magic)▓ AliEn since 2001

➟ Used for production, user analysis, data management in LHCb & Alice➟ New developments for others

▓ Panda : Atlas, Charmm Central server based on Apache

▓ GlideIn : Atlas, CMS, CDF Based on Condor

▓ Used for production and analysis➟ Very successful implementations

▓ Real-time view of the local environment▓ Pilot agents can have some intelligence built into the system

Useful for heterogeneous computing environment▓ Recently Panda to be used for all Atlas production

One talk on distributed batch systems

6

Pilot agents

Pilot agents submitted on demand➟Reserve the resource for

immediate use▓ Allows checking of the environment

before job scheduling▓ Only bidirectional network traffic▓ Unidirectional connectivity

➟Terminates gracefully if no work is available

➟Also called GlideIn-s LCG jobs are essentially pilot

jobs for the experiment

7

DIRAC WMS

8

Panda WMS

9

Alice (AliEn / MonaLisa)

History plot of running jobs

10

LHCb (Dirac)Max running jobs

snapshot

11

Glexec

A thin layer to change Unix domain credentials based on grid identity and attribute information

Different modes of operation ➟ With or without setuid

▓ Ability to change the user id of the final job Enable VO to

➟ Internally manage job scheduling and prioritisation➟ Late binding of user jobs to pilots

In production at Fermilab➟ Code ready and tested, awaiting full audit

12

LSF universus

LSF PBS SGE CCE

Cluster/Desktops

LSF Scheduler

Web PortalJob Scheduler

Cluster/Desktops

LSF SchedulerMultiCluster

13

LSF universus Commercial extension of LSF

➟ Interface to multiple clusters➟Centralised scheduler, but sites retain local control➟LSF daemons installed on head nodes of remote

cluster➟Kerberos for user, host and service authentication➟Scp for file transfer

Currently deployed in ➟Sandia National labs to link OpenPBS, PBS Pro and

LSF clusters➟Singapore national grid to link PBS Pro, LSF and

N1GE clusters➟Distributed European Infrastructure for

Supercomputing Applications (DEISA)

14

Grid interoperability

Many different grids➟ WLCG, Nordugrid, Teragrid, …➟ Experiments span the various grids

Short term solutions have to be ad-hoc➟ Maintain parallel infrastructures by the user, site or both

For the medium term setup adaptors and translators In the long term adopt common standards and interfaces

➟ Important in security, information, CE, SE➟ Most grids use X509 standard➟ Multiple “common” standards …➟ GIN (Grid interoperability now) group working on some of this

SRMSRMSRMStorage Control Protocol

GSI/VOMS

GridFTP

GLUE v1

LDAP/GIIS

GRAM

OSG

GSI/VOMSGSI/VOMSSecurity

GridFTPGridFTPStorage Transfer Protocol

GLUE v1.2ARCSchema

LDAP/BDIILDAP/GIISService Discovery

GRAMGridFTPJob Submission

EGEEARC

15

Distributed storage

GridPP organised into 4 regional Tier-2s in the UK

Currently a job follows data into a site➟Consider disk at one site as close

to cpu at another site▓ Eg. Disk at Edinburgh vs cpu at

Glasgow

➟Pool resources for efficiency and ease of use

➟Jobs need to access storage directly from the worker node

16

RTT between Glasgow and Edinburgh ~ 12 s

Custom rfio client➟ Normal : One call / read➟ Readbuf : Fills internal buffer to

service request➟ Readahead : Reads till EOF➟ Streaming : Separate streams for

control & data Tests using single DPM server Atlas expects ~ 10 MiB/s / job Better performance with dedicated

light path Ultimately a single DPM instance

to span Glasgow and Edinburgh sites

17

Data Integrity Large number of components performing data

management in an experiment Two approaches to checking data integrity

➟ Automatic agents continuously performing checks➟ Checks in response to special events

Different catalogs in LHCb : Bookkeeping, LFC, SE Issues seen :

➟ zero size files: zero size files: ➟ missing replica informationmissing replica information:➟ wrong SAPathwrong SAPath➟ wrong SE host:wrong SE host:➟ wrong protocolwrong protocol

▓ sfn, rfio, bbftp…➟ mistakes in files registrationmistakes in files registration

▓ blank spaces on the surl path▓ carriage returns▓ presence of port number in the surl path..

18

Summary Many experiments have embraced the grid Many interesting challenges ahead

➟ Hardware▓ Reduce the power consumed by cpu-s▓ Applications need to manage with lesser RAM

➟ Software▓ Grid interoperability▓ Security with generic pilots / glexec▓ Distributed grid network

And many opportunities➟ To test solutions to above issues➟ Stress test the grid infrastructure

▓ Get ready for data taking▓ Implement lessons in other fields

Biomed …

➟ Note : 1 fully digitised film = 4 PB and needs 1.25 GB/s to play