GridICE: overview and current status

24
Enabling Grids for E-sciencE www.eu-egee.org GridICE: overview and current status Guido Cuscela INFN – Bari Service Challenge Technical Meeting September 15 , 2006

description

GridICE: overview and current status. Guido Cuscela INFN – Bari Service Challenge Technical Meeting September 15 , 2006. Outline. Old and new features (new release) What we are monitoring Job monitoring results (INFN-T1) Use cases (web interface) Issues. Why GridICE for monitoring. - PowerPoint PPT Presentation

Transcript of GridICE: overview and current status

Page 1: GridICE: overview and current status

Enabling Grids for E-sciencE

www.eu-egee.org

GridICE:overview and current status

Guido CuscelaINFN – Bari

Service Challenge Technical MeetingSeptember 15 , 2006

Page 2: GridICE: overview and current status

2

Enabling Grids for E-sciencE

Outline

• Old and new features (new release)

• What we are monitoring

• Job monitoring results (INFN-T1)

• Use cases (web interface)

• Issues

Page 3: GridICE: overview and current status

3

Enabling Grids for E-sciencE

Why GridICE for monitoring

– Grid monitoring Grid resources and services are subject to

failures and is fundamental their monitoring for the Grid utilization

– Local monitoring GridICE can be used to monitor your own farm (in

connection with a local server)

Page 4: GridICE: overview and current status

4

Enabling Grids for E-sciencE

Present deployment

EGEE EGEE-SWE RDIG EGEE-SEE Grid.it GILDA CMS ATLAS EUMedGrid

EUChinaGRID BalticGrid EELA BeGrid

• Version: v1.9.0-0 was released on Fri, 08 Sep 2006– The Grid.it server (gridce4@cnaf) has been already updated

• Installed servers are monitoring Grid resources in the scope of:

• The EGEE server (gridice2@cnaf) runs since July 2005

• The Grid.it server (gridce4@cnaf) runs since July 2005 without any major intervention and continue to perform very well

Page 5: GridICE: overview and current status

5

Enabling Grids for E-sciencE

How does it work

Generation

Distribution

Presentation

Pro

cessin

g

Sensors enquiring entities and encoding the measurements according to a schema

Transmission of the events from the source to any interested parties

Abstract the huge number of received events in order to enable the consumer to draw conclusions about the operation of the monitored system

e.g., filtering according to some predefined criteria, or summarising a

group of events

Page 6: GridICE: overview and current status

6

Enabling Grids for E-sciencE

Features

• powerful and complete web-based interface for data presentation

• each view of the web-based interface offers the same data in XML format

• support for customized graph generation

• notification service– Customizable monitoring of nodes

• automatic discovery of new resources to be monitored through the Grid Information Service

• complete set of monitored metrics, from host-related to Grid service related characteristics – supports and extends the GLUE Schema

• support for the following batch systems: OpenPBS, Torque, LSF • integrated with network-related infrastructure for monitoring the

connectivity of a Grid

Page 7: GridICE: overview and current status

7

Enabling Grids for E-sciencE

What we are monitoring

• Hardware monitoring:– fabric level monitoring via LEMON sensors

• Services monitoring:– For every grid node we check the related services (via standard GRIIS)– Monitoring of every process/daemon which has to run on nodes

• Job monitoring:– New “lightweight” job monitoring sensors (we are running at INFN-T1 with

no problems and with more than 3000 jobs R/Q)– Execution time reduced of the order of a factor ten compared with the

previous version– About 99% of jobs retrieved correctly

• LRMS monitoring (since GridICE 1.9.0 release):– LRMSinfo sensor as preliminary SLA support and basic site CPU usage

efficiency– No sensors on WNs (all needed information retrieved on the CE from batch

system)

Page 8: GridICE: overview and current status

8

Enabling Grids for E-sciencE

Fabric monitoring

Page 9: GridICE: overview and current status

9

Enabling Grids for E-sciencE

Job monitoring

Comparison between BOSS and GridICE jobs data (CMS production aggregate data from INFN-T1,INFN-Legnaro,INFN-Bari,INFN-Pisa)

Total number of jobs 5939 (3175 at INFN-T1)

Number of jobs not seen by GridICE 97 (55 at INFN-T1)

98.3% accuracy

Page 10: GridICE: overview and current status

10

Enabling Grids for E-sciencE

New features in release v1.9.0• Region/ROC support

– filter the resource by region– modify site/region binding

• Synchronization with GOCDB– Detailed info on site downtimes (foreseen, partial or global)

• LRMSInfo– a bunch of new charts available to have a view of resources

utilization• More options to retrieve jobs information (search by

global-ID, local user …)• New statistic plots with new look & feel

(ex: Grid Jobs vs. Local Jobs)• Chart Section Reorganized

– new menu to select single charts or per user role view• Clean Up DB History

– available a new script that help in deleting historical data from the DB (you should need to delete data older then a specific date/time)

Page 11: GridICE: overview and current status

11

Enabling Grids for E-sciencE

Different viewpoints

We focus on the following categories of

users:– VO manager

actual set of resources accessible to VO members “How many jobs submitted by my users are running or

queued?”

– Grid operator all resources under responsibility of a Grid Operator Center “How many resources are available?”

– Site administrator site resources offered to a Grid “Is there any service down?”

Page 12: GridICE: overview and current status

12

Enabling Grids for E-sciencE

Host View

Page 13: GridICE: overview and current status

13

Enabling Grids for E-sciencE

Host View - Details

Page 14: GridICE: overview and current status

14

Enabling Grids for E-sciencE

Job View

Page 15: GridICE: overview and current status

15

Enabling Grids for E-sciencE

Local monitoring

Page 16: GridICE: overview and current status

16

Enabling Grids for E-sciencE

GOC interfacing

Page 17: GridICE: overview and current status

17

Enabling Grids for E-sciencE

LRMSinfo

Page 18: GridICE: overview and current status

18

Enabling Grids for E-sciencE

Issues

• Queries lateness [end of the year]– We are working on database improvements (table partitioning, db schema

modification …)• LeMON 2.10.x [end of the year]

– We have planned to migrate to latest LeMON version as soon as possible• gLite 3.0 [end of October]

– Integration of job monitoring sensors is finished (we are testing them with italian ROC release team)

• Storage probes [end of October]– Grid transfer monitoring (DPM, CASTOR, dCache)– local transfer and access to file (RFIO,dcap; both authenticated and un-

authenticated versions )– Not yet ready for production. Need some more development and tests

• Advanced RB probe– Code is ready for gLite. We need some more time to integrate the info on the

GridICE collecting infrastructure• FTS monitoring

– Used at CNAF– Will be integrated in GridICE

• Group and VOMS roles monitoring– Will be available in new releases

Page 19: GridICE: overview and current status

19

Enabling Grids for E-sciencE

Conclusions

• We are able to provide a wide and easy to use Grid monitoring– Fabric level– Services monitoring– Job monitoring– Storage and FTS monitoring (shortly)

• We keep on working to improve:– Performances– Reliability– Design

• We are open to collect new requirements and support your monitoring needs

Page 20: GridICE: overview and current status

20

Enabling Grids for E-sciencE

References

GridICE Publications: [1] S. Andreozzi, N. De Bortoli, S. Fantinel, A. Ghiselli, G. L. Rubini, G. Tortone, M. C.

Vistoli GridICE: a monitoring service for Grid systems, Future Generation Computer System 21 (2005) 559–571

[2] C. Aiftimiei, S. Andreozzi, G. Cuscela, N. De Bortoli, G. Donvito, S. Fantinel, E. Fattibene, G. Misurelli, A. Pierro, G.L. Rubini, G.Tortone. GridICE: Requirements, Architecture and Experience of a Monitoring Tool for Grid Systems. In Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP2006), Mumbai, India. 13-17 February 2006.

[3] C. Aiftimiei, S. Andreozzi, G. Cuscela, N. De Bortoli, G. Donvito, S. Fantinel, E. Fattibene, G. Misurelli, A. Pierro, G.L. Rubini, G.Tortone. Flexible notification service for Grid monitoring events. In Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP2006), Mumbai, India. 13-17 February 2006.

[4] S. Andreozzi, A. Ciuffoletti, A. Ghiselli, C. Vistoli. Monitoring the Connectivity of a Grid. In Proceedings of the 2nd International Workshop on Middleware for Grid Computing (MGC 2004) in conjunction with the 5th ACM/IFIP/USENIX International Middleware Conference, Toronto, Canada, October 2004.

GridICE dissemination: http://gridice.forge.cnaf.infn.it

Page 21: GridICE: overview and current status

21

Enabling Grids for E-sciencE

Backup slides

Page 22: GridICE: overview and current status

22

Enabling Grids for E-sciencE

VO View

Use Case 3

VO manager

Detecting all Grid resources for the “alice” VO

Page 23: GridICE: overview and current status

23

Enabling Grids for E-sciencE

Job monitoring load

JM off JM on

Page 24: GridICE: overview and current status

24

Enabling Grids for E-sciencE

New charts selection