Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

24
Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006 GRID Infrastructure in Italy

description

GRID Infrastructure in Italy. Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006. Outline. The Italian Grid Production Infrastructure Operations and Organisation Support, Monitoring & Accounting Long term sustainability of a Grid Infrastructure in Italy. - PowerPoint PPT Presentation

Transcript of Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Page 1: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Federico Ruggieri – INFNon behalf of the Italian Grid

GGF16 – Athens 14 February 2006

GRID Infrastructure in Italy

Page 2: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Outline• The Italian Grid Production Infrastructure

• Operations and Organisation

• Support, Monitoring & Accounting

• Long term sustainability of a Grid Infrastructure in Italy.

• Conclusions

Page 3: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

The Italian Grid Production Infrastructure

about 40 Resource Centers

The grid resources can be accessed through central or VO-specific services (e.g. Resource Brokers)

28 sites are also part of the EGEE/LCG Grid infrastructure (and are registered in the central database of the Grid Operation Center)

the other 12 sites can be accessed through the Italian grid services only

http://grid-it.cnaf.infn.it

Page 4: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Organisation & Operations

Regional Operation

Centre(ROC)

CentralManagement

Team(CMT)

SiteManagers

SiteManagers

SiteManagers

SiteManagers

Common Tools

Page 5: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Grid.IT Production Grid: Operations Portal

• User documentation

• site managers documentation

• Software repository

• Monitoring

• Trouble tickets system

• Knowledge base

http://grid-it.cnaf.infn.it

Page 6: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

The Italian Regional Operation Centre (ROC)

• First level support: Italy– Geographically based, local front line support to Virtual Organization, Users

and Resources Centres– ROC team is organized in daily shifts: 2 people per shift, 2 shifts per day,

from Monday to Friday covering working hours (8.30-19.30). Experts on call– Shifters are both from CNAF and major Italian Centres.– Check list to be covered during the shift Log trouble tickets created, updated

and closed, problems on grid services and sites, monitor successful site certification

• check the actions of the previous shift and the downtime page• check the status of production grid services and the GRIS status of production CE

and SE.• check the status of the production sites using the Site Functional Tests report

– Periodic (every 15 days) phone conference (ROC teams and site managers)– ROC report to EGEE and leverage from activities of the Italian Production

Grid Central Management Team (CMT)• Second level support: EU

– Operation of the EU e-Infrastructure– Italian ROC guarantee the weekly shifts rotating among major EU Centres.

Page 7: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

The Central Management Team (CMT)

• Guarantees Release Distribution and Site Certification• The CMT is responsible of the certification: dynamically

checking the functionalities and configuration of a site services before including it in the Italian production grid.

• In particular the following checks are systematically done: – Information System data consistency.– Local jobs submission (LRMS).– Grid submission with Globus (globus-job-run).– Grid submission with the EGEE Resource Broker. – Replica Manager functionalities.

• To certify a site the CMT uses dedicated grid services located at CNAF - Bologna.

• Only certified sites are dynamically included in the production grid to guarantee the robustness of operations.

Page 8: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Grid Central Management Team • Site registration procedure

– Site managers of the candidate site contact the Italian ROC (Regional Operation Center) team, provide all the relevant information (contact points, etc , SLA) and sign a policy document for acceptance.

– The ROC team verifies the completeness of the information and then creates a new record for the site in the GOC database.

– Site is flagged as 'candidate‘ in the DB.– The site administrators of the candidate site fill the GOC database with

additional information and request the validation of the site to the ROC– The site status is set to 'uncertified'.

• Site Certification Procedure; the resource administrators of a site should:– Apply for the DTEAM and INFNGRID VO membership in order to be

able to submit test jobs to check the correctness of the local installation.– Ask the Central Management Team to perform acceptance tests before

including the new site in the Information System. If the acceptance tests are successful the site information will be published in the Information System. Then the site status is set to 'certified' in the GOC database.

– The site is included in the daily report and functional test of the production infrastructure

Page 9: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

INFNGRID-2.7.0: deployed services

FTSLFC

MyProxy RB (DGAS)VOMS

Gridice

BDII

HLR

INFNGRID 2.7

LCG 2.7

Page 10: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

INFN GRID 2.7.0• INFN – GRID 2.7.0 customizations to LCG 2.7.0:

– Support for the following VOs: • egrid, babar, zeus, biomed, magic, esr (managed via LDAP VO server);• libi, pamela, infngrid, cdf, gridit, compchem, planck, bio, enea, theophys, ingv, inaf,

virgo, argo  (managed via VOMS server);• euchina, eumed (optional and managed via VOMS server).

– DGAS (DataGrid Accounting System): • Patched WMS lcg2.1.73 on the Resource Broker to support DGAS • DGAS HLR (Home Location Register) server: it is responsible for keeping the

accounting information for both users and grid resources. – Network Monitor Element, interfaced with GridIce for data presentation.– Support for MPI jobs via home synchronization with scp and host based

authentication.– Special setup for WorkerNodes on private network:

• A local DNS can be run on a Apt+Kickstart server; the ComputingElement acts as gateway for the Worker Nodes.

• new profiles for Worker Nodes without AFS– Customized tools to install and use the grid:

• installation by a customized version of LCG yaim (ig-yaim)• support to interface ig-yaim with a Quattor installation; • UIPnP: a PlugAndPlay User Interface to access the grid as user of every Linux system

without RPMs.

Page 11: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

User, Operation and VO support

• The user support system provides a system for exchange of tickets between: – ROC on Duty and site managers

– Site managers and Central management team and viceversa

– Site manager and certification team during installation/upgrade

– Global Grid User Support (GGUS) to ROC.

• The Italian ROC, user support system is interfaced to the Global Grid User Support helpdesk application using web-services technologies

Page 12: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

The support system• Italian ROC ticketing system is built upon a suite of web based tools written

in PHP: Xhelp• The support system components are accessible form the main interface of

the deployment portal (grid-it.cnaf.infn.it) providing a SSO point of registration/identification certificate-based.

• The end-user can open a request, view and follow his/her own tickets and related replies;

• A supporter can view tickets assigned to his/her own groups, add responses and solutions, and change status/priority.

• Ticket manager can moreover work with FAQ and with ticket assignment (to other supporters and groups).

• While operating tickets, a side content is always available for all classes of users (related to their access level):– Site Functional Tests, – site downtimes calendaring system– file archive– net query tools– IRC applet, contextual questions and answers– reports from CMT daily shifts

Page 13: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006
Page 14: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Grid Monitoring

• The status of the Italian grid infrastructure is monitored using GridIce, a monitoring tool developed by INFN.– It is one of the monitoring tools used by EGEE– It is used to control

• the status of the submitting queues

• Process/daemons status in the services (RB, BDII)

• VO view: list of CE and SE available for a given VO an its status

• Job monitoring

Page 15: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Monitoring

Page 16: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

AccountingThe DataGrid Accounting System (DGAS) has been

developed within the EDG and EGEE project. It implements a resource usage metering and

economic accounting in a fully distributed grid environment

It is part of the InfnGrid middleware release and has been deployed on the Italian Grid Infrastructure

Grid computing resources and grid users are registered in appropriate servers, known as HLRs (Home Location Registers), which keep track of every submitted job. An arbitrary number of HLR servers can be used.

Page 17: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

DGAS HLR flow

Page 18: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006
Page 19: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Jobs per site (January, 15 – 31)

Total jobs =179.310

Page 20: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Italian MW activities

• MW components supported and released by INFN include– WMS: Workload Mangement Service (with EDG, LCG, EGEE) for

distributed scheduling and resource management in a Grid environment– Data Management Services

• COSTANZA: Virtual Db replication and Replicas Consistency Service (with Grid.it)

• SToRM: Storage Resource Management Service for Storage allocation and File pinning with SRM interface over Unix file systems (with ICTP)

– Portals and Grid User Interface: UI (with Datamat) , Genius Portal (with Nice)• With PDA and Cellular Phone interface

– VOMS: VO oriented Authentication/Authorization Service (with LCG, EGEE, Grid.it)

– GridICE: General Grid Monitoring Service (with LCG) – DGAS: Economy based Grid Accounting Service (with EGEE)– G-PBOX: VO oriented Policy enforcing framework (with Grid.it)– VO oriented User Support systems (with Grid.it) integrated with GGUS

• Important outcome of Grid.it from CNR and Uni-Pi– Parallel Programming Environments (Assist)

• All are made available with the general Open Source License of EGEE, supported and evolving towards SOA

Page 21: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Long term sustainability

• Italian Grid Infrastructure, originally based on INFN-GRID project, was then extended to other Scientific and Academic Institutions in the Grid.It project funded by the Italian Ministry of Education, University and Research (MIUR).

• New organisations needed to:– Support and operate the Italian Grid Infrastructure (IGI).

– Manage and support long term availability of middleware releases to favour industrial take-up (c-Omega).

Page 22: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

IGI – Italian Grid Infrastructure• Originally conceived to provide national coordination of the different pieces

of the national e-Infrastructure present in EGEE II. Recognized at EU level as Joint Research Unit, supported by MIUR. Focus on setting up and operate a common e-Infrastructure for the Italian Science, including main public resources providers: INFN, CNR, SPACI, ENEA, ICTP, INAF, INGV, Computing Centres, Regional Initiatives, etc.

• Provide a consistent/coordinated Italian strategy as a step towards the European Grid Organization (EGO) and an interface to:

– EU Grid infrastructure projects, eIRG and ESFRI

– International activities

• Support activities of a vast range of Scientific disciplines: Physics, Astrophysics, Biology, Health, Chemistry, Geophysics, Economy, Finance, and possible extensions to other sectors as Civil Protection, e-Learning, dissemination in Universities and secondary school.

• Agreement reached on the 23 Jan ’06.

Page 23: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

C-OMEGA• Main objective of c-OMEGA is to support the innovation and the

commercial exploitation process of grids in Italy.• Other Objectives are:

– Become the national reference organization, also for activities at EU and International level, aiming at developing, support, diffuse and exploit a platform of Open Source components derived by current Grid projects components and increasingly obeying to international standards

– Favor synergy between the Research and Academia and the industrial world, in particular SME, the public Services (Health, Administration..) etc.

– Support with formation and dissemination activities and pilot projects the early commercial adoption of grids to increase Italian and EU competitiveness.

• Partners involved: Public Research Institutions, Universities, Computing Consortia, Large end-user companies, International IT industries, National IT companies, SMEs, etc.

• Good chances to provide a foundation for Italian and EU middleware support and its industrial exploitation

• c-OMEGA, together with UK OMII is at the foundation of current EU OMII proposal.

Page 24: Federico Ruggieri – INFN on behalf of the Italian Grid GGF16 – Athens 14 February 2006

Conclusions• First generation of Grid services in LCG/EGEE, DEISA

production Grids are currently in use in Italy and Europe.• They are still evolving for more functionalities,

robustness and security.• Some needed services are still under test, development or

missing and we may still discover that other important functionalities are required by specific user communities.

• Together with National Initiatives EU needs to address long term sustainability of GRID Infrastructures.