Technical Status

21
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Steven Newhouse Technical Director CERN EGEE-III First Review, 24-25 June, 2009 Technical Status

description

Technical Status. Steven Newhouse Technical Director CERN EGEE-III First Review, 24-25 June, 2009. Project Overview. 17000 users 139,000 LCPUs (cores) 25Pb disk 39Pb tape 12 million jobs/month +45% in a year 268 sites +5% in a year 48 countries +10% in a year 162 VOs - PowerPoint PPT Presentation

Transcript of Technical Status

Page 1: Technical Status

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Steven NewhouseTechnical DirectorCERN

EGEE-III First Review, 24-25 June, 2009

Technical Status

Page 2: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Project Overview17000 users139,000 LCPUs (cores)25Pb disk39Pb tape

12 million jobs/month+45% in a year

268 sites+5% in a year

48 countries+10% in a year

162 VOs+29% in a year

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 2

Page 3: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

So what does EGEE actually do?• Builds and supports user communities on the grid

• Integrates and provides a worldwide infrastructure

• Collaboration and Technical Leadership worldwide

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 3

Training

Application

Porting

User Supp

ort

SoftwareDevelop

ment

Integration,

Test & Certificat

ion

Deployment

Operations

Collaborating

Projects

Standards

Policy

Page 4: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Supporting New Communities• Winter and Summer Grid Schools for the community

– gLite, UNICORE, Globus, GridSAM, Condor, OGSA-DAI, ... • Regionally Driven Training Events

– 101 training events at 56 locations in 29 countries– 1424 unique participants attending 4431 training days– High satisfaction: 5.1/6.0

• Application Porting Support– 15 applications ported, 10 currently underway

• Recommended External Software for EGEE CommuniTies– Public criteria and assessment process for entry into RESPECT

Software that builds on gLite and supported by the community– 11 programs covering: Simplified access, Workload management,

New Resources, Infrastructure Services

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 4

Page 5: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Supporting Science

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 5

• Archeology• Astronomy• Astrophysics• Civil Protection• Comp. Chemistry• Earth Sciences• Finance• Fusion• Geophysics• High Energy Physics• Life Sciences• Multimedia• Material Sciences

Changes in Resource Utilisation

Number of jobs x2 over the periodProportion of HEP usage ~77%

High

Ene

rgy

Phys

ics

Fusi

on

Earth

Sci

ence

Com

puta

tiona

l Che

mis

try

Life

Sci

ence

s

Com

pute

r Sci

ence

& M

ath-

emat

ics

Astro

nom

y, A

stro

phyi

cs &

As

tro P

artic

le P

hysi

cs Oth

er

0

10

20

30

40

50

60

Registered ApplicationsEnd-user activity• 13,000 end-users in 112 VOs

• +44% users in a year• 23 core VOs

• A core VO has >10% of usage within its science cluster

Other Areas

Fusion

Earth Science

Astronomy & Astrophysics

Multidisciplinary

Life Sciences

Computational Chemistry

0 1 2 3 4 5 6 7

March 2008 to February 2009 (%) March 2007 to February 2008 (%)

Page 6: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Physical Resources

Connecting Users to Resources

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 6

Computers Disks Tape

Middleware

Applications

Page 7: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

EGEE Maintained Components External Components

gLite Middleware

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 7

Physical Resources

General Services

LHC FileCatalogue

HydraWorkload

Management Service

File TransferService

Logging &Book keeping

Service

AMGA

Storage Element

Disk Pool Manager

dCache

Information S

ervices

BDII

MON

User InterfaceUser Access

SecurityServices

Virtual Organisation Membership

Service

Authz. Service

SCAS

Proxy Server

LCAS & LCMAPS

Compute Element

CREAM LCG-CE

gLExec

BLAH

Worker Node

User Interface

Page 8: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

New gLite Releases• Incremental delivery of functionality

– gLite 3.1: 22 updates across all node types– gLite 3.2: Releases for the Worker Node and User Interface– Ability to roll back when an issue is found with a release

• Focus on maintenance to improve reliability & stability– Improvement of multi-platform support– Incremental introduction of IPv6 support

• Introduction of CREAM to replace the LCG-CE– Provides ‘next generation’ CE with increased capability

• Implementation of an Authorization Service (Argus)– Consistent framework for site, region, VO & grid authorization– Initial rollout planned during EGEE-III for site level functionality

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 8

Page 9: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Infrastructure Operations• Improved Reliability and Availability

– Introduction of local site monitoring– Now a larger infrastructure with fewer staff– Figures reflect software & hardware issues

Weighted by site size within a region Summed across all regions Fire at AGSC took whole data centre out!

• Deployment of seed resources– Bootstrapping new user communities– Distributed across 4 sites

257 cores and 27 TB of disk space

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 9

May-08

Jul-0

8

Sep-08

Nov-08

Jan-0

9

Mar-09

May-09

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

WeightedRegionalised Average

Availability & Reliability

Average Availability

Average Reliability

Linear (Average Reliability)

Page 10: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Network Support

• End to end support for networking issues• Integrating network monitoring tools into support portal• Progress continues on porting to IPv6 through testbed• Design and implementation of the LHC Optical Private

Network operational model

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 10

ENOC ensuring E2E connectivity for Grid sites on the whole path

GÉANT2NREN ARC 1

Grid site 1 NREN BRC 2

• Grid site 2

Operated by DANTEOperated by NOC of NREN A

Operated by NOC of NREN B

Operated by NOC of RC2

Operated by NOC of RC1

Page 11: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Interoperation & Interoperability• End-user driven relationships

– Federation of Open Science Grid & Nordic Data Grid Facility– Workload Management System: Submit to ARC in NDGF

Actively used by the CMS experiment• Production Grid Infrastructures

– Build on experience of ARC, UNICORE and gLite– Work within the Open Grid Forum for next generation

specification for job submission• Nationally

– Interaction with collaborating e-Infrastructures– Interaction with national software deployments

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 11

Page 12: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Community Engagement• EGEE’08, Istanbul

– 529 participants from 47 countries• EGEE 4th User Forum, Catania

– Joint event with OGF 25 & OGF-Europe– 18 demos– 37 posters– 101 oral presentations

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 12

Page 13: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Collaboration• Collaborating Projects

– 17 active projects with 6 completed during the first period– 15 letters of support have been signed– Memorandum of Understanding to formalise collaboration

Infrastructure: EDGeS, BalticGrid-II, SEE-GRID-SCI, Kazakh-British Technical University (Kazakhstan Grid)

General: OGF-Europe & GENESI-DR Drafts: EELA-2 & RESERVOIR

• Bridging between e-Infrastructures– Application level use of EGEE & DEISA resources– Demonstrated with the EUFORIA project using Kepler (workflow)– 9 applications ported by Fusion cluster to EGEE

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 13

Page 14: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Policy• European e-Infrastructure Forum (EEF)

– Purpose: “… discussion of principles and practices to create synergies for distributed Infrastructures”

– Membership: EGEE, DEISA, GEANT, PRACE, EGI, Terena– Meeting quarterly for 2-3 hours

• Infrastructure Policy Group (IPG)– Purpose: “meeting of the major worldwide e-infrastructure

projects”– Membership: EGEE, DEISA, TeraGrid, OSG, NAREGI– Meeting at OGF for 2-3 hours.– Recent topics: Alignment of security, accounting & resource

allocation policies– Further details: http://www.ogf.org/IPG

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 14

Page 15: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Standards: Open Grid Forum• Organizational roles

– Strategic Management: Board– Area Directors: Applications, Data & Security

• Key Technical Leadership– GLUE Working Group

GLUE 2.0 specification Complete.– Production Grid Infrastructure Working Group

Evolution of BES 1.0 and JSDL 1.0 specifications– Grid Storage Resource Management Working Group

Revisions of the SRM specification to track production usage• Strong relationship with OGF-Europe

– EGEE UF4, CloudScape Workshop, Business Outreach

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 15

Page 16: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Grids and Clouds• Analysis of the Cloud in the context of EGEE Grids

– “An EGEE Comparative study: Grids and Clouds – evolution or revolution?”

• Long term can envisage several scenarios:– Provision of VO specific virtualised Worker Nodes– Virtualise Worker Nodes for scale out to the cloud– Completely virtual EGEE site

• RESERVOIR collaboration to explore issues– Draft MoU

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 16

Public/Private Cloud Provider

Site Services

National Infrastructure Services

Site Services

Worker Nodes

Worker Nodes

Virtual Machine Infrastructure

Public/Private Cloud Provider

Worker Nodes

Virtual Machine Infrastructure

Page 17: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Technical Coordination• Technical Management Board meets regularly

– Representation from all the stakeholders within the project– Working groups

MPI: Investigates deployment issues relating to MPI uptake CREAM: Development of certification and deployment plans

• Security Coordination Group– Integrates various security functions– OSCT: Security service challenge– JSPG: Policy for VOs, Portals, ...– MWSG: Meetings with:

UNICORE OSG

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 17

Page 18: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

European Grid Initiative• EGEE has been engaging with EGI Design Study

– DNA1.4 collected responses from EGEE and related projects– All Activity meeting in January 2009 highlighted several issues

• Most of the engagement to date has been managerial– Project office and activity/task leaders– Experts participating in EGI_DS Task Forces– Migrating to the EGI model is the main objective for Year II

Specific presentation on Thursday morning– Technical understanding of EGI model continues

What does it mean for middleware integration & deployment? How does the operational model need to change with ~40 NGIs? There are many open questions.... some of them critical!

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 18

Page 19: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Risks Avoided• Partner(s) fail to complete their task• Mis-alignment of strategy and implementation with

collaborating infrastructure projects• Dissemination of incorrect information• Failure to attract suitable trainers• Resource congestion due to LHC startup• Inadequate support for third party components• Grid operations remains a labour intensive task• Malicious attacks on the grid infrastructure or tools• Unannounced network availability• Slow standardisation and industry uptake• Delays in the development roadmap* From EGEE-III DoW, Section 3.2.3, Table 11-13, Pages 216+

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 19

Page 20: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Risks Encountered• Failure to provide required functionality to application

community– MPI support continues to be an issue and is being followed up

through the TMB• Low business uptake of gLite

– Standalone adoption by business is slow, but plenty of engagement by companies in the support of research projects

• Failure to implement EGI transition while maintaining production service– EGI structures represent mostly an evolution from EGEE– EGI risks and timeline addressed in Year II plans presentation

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 20

Page 21: Technical Status

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Summary• User community and usage continues to grow

– Diversity of supported application communities increases– Training and technical support for new and existing users

• Incremental middleware releases through gLite– Primary focus in EGEE-III is on support & maintenance– Stabilisation provides a platform for other groups

• Delivery of leading world class e-infrastructure– Incremental growth of the physical infrastructure– Availability and reliability continues to improve

• Leadership & Collaboration in Europe and Worldwide– Technical within the OGF and collaborating projects– Policy interactions through EEF, IPG, and other bodies

• Transition to EGI provides many challenges for year II

Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 21