PPS : Release and Interactions with ITR team

13
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks PPS : Release and Interactions with ITR team Author: Antonio Retico (SA1) Location: CERN (29-Jun-07)

description

PPS : Release and Interactions with ITR team. Author: Antonio Retico (SA1) Location: CERN (29-Jun-07). PPS in a nutshell. P re- P roduction S ervice for E G EE - PowerPoint PPT Presentation

Transcript of PPS : Release and Interactions with ITR team

Page 1: PPS : Release and Interactions                 with ITR team

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

PPS : Release and Interactions with ITR teamAuthor: Antonio Retico (SA1)

Location: CERN (29-Jun-07)

Page 2: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

PPS in a nutshell

• Pre-Production Service for EGEE• Mandate: “To give early access to new services in

order for WLCG/EGEE users to evaluate new features and changes in the release”

• PPS grid counts ~ 30 sites• Operations supported by the EGEE ROCs• Coordination done at CERN (resp. Nick Thackray)

•www.cern.ch/pps

Page 3: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

PPS “ + ”

• The quality of gLite profits of PPS:1. Testing of deployment procedures and software in real operational

conditions

2. Debugging of new functionality done by the applications/VOs

3. Feedback for early bug fix to the release before moving to production

• PPS is the “production” grid for Diligent VO– 6 sites in PPS are exclusively supporting Diligent– https://twiki.cern.ch/twiki/bin/view/DILIGENT/DiligentInfrastructurePps– The DILIGENT project aims to support a new research operational mode

by enabling the creation of on-demand digital libraries.

Page 4: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

release testing15%

operations43%

special activities37%

normal usage5%

PPS “ – ”

• Share of work spent “around” PPS (last three months):

• Standard Usage (5%): VOs use SW regularly released • Special Activities (37%): VOs test non-certified SW

– SRMv2 integration

– FIX for VoViews

– …

• Release Testing (15%): few selected sites do pre-deployment testing• Operations (43%): ~ 30 sites maintain a service running• High operation costs compared to poor (standard) usage by VOs

=> Revision of the mandate in study

Page 5: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

ITR and PPS

• ITR and PPS Coordination work together doing:

1. Release steering in EMT– priorities in release– actions on severity of bugs– fast-tracking of urgent fixes

2. Middleware releases– shared tools and procedures (see next slide)– https://twiki.cern.ch/twiki/bin/view/LCG/PPSReleaseProcedures

Page 6: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

ITR and PPS: Release Process

test report

Certification > PPS

PPS Coord pre-testing Team repository Mgr PPS SitesSA3

Kick-off

- Write Release Notes - Update repo@CERN

e-mail

Consistency check of repo@CERN

pre-deployment testing

(installation, configuration.

SAM)

Prepare release bulletin

release notes

apt status report

test report

Deploy?

- Publish Release Bulletin

Synchronize repo@CNAF

Y

Handle roll-back

N

e-mail

Upgrade

Broadcast

Open Bugs and update test reports

PPS > Production

PROD SitesPPS CoordSA3

tentative list

Create candidate list of patches

EMT meeting: finalize list of patches

meeting: Define list of “known issues”

final list

- Write Release Notes - Update Repository

final list

Verify status of associated bugs

final list

Prepare Broadcast

release notes

Upgrade

BROADCAST

Fri

Mon

Tue

Thu

Mon

Tue…3.5

weeks

later...

Wed

list of issues

Page 7: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Release Schedule: Theory

Page 8: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Release Schedule: Practice

• Estimation of work done in PPS (last three months):

• 3 releases per month in average

0

10

20

30

40

50

60

70

80

90

users

vo testing

operations

pre-dep

Standard

Releases

SRMv2VoViews

slc4-WN

Page 9: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Two words of explanation

• Theory and practice differ because of:

1. Fast-tracking (decided by EMT)1. Security patches

2. Important fixes

3. [optional] Fixes for new bugs introduced by “Important fixes”

2. Special activities– requested by VOs– based on installation of uncertified middleware– supported by a restricted number of sites– managed separately from the release process– e.g. Integration of SRMv2 pilot testbed in PPS

Page 10: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

ITR and PPS: Emoticons

• Friendly and constructive interactions• Solid, agreed and unambiguous procedures• High-quality release documentation• Significant improvements expected from new YAIM• PPS Site admins happy and reactive on releases• Frequent requests for fast-tracking patches

– overlapping releases and broken upgrade sequences

• Deployment task forces (e.g. WMS 3.1) – good for start-up but likely to forget operational aspects– (short) stage in PPS always recommended

• quick fixes => new bugs• still a lot of services “skip” PPS

Page 11: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

quick fixes => new bugs

• PPS maybe a buffer to production, but it is not a testbed: PPS is a grid and a service

• Recovering the service after a bad upgrade costs time (money) to more than 30 sites (not to count GD time)

• We do our best to protect PPS against accidents– APT mirroring, pre-deployment test

• But help from ITR is always appreciated– No fast-tracking unless really needed– Avoid grouping quick fixes and new features in a single patch– Mention known and possible issues with a patch in advance:

• “high-risk” patches eventually to be tried in “isolated compartments” in PPS

Page 12: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Services “skipping” PPS

• One valid argument: “no point to stay in PPS stage so long if nobody tries out the software there”

• True. PPS offers nevertheless some added value:– pre-deployment testing (release notes, instructions)– “ops” SAM testing

• Skipping basic testing proved to be dangerous (issues in production caused by trivial bugs)

• Work needs to be done here at process level• PPS openings (free-thinking for further development):

– middleware is service-oriented but updates to PPS are still handled as a single object

– The correlation between PPS and PROD release numbers is practical, but it is an arbitrary choice

– The same fixed stage in PPS for all the services is not strictly a requirement

– The really important thing is to keep the sequence in the updates for the same service

Page 13: PPS : Release and Interactions                 with ITR team

To change: View -> Header and Footer 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Questions?