PPS : Release and Interactions with ITR team
-
Upload
speranza-arkins -
Category
Documents
-
view
27 -
download
5
description
Transcript of PPS : Release and Interactions with ITR team
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
PPS : Release and Interactions with ITR teamAuthor: Antonio Retico (SA1)
Location: CERN (29-Jun-07)
To change: View -> Header and Footer 2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
PPS in a nutshell
• Pre-Production Service for EGEE• Mandate: “To give early access to new services in
order for WLCG/EGEE users to evaluate new features and changes in the release”
• PPS grid counts ~ 30 sites• Operations supported by the EGEE ROCs• Coordination done at CERN (resp. Nick Thackray)
•www.cern.ch/pps
To change: View -> Header and Footer 3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
PPS “ + ”
• The quality of gLite profits of PPS:1. Testing of deployment procedures and software in real operational
conditions
2. Debugging of new functionality done by the applications/VOs
3. Feedback for early bug fix to the release before moving to production
• PPS is the “production” grid for Diligent VO– 6 sites in PPS are exclusively supporting Diligent– https://twiki.cern.ch/twiki/bin/view/DILIGENT/DiligentInfrastructurePps– The DILIGENT project aims to support a new research operational mode
by enabling the creation of on-demand digital libraries.
To change: View -> Header and Footer 4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
release testing15%
operations43%
special activities37%
normal usage5%
PPS “ – ”
• Share of work spent “around” PPS (last three months):
• Standard Usage (5%): VOs use SW regularly released • Special Activities (37%): VOs test non-certified SW
– SRMv2 integration
– FIX for VoViews
– …
• Release Testing (15%): few selected sites do pre-deployment testing• Operations (43%): ~ 30 sites maintain a service running• High operation costs compared to poor (standard) usage by VOs
=> Revision of the mandate in study
To change: View -> Header and Footer 5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ITR and PPS
• ITR and PPS Coordination work together doing:
1. Release steering in EMT– priorities in release– actions on severity of bugs– fast-tracking of urgent fixes
2. Middleware releases– shared tools and procedures (see next slide)– https://twiki.cern.ch/twiki/bin/view/LCG/PPSReleaseProcedures
To change: View -> Header and Footer 6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ITR and PPS: Release Process
test report
Certification > PPS
PPS Coord pre-testing Team repository Mgr PPS SitesSA3
Kick-off
- Write Release Notes - Update repo@CERN
Consistency check of repo@CERN
pre-deployment testing
(installation, configuration.
SAM)
Prepare release bulletin
release notes
apt status report
test report
Deploy?
- Publish Release Bulletin
Synchronize repo@CNAF
Y
Handle roll-back
N
Upgrade
Broadcast
Open Bugs and update test reports
PPS > Production
PROD SitesPPS CoordSA3
tentative list
Create candidate list of patches
EMT meeting: finalize list of patches
meeting: Define list of “known issues”
final list
- Write Release Notes - Update Repository
final list
Verify status of associated bugs
final list
Prepare Broadcast
release notes
Upgrade
BROADCAST
Fri
Mon
Tue
Thu
Mon
Tue…3.5
weeks
later...
Wed
list of issues
To change: View -> Header and Footer 7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Release Schedule: Theory
To change: View -> Header and Footer 8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Release Schedule: Practice
• Estimation of work done in PPS (last three months):
• 3 releases per month in average
0
10
20
30
40
50
60
70
80
90
users
vo testing
operations
pre-dep
Standard
Releases
SRMv2VoViews
slc4-WN
To change: View -> Header and Footer 9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Two words of explanation
• Theory and practice differ because of:
1. Fast-tracking (decided by EMT)1. Security patches
2. Important fixes
3. [optional] Fixes for new bugs introduced by “Important fixes”
2. Special activities– requested by VOs– based on installation of uncertified middleware– supported by a restricted number of sites– managed separately from the release process– e.g. Integration of SRMv2 pilot testbed in PPS
To change: View -> Header and Footer 10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ITR and PPS: Emoticons
• Friendly and constructive interactions• Solid, agreed and unambiguous procedures• High-quality release documentation• Significant improvements expected from new YAIM• PPS Site admins happy and reactive on releases• Frequent requests for fast-tracking patches
– overlapping releases and broken upgrade sequences
• Deployment task forces (e.g. WMS 3.1) – good for start-up but likely to forget operational aspects– (short) stage in PPS always recommended
• quick fixes => new bugs• still a lot of services “skip” PPS
To change: View -> Header and Footer 11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
quick fixes => new bugs
• PPS maybe a buffer to production, but it is not a testbed: PPS is a grid and a service
• Recovering the service after a bad upgrade costs time (money) to more than 30 sites (not to count GD time)
• We do our best to protect PPS against accidents– APT mirroring, pre-deployment test
• But help from ITR is always appreciated– No fast-tracking unless really needed– Avoid grouping quick fixes and new features in a single patch– Mention known and possible issues with a patch in advance:
• “high-risk” patches eventually to be tried in “isolated compartments” in PPS
To change: View -> Header and Footer 12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Services “skipping” PPS
• One valid argument: “no point to stay in PPS stage so long if nobody tries out the software there”
• True. PPS offers nevertheless some added value:– pre-deployment testing (release notes, instructions)– “ops” SAM testing
• Skipping basic testing proved to be dangerous (issues in production caused by trivial bugs)
• Work needs to be done here at process level• PPS openings (free-thinking for further development):
– middleware is service-oriented but updates to PPS are still handled as a single object
– The correlation between PPS and PROD release numbers is practical, but it is an arbitrary choice
– The same fixed stage in PPS for all the services is not strictly a requirement
– The really important thing is to keep the sequence in the updates for the same service
To change: View -> Header and Footer 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Questions?