11 March 2008 GridPP20: GridPP3 Project Management
Slide 3
What’s the project map for?
• To show us how well GridPP is delivering against requirements
• To report to the Oversight Committee on:– Areas where GridPP is doing well (or OK)
– Areas that need attention
• To report on staff posts
• GridPP is not in direct control of all metrics, but can aim to put pressure in areas where we see problems
• Will need to be complete for next OC – May?
11 March 2008 GridPP20: GridPP3 Project Management
Slide 4
From production…
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.100 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 0.109 0.110 0.111 0.112 0.113 0.114 0.115 0.116
0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.117 0.118 0.119 0.120 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.129 0.130 0.131 0.132 0.133
0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.134 0.135 0.136 0.137 0.138 0.139 0.140 0.141 0.142 0.143 0.144 0.145 0.146 0.1470.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68
2.1 3.1 4.1 5.1 6.1 1.1.1 1.1.2 1.1.3 1.1.4 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5
1.1.5 2.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 5.1.8 5.1.9 5.1.10 6.1.6 6.1.7 6.1.8 6.1.9
2.1.11 2.1.12 3.1.11 3.1.12 3.1.13 4.1.11 4.1.12 5.1.11 5.1.12
2.2 3.2 4.2 5.2 6.2 1.2.1 1.2.2 1.2.3 1.2.4 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5
1.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10 3.2.6 3.2.7 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.10
2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 4.2.11 4.2.12 4.2.13 4.2.14 4.2.15 5.2.11 5.2.12 5.2.13 5.2.14 5.2.15 6.2.11 6.2.12 6.2.13 6.2.14
2.3 3.3 4.3 6.3 1.3.1 1.3.2 1.3.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5
2.3.6 2.3.7 2.3.8 2.3.9 2.3.10 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10
2.3.11 3.3.11 3.3.12 3.3.13 4.3.11 4.3.12 4.3.13
2.4 3.4 4.4 6.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 6.4.1 6.4.2 6.4.3 6.4.4
2.4.6 2.4.7 2.4.8 2.4.9 2.4.10 3.4.6 3.4.7 3.4.8 3.4.9 3.4.10 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10
2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 3.4.11 3.4.12 3.4.13 3.4.14 3.4.15
2.5 3.5 90 Days2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5
2.5.6 2.5.7 2.5.8 2.5.9 2.5.10 3.5.6 3.5.7 3.5.8 3.5.9 Monitor OK 1.1.1 2.5.11 2.5.12 2.5.13 2.5.14 Monitor not OK 1.1.1 Milestone complete 1.1.1
2.6 3.6 Milestone overdue 1.1.1
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 3.6.1 3.6.2 3.6.3 3.6.4 3.6.5 Milestone due soon 1.1.1
2.6.6 2.6.7 2.6.8 2.6.9 2.6.10 3.6.6 3.6.7 3.6.8 3.6.9 3.6.10 Milestone not due soon 1.1.1
2.6.11 2.6.12 2.6.13 Item not Active 1.1.1
Other Link Network LHC Deployment
Project Planning
CMS
Portal
Status Date - 30/Jun/07 + next
UKQCD
Navigate downExternal link
PhenoGrid
LHC Apps
1.1
1.3
Security
InfoMon
Design
Service Challenges
Production Grid Milestones Production Grid Metrics
1LCG External
4M/S/N
5Non-LHC Apps Management
GridPP2 Goal: To develop and deploy a large scale production quality grid in the UK for the use of the Particle Physics community
2 3
Knowledge Transfer
LHCb
GANGA
ATLAS
InteroperabilitySamGrid
EngagementWorkload
6
1.2
Development
Dissemination
Project Execution
BaBarMetadata
Storage
Update
Clear
11 March 2008 GridPP20: GridPP3 Project Management
Slide 5
…to exploitation
1.1 1.2 1.3 1.4
2.1 3.1 4.1 5.1 6.1
2.2 3.2 4.2 5.2 6.2
2.3 3.3 4.3 6.3
2.4 3.4 4.4 6.4
2.5 0.1
Navigate downExternal linkLink to goals
3 4 5 6Tier-2 Management External
Outreach &
management
engagementNorthGrid
Resource delivery
Tier-1
London EGEE
National GridInfrastructure
transitionsupport
ScotGrid
Grid services
Middleware GridPP2+
Hardware procurement
Other experiments
Planning
SouthGrid Deployment
To provide UK computing for the Large Hadron ColliderGridPP3 Goal
Front end systems
LCG
LHCb
Operations
2
& tracking
ATLAS CMS
Storage systems
& deployment
Data and storage
Security
Network
11 March 2008 GridPP20: GridPP3 Project Management
Slide 6
Main features
• Led by experiments – key to delivering for LHC
• Tier-1 and Tier-2 areas– Aggregated per Tier-2
• Mainly metrics, with some deliverables
• Based around services delivered – especially meeting MoU commitments
• Includes section for GridPP2+
11 March 2008 GridPP20: GridPP3 Project Management
Slide 7
Milestones and metrics
ATLAS LHCb CMS 1.1 1.2 1.3 1.4
1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.4.1 1.4.2 1.4.3 1.4.4 1.4.51.1.6 1.1.7 1.1.8 1.1.9 1.1.10 1.2.6 1.2.7 1.2.8 1.2.9 1.2.10 1.3.6 1.4.6 1.4.7 1.4.8
1.1.11 1.1.12 1.1.13 1.1.14 1.1.15 1.2.11 1.2.121.1.16 1.1.17 1.1.18 1.1.19 1.1.20
2.1 3.1 4.1 5.1 6.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.52.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 6.1.6 6.1.7 6.1.8 6.1.9
2.1.11 2.1.12 2.1.13 2.1.14 3.1.11 3.1.12 3.1.13 3.1.14 3.1.15 4.1.11 4.1.123.1.16
2.2 3.2 4.2 5.2 6.2 2.2.1 2.2.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5
3.2.6 3.2.7 3.2.8 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.104.2.11 4.2.12 5.2.11 5.2.12 5.2.13 5.2.14 6.2.11 6.2.12 6.2.13 6.2.14 6.2.15
2.3 3.3 4.3 6.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.52.3.6 2.3.7 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 6.3.6 6.3.7 6.3.8 6.3.9
3.3.11 3.3.12 3.3.13 4.3.11 4.3.12
2.4 3.4 4.4 6.4 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.53.4.6 3.4.7 3.4.8 3.4.9 3.4.10 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10
3.4.11 4.4.11 4.4.12
2.5 0.1 2.5.1 2.5.2 2.5.3 2.5.4 0.1.1 0.1.2 0.1.3 0.1.4 0.1.5
0.1.6 0.1.7 0.1.8 0.1.9 0.1.10 Navigate down
0.1.11 0.1.12 0.1.13 External link
Link to goals
GridPP3 Goal
2
NGI
LCG
EGEE
To provide UK computing for the Large Hadron Collider
Grid services
Middleware support
ScotGrid
London
GridPP 2+
Network
Data and storage
Hardware procurement
Storage systems
Other experiments
Operations
Security
Front end systems
Resource delivery
NorthGrid
SouthGrid
Planning
Deployment
Outreach
Tier-2 Management ExternalTier-13 4 5 6
11 March 2008 GridPP20: GridPP3 Project Management
Slide 8
Example Experiment metrics
Number LHCb
1.2.1UK share of LHCb production computing
needs
1.2.2 MC production (generation) efficiency
1.2.3T1 MC production (reconstruction, stripping)
efficiency
1.2.4T1 MC/Event user analysis - UK share/
efficiency
1.2.5 T2 data transfer - T2->RAL
1.2.6 T2 data transfer- T2->others (failover?)
1.2.7 T1 data transfer - Incoming
1.2.8 T1 data transfer - Outgoing
1.2.9 T1 data storage : Tape
1.2.10 T1 data storage : Disk
1.2.11 LHCb SAM tests uptime T1
1.2.12 LHCb SAM tests uptime T2
Number ATLAS
1.101 Tier 1 - Available jobs slots for reconstruction
1.102 Tier 2 - Available job slots for group analysis
1.103 Tier 1 - Available job slots for MC production
1.104 Tier 1 - Job success rates in batch system
1.105Tier 1 - Available storage in usable service
classes
1.106Tier 1 - Data reading rates from storage system
to batch farm
1.107Tier 1 - Rates of data movement from tape to
disk for reprocessing.
1.108 Tier 1 - Data availability in storage system.
1.109Tier -1 Data loss per quarter (when not
recoverable)
1.110Tier 1 - Data acceptance from CERN, Tier 1s,
Tier 2s
1.111 Tier 1- MoU service levels
1.112 Tier 2 - Data acceptance from Tier 1
1.113 Tier 2 - Available simulation slots
1.114 Tier 2 - Available analysis slots
11 March 2008 GridPP20: GridPP3 Project Management
Slide 9
Overall operations metrics
Number Title
2.1.1 Fraction of UK sites in Production
2.1.2 Number of supported VOs
2.1.3 Fraction of kSI2k used
2.1.4 GridPP kSI2K Available
2.1.5 GridPP disk storage available
2.1.6 Job failure rates
2.1.7 UK contribution to LHC experiments
2.1.8 UK contribution to non-LHC experiments
2.1.9 Deployment team meetings
2.1.10 UK wide deployment support active
2.1.11 GridPP deployment web-pages up-to-date
2.1.12 Training needs addressed
2.1.13 GridPP helpdesk functioning adequately
2.1.14 Number of sites on VO blacklists
11 March 2008 GridPP20: GridPP3 Project Management
Slide 10
Tier-1 metrics – examples
Number Resource delivery
3.2.1 Tier-1 KSI2K Available to EGEE/LCG
3.2.2 Tier-1 delivering to LCG MoU
3.2.3Fraction of available T1 KSI2K used in
quarter
3.2.4Fraction of available T1 KSI2K used in
quarter
3.2.5 UB schedule implemented and upheld
3.2.6 Time on VO blacklists
3.2.7 Respond to tickets within required time
3.2.8 Job efficiencies
Number Hardware procurement
3.1.1 Disk tender started
3.1.2 Disk delivered
3.1.3 Disk available and in production as per plan
3.1.4 Tape tender started
3.1.5 Tape delivered
3.1.6 Tape available and in production as per plan
3.1.7 CPU tender started
3.1.8 CPU delivered
3.1.9 CPU available and in production as per plan
3.1.10 New machine room migration plan available
3.1.11 New machine room - migration complete
3.1.12New machine room available to accept
hardware
3.1.13 Network upgraded
•Services•Storage
11 March 2008 GridPP20: GridPP3 Project Management
Slide 11
Tier-2 metrics
Number Title
4.x.1 % of promised (by that time) disk available
4.x.2 % of promised (by that time) CPU available
4.x.3Average SAM (SLL page) availability performance over
the last quarter
4.x.4Average SAM (SLL page) reliability performance over the
last quarter
4.x.5 Average SLL ATLAS test performance?
4.x.6 Average SLL disk test performance ?
4.x.7 Amount of CPU delivered
4.x.8 Number of TB of disk used
4.x.9 Number of technical meetings held
4.x.10 Number of management meetings held
4.x.11 Tier-2 delivering to LCG MoU
4.x.12 Quarterly operational performance review
11 March 2008 GridPP20: GridPP3 Project Management
Slide 12
Risk registerID Name
Li Im Risk Li Im Risk Li Im Risk Li Im Risk Li Im RiskR1 Recruitment/retention difficulties 2 2 4 2 2 4 2 2 4 2 2 4R2 Sudden loss of key staff 1 3 3 1 3 3 1 3 3 1 4 4R3 Minimal Contingency 4 2 8R4 GridPP deliverables late 1 3 3 2 3 6 2 2 4R5 Sub-components not delivered to project 1 2 2 2 3 6 3 3 9 2 3 6R6 Non take-up of project results 2 1 2 1 4 4 2 2 4 1 4 4R7 Change in project scope 1 1 1 2 2 4R8 Bad publicity 1 3 3 1 3 3 1 3 3 2 3 6R9 External OS dependence 3 1 3R10 External middleware dependence 4 2 8 1 4 4 3 2 6 2 2 4R11 Lack of monitoring of staff 1 2 2 2 2 4 2 2 4 1 3 3R12 Withdrawal of an experiment 2 3 6 1 4 4R13 Lack of cooperation between Tier centres 2 2 4 1 3 3R14 Scalablity problems 1 2 2 2 2 4R15 Software maintainability problems 2 2 4 2 3 6 4 3 12 1 4 4R16 Technology shifts 1 2 2 2 3 6 2 3 6R17 Repitition of research 3 2 6R18 Lack of funding to meet LCG PH-1 goals 4 1 4R20 Conflicting software requirements 3 2 6 2 3 6R22 Hardware resources inadequate 2 3 6 2 3 6 2 3 6R25 Hardware procurement problems 2 2 4 2 3 6R26 LAN Bottlenecks 1 3 3R27 Tier-2 organisation fails 2 2 4
R28 Experiment Requirements not met 2 3 6R29 SYSMAN effort inadequate 2 3 6R30 Firewalls interfere with Grid 2 3 6R31 Inablility to establish trust relationshipsR32 Security inadequate to operate Grid 2 3 6R33 Interoperability 2 3 6R35 Failure of international cooperation 2 1 2R36 e-Science and GridPP divergence 2 3 6R37 Institutes do not embrace Grid 2 2 4R38 Grid does not work as required 4 2 8 4 2 8R39 Delay of the LHC 2 2 4R40 Lack of future funding 2 3 6 3 3 9 4 3 12R41 Network backbone failure 0 4 1R42 Network backbone bottleneck 2 2 4R43 Network backbone upgrade delay 1 4 4R44 Inadequate User Support 2 3 6
Pro. GridGridPP LCG MSN Apps
11 March 2008 GridPP20: GridPP3 Project Management
Slide 13
Reporting
Project Manager
User BoardChair
Tier-1Manager
ProductionManager
Technical director
ATLAS
LHCb
CMS Tier-1Staff
Tier-2 Coordinators
Tier-2HardwareSupportPosts
Storage
Data
Info.Mon.
WLMS
Security
Network
CBPMB
OC
Portal
Other expts
User support
Expt.support
11 March 2008 GridPP20: GridPP3 Project Management
Slide 14
Quarterly reports
• Produced by manager in each area• Reporting on progress in the quarter, including:
– Effort figures– Resources delivered– Service levels– Metrics and milestones– Issues arising
• Expected 1 month after the end of each quarter
Top Related