The Grid Observatory: goals and challenges
-
Upload
kim-spears -
Category
Documents
-
view
32 -
download
0
description
Transcript of The Grid Observatory: goals and challenges
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
The Grid Observatory: goals and The Grid Observatory: goals and challengeschallenges
C. Germain-Renaud (CNRS/LRI & LAL)
EGEE’07 Conference
Budapest, Hungary
1-5 October 2007
Application Track - Grid Observatory 2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Overview
• NA4 cluster in EGEE-III proposal
• Integrate the collection of data on the behaviour of the EGEE grid and users with the development of models and of an ontology for the domain knowledge
Application Track - Grid Observatory 3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Some immediate questions
• Ressource allocation– Performance of the gLite scheduling hierarchy– Published waiting time– Reactive grids – Everybody's grid
• Dimensioning– Patterns and trends in requests and usage– Anticipate peaks
• On-line fault management– Detection– Diagnosis– Prevention
Application Track - Grid Observatory 4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
The big picture
• Considering current technologies, we expect that the total number of device administrators will exceed 220 millions by 2010 – Gartner June 2001
• No more Moore’s Law free lunch: much more complex software & applications
• The Virtual Organization concept creates common goods
Application Track - Grid Observatory 5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Autonomic Computing
Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003
– Self-*: configuration, optimization, healing, protection– Of open non steady state dynamic systems
Application Track - Grid Observatory 6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Autonomic Computing
Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003
– Self-*: configuration, optimization, healing, protection– Of open non steady state dynamic systems– Academic and industry involved
Application Track - Grid Observatory 7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Autonomic Grids
• Statistical analysis• Data mining• Machine learning
monitor
analy
ze
pla
n
execute
knowledge
DATA REQUIRED
Application Track - Grid Observatory 8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Collection and Publication
• Acquisition, consolidation, long-term conservation of traces of EGEE activities – Permanent storage of reliable, exhaustive, filtered information– Exhaustive: added value in snapshots of the inputs and grid
state e.g. workload and available services during a relevant time range
– Filtered: from operational to structured
No join !L&B schema
Application Track - Grid Observatory 9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Collection and Publication
• Acquisition, consolidation, long-term conservation of traces of EGEE activities – Permanent storage of reliable, exhaustive, filtered information:
from operational to structured– No monitoring development: rich ecosystem of sources, with
very different scopes, deployment and institutional status– Centralized
• CIC tools (GOCDB, SAM, SFT,…),• core gLite (L&B, BDII,…)• sites (Maui/PBS logs)• gLite integrators (R-GMA, Job
Provenance)• experience integrators
(DashBoard)• external software (MonaLisa)
Application Track - Grid Observatory 10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Collection and Publication
• Acquisition, consolidation, long-term conservation of traces of EGEE activities – Permanent storage of reliable, exhaustive, filtered information:
from operational to structured– No monitoring development: rich ecosystem of sources, with
very different scopes, deployment and institutional status
• The major challenge is exhaustive– Some data are outside the scope: external traffic on shared
resources– Inside the scope, we need snapshots of the grid state and inputs – Privacy related legal constraints– Scientific usage will help– Interaction with EGI– Long-term: privacy-preserving data mining
Application Track - Grid Observatory 11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Collection and Publication
• Publication service: navigation and querying – Integration of independent sources– Indexing along the needs of the users communities
Scheduling: ongoing work with CoreGrid Jobs: ongoing work with KDUbik
• Ontology– The Glue Information Model: an ontology of the
resources– Concepts for the grid dynamics e.g. job lifecycle or
users relations– Expert concepts as prior knowledge of non-trivial
correlations: workflows, failure modes,…
Resource
Job
Application Track - Grid Observatory 12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Models
• Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality– Likely to be similar to IP traffic: many short, and a significant
number of long, at all scales– Long range dependencies
Application Track - Grid Observatory 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Models
• Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality– Likely to be similar to IP traffic: many short, and a significant number
of long, at all scales– Long range dependencies
• Characterizations of middleware-dependant metrics e.g. queuing delays, overhead, SE load
Application Track - Grid Observatory 14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Models
• Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality– Likely to be similar to IP traffic: many short, and a significant number of long, at all
scales– Long range dependencies
• Characterizations of middleware-dependant metrics e.g. queuing delays, SE load
• Inference of models for middleware components and applications, users and usage profiles, users interactions
Application Track - Grid Observatory 15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Autonomic dependability
• On-line failure detection and anticipation• Passive vs Active probing : a lot of information
is available from user work• Black-box
– On-line statistics from « similar » actions (executions, data access, middleware modules)
Application Track - Grid Observatory 16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Evaluation
• Assessing performance at the grid scale is a challenge– Need a snapshot of the inputs and grid state e.g.
workload and available services during a relevant time range
– Classical optimization does not scale– Advanced optimization: anytime algorithms
Application Track - Grid Observatory 17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Abrupt changepoint detection
• Page-Hinckley statistics
• Time-sequential version of Wald’s statistics – also known as CUSUM
• « intelligent threshold » test which minimizes the expected time before a change detection for a fixed false positive rate
• Routine in quality control, clinical trials
VO software bug
Blackhole
Application Track - Grid Observatory 18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Autonomic dependability
• On-line failure detection and anticipation• Passive vs Active probing : a lot of information
is available from user work• Black-box
– On-line statistics from « similar » actions (executions, data access, middleware modules)
• Supervised and unsupervised learning
Application Track - Grid Observatory 19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Mining the L&B logs
Constructive induction
Double clustering
Application Track - Grid Observatory 20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Autonomic dependability
• On-line failure detection and anticipation• Passive vs Active probing : a lot of information
is available from user work• Black-box
– On-line statistics from « similar » actions (executions, data access, middleware modules)
• Supervised and unsupervised learning• Active probing
– Adaptive on-line test selection for best coverage of possibly faulty components
– Experience planning
Application Track - Grid Observatory 21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Goals & Challenges
• Contributions to a quantitative approach of grid middleware and architecture, in the RISC sense
• Operational impacts on EGEE: evaluation, autonomic dependability
• Basic research in autonomic computing• Collaboration between EGEE and national research
initiatives and other UE projects: DEMAIN, PASCAL KD-Ubiq, CoreGrid, and hopefully more
• Adequate tradeoff between productivity and sustainability