Françoise André IRISA – Prof. University of Rennes 1 Jérémy Buisson IRISA – INSA of Rennes

32
Dynamic adaptability Phenix workshop on self-healing and fault tolerant systems December 7-8, 2006 – IRISA, Rennes Françoise André IRISA – Prof. University of Rennes 1 Jérémy Buisson IRISA – INSA of Rennes

description

Dynamic adaptability Phenix workshop on self-healing and fault tolerant systems December 7-8, 2006 – IRISA, Rennes. Françoise André IRISA – Prof. University of Rennes 1 Jérémy Buisson IRISA – INSA of Rennes. Outline. Dynamic adaptability Dynaco: generic framework for adaptability - PowerPoint PPT Presentation

Transcript of Françoise André IRISA – Prof. University of Rennes 1 Jérémy Buisson IRISA – INSA of Rennes

Dynamic adaptabilityPhenix workshop on self-healing and fault tolerant systems

December 7-8, 2006 – IRISA, Rennes

Françoise AndréIRISA – Prof. University of Rennes 1

Jérémy BuissonIRISA – INSA of Rennes

Dynamic adaptability 2

Outline

Dynamic adaptabilityDynaco: generic framework for

adaptabilityAfpac: tool for the adaptation of SPMD

codesEvaluationsConclusion and future works

Dynamic adaptability 3

Outline

Dynamic adaptabilityDynaco: generic framework for

adaptabilityAfpac: tool for the adaptation of SPMD

codesEvaluationsConclusion and future works

Dynamic adaptability 4

Adaptability

A functionality of applicationsAbility to modify itself (reconfigure) at runtime (dynamically)

according to its execution environment

Some synonyms for “adaptability”Autonomous computing, autonomic computing

• More or less adaptability• Sometimes structured as provided functionalities, such as self-

healing, self-optimization, …Adaptivity, autonomicity

Other similar areaApplication steering

• More or less adaptability triggered by users

Dynamic adaptability 5

Need for adaptability

When resources vary in the execution environmentSome resources may appearSome resources may disappearPossible causes

• Faults; administrative tasks; resource sharing among users When an application have several configurations that use

resources differentlyDifferent possible algorithmsSome parameters that can be tuned

Adaptability ensures that the application continuously executes the “best” configurationAccording to the actual execution environment

Dynamic adaptability 6

Overall goal

Benefit from appearing resourcesTerminate sooner

Support disappearing resourcesAvoid expected

crashes

Dynamic adaptability 7

Adaptability in the PARIS team

Studied for some yearsInitially

• Mobile computing• Distributed computing

Last works• Parallel computing

Framework approachAd-hoc implementations should be avoidedThe structure should highlight reusable tools

Current prototypesDynaco: generic framework for adaptabilityAfpac: tool for adapting SPMD codes

Dynamic adaptability 8

Other works on adaptability

Many ad-hoc implementationsSpecific to one kind of adaptation

• E.g. adapting the number of processes to the number of processors/machines, redistributing tasks [Paul et al., 1998]

Specific to one application• E.g. video streaming [Plasma]

Some (more or less generic) frameworks[EPSN]

Some compiler approaches[ASSIST]

Some semantic models[Zhang et Cheng, 2005]

Dynamic adaptability 9

Outline

Dynamic adaptabilityDynaco: generic framework for

adaptabilityAfpac: tool for the adaptation of SPMD

codesEvaluationsConclusion and future works

Dynamic adaptability 10

Dynaco: a generic adaptability framework

Decomposition of adaptability in 4 stepsObserve the execution environment as it evolvesDecide that the component should adaptPlan how to achieve the adaptationSchedule and execute planned actions

Dynamic adaptability 11

Adaptability step 1: observe

Collect information about the execution environmentConnect to the monitoring infrastructure of

the environmentDetect relevant changes

Trigger adaptability when the adaptable component may not be well adapted anymore

Dynamic adaptability 12

Adaptability step 2: decide

Find the best strategyWith regard to a developer- or user-provided criterion

• E.g. performance model

Depending on information collected at the observe phase

Possible implementationsAny optimization algorithm

• Depending on the properties of the criterion that should be optimized

Expert systems and decision diagrams

Dynamic adaptability 13

Adaptability step 3: planning

Find how the decided strategy can be achievedStarting from the currently executing configurationAssembling predefined actions with some control flow

Possible algorithmPlanning algorithms

• May be costly if too much expressivity is required

Collection of predefined plans• Difficult to construct a sufficient collection

Dynamic adaptability 14

Adaptability step 4: execution

Execute generated plansSchedule accordingly to dependencies highlighted in

plansSynchronize with the applicative execution flows

Possible implementationsHooks in the applicative code

• Called “adaptation points”• Rendezvous at the next hook in applicative code• Rollback to the previous hook in applicative code

Applicative code suspension

Dynamic adaptability 15

Dynaco: a generic adaptability framework

In order to instantiate the frameworkChoose implementations for the generic enginesImplement policy, guide and actions

Dynamic adaptability 16

Dynaco: a generic adaptability framework

Integrate the framework instance within the adaptable componentBind “actions” and “execute” to the content of the

componentBind the framework to the monitoring infrastructure

Dynamic adaptability 17

Integration in the development cycle

Dynamic adaptability 18

Outline

Dynamic adaptabilityDynaco: generic framework for

adaptabilityAfpac: tool for the adaptation of SPMD

codesEvaluationsConclusion and future works

Dynamic adaptability 19

Adaptation for parallel components

Parallel componentsComponents that encapsulate parallel codes

Case of parallel componentsIn the execute phase

• Synchronize adaptation actions with the execution of the applicative code

• Hook the applicative execution threads

Adaptation points are global states

Dynamic adaptability 20

Afpac: adaptation for SPMD components

Rendezvous at the upcoming global state hook

Locally to each process, adaptation points are indicated by developersCall to an Afpac function

Globally, adaptation points are built as the identity relation over local adaptation pointsSPMD code assumption

Dynamic adaptability 21

Afpac

Distributed algorithm to find the upcoming adaptation pointIterative

• Each process locally predicts upcoming local adaptation points

– If prediction is impossible, wait for the applicative execution thread to progress

» E.g. in case of conditional instructions• Each process gathers other processes’ predictions• As long as at least one process does not agree, rerun the

algorithm– Each process computes a least upper bound according to other

processes’ predictionsConcurrent to the applicative execution thread

Dynamic adaptability 22

Afpac

Requirements for the applicative codeTracking the progress of the execution in

each process• Upon local adaptation points• Upon control structures containing adaptation

points

Predicting upcoming adaptation points• Control flow model of the applicative code

– With the same granularity as above

Dynamic adaptability 23

Taco: AOP tool easing the use of Afpac

Specific aspect weaverHandling of control structures

• Source code transformation for inserting calls upon control structures

Extraction of the control flow model

Task still belonging to developersIndicating local adaptation points

Dynamic adaptability 24

Outline

Dynamic adaptabilityDynaco: generic framework for

adaptabilityAfpac: tool for the adaptation of SPMD

codesEvaluationsConclusion and future works

Dynamic adaptability 25

Examples of using Dynaco

FT (from the NAS Parallel Benchmark suite): numerical kernelAdapting the number of processes to the number of

available processors• i.e. implementing malleability

Gadget 2: N body simulatorAdapting the data distribution to load unbalance

• i.e. revisiting load balancingDad: home-made genetic algorithm

Adapting the implementation to the underlying architecture

• Including to communication facilities

Dynamic adaptability 26

Progress of one adaptation

Dynamic adaptability 27

Progress of one adaptation

Dynamic adaptability 28

Outline

Dynamic adaptabilityDynaco: generic framework for

adaptabilityAfpac: tool for the adaptation of SPMD

codesEvaluationsConclusion and future works

Dynamic adaptability 29

Summary

Dynaco: a generic framework for adaptabilityIndependent of the application

• E.g.: numerical algorithms, transactional systemsIndependent of formalisms and technologies

• E.g.: 3 interchangeable formalisms for the policy in the current implementation

– Objective function, optimized by a genetic algorithm– Collection of condition-action rules, interpreted by the

Jess expert system– Plain Java code, executed by a JVM

Dynamic adaptability 30

Ongoing and future works

Trying to reduce applicative code suspension while selecting a global adaptation pointDesigning speculative algorithmGuessing what other processes will do, rather than waiting for

those processes to do it• Compensating small desynchronizations

Using rollback in case of wrong prediction Designing a dialogue between grid resource managers and

adaptable applications Investigate how resource managers and adaptable applications

can mutually benefit from each other• Better resource management• Avoid considering rescheduling as faults

– Avoid using checkpoint/restart

Dynamic adaptability 31

Long term goals

Connections with fault toleranceMaking Dynaco resilient

• Dynaco is centralized– Even if able to command the adaptation of parallel applications

• The process executing the Dynaco framework must never fail

– Furthermore, it should be able to execute actionsImplementing fault tolerance with Dynaco

• One adaptation action may be “restart from checkpoint”• Adaptability would allow to restart with a different

behavior/implementationUsing fault tolerance features for adaptability

• Several adaptability implementations use checkpoint/restart• It can be useful to implement speculative adaptation point

selection

Dynamic adaptability 32

Long term goals

Adaptability in the context of systemsNot restricted to well suited resource managementResource managementData management

• Replication• Consistency

Adaptation of the systemAccording to the underlying platformAccording to hosted applications