Enabling Self-management of Component-based High-performance Scientific Applications

20
Enabling Self-management of Component-based High- performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering Rutgers University

description

Enabling Self-management of Component-based High-performance Scientific Applications. Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering Rutgers University. Challenges. Emerging scientific applications are - PowerPoint PPT Presentation

Transcript of Enabling Self-management of Component-based High-performance Scientific Applications

Page 1: Enabling Self-management of Component-based High-performance Scientific Applications

Enabling Self-management of Component-based High-performance Scientific Applications

Hua (Maria) Liu and Manish Parashar

The Applied Software Systems Laboratory

Department of Electrical and Computer Engineering

Rutgers University

Page 2: Enabling Self-management of Component-based High-performance Scientific Applications

2

Challenges

• Emerging scientific applications are– Distributed, heterogeneous, long-running, dynamic

• Changing user requirements

• Changing problem domains

• Changing context environments

• Emerging execution environments are also– Distributed, heterogeneous, dynamic

• Changing workload and communication capabilities

Page 3: Enabling Self-management of Component-based High-performance Scientific Applications

3

Solution

• Applications should be aware of changes in application/system state and execution context, and respond to them.– i.e., applications should be self-managing or autonomic

• However, this requires a programming system that can support the development and execution of such autonomic self-managing applications.– Extend computational elements (objects, components, and services)

to support autonomic behaviors

– Define dynamic composition (interactions) of autonomic elements that responds to changing user requirements and execution context

– Provide a runtime infrastructure to achieve self-management

Page 4: Enabling Self-management of Component-based High-performance Scientific Applications

4

Outline

• Challenges and solution• Conceptual model of Accord• Prototype implementation based on CCA Ccaffeine

framework• Illustrative applications

Page 5: Enabling Self-management of Component-based High-performance Scientific Applications

5

Overview of Accord Programming System

• Accord supports– Dynamic specification of adaptation behaviors in rules– Runtime enforcement of adaptation behaviors by invoking

sensors and actuators– Runtime conflict detection and resolution

• Key contributions– Accord provides programming abstractions to define the

control port– Accord enables applications to be context-aware and self-

managing– Accord enables element behavior adaptation and interaction

adaptation at runtime

Page 6: Enabling Self-management of Component-based High-performance Scientific Applications

6

Autonomic Element

Element Manager

Functional Port

Autonomic Element

Control Port

Operational Port

Element Manager

Event generation

Actuatorinvocation

OtherInterface

invocation

Internalstate

ContextualstateRules

ComputationalElement

Page 7: Enabling Self-management of Component-based High-performance Scientific Applications

7

The Accord Runtime Infrastructure

Application workflow

Composition manager

Application strategiesApplication requirements

Composition rules

Composition rules

Composition rules

Composition rules

Component rules

Component rules

Component rules

Component rules

Page 8: Enabling Self-management of Component-based High-performance Scientific Applications

8

CCA and Ccaffeine Framework

P0 P1 P2 P3

Components: Blue, Green, Red

Framework: Gray

• Different components in same process “talk to each” other via ports and the framework

• Same component in different processes talk to each other through their favorite communications layer (i.e. MPI, PVM, GA)

• Each process loaded with the same set of components wired the same way

Note: this slide is taken from CCA tutorial – www.cca-forum.org

• The characteristics of scientific applications

•These applications are component-based.•The execution of these applications typically consists of a series of computational phases.

Page 9: Enabling Self-management of Component-based High-performance Scientific Applications

9

Accord-CCA: Extend Ccaffeine to Enable Self-Management Behaviors

Controllable component

Component manager Composition manager

Driver

Ccaffeine framework + TAU

C1

C2

C3C4

Page 10: Enabling Self-management of Component-based High-performance Scientific Applications

10

Manager Components

• Component managers provide component-level adaptations via– Adapting the runtime behaviors of

individual component based on component rules

– Dynamically replacing components based on composition rules

• Composition managers provide application-level adaptations via– Coordinating component

managers’ behaviors

TAU

RulePort

events

C2

C3

Page 11: Enabling Self-management of Component-based High-performance Scientific Applications

11

Rule

Rule {

on events;

when conditions;

do actions;

}

component or system events

component or system sensors

component or system actuators

Page 12: Enabling Self-management of Component-based High-performance Scientific Applications

12

The Rule Enforcement Engine

Batch condition inquiry

Condition evaluation in parallel

Conflict detection and

resolutionReconciliation

Batch action

invocation

Context

Internal state of

elements

Pre-condition

Post-condition

Sensor-actuator conflict:• Detection: Execution of some rules will change the pre-condition• Resolution: Disable these rules

Actuator-actuator conflict:• Detection: The post-condition contains multiple • Resolution: Relax rule condition until no actuators are invoked with different

values by incrementally deleting sensors in a user-specified sequence

Page 13: Enabling Self-management of Component-based High-performance Scientific Applications

15

Reconciliation

C1

C2

Node x

C1

C2

Node y

C1

C2

Node z

Algorithm 1

Algorithm 1

Algorithm 2

C3C3

C4

Case1:

If the replacement on node z has a high priority and the other two have a low priority: propagate the replacement with C4.

If multiple high priority replacements: error.

Case2:

If all the replacements have a low priority, the replacement with highest performance gain will be propagated.

Page 14: Enabling Self-management of Component-based High-performance Scientific Applications

16

The Self-managing CH4 Ignition Simulation: Self-optimizing Via Component Adaptation

Component Manager

0

200000

400000

600000

800000

1000000

1200000

1400000

1000

1200

1400

1600

1800

2000

2200

2400

temperature

the

nu

mb

er

of

inv

oc

ati

on

to

G

rule basedexecution

non rulebasedexecution

3.69%

10.23%21.33%

9.38%

5.36%3.60%

27.42%

9.59%

Rule Generator

Export sensor “temperature” and

actuator “algorithm”

Initializer Executor CvodeThermo

ChemistryRef

A set of algorithms is provided to simulate a set of reaction processes. Some algorithms may not work at some temperatures. Further, these algorithms demonstrate different performance levels (execution time) at the same temperature. So algorithms have to be dynamically selected to avoid application crash and/or optimize application execution.

Page 15: Enabling Self-management of Component-based High-performance Scientific Applications

17

The Self-managing Shock Simulation: Self-optimizing Via Component Replacement

Component ManagerIF cache miss of GodunovFlux > value THEN REPLACE GodunovFlux EFMFlux

Performance toolkit (TAU)

2. collect cache miss of GodunovFlux

3. evaluate the rule

GodunovFluxEFMFlux

1. register cache miss event

4. replace GodunovFlux with EFMFlux

EFMFlux will be used from the next computation

Page 16: Enabling Self-management of Component-based High-performance Scientific Applications

18

The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation

AMRMesh

Component Manager

1. export actuator “algorithm”

IF bandwidth < threshold THEN algorithm x

xy

Performance toolkit (TAU)

3. collect current bandwidth

5. invoke algorithm with x

Algorithm x will be used from the next computation

4. evaluate the rule

2. register communication bandwidth

Page 17: Enabling Self-management of Component-based High-performance Scientific Applications

19

The Self-managing Shock Simulation: Self-healing Via Component Replacement

Component ManagerIF GodunovFlux error

THEN REPLACE GodunovFlux EFMFlux

2. evaluate the rule

GodunovFluxEFMFlux

1. register execution error as a sensor

3. replace GodunovFlux with EFMFlux

Page 18: Enabling Self-management of Component-based High-performance Scientific Applications

20

Conclusion

• The distribution, heterogeneity, and dynamism of emerging environments and applications impose new requirements on programming systems– To support development and execution of autonomic self-

managing applications

• Accord programming system extends CCA Ccaffeine framework to meet the requirements– Extends CCA components with component managers to

autonomic components– Provides a runtime infrastructure to enforce adaptation

behaviors and detect/resolve runtime conflicts

Page 19: Enabling Self-management of Component-based High-performance Scientific Applications

Additional Slides

Page 20: Enabling Self-management of Component-based High-performance Scientific Applications

22

Centralized vs Decentralized Reconciliation

• Centralized approach: one instance collects proposals from other instances and propogates reconciliation result– Converging rate = O(n)– Low scalability– Not robust

• Decentralized approach: each instance only communicates with its neighbors to achieve local consensus– Converging rate = O(lg n)– High scalability– Robust

• Problems to be solved– Local rules used by individual component instances– How to define neighbors