Adaptive Cluster Computing using JavaSpaces · Adaptive Cluster Computing using JavaSpaces The...

Adaptive Cluster Computing using

JavaSpaces

The design, implementation and evaluation of a framework that uses JavaSpaces

2006/08/09 林子鐸

ReferencesJ. Batheja and M. Parashar, “Adaptive Cluster Computing using JavaSpaces”, Proceedings of the 2001 IEEE International Conference on Cluster ComputingSun Microsystems. JavaSpaces, www.javasoft.com/products/javaspaces/specs (1998)E. Freeman, S. Hupfer, K. Arnold. JavaSpaces Principles, Patterns, and Practice. (June 1999)

Outline

IntroductionA Framework for Adaptive Parallel Computing on ClustersExperimental Evaluation of the FrameworkConclusions

Motivation: Traditional HPC is expensive

High Performance Computing (HPC)Why expensive?

Massively parallel processorsSupercomputersHigh-end workstation clusters

So, any alternatives?

Using idle resources in a networked system can be a more cost effective alternative

Opportunistic computing

What is Opportunistic Computing?

To provide large amounts of processing capacity by harnessing the idle and available resources on the network in an “opportunistic”manner Two Approaches

Job level approachAdaptive approach

Opportunistic Computing : Job level approach

Entire application jobs are allocated to available idle resources for computationA passive approach

Opportunistic Computing : Adaptive approach

Available processors are treated as part of a dynamic resource pool, and they aggressively competes for application tasksAn active approachThis approach targets applications that can be decomposed into independent tasksCluster based or web based

The Challenges to archive Opportunistic Computing

HeterogeneityIntrusivenessSystem configuration and management overheadAdaptability to system and network dynamicsSecurity and privacy

A suitable solution: JavaSpaces

What is it?A shared, network accessible repository for Java objects

The Principle of JavaSpaces

Master-worker parallel computing using JavaSpaces

A Framework for Adaptive Parallel Computing on Clusters

Three featuresPortability across heterogeneous platforms Minimal configuration overheads and runtime class loading at the participating nodesAutomated system state monitoring

Targets applications that are divisible into subtasks that can be solved independently

The Framework: Architecture Overview

The Framework: The Master Module

The Master ModuleHosts the JavaSpaces serviceDecomposes the application into independent tasksPlaces the tasks into the spaceTakes back the task results

Master JavaSpacesPut jobs

Get results

The Framework: The Worker Module

The Worker ModuleA thin moduleCan be configured and loaded at runtime Gets the tasks from the spacePut the task results back to the spaceControlled by the network management module

Worker JavaSpacesPut results

Get jobs

The Framework: The Worker Module

State transition diagram

Running0~25%

Paused25~50%

Stopped50~100%

Stop

Start

Pause

Resume

The Framework: The Network Management Module

Two functionsMonitor the state of workersProvide a decision making mechanism to facilitate the utilization of idle resources

Inference EngineRule based protocolMonitoring agent

Use SNMPTwo components: the manager component and the worker-agent component

The Framework: The Implementations

Remote Node ConfigurationUsing reflection to load classes dynamicallyRequired worker classes are downloaded from the master server

Dynamic Worker Management for Adaptive Cluster Computing

Dynamic Worker Management for Adaptive Cluster Computing

“Worker Module”

“Network Management

Module”

The EvaluationParallel Ray Tracing

An image generation techniqueDivide-&-Conquer

A 600x600 image is divided into 24 25x600 independent rectangular slices

Experiments (5 PCs)Scalability AnalysisAdaptation Protocol AnalysisAnalysis of Dynamic Worker Behavior Patterns under Varying Load Conditions

The Evaluation: Scalability Analysis

Measures the overall scalability of the frameworkCriteria

Max worker timeMaximum computation time among all workers

Task planning timeTime required for dividing and putting the tasks

Task aggregation timeTime required for collecting and aggregating the results

Parallel timeTotal execution time from start to finish

The Evaluation: Scalability Analysis Result

The Evaluation: Adaptation Protocol Analysis

Adaptation Protocol AnalysisAnalyze the overheads involved in signaling worker nodes and adapting to their current CPU load.Criteria

Two load simulatorsCPU load

Simulator 1: 30%~50%Simulator 2: 100%

The Evaluation: Adaptation Protocol Analysis Result

CPU Usage History on the worker machine Analysis of the signaling times

Simulator 2

START

Simulator 1

The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions

Analysis of Dynamic Worker Behavior Patterns under Varying Load Conditions

CriteriaMaximum Worker TimeMaximum Master OverheadTask Planning and Aggregation TimeTotal Parallel Time

The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions Result

Conclusions

SummaryGood scalability for loosely coupled applicationsIdle workstations can be effectively usedMonitoring and reacting to system state enables us to minimize intrusiveness to machines within the cluster

Adaptive Cluster Computing using JavaSpaces · Adaptive Cluster Computing using JavaSpaces The...

Documents

Transcript of Adaptive Cluster Computing using JavaSpaces · Adaptive Cluster Computing using JavaSpaces The...