Adaptive Cluster Computing using JavaSpaces · Adaptive Cluster Computing using JavaSpaces The...
Transcript of Adaptive Cluster Computing using JavaSpaces · Adaptive Cluster Computing using JavaSpaces The...
-
Adaptive Cluster Computing using
JavaSpaces
The design, implementation and evaluation of a framework that uses JavaSpaces
2006/08/09 林子鐸
-
ReferencesJ. Batheja and M. Parashar, “Adaptive Cluster Computing using JavaSpaces”, Proceedings of the 2001 IEEE International Conference on Cluster ComputingSun Microsystems. JavaSpaces, www.javasoft.com/products/javaspaces/specs (1998)E. Freeman, S. Hupfer, K. Arnold. JavaSpaces Principles, Patterns, and Practice. (June 1999)
-
Outline
IntroductionA Framework for Adaptive Parallel Computing on ClustersExperimental Evaluation of the FrameworkConclusions
-
Motivation: Traditional HPC is expensive
High Performance Computing (HPC)Why expensive?
Massively parallel processorsSupercomputersHigh-end workstation clusters
-
So, any alternatives?
Using idle resources in a networked system can be a more cost effective alternative
Opportunistic computing
-
What is Opportunistic Computing?
To provide large amounts of processing capacity by harnessing the idle and available resources on the network in an “opportunistic”manner Two Approaches
Job level approachAdaptive approach
-
Opportunistic Computing : Job level approach
Entire application jobs are allocated to available idle resources for computationA passive approach
-
Opportunistic Computing : Adaptive approach
Available processors are treated as part of a dynamic resource pool, and they aggressively competes for application tasksAn active approachThis approach targets applications that can be decomposed into independent tasksCluster based or web based
-
The Challenges to archive Opportunistic Computing
HeterogeneityIntrusivenessSystem configuration and management overheadAdaptability to system and network dynamicsSecurity and privacy
-
A suitable solution: JavaSpaces
What is it?A shared, network accessible repository for Java objects
-
The Principle of JavaSpaces
Master-worker parallel computing using JavaSpaces
-
A Framework for Adaptive Parallel Computing on Clusters
Three featuresPortability across heterogeneous platforms Minimal configuration overheads and runtime class loading at the participating nodesAutomated system state monitoring
Targets applications that are divisible into subtasks that can be solved independently
-
The Framework: Architecture Overview
-
The Framework: The Master Module
The Master ModuleHosts the JavaSpaces serviceDecomposes the application into independent tasksPlaces the tasks into the spaceTakes back the task results
Master JavaSpacesPut jobs
Get results
-
The Framework: The Worker Module
The Worker ModuleA thin moduleCan be configured and loaded at runtime Gets the tasks from the spacePut the task results back to the spaceControlled by the network management module
Worker JavaSpacesPut results
Get jobs
-
The Framework: The Worker Module
State transition diagram
Running0~25%
Paused25~50%
Stopped50~100%
Stop
Start
Pause
Resume
-
The Framework: The Network Management Module
Two functionsMonitor the state of workersProvide a decision making mechanism to facilitate the utilization of idle resources
Inference EngineRule based protocolMonitoring agent
Use SNMPTwo components: the manager component and the worker-agent component
-
The Framework: The Implementations
Remote Node ConfigurationUsing reflection to load classes dynamicallyRequired worker classes are downloaded from the master server
Dynamic Worker Management for Adaptive Cluster Computing
-
Dynamic Worker Management for Adaptive Cluster Computing
“Worker Module”
“Network Management
Module”
-
The EvaluationParallel Ray Tracing
An image generation techniqueDivide-&-Conquer
A 600x600 image is divided into 24 25x600 independent rectangular slices
Experiments (5 PCs)Scalability AnalysisAdaptation Protocol AnalysisAnalysis of Dynamic Worker Behavior Patterns under Varying Load Conditions
-
The Evaluation: Scalability Analysis
Measures the overall scalability of the frameworkCriteria
Max worker timeMaximum computation time among all workers
Task planning timeTime required for dividing and putting the tasks
Task aggregation timeTime required for collecting and aggregating the results
Parallel timeTotal execution time from start to finish
-
The Evaluation: Scalability Analysis Result
-
The Evaluation: Adaptation Protocol Analysis
Adaptation Protocol AnalysisAnalyze the overheads involved in signaling worker nodes and adapting to their current CPU load.Criteria
Two load simulatorsCPU load
Simulator 1: 30%~50%Simulator 2: 100%
-
The Evaluation: Adaptation Protocol Analysis Result
CPU Usage History on the worker machine Analysis of the signaling times
Simulator 2
START
Simulator 1
-
The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions
Analysis of Dynamic Worker Behavior Patterns under Varying Load Conditions
CriteriaMaximum Worker TimeMaximum Master OverheadTask Planning and Aggregation TimeTotal Parallel Time
-
The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions Result
-
The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions Result
-
Conclusions
SummaryGood scalability for loosely coupled applicationsIdle workstations can be effectively usedMonitoring and reacting to system state enables us to minimize intrusiveness to machines within the cluster