Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

21
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma Department of Computer Science and Information Systems The University of Hong Kong {lchen2+clwang+fcmlau+kk1ma}@csis.hku.hk

description

G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports. Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma Department of Computer Science and Information Systems The University of Hong Kong {lchen2+clwang+fcmlau+kk1ma}@csis.hku.hk. - PowerPoint PPT Presentation

Transcript of Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

Page 1: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

G-JavaMPI:A Grid Middleware for Distributed Java Computing with

MPI Binding and Process Migration Supports

Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

Department of Computer Science and Information Systems

The University of Hong Kong

{lchen2+clwang+fcmlau+kk1ma}@csis.hku.hk

Page 2: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 2

OutlineMotivation

Overall system architecture

Detailed Issues

Related works

Conclusion & Future Work

Page 3: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 3

Motivation

Grid computing: large-scale resource sharing, high performance

Globus Project: basic services required by building and using a Grid (authentication, security, resource allocation, remote data access, information services, etc.)

However long-running applications continuous computation Better utilization of resource scheduling and load balancing

Java process migration architecture-independent bytecode makes migration easier

Page 4: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 4

Motivation

Let the programmer write a grid application easily no care about inter-site communication and intra-site

communication (we must care about it if directly using globus communication libraries)

SPMD: one program can be executed in multiple places or sites

MPI paradigm a group of distributed processes, they can do peer-to-peer

or collective communication Communication source or destination addresses are

unrelated with the real physical network address (adaptable)

Page 5: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 5

System Overview

(*)

Gatekeeper

(1)(1*)

LS

Gatekeeper(3*)

LS

Gatekeeper

(3)

LS

(2)

WAN

Migrating(restarting a new process through Globus remote job request with delegated user credentials and Java-MPI job credentials)

Java-MPI communication

Some legacymessages are redirectedduring migration

(2*)

JVM

M

Migration module resides in each JVM

Page 6: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 6

System overview

Globus Toolkit Libraries Java MPI communication daemons Local schedulers Java-MPI processes Migration modules

A Java-MPI process Java-MPI process (before migration) (after migration) (1*) – (2*) – (3*): MPI communication route before migration (1*) – (2*) – (3*): MPI communication route after migration (*): Java MPI communication daemons redirect some legacy messages which should be go to the migrated process

MLS

Page 7: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 7

Layered design

RestorableCommunication Services

Authentication

ControlBlock

DLBPolicy

Info.Update

(Restorable MPI Comm Layer) (Load Balancing Module)

Java-MPI Applications

MPICH-G2

MessageQueues

Globus Services

OS

JVMJVMDI

Execution State Probe & Migration Plug-in

(Migration Layer)

Java-MPI API & Java API(Java-MPI API Layer)

Hardware

MigrationInstructions

Page 8: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 8

Java-MPI bindingRestorable communication layerDaemon, a running MPICH-G2 process,

providing MPI communication servicesCommunicate with JavaMPI process

through IPCPost-migration message

re-direction

RestorableCommunication

Processspace

Page 9: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 9

Java Process MigrationState capturing: a probe attached in each JVM, saves the process

context through JVMDI (JVM Debugger Interface) All runtime data: PC register, stack frames, objects,

method area (local variables), etc. Event notification: method_entry, frame_pop, etc.

Use object serialization to package all reachable objects in heap

New JDK1.4.0 & 1.4.1 released in Aug. 2002 support “full-speed debugging”

JVM probeJVMDI

1. Execution state data2. Eventnotification

Page 10: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 10

Java process migration

State Restoration:Exception handler inserted in bytecode

(pre-processing before execution) to restore local variables and “jump” to the original execution point

Re-allocate objects when re-starting JVMDynamic class loading

Page 11: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 11

Information updateMigration

Sourcesite

MigrationDestination

site

Other sites

Migration begin

Notify other sites (including destination site)

The process arrives the safe migration point

(consume all legacy messages)

Update local site of the process’s new place

Begin process state capturation

Page 12: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 12

Process Restart

OriginalProcess

OriginalProcess

New-startedProcess

New-startedProcess

creates a new user certificate proxy(proxy_init_cred )

delegated to remote site

get the resource allocation

The new process can be started(similar to normal globus job submit)

JVM initializationAt the same time, the probe started

Process suspended in the beginning,Probe read out context from dumpfile

Restoring the execution context

Process resumed and continued from the last point

Page 13: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 13

Experiment ResultsHardware 32-node Cluster “ostrich” configured as two grid points of 16 nodes 733MHz Pentium III processor 392MB of memory connected by a 24-port Fast Ethernet switch

Software Linux 2.2.14 Gloubs 2.0 Sun JDK 1.4.0_02 (supporting JVMDI with

full-speed debugging mode) MPICH 1.2.4 (MPICH-G2)

Page 14: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 14

Experiment resultsBandwidth

0500

100015002000250030003500400045005000

8 16 32 64 128 256 512 1024 2048

Message Size (byte)

Ban

dw

idth

(K

byt

e/s)

Intra-site bandwidth Inter-site bandwidth

Bandwidth comparison between inter-site and intra-site communication with the installation of the MPI communication layer.

Page 15: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 15

Experiment resultsLatency

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 4 8 16 32 64 128 256 512 1024 2048

Message Size (byte)

Late

ncy

(s)

Inter-site latency Intra-site latency

Latency comparison for small messages between intra-site and inter-site communication with the installation of the MPI communication layer.

Page 16: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 16

Experiment resultsTime for capturing and restoring objects

0

500

1000

1500

2000

2500

3000

1 10 100 1000 10K 100K 1M 10M

object size (byte)

tim

e (

mic

ros

ec

on

d)

capturing objects restoring objects

Time spent in capturing and restoring objects

Page 17: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 17

Experiment resultsTime for capturing and restoring Java frames

0

1

2

3

4

5

6

1 10 20 50 100 200 300

number of frames

tim

e (

se

co

nd

s)

capturing frames restoring frames

Time spent in capturing and restoring frames

Page 18: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 18

Related WorksJava bindings for MPI: “mpiJava”, “JavaMPI”, “MPIJ”, etc.Java process or thread migration: Add additional backup codes in programs [Aglets[IBM96]] Insert backup statements in the source or byte code, a backup

object is used to store state [Wasp project [Funfrocken98]] Extend the JVM, make state accessible from Java programs,

support type recognition of Java stack [sara Bouchenak 2000] Use JVMDI to capture state, insert bytecode instructions in

program body to help restoring [Torsten2001] JESSICA (supports thread migration in JVM)

Page 19: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 19

Conclusion

a new middleware for the Grid with Java-MPI communication and transparent process migration features. write MPI-style programs in Java language Java process migration mechanism

supports the development of any dynamic load balancing policy or fault tolerance mechanism

Page 20: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

GCC2002 PresentationLin Chen, CSIS, HKU (Dec. 26, 2002) 20

Future Plan

Develop some scientific and engineering applications on top of this middlewareSupport of the transfer of other I/O (including file stage-in/out)Load balancing algorithm for the grid environment (both CPU and network load)

Page 21: Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma

The End

Thanks !