Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic...

11
Does the implementation give solutions for the requirements? Flexibility Flexibility GridRPC enables dynamic join/leave of QM s ervers. GridRPC enables dynamic expansion of a QM server. Robustness Robustness GridRPC detects errors and application can implement a recovery code by itself. Efficiency Efficiency GridRPC can easily handle multiple cluster s. Local MPI provides high performance on a c luster by fine grain parallelism.

Transcript of Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic...

Page 1: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Does the implementation give solutions for the requirements?

FlexibilityFlexibilityGridRPC enables dynamic join/leave of QM servers.GridRPC enables dynamic expansion of a QM server.

RobustnessRobustnessGridRPC detects errors and application can implement a recovery code by itself.

EfficiencyEfficiencyGridRPC can easily handle multiple clusters.Local MPI provides high performance on a cluster by fine grain parallelism.

Page 2: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Strategy for long run

QM simulation will migrate to the other cluster either by intQM simulation will migrate to the other cluster either by intentionally or unintentionally.entionally or unintentionally.

intentional migrationExceeds the maximum runtime for the clusterReservation period has expired

unintentional migrationAny error/fault is detected

The next cluster will be selected by either reservation or siThe next cluster will be selected by either reservation or simple selection algorithm.mple selection algorithm.

Selection algorithm considersnumber of available cpusnumber of requested cpusrecords of past utilization

Simulation reads a host information file in every time step.Simulation reads a host information file in every time step.A cluster can join to/leave from the experiment on-the-fly.

Page 3: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

National Institute of Advanced Industrial Science and Technology

Experiments- target simulation -

- testbed -- results and lessons learned -

Page 4: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Grid-enabled SIMOX Simulation on Japan-US Grid Testbed at SC2005

A technique to fabricate a micro A technique to fabricate a micro structure consisting of Si surface on structure consisting of Si surface on the thin SiOthe thin SiO22 insulator insulatorAllows to create higher speed with Allows to create higher speed with lower power consumption devicelower power consumption device

Oxygen implanted into Silicon substrate

SiO2 insulator

Annealing

IC IC

Creating IC chips on the Insulator

leak current

Formation of Silicon over insulator (SOI) structure

Page 5: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

SIMOX simulation on the Grid

Simulate SIMOX by implanting five oxygen atoms with their initial velocities Simulate SIMOX by implanting five oxygen atoms with their initial velocities much smaller than the usual values.much smaller than the usual values.

The incident positions of the oxygen atoms relative to the surface crystalline The incident positions of the oxygen atoms relative to the surface crystalline structure of Si differ.structure of Si differ.

5 QM regions are initially defined5 QM regions are initially definedSize and No. of QM regions are changed during the simulation

0.11million atoms in total0.11million atoms in total

Results of the experiments will demonstrate the sensitivity of the process on Results of the experiments will demonstrate the sensitivity of the process on the incident position of the oxygen atom when its implantation velocity is the incident position of the oxygen atom when its implantation velocity is small.small.

Page 6: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Testbed for the experiment

Phase 1 Phase 2 Phase 3 Phase 4

AIST Super ClustersAIST Super ClustersP32 (2144 CPUs), M64 (528 CPUs), F32 (536 CPUs)

TeraGrid ClustersTeraGrid ClustersPSC clusters (3000 CPUs), NCSA clusters(1774 CPUs)

USC ClustersUSC ClustersUSC (7280 CPUs)

Japan ClustersJapan ClustersU-Tokyo (386 CPUs), TITECH (512 CPUs)

QM1 P32 P32 P32 P32 P32 USC USC USCISTB

SISTBS

QM2 P32 P32NCS

ANCS

ANCS

AUSC USC USC

Presto

Presto

QM3 M64 M64 M64 M64 M64 M64 M64 M64 M64 M64

QM4 P32 P32 TCS TCS TCS USC USC USC P32 P32

QM5 P32 P32 TCS TCS TCS USC USC USC P32 P32

Reserve

F32 F32 P32 P32 P32 P32 P32 P32 F32 F32

Page 7: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Result of the experiment

Phase 1 Phase 2 Phase 3 Phase 4

Experiment Time: 18. 97 daysSimulation steps: 270 (~ 54 fs)Longest Calculation Time: 4.76 day

Page 8: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Results of the experiment (cont’d)Behavior of oxygen atoms strongly Behavior of oxygen atoms strongly depends on the incident positiondepends on the incident position

0

0.2

0.4

0.6

0.8

1

1 51 101 151 201 251

QM 1QM 2QM 3QM 4QM 5

v/v0

Time step

Expanding/Dividing QM regions at every 5 Expanding/Dividing QM regions at every 5 time stepstime steps (( Expansion: 47 times, Division: 8 times)Expansion: 47 times, Division: 8 times)

0100200300400500600700800

50 100 150 200 250 Time Step

No. ofCPUs

No. of QM Atoms

270

No. of CPUs/Atoms

Succeeded in long-run by Succeeded in long-run by intentional/unintentional resource intentional/unintentional resource migrationmigration

Intentional migrationMigration triggered by faults

Page 9: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Summary of the experimental results

We could verify that our strategy for long run is praWe could verify that our strategy for long run is practical approachctical approach

Continue the simulation by migrating one cluster to the other one based on reservation

We could verify the programming using GridRPC anWe could verify the programming using GridRPC and MPI could implement real Grid-enabled applicatiod MPI could implement real Grid-enabled applicationn

Dynamic resource allocation / migrationRecover from faultsManage hundreds of CPUs on distributed sites

Page 10: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

Status and Future Plans

Ninf-G Version 5 will be coming!Ninf-G Version 5 will be coming!What are differences with Ninf-G4?What are differences with Ninf-G4?

Lower prerequisites for installationNinf-G4 needs Globus Library since it uses Globus IO for client/server communications.Ninf-G5 can be installed without Globus. i.e.,

Ninf-G5 can be installed according to the underlying softwarNinf-G5 can be installed according to the underlying software environmentse environments

Three major components (remote process invocation, information retrieval, and client/server communication) can be pluggable. e.g. without Globus, without TCPWork efficiently from a single supercomputer to Grid

Other new features will be supportedOther new features will be supportedConnection less (client <-> server)Client-side check pointing

Ninf-G 5.0.0alpha will be available in this March.Ninf-G 5.0.0alpha will be available in this March.

Page 11: Does the implementation give solutions for the requirements? Flexibility GridRPC enables dynamic join/leave of QM servers. GridRPC enables dynamic expansion.

For more info, related links

Ninf project MLNinf project [email protected]

Ninf-G Users’ ML (subscribed member’s only)Ninf-G Users’ ML (subscribed member’s only)[email protected]

Ninf project home pageNinf project home pagehttp://ninf.apgrid.org

Open Grid ForumOpen Grid Forumhttp://www.ogf.org/

GGF GridRPC WGGGF GridRPC WGhttp://forge.gridforum.org/projects/gridrpc-wg/