1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein...

64
1 002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department of Electrical and Computer Engineering Alberto Montresor University of Bologna - Italy Department of Computer Science

Transcript of 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein...

Page 1: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

1© 2002-2003 Hein Meling and Alberto Montresor

The Jgroup/ARMDependable Computing Toolkit

Hein MelingStavanger University College – Norway

Department of Electrical and Computer Engineering

Alberto MontresorUniversity of Bologna - Italy

Department of Computer Science

Page 2: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

2© 2002-2003 Hein Meling and Alberto Montresor

Context

(Distributed) systems that require

• Reliable and high-availability operation

• Fault tolerance

• (Load balancing)

Based on ”cheap” hardware and software

• Commercial off the shelf, and not custom hardware

• Heterogenous software (OS) architectures

Middleware architectures for distributed computing

• Middleware: between the application and OS

Page 3: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

3© 2002-2003 Hein Meling and Alberto Montresor

Types of Failures

Processor failures

• Crash failures

• Value failures (very expensive)

Network failures

Operating System hangs

Memory leaks

Software design errors(beyond state-of-the-art)

Page 4: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

4© 2002-2003 Hein Meling and Alberto Montresor

Overview

Jgroup

• A toolkit aimed at supporting the development of reliable and highly-available applications.

Autonomous Replication Management (ARM)

• A framework for server replica deployment and recovery without user intervention.

History

• Formal specification (1996-97)

• Algorithm description and Jgroup implementation

• Integration with existing technologies (Java RMI / Jini)

• The ARM framework (2000-03)

• Development of Jgroup-based applications

Page 5: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

5© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

Page 6: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

6© 2002-2003 Hein Meling and Alberto Montresor

The Problem

Some environments supporting distributed computing:

• CORBA (OMG)

• DCOM / .NET (Microsoft)

• Java RMI / Jini / EJB (Sun)

Characteristics:

• Object-oriented

• Based on client - server remote method invocations

• Promote modularity, reusability, interoperability, portability

Page 7: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

7© 2002-2003 Hein Meling and Alberto Montresor

Java Remote Method Invocations

Java RMI protocol:

• enables objects residing in different JVMs to communicate through remote method invocations

Client Server

Stub

Server-sideRMI

Runtime

Network

JVM1 JVM2

method() return x

Page 8: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

8© 2002-2003 Hein Meling and Alberto Montresor

Java Remote Method Invocations

Client Server

JVM1 JVM2

method()

return x

Page 9: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

9© 2002-2003 Hein Meling and Alberto Montresor

The Problem

Distributed computing environments did not provide adequate support for developing reliable and high-available applications

Lack of reliable “one-to-many” interaction primitives

• From the client’s point of view: non-transparent access to replicated servers

• From the server’s point of view: no support for maintaining consistency

Page 10: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

10© 2002-2003 Hein Meling and Alberto Montresor

The Solution: The Object Group Paradigm

Object group:

• A dynamic collection of server objects that cooperate in order to deliver some service and maintain shared state

Group method invocations:

• The act of invoking a method on an object group

• The method is executed by a certain number of servers in the object group, depending on the invocation semantics

Client

Server Server

Server

ObjectGroup

Page 11: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

11© 2002-2003 Hein Meling and Alberto Montresor

The Solution: The Object Group Paradigm

From the client’s point of view:

• Groups must be transparent - like standard remote objects

• Clients need not be aware that they are interacting with an object group instead of a single server

From the server’s point of view:

• Server implementation - as transparent as possible

• Servers forming a group• must cooperate to maintain shared state and• to appear as a single object

Page 12: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

12© 2002-2003 Hein Meling and Alberto Montresor

Group Communication

Group communication has been shown to be a powerful paradigm for supporting the development of dependable applications in distributed systems

• Management of dynamic groups(join/leave operations)

• Failure monitoring(crashes / partitionings)

• “One-to-many” communication

• Ordering of events (FIFO, Causal, Atomic)

• State synchronization tools

Group MembershipService

Reliable MulticastService

State TransferService

Page 13: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

13© 2002-2003 Hein Meling and Alberto Montresor

Other Object Group Systems

CORBA

• Electra [Cornell, Zurich]

• Object Group Service (OGS) [EPFL, Lausanne]

• Eternal [UC Santa Barbara, Eternal Systems]

• Newtop [Newcastle, UK]

Java RMI

• Filterfresh [Bell Labs]

• JavaGroups [Cornell]

• Aroma [UC Santa Barbara]

DCOM

• Quintet [Cornell]

Page 14: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

14© 2002-2003 Hein Meling and Alberto Montresor

Jgroup: “Yet Another Object Group Service”?

Support for partition-awareness:

• Modern wide-area communication networks are often characterized as highly partitionable

• Jgroup supports the development of reliable and high-available applications in partitionable systems

Moreover:

• Is extends modern technologies like Java RMI and Jini

• Is completely written in Java (portability)

• Supports complex merging service

• Extensible: deployment, recovery and upgrade facilities

Page 15: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

15© 2002-2003 Hein Meling and Alberto Montresor

Autonomous Replication Management

Support for transparent replica deployment

• Placing server replicas on machines in the network

• Selecting machines so that each application can tolerate both network and machine failures

Support for replica recovery

• Jgroup detect and report failures

• ARM replace any crashed server replica with a new instance

Page 16: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

16© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

Page 17: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

17© 2002-2003 Hein Meling and Alberto Montresor

Group Membership

Group membership service tracks both voluntary and involuntary changes in the group’s membership

Variations are reported to group members through the installation of views

Installed views

• Consist of a collection of members

• Correspond to the group’s current membership as perceived by the members included in the view

Page 18: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

18© 2002-2003 Hein Meling and Alberto Montresor

Group Membership: A Simple Scenario

join

join

joinS1

S2

S3

S3 crashes!view

Page 19: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

19© 2002-2003 Hein Meling and Alberto Montresor

Partition-awareness

What kind of behavior can we expect from fault-tolerant applications in the presence of network partitioning?

The primary-partition approach:

No serviceavailable !

How can I help You ?

No serviceavailable !

Page 20: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

20© 2002-2003 Hein Meling and Alberto Montresor

Jgroup supports dependability in partitionable systems

• Development of applications aware of the existence of partitions (on the server-side)

• Partition-aware applications take advantage of their semantics in order to be more available

• Computations continue in all partitions of the system

How can I help You ?

How can I help You ?

How can I help You ?

Support for partition-awareness

Page 21: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

21© 2002-2003 Hein Meling and Alberto Montresor

Group Membership: A Partitioning Scenario

join

join

joinS1

S2

S3

S1 and S2 partitioned

from S3!

Communicationwith S3

restored!

Page 22: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

22© 2002-2003 Hein Meling and Alberto Montresor

Example: Task Execution Service

Server Server

Server

Primary Partition

Server

Client TaskTask

TaskTask

Client

Warning!

Server Server

Server Server

Client TaskTask

TaskTask

ClientTaskTask

TaskTask

Partition-aware

Page 23: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

23© 2002-2003 Hein Meling and Alberto Montresor

Comparison

Primary-partition approach

+ Easy to maintain a single, coherent shared state(strong consistency)

- Servers in non-primary partitions unable to serve requests (low availability)

Partition-aware approach

+ Servers in multiple partitions may be able to serve requests(high availability)

- Partitions evolve independently, possibly leading to inconsistent states (loose consistency)

Page 24: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

24© 2002-2003 Hein Meling and Alberto Montresor

Comparison (Cont.)

Primary-partition approach

+ Development of fault-tolerant applications is simpler(active replication of existing non fault-tolerant servers)

- Developers cannot exploit application semantics in order to provide a more available service

Partition-aware approach

+ Applications adapt their behavior and remain available in many partitions (perhaps by reducing their quality of service)

- Development of fault-tolerant applications is more complex (case-by-case design is needed)

Page 25: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

25© 2002-2003 Hein Meling and Alberto Montresor

The State Merging Problem

During partitioning, the state of servers belonging to distinct partitions may become inconsistent

When the partition disappears, an application-specific state merging protocol may be needed

Servers participating in the protocol try to define a new shared state that reconciles (when possible) the divergences

Server Server

Server ServerTaskTask

TaskTaskServer Server

Server ServerTaskTask

TaskTask

Page 26: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

26© 2002-2003 Hein Meling and Alberto Montresor

The State Merging Problem

During partitioning, the state of servers belonging to distinct partitions may become inconsistent

When the partition disappears, an application-specific state merging protocol may be needed

Servers participating in the protocol try to define a new shared state that reconciles (when possible) the divergences

Server Server

Server ServerTaskTask

TaskTaskServer Server

Server ServerTaskTask

TaskTask

TaskTask

TaskTask

Page 27: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

27© 2002-2003 Hein Meling and Alberto Montresor

The State Merging Problem

State merging protocols are based on the exchange of information among servers that have been partitioned

Jgroup provides a state merging service (SMS) that simplifies the development of state merging protocols

NOTE

Determining

• what information needs to be exchanged

• how to use it to construct a new consistent shared state

is an application-dependent problem

Page 28: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

28© 2002-2003 Hein Meling and Alberto Montresor

General Schema for State Merging Protocols

• In each of the merging partitions, a coordinator is selected

• SMS interrogates each coordinator to obtain information about its current state

• State information from a coordinator is passed to servers that used to be partitioned from it

• Each of the servers merge information from coordinators with their own state

S1

S2

S3

S4

getState()

putState()

Page 29: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

29© 2002-2003 Hein Meling and Alberto Montresor

General Schema for State Merging Protocols

• In each of the merging partitions, a coordinator is selected

• SMS interrogates each coordinator to obtain information about its current state

• State information from a coordinator is passed to servers that used to be partitioned from it

• Each of the servers merge information from coordinators with their own state

S1

S2

S3

S4

getState()

putState()

Page 30: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

30© 2002-2003 Hein Meling and Alberto Montresor

Full Object-Orientation

Server Server

Server

Client

Remote methodinvocations

Messagemulticasting

Stub

Existing object group systems fail to provide a completely object-oriented environment for software developers

Page 31: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

31© 2002-2003 Hein Meling and Alberto Montresor

View Synchrony

View synchrony (1)

If a correct server S executes an invocation during a view, then

• all servers within the view will also execute the invocation,

• or S will install a new view

View synchrony does not admit executions like this:

S2

S3

S4

S1

admits

Page 32: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

32© 2002-2003 Hein Meling and Alberto Montresor

View Synchrony

View Synchrony (2)

All servers that survive from one view to the same next view execute the same set of invocations in the original view

View synchrony does not admit executions like this:

S2

S3

S4

S1

admits

Page 33: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

33© 2002-2003 Hein Meling and Alberto Montresor

Internal Group Method Invocations

Synchronous invocations

• The method invocation terminates by returning a vector of return values, one from each server at which the method was executed

Asynchronous invocations:

• The method invocation terminates immediately; replies (if any) are returned to a callback object

• Can be used to simulate message multicasting through void methods (one-way)

Page 34: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

34© 2002-2003 Hein Meling and Alberto Montresor

Internal Invocations: example

Synchronous invocation

S1

S2

S3

int[] values =

group.getValue();

int getValue() {

return value

}

Page 35: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

35© 2002-2003 Hein Meling and Alberto Montresor

Internal Invocations: example

S1

S2

S3

ValuesCallback cb;group.getValue(cb);…int[] values = cb.getResults();

public class ValuesCallback implements Callback { void result(Object value); int[] getResults();}

int getValue() {

return value

}

Page 36: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

36© 2002-2003 Hein Meling and Alberto Montresor

External Group Method Invocations

Anycast invocations:

• Are executed by at least one server in the object group (unless the client is partitioned from the group)

• Efficiency (same cost as standard RMI interactions)

• Useful for “read” methods on replicated databases

Multicast invocations:

• Are executed by all servers in a view, following the view synchrony semantics

• More costly (involve several servers)

• Useful for “write” methods on replicated databases

Page 37: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

37© 2002-2003 Hein Meling and Alberto Montresor

External invocations: example

S1

S2

S3

C1

C2

Multicast invocation:

registry.bind(“name”, obj);

Anycast invocation:

registry.lookup(“name”);

Page 38: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

38© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

Page 39: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

39© 2002-2003 Hein Meling and Alberto Montresor

Replication Management – The Problem

Object Group Systems support replication transparency:

• Membership management

• Reliable multicast

But does not support full failure transparency:

• Application or manual support to distribute replicas

• Application support or manual intervention required to recover from replica failures

Complicated tasks

• Application implementations prone to contain errors

• These tasks should not be left to the application developer

Page 40: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

40© 2002-2003 Hein Meling and Alberto Montresor

Solution: Autonomous Replication Management

Support for creating object groups

• By placing individual members on distinct machines

• Each application may specify a replication policy• For example, redundancy level = 3

Support for failure recovery

• Jgroup detects and reports failures to ARM

• ARM reacts by creating a replacement member for each failed member, perhaps on a different machine

• Each application may specify a recovery policy

Page 41: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

41© 2002-2003 Hein Meling and Alberto Montresor

ARM: Replica Distribution

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ReplicationManager

ManagementClient

createGroup()createReplica()

Page 42: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

42© 2002-2003 Hein Meling and Alberto Montresor

ARM: Replica Distribution

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ManagementClient

createGroup()createReplica()

NettBankServer

NettBankServer

NettBankServer

Page 43: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

43© 2002-2003 Hein Meling and Alberto Montresor

ARM: Recovery from Crash Failure

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ManagementClient

NettBankServer

NettBankServer

NettBankServer

Group Leader

notifyViewChange()

View agreement protocol

Page 44: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

44© 2002-2003 Hein Meling and Alberto Montresor

ARM: Recovery from Crash Failure

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ManagementClient

NettBankServer

NettBankServer

Group Leader

notifyViewChange()

createReplica()

NettBankServer

Page 45: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

45© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

Page 46: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

46© 2002-2003 Hein Meling and Alberto Montresor

Introduction to Jini

Jini is an API built on top of the Java 2 platform:

• enables spontaneous networks of devices/software services to assemble into federations of objects

• addresses the distribution problems in these federations through a set of simple interfaces and protocols

Jini

Network

Page 47: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

47© 2002-2003 Hein Meling and Alberto Montresor

Jini Architecture

The components of the Jini architecture may be divided in three categories:

• Infrastructure i.e. the components that enables building a federated Jini system

• Model that “supports and encourages the production of reliable distributed services”

• Services that can be made part of a federated Jini system and which offer functionality to any other member of the federation

• Javaspaces

Page 48: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

48© 2002-2003 Hein Meling and Alberto Montresor

Jini Infrastructure

The infrastructure is composed of:

• Java RMI protocol:enables objects residing in different JVMs to communicate through remote method invocations

Client Server

Stub

Server-sideRMI

Runtime

Network

JVM1 JVM2

method() return x

Page 49: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

49© 2002-2003 Hein Meling and Alberto Montresor

Jini Infrastructure

The infrastructure is composed of:

• Lookup Service: defines how services may become part of a Jini system and clients retrieve services by their types and attributes.

Client

Lookup Service

Server

StubStub

Join Stub

Lookup

Stub

Invocation

Lookup.

Stub

Discovery

Page 50: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

50© 2002-2003 Hein Meling and Alberto Montresor

The Jini Programming Model

The programming model is based on three distinct paradigms for distributed computing:

• Leases extend the Java programming model by adding the time to the notion of holding a reference to a resource

• Transactionsallow a set of operations on one or more remote participants to be grouped in such a way that either all succeed or all fail

• Eventsenable objects to register interest in changes of the abstract state of remote objects

Page 51: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

51© 2002-2003 Hein Meling and Alberto Montresor

Jini and Fault Tolerance

Jini fault tolerance is based on leases and transactions

• leases enable the detection of service failures

• transactions provide consistency by guaranteeing “all-or-nothing” semantics

Unfortunately, no support for high-availability is present in Jini

• No support for replication

• Failure of transaction manager clients and participants must wait for the recovery of the manager before serving further requests

Page 52: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

52© 2002-2003 Hein Meling and Alberto Montresor

Enhancing Jini with Fault-Tolerance

Extending Jini with the Object Group Paradigm:

• Infrastructure• Extending Java RMI for Group Method Invocations

• Extending the Lookup Service for dealing with Group Proxies

• Programming Model

1. Object Group Paradigm as alternative programming model

2. Integration between transactional and object group model

• Services• Replicated JavaSpaces

Page 53: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

53© 2002-2003 Hein Meling and Alberto Montresor

Extending Java RMI

RMI group at Javasoft designed Java RMI in order to be extensible

• The RemoteRef interface enables programmers to write their own references to remote objects on the client-side

Unfortunately, RemoteRefs are not sufficient

• There is no possibility to modify the behavior of RMI on the server side

RemoteRef

Client Stub

Server-sideRMI

Runtime

Server

Page 54: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

54© 2002-2003 Hein Meling and Alberto Montresor

The Jgroup Approach (Current Version)

ServerProxy

Server

ClientProxy

Client

Statically or dynamicallygenerated – implementsthe remote interface

Fixed stub for server proxy

RMI Stub

Server-sideRMI

Runtime

RMI

ServerProxy

Server

Methoddispatchers

Multicast

RMI Stub

Server-sideRMI

Runtime

Page 55: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

55© 2002-2003 Hein Meling and Alberto Montresor

Designing a New Java RMI API

We have cooperated with Sun Microsystems to design a new RMI API:

• Fully customizable, on both the client-side and the server-side

• Based on Dynamic Proxy Classes (JDK 1.3)(No need for static stub generators)

• Two different versions:

• One-to-one (remote method invocations)

• Voted down in JSR-078

• Being included in the "Davis" release of Jini

• One-to-many (group method invocations)

Page 56: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

56© 2002-2003 Hein Meling and Alberto Montresor

ServerProxy

Server

ClientProxy

Client

Statically or dynamicallygenerated – implementsthe remote interface

ServerProxy

Server

Methoddispatchers

Jgroup with 1-to-1 Customizable RMI

RMI Stub

Server-sideRMI

Runtime

RMIMulticast

RMI Stub

Server-sideRMI

Runtime

RMI

Page 57: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

57© 2002-2003 Hein Meling and Alberto Montresor

Jgroup with 1-to-Many Customizable RMI

ServerProxy

Server

ClientProxy

Client

ServerProxy

Server

ServerProxy

Server

Customizableobjects

Multicast RMI

Page 58: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

58© 2002-2003 Hein Meling and Alberto Montresor

Extending the Lookup Service

Jini enables the registration of customized proxies for services

• this feature can be used to register group proxies using any implementation of the lookup service

Group proxies, however, differ from standard proxies as their contents may be dynamic

• server registration server reference added to group proxy

• server removal, lease expired server reference removed from group proxy

We have developed an alternative implementation of the lookup specification capable to deal with group proxies

Page 59: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

59© 2002-2003 Hein Meling and Alberto Montresor

The Jgroup Lookup Service

Client

Lookup Service

Server Server Server

StubStub

Lookup

Invocation

Join Stub Join Stu

b Join Stub

Stub

Page 60: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

60© 2002-2003 Hein Meling and Alberto Montresor

Extending the Jini Programming Model

Jgroup + Jini programming model for fault-tolerance

• Leases + transactions

• Object group communication

Problem:

• transactions and group communication considered as separate aspects of fault-tolerance

• their composition does not result in any meaningful combination of their respective strengths

We need the possibility of using replication in transactions:

• Transaction managers

• Participants

• Clients

Page 61: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

61© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

Page 62: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

62© 2002-2003 Hein Meling and Alberto Montresor

Applications (Research)

Jgroup/ARM is being used for

• A distributed auction system• Partitionable auctions

• [Panzieri, Amoroso et al., University of Bologna, 2002]

• An online-upgrade service for active replication• [Solarski, GMD Fokus]

• A replication management framework• Application-specific replication and recovery strategies

• [Meling, HiS]

• Dependable naming service• Support for extensible group proxies (JERI)

• [Meling et al., HiS]

Page 63: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

63© 2002-2003 Hein Meling and Alberto Montresor

Applications (Education)

Jgroup is being used at the

• Stavanger University College in the “Advanced Programming” course

• University of Bologna in the “Distributed System” course

• Norwegian University of Science and Technology in the “Dependable Systems” course

Source for several projects and thesis:

• Low-level communication protocols (Bologna)

• Replication services (Bologna)

• Wide-area distributed services (Padova)

• Management and deployment issues (HiS)

Page 64: 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein Meling Stavanger University College – Norway Department.

64© 2002-2003 Hein Meling and Alberto Montresor

Thank You!

http://jgroup.sourceforge.net/