Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

28
Filterfresh Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo Arash Baratloo www.cs.nyu.edu/phd_students/baratloo www.cs.nyu.edu/phd_students/baratloo
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Page 1: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FilterfreshFilterfresh

Fault-tolerant Java Servers Through Active Replication

Arash BaratlooArash Baratloowww.cs.nyu.edu/phd_students/baratloowww.cs.nyu.edu/phd_students/baratloo

Page 2: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

• Investigation of failure models in distributed Java applications

• Provide transparent fault-masking (to users and to programmers)

• Support highly available services in presence of failures

• Remove single-points of failure

FilterfreshFilterfresh

Page 3: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Remote Method Invocation (RMI)100% Java, hot, new, easy-to-use

and

Reliable Object Services (ROS) Interest in Providing:– support active-active replication– support Java objects

Motivating FactorsMotivating Factors

Page 4: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

RoadmapRoadmap

Motivation

– RMI Registry & crash failures– RMI Server Architecture & crash failures– A Unified Solution -- process group

approach– Fault-tolerant Registry– Fault-tolerant RMI– Conclusion

Page 5: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

RMI in a NutshellRMI in a Nutshell

• Servers register with the local registry

• Clients looks up a server at a well known registry

• Given a remote reference, client performs a remote method invocation

registry

client

serverlook

up"a

bc"

bind to

"abc"

Page 6: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Limitations of RMI RegistryLimitations of RMI Registry

• The “well known registry” requirement too restrictive for failure recovery

• Single point of failure• Can not support replicated servers, thus,

highly available servers

Page 7: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FT Registry requires...FT Registry requires...

• Distribute and replicate registry servers• Replication strategy to maintain a consistent

state• Failure detection and removal of failed

registry servers• Failed objects must be restarted

automatically• Dynamic addition of registry servers

Page 8: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

clientapplication

server stub

Transport layer

serverapplication

server skeleton

remote referencelayer (RRL)

remote referencelayer (RRL)

RMI ArchitectureRMI Architecture

• RRL assumes a stream-oriented transport• Transport layer implemented on TCP/IP

Page 9: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Architecture Architecture (cont…)(cont…)

client

stub

RRLtransport

RRL

skel

server

interface Server public void foo();

class Client { ... Server s = lookup... s.foo();

Page 10: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Architecture Architecture (cont…)(cont…)

client

stub

RRLtransport

RRL

skel

server

interface Server public void foo();

class ServerImplextends ... { public void foo() { ... }

Page 11: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Architecture Architecture (cont…)(cont…)

client

stub

RRLtransport

RRL

skel

server

Page 12: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

client

stub

RRLtransport

RRL

skel

server

class ServerImpl extends

UnicastRemoteObject { public void foo() { ... }

Architecture Architecture (cont…)(cont…)

Transparent FT system implies RRL or below

Page 13: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FT Servers Require...FT Servers Require...

• Distribute and replicate servers• Replication strategy to maintain a consistent

state• Failure detection and removal of failed

registry servers• Dynamic addition of registry servers• Object reference must remain valid after the

associated object has failed

Page 14: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

A Unified Solution...A Unified Solution...

Process Group Approach where all non-faulty

objects– form a group– consistent view of the group– interact through reliable group primitives --

all or nothing– total order on group primitives

Page 15: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FortunatelyFortunately

Process Group Membership is– well understood problem and protocols– well tested (ISIS, Transis, Amoeba, etc.)– basis for virtual synchrony

Equivalent Problems* (implement one, get all)– Group Membership– Reliable Failure Detectors– Reliable and ordered multicast

* Chandra and Toueg. Unreliable failure detectors for Reliable Distributed Systems. JACM, March 96.

Page 16: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

UnfortunatelyUnfortunately

Process Group Membership is– as hard as distributed consensus– impossible in purely asynchronous systems with

crash failures*

Our solution– the standard “timeout” assumption– variation of protocol used in Amoeba OS**

* Chandra, Toueg, Hadzilacos and Charron-Bost. Impossibility of Group Membership in Asynchronous Systems.

** Oey, Langendoen and Bal. Comparing Kernel-level and User-level Communication protocols on Amoeba. ICDCS 95.

Page 17: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

What We Provide...What We Provide...

A Group Manager Class– 100% Java– build on top of UDP/IP

Implements– group creation– join operation (with state transfer)– leave operation– failure detection and recovery– reliable multicast

All events are atomic and totally ordered

Page 18: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Multicast PerformanceMulticast Performance

• Pentium Pro 200, Linux RedHat 4.0, Fast Ethernet hub

0

10

20

30

40

50

60

70

Tim

e (

ms

ec

)

1 byte 512 1024

Message size (bytes)

local RMI

remote RMI

multicast-1

multicast-2

multicast-4

multicast-8

Page 19: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FT Registry ArchitectureFT Registry Architecture

• registry on each host/domain

• group managers ensure reliable ordered events

• support dynamic joins w/state transfer

ft registry

rmi registry

group mgr

ft registry

rmi registry

group mgr

server

bind

multicast

Page 20: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FT Registry Architecture FT Registry Architecture (cont…)(cont…)

• lookup becomes a local operation

• detect and remove failed objects

• consistent global state

ft registry

rmi registry

group mgr

ft registry

rmi registry

group mgr

client

look

up

Page 21: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FT Registry PerformanceFT Registry Performance

• Pentium Pro 200, Linux RedHat 4.0, Fast Ethernet, Ethernet hub

0

10

20

30

40

50

60

70

80

bind lookup

RMI Registry local

RMI Registry Remote

FT Registry-1

FT Registry-2

FT Registry-4

FT Registry-8

Page 22: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

RMI & FT RegistryRMI & FT Registry

• support multiple servers register with a same name

FT Registry

clientstub

RRL

transport

RRL

skelserver

transport

RRL

skelserver

RRL

skelserver

• can now support

recovery from server failure

Page 23: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

What if...What if...

In the event of server failure...

FT Registry

clientstub

RRL

transport

RRL

skelserver

transport

RRL

skelserver

RRL

skelserver

Ouch!

Page 24: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Failure RecoveryFailure Recovery

• The old connection is patched with

a connection to a non-faulty server• Illusion of a valid object reference• Transparent!

FT Registry

clientstub

RRL

transport

RRL

skelserver

RRL

skelserver

"reverse" lookup

transport

• A “reverse”

lookup returns

a name given a

wire connection

Page 25: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Failure Recovery Failure Recovery PerformancePerformance

?Working but measurements have not been

made

Page 26: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

FT Server ArchitectureFT Server Architecture

• Client has the illusion of a single server• In reality, we have active replicated servers• Highly available?

serverskel

RRL

groupmgr

serverskel

RRL

groupmgr

serverskel

RRL

groupmgr

serverskel

RRL

groupmgr

clientstub

RRL

transporttransport

Page 27: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

Highly Available ServersHighly Available Servers

• Group managers ensure reliable ordering of events across all servers

• Guarantees servers have a consistent state• Failure detection and removal of failed

servers• Dynamic addition of servers w/state transfer• Illusion of a valid server reference even after

the associated object has failed

Page 28: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo .

ConclusionsConclusions