Scalable Group Communication In Heterogeneous Cluster Filip Hanik Apache Software Foundation June 30...

39
Scalable Group Communication In Heterogeneous Cluster Filip Hanik Apache Software Foundation June 30 th , 2006

Transcript of Scalable Group Communication In Heterogeneous Cluster Filip Hanik Apache Software Foundation June 30...

Scalable Group CommunicationIn Heterogeneous Cluster

Filip Hanik Apache Software FoundationJune 30th, 2006

2

Who am I

[email protected]• Tomcat Committer / ASF member• Responsible for session replication

and clustering• Been involved with ASF since 2001

3

What we will cover

• Introduction to group communication• Challenges in group/cluster

communication• Today’s Solutions• Detailed Tribes overview• Tribes – design/configuration/usage• Problems and their solutions• Q & A

4

What is Group Communication

• 1-to-n communication between software/hardware nodes

• Designed to reduce packets compared to 1-to-1 (point to point) communication

• Also referred to as broadcasting and/or multicasting

• broadcast != multicast• broadcast – all nodes receive• multicast – interested (subscribed) nodes receive

• Popular academic research topic!! Lots of information available

5

Challenges in Group Communication

• Multicast is most commonly used• Group consistency and leadership• Delivery guarantee• Group delivery guarantee• Ordering and total ordering• Flow control• Multiple networks

6

Today’s Solutions

• Dozens if not hundreds academic products• Not maintained, Not supported, Proprietary

• Many open source projects• Appia, Spread, Erlang, JGroups…list goes

on

• Most multicast based to solve the 1-to-n packet reduction problem

7

What is uniform group model?

• Nodes are identical• All nodes process, send and receive

message in the same way• All nodes have the same applications • Total ordering is based on the

complete group• Note: Not the official definition for

what uniformity in a group setting is

8

When isn’t the uniformity enough?

• When processes on each node are dynamic - activate, passivate, short and long lived

• Example, Tomcat webapps• Example, heterogeneous hardware environments• Application management vs. application data

replication

• Messages with different priorities• Example, session attribute being replicated vs.

a 25MB war file being transferred

• Need different guarantee levels• When most messages are 1-to-m m<n

9

Challenges in heterogeneous clusters

• Same challenges as in homogeneous environments

• Node attributes change runtime• Nodes carry different responsibilities• Total order messages that are sent

1-to-m where m < n

10

What is Tribes?• Tribes is a messaging framework with

group communication capabilities• 100% Java, Apache Licensed (2.0)• Born out of the cluster/session

replication code from Tomcat 5.0-5.5 early 2006

• Currently alpha, will become the communication framework for Tomcat’s next cluster implementation

• Ideas from 2001

11

Why Tribes?

• Many frameworks are not flexible enough• Not enough features• Messages were guaranteed, without

delivery feedback• Static configurations for message delivery• Based on 1-to-m delivery, where m<n• License, license, license…

12

Why Tribes?

• Research gap - platforms are proprietary and often suggest protocols that are not standard

• Opportunities for httpd & Tomcat and other ASF software integration for more advanced and intelligent clusters

• Separation of communication layer• Did I say Apache License?

13

Why not Tribes

• TCP is connection based• When you always want to send 1-to-n• Unique scenario where a highly

customized solution might be the best fit

• Its not the one fit all solution, if such exists

14

Goals• Simplify peer-to-peer and peer-to-group

communication for distributed applications• Flexible enough to support a wide range of

applications under one runtime configuration• Provide instant feedback on message

delivery• Concurrent message delivery, even between

two nodes• Parallel delivery to multiple nodes• Clean, intuitive and easy to use, even for

complex tasks• All this with low overhead

15

Feature Overview

• Pluggable Modules• Guaranteed Messaging• Different Guarantee Levels• Per message delivery semantics(!)• Pluggable Interceptors (runtime)• Delivery feedback – even for async• Concurrent and parallel delivery• Fixed node hierarchy

16

Feature: Pluggable Modules• All major components can be swapped out,

simple interfaces defined• Needed when customization is required for

lower level IO operations• Example

• Multicast not available• Proprietary network protocols• SSL

• Goal: Default Implementation to be enough for 80% of applications that require messaging

17

Feature: Guaranteed Msg Delivery

• Assume 1-to-m delivery, (m < n)• Default implementation is TCP based

• java.io & java.nio

• Most cases, TCP(java) will outperform UDP when flow control and ack/nack for guaranteed delivery is implemented

• java.io support for platforms with poor NIO implementations

• java.nio preferred

18

Feature: Guarantee Levels

• By default supports 3 levels• NO_ACK – message was sent

• Relies on TCP to deliver without node feedback

• ACK – message was received• Remote node replies with an ACK

• SYNC_ACK – message was processed• Remote node replies with ACK/FAIL_ACK

when message has been processed• Allows for message process feedback

19

Feature: Per message delivery

semantics• Most unique feature, what makes Tribes

really stand out• Allows for each message to be delivered

differently• Per message guarantee level• Sync vs. async• Not ordered, ordered, totally ordered

• 27 flags - 2ⁿ (n=27) combinations• Based on interceptors configured

• Each message with its own uniquedelivery guarantee

20

Feature: Pluggable Interceptors

• React on message attributes (flags)• If not modifying message bytes, can

be inserted run time• Intercept any events through defined

methods• ChannelInterceptorBase available to

minimize redundant code for non intercepted methods

21

Feature: Delivery Feedback

• Tribes aims to deliver feedback for each message and each delivery semantic

• NO_ACK, ACK, SYNC_ACK• Synchronous and asynchronous delivery

• Asynchronous gets feedback through callback

• Example, recoverable transactions can now be implemented since we always know if the remote node received the message

22

Feature: Concurrent & Parallel Delivery

• Concurrent• More than one message sent or received a

any point in time• No “message blocking” ie 10mb message

with SYNC_ACK will not stop 10kb NO_ACK

• Parallel• Able to send a message to multiple

destinations in parallel using one thread (NIO)

• Prioritized• Future feature

23

Feature:Fixed Node Hierarchy

• Absolute Order Algorithm• Always be able to determine leadership

• No message exchanges (chat free)• Non coordinated

• Also provides “Coordination” algorithm• Chatty, but efficient• Auto merge groups• Enhance node discovery where multicast might glitch• Can connect different subnets when used together

with the StaticMembershipInterceptor

24

Feature:Absolute Failure Detection

• Simple interceptor TcpFailureDetector• Instant feedback on member down

• No need to wait for timeout• No risk of node pings getting stuck on a busy

network

• Verifies timeouts against “false positives”

• 3 levels • Connect• Send• Read

25

Feature RPC messaging

• Ability to collect responses to a message

• NO_REPLY, FIRST_REPLY, MAJORITY_REPLY & ALL_REPLY

• Absence reply(!) – rather than timeout• Callback left over delivery• Support for multiple RPC channels on

top of one Tribes channel

26

Feature – JNDI Channel

• Ability to bind a channel into a JNDI tree

• Share the channel between objects• Ideal for J2EE messaging• Coming soon:

• Ability to download client stub• Out of process invocation

• Not yet implemented…

27

Architecture - Overview

Channel

RpcChannel

Application Application Application Application

Tipi Tipi

Interceptor

Interceptor

Coordinator

Membership Sender Receiver

RpcChannel

RX

TX

28

Architecture - Channel• 1 instance per Tribes runtime setup• Is the first interceptor• Holds a list of one or more

ChannelListeners & MembershipListeners

• Serializes and deserializes messages• Supports ByteMessage for transfer of

pure byte[] data• RpcChannel instanceof ChannelListener

29

Architecture - Interceptors• Linked list invocation• Strongly typed – one method per event• No events need to travel through the stack

to coordinate interceptors• Examples

• Failure detection• Static membership• Total order or per member order• Throughput measurements and statistics• Leadership election• Message data encryption• Message dispatch – asynchronous messaging• All or none delivery guarantee

30

Architecture - Interceptors

• Trigger on ChannelData.getOptions() • Pass through a ChannelData object• Using XByteBuffer – optimized byte[]

handling• Membership & Message interceptions• Threadless

31

Architecture - Coordinator

• Last interceptor• Coordinates IO components

• Sender• Receiver• Membership

• Receiver uses thread pool• Sender piggy backs on application

thread

32

Code Structure

• org.apache.catalina.tribes• Application and Component interfaces

• group – default implementation• transport – RX/TX components• membership – membership service• group.interceptors – supplied interceptors• io – protocol utilities and optimizations• tipis – utilities on top of Tribes core

33

Quick StartChannel myChannel = new GroupChannel();

ChannelListener msgListener = new MyMessageListener();MembershipListener mbrListener = new MyMemberListener();

myChannel.addMembershipListener(mbrListener);myChannel.addChannelListener(msgListener);

myChannel.start(Channel.DEFAULT); //start the channel

Serializable myMsg = new MyMessage();

Member[] group = myChannel.getMembers();

channel.send(group,myMsg,Channel.SEND_OPTIONS_DEFAULT);

34

Data Replication

• ReplicatedMap – one to all replication• LazyReplicatedMap – primary/backup

replication• Cookie based replication map

• ideal for HTTP session replication• Backup location stored in cookies

• Versioned delta replication• Example: org.apache.catalina.ha

35

Tribes Demos

• Demo• Code Example• Discussion around common problems

and how Tribes could solve them

36

Future Work• Security - SSL Support and node

authentication• Many processes – one channel • Language independent • WAN membership discover• TCP Based multicaster for large clusters

• 2*n packet reduction for the sender, not total

• Intelligent membership broadcasting• httpd as a load balancer

37

Q & A

[email protected]• http://people.apache.org/~fhanik/trib

es• Tomcat SVN repository• Interested to use?• Interested to help?

38

Folientitel

• Font: Trebuchet MS, 32 pt•Font: Trebuchet MS, 28 pt

•Font: Trebuchet MS, 24 pt• Font: Trebuchet MS, 20 pt

• Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat.

39

FolientitelLorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct. Nam liber a tempor cum soluta nobis eligend optio comque nihil quod a impedit anim id quod maxim placeat.

Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct.