Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

44
Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx Hormuzd Khosravi,Intel draft-maloy-tipc-01.txt TIPC as TML IETF-61, Washington DC, Nov 2004

description

TIPC as TML. draft-maloy-tipc-01.txt. Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx Hormuzd Khosravi,Intel. IETF-61, Washington DC, Nov 2004. TIPC. A transport protocol for cluster environments - PowerPoint PPT Presentation

Transcript of Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

Page 1: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

Jon Maloy, EricssonSteven Blake, Modularnet

Maarten Koning, WindRiverJamal Hadi Salim,ZnyxHormuzd Khosravi,Intel

draft-maloy-tipc-01.txt

TIPC as TML

IETF-61, Washington DC,

Nov 2004

Page 2: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

TIPCTIPC A transport protocol for cluster environments

Connectionless and Connection Oriented; Reliable or Unreliable. Reliable or Unreliable Multicast Usage not limited to ForCES context

A framework for detecting, supervising and maintaining cluster topology

Available as portable open source code package under BSD licence

12000 lines of C code, 112 kbyte Linux kernel module Runs on 4 OS:es so far, and more to come

Proven concept, used and deployed in several Ericsson products

Page 3: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

ForCES Protocol FrameworkForCES Protocol Framework

ForCES Protocol Messages

CE TML

CE PL (ForCES Protocol)

Transport (IP,TCP,RapidIO,Ethernet…)

FE TML

FE PL (ForCES Protocol)

Transport (IP,TCP,RapidIO,Ethernet…)

Page 4: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

TIPC as L2 TMLTIPC as L2 TML

ForCES Protocol Messages

TIPC TML

CE PL (ForCES Protocol)

L2 Transport (RapidIO,Ethernet…)

TIPC TML

FE PL (ForCES Protocol)

L2 Transport (RapidIO,Ethernet…)

Page 5: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Interface AdaptationInterface Adaptation

ForCES Protocol Messages

TIPC TML

CE PL (ForCES Protocol)

L2 Transport (RapidIO,Ethernet…)

TIPC TML

FE PL (ForCES Protocol)

L2 Transport (RapidIO,Ethernet…)

Interface Adaptation Interface Adaptation

Page 6: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Reliability Reliable transport in all modes Can be made unreliable per socket/direction

Security Only secure within closed networks. No explicit authentication/encryption support yet, but planned Not IP-based, no router will forward TIPC messages!!

Congestion Control At three levels: Connection/Transport, Signalling Link and Carrier level Will give feedback to PL layer if connection is broken or message

rejected Multicast/Broadcast

Supported

Fulfilling Requirements(1)Fulfilling Requirements(1)

Page 7: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Timeliness Immediate delivery (No Nagle algorithm) Inter-node delivery time in the order of 100 microseconds

HA Considerations L2 link failure detection and failover handled transparently for user Connection abortion with error code if no redundant carrier available Peer node failure detection after 0.5-1.5 seconds

Encapsulation 24 byte extra header 40 extra for connectionless

Priorities Supports 4 message importance priorities, determining congestion

levels and abort/rejection levels Is 8 levels really needed ?

Fulfilling Requirements(2)Fulfilling Requirements(2)

Page 8: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Connection Directly on TIPCConnection Directly on TIPC

LFB 1 LFB 2FE

Object

FB X FB YCE

Object

FE

CE

TIPC

Page 9: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Connections via FE/CE ObjectConnections via FE/CE Object

FE Object

CE Object

FE

CE

TIPC

LFB 1 LFB 2

FB X FB Y

Page 10: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Connection UsageConnection Usage

FE Object

CE Object

FE

CE

LFB 1 LFB 2

FB X FB Y

Control Connection:High PriorityReliable in both directions

Traffic Data Connection:Low PriorityReliable CE->FEUnreliable FE->CE

TIPC

Page 11: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Server Process,Partition B

Server Process,Partition A

Client Process

bind(type = foo, lower=0, upper=99)

sendto(type = foo, instance = 33)

bind(type = foo, lower=100, upper=199)

foo,33

Functional Addressing: UnicastFunctional Addressing: Unicast Function Address

Persistent, reusable 64 bit port identifier assigned by user Consists of type number and instance number

Function Address Sequence Sequence of function addresses with same type

Page 12: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Address Mapping -UnicastAddress Mapping -Unicast

FE Object

CE Object

FE

CE

LFB 1Meter

44

FB XRSVP

77

TIPC

TIPC API

TML APItml_bind(RSVP,77)

bind(RSVP,77,77)

TML APItml_bind(meter,44)

bind(meter,44,44)TIPC API

Page 13: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Connection SetupConnection Setup

FE Object

CE Object

FE 17

CE 8

LFB 1Meter

44

FB XRSVP

77

TIPC

TIPC API

TML APItml_bind(RSVP,77)

bind(RSVP,77,77)

tml_connect(RSVP,77, CEID=8)

connect(RSVP,77,node=8)

If instance numbers are coordinated over whole cluster there is no need for LFBs to know CEID

Page 14: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Server Process,Partition B

Server Process,Partition A

Client Process

bind(type = foo, lower=0, upper=99)

sendto(type = foo, lower = 33,

upper = 133)

bind(type = foo, lower=100, upper=199)

foo,33,133

foo,33,133

Functional Addressing: MulticastFunctional Addressing: Multicast Based on Function Address Sequences

Any partition overlapping with the range used in the destination address will receive a copy of the message

Client defines “multicast group” per call

Page 15: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Address Mapping -MulticastAddress Mapping -Multicast

FE Object

CE Object

FE

CE

Meter13

Meter44

FB XRSVP

77

TIPC

tml_mcast(meter_mc,group=X)

sendto(meter_mc,X,X)

tml_join(meter_mc,X)

bind(meter_mc,X,X)bind(meter_mc,X,X)

tml_join(meter_mc,X)

Page 16: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Questions???Questions???

Page 17: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Congestion control at three levels Connection level, signalling link level and media level Based on 4 importance priorities

Simple to configure Each node needs to know its own identity, that is all Automatic neighbour detection using multicast/broadcast

Lightweigth, Reactive Connections Immediate connection abortion at node/process failure or overload

Toplogy Subscription Service Functional and physical topology

Why TIPC in ForCES ?Why TIPC in ForCES ?

Page 18: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Infiniband Mirrored MemoryEthernet SCTPUDP

Bearer Adapter API

Sequence/RetransmissionControl

Packet BundlingCongestion Control

Fragmentation/De-fragmentation

Reliable Multicast Neighbour DetectionLink Establish/Supervision/Failover

Address Table Distribution

Connection SupervisionRoute/Link Selection

Address Subscription Address Resolution

User Adapter API

Socket API Adapter Port API Adapter Other API Adapters

NodeInternal

Functional ViewFunctional View

Page 19: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Zone <1>

Zone <2>

Node <1.2.3>

Internet/Intranet

Slave Node <2.1.3333>

Network TopologyNetwork Topology

Cluster <1.2>

Cluster <1.1>

Cluster <2.1>

Page 20: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Server Process,Partition B

Server Process,Partition A

Client Process

bind(type = foo, lower=0, upper=99)

sendto(type = foo, instance = 33)

bind(type = foo, lower=100, upper=199)

foo,33

Functional Addressing: UnicastFunctional Addressing: Unicast Function Address

Persistent, reusable 64 bit port identifier assigned by user Consists of type number and instance number

Function Address Sequence Sequence of function addresses with same type

Page 21: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Server Process,Partition B

Server Process,Partition A

Client Process

bind(type = foo, lower=0, upper=99)

sendto(type = foo, lower = 33,

upper = 133)

bind(type = foo, lower=100, upper=199)

foo,33,133

foo,33,133

Functional Addressing: MulticastFunctional Addressing: Multicast Based on Function Address Sequences

Any partition overlapping with the range used in the destination address will receive a copy of the message

Client defines “multicast group” per call

Page 22: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved

Client Process

sendto(type = foo, lower = 33,

upper = 133)

Node <1.1.1> Server Process,Partition B

Server Process,Partition A

bind(type = foo, lower=0, upper=99)

bind(type = foo, lower=100, upper=199)

foo,33,133

Location TransparencyLocation Transparency

Page 23: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved

Client Process

sendto(type = foo, lower = 33,

upper = 133)

Node <1.1.1> Server Process,Partition B

Server Process,Partition A

bind(type = foo, lower=0, upper=99)

bind(type = foo, lower=100, upper=199)

foo,33,133

Location TransparencyLocation Transparency

Node <1.1.2>

Page 24: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Node <1.1.2>

bind(type = foo, lower=100, upper=199)

Node <1.1.3>

Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved

Client Process

sendto(type = foo, lower = 33,

upper = 133)

Node <1.1.1> Server Process,Partition B

Server Process,Partition A

bind(type = foo, lower=0, upper=99)

foo,33,133

Location TransparencyLocation Transparency

Page 25: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client

bind(type = foo, lower=0, upper=99)

Client Process

sendto(type = foo, lower = 33,

upper = 133)

Server Process,Partition A’

Server Process,Partition A

bind(type = foo, lower=0, upper=99)

foo,33,133

Address BindingAddress Binding

Page 26: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client

Same socket may bind to many partitions

bind(type = foo, lower=100, upper=199)

Client Process

sendto(type = foo, lower = 33,

upper = 133)

Server Process,Partition B

Server Process,Partition A+B’

bind(type = foo, lower=0, upper=99)bind(type=foo, lower=100, upper=199)

foo,33,133

Address BindingAddress Binding

Page 27: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client

Same socket may bind to many partitions Same socket may bind to different functions

bind(type = foo, lower=100, upper=199)

Client Process

sendto(type = foo, lower = 33,

upper = 133)

Server Process,Partition B

Server Process,Partition A

bind(type = foo, lower=0, upper=99)bind(type=bar, lower=0, upper=999)

foo,33,133

Address BindingAddress Binding

Page 28: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Server Process,Partition B

Server Process,Partition A

Client Process

bind(type = foo, lower=0, upper=99)

subscribe(type = foo, lower = 0,

upper = 500)

bind(type = foo, lower=100, upper=199)

foo,100,199

foo,0,99

Functional Topology SubscriptionFunctional Topology Subscription Function Address/Address Partition bind/unbind events

Page 29: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

TIPC

bind(type = node, lower=0x1001003, upper=0x1001003)

Node <1.1.2>

Client Process

subscribe(type = node, lower = 0x1001000,

upper = 0x1001009)node,0x1001003

node,0x1001002

Node <1.1.1>

Node <1.1.3>

bind(type = node, lower=0x1001002, upper=0x1001002)

TIPC

Network Topology SubscriptionNetwork Topology Subscription Node/Cluster/Zone availability events

Same mechanism as for function events

Page 30: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

ForCES Applied on TIPCForCES Applied on TIPC

Network EquipmentNetwork Equipment

Control ElementControl Element

Forwarding Element Forwarding Element

OSPF, RIPOSPF, RIP COPS, CLI, SNMPCOPS, CLI, SNMP Other ApplicationsOther Applications

ForCES Protocol/TIPC

LFB <IPv4F,5>LFB <CNT,17>LFB <IPv4F,1>LFB <CNT,32>

Page 31: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Network EquipmentNetwork Equipment

Control ElementControl Element Control ElementControl Element

ForCES applied on TIPCForCES applied on TIPC

Control ElementControl Element

Forwarding Element Forwarding Element Forwarding Element Forwarding Element

OSPF, RIPOSPF, RIP COPS, CLI, SNMPCOPS, CLI, SNMP Other ApplicationsOther Applications

Internet

InternetForCES Protocol/TIPC

LFB <IPv4F,5>LFB <CNT,17>LFB <IPv4F,1>LFB <CNT,32>

Page 32: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS Establishment based on functional addressing

Selectable lookup algorithm, partitioning, redundancy etc No protocol messages exchanged during setup/shutdown

Only payload carrying messages Traditional TCP-style connection setup/shutdown as alternative End-to-end flow control SOCK_SEQPACKET SOCK_STREAM SOCK_RDM for connectionless and multicast SOCK_DGRAM can easily be added if needed Same with “Unreliable SOCK_SEQPACKET”

Page 33: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS

foo,117

Server Process,Partition BClient

Process

sendto(type = foo, instance = 117)

No protocol messages exchanged during setup/shutdown Only payload carrying messages

Page 34: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS No protocol messages exchanged during setup/shutdown

Only payload carrying messages

Server Process,Partition BClient

Process connect(client)send()

Page 35: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS No protocol messages exchanged during setup/shutdown

Only payload carrying messages

Server Process,Partition BClient

Process

connect(server)

Page 36: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS Immediate “abortion” event in case of peer process crash

Server Process,Partition BClient

Processabort

Page 37: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS Immediate “abortion” event in case of peer node crash

Server Process,Partition BClient

Process

abort

Node <1.1.5>Node <1.1.3>

Page 38: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS Immediate “abortion” event in case of communication failure

Server Process,Partition BClient

Process

abort

Node <1.1.5>Node <1.1.3>

Page 39: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

CONNECTIONSCONNECTIONS Immediate “abortion” event in case of node overload

Server Process,Partition BClient

Process

Node <1.1.5>Node <1.1.3>

abort

Page 40: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Network RedundancyNetwork Redundancy Retransmission protocol and congestion control at signalling link level Normally two links per node pair, for full load sharing and redundancy

Server Process,Partition BClient

Process

Node <1.1.5>Node <1.1.3>

Page 41: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Network RedundancyNetwork Redundancy Retransmission protocol and congestion control at signalling link level Normally two links per node pair, for full load sharing and redundancy Smooth failover in case of single link failure, with no consequences for

user level connections

Server Process,Partition BClient

Process

Node <1.1.5>Node <1.1.3>

Page 42: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

Remaining WorkRemaining WorkImplementation

Reliable Multicast not fully implemented yet (exp. end of Q1) Re-stabilization after most recent changes Re-implementation of multi-cluster neighbour detection and link

setupProtocol

Fully manual inter cluster link setup Guaranteeing Name Table consistency between clusters Slave node Name Table reduction ?????

Page 43: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

http://tipc.sourceforge.nethttp://tipc.sourceforge.net

Page 44: Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx

NOKIA RESEARCH CENTER / BOSTON

QUESTIONS ??QUESTIONS ??