Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

13
Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9

Transcript of Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Page 1: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Reduced Communication Protocol for Clusters

Clunix Inc.Donghyun Kim

2000.9

Page 2: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

IntroductionIntroduction

Communication Sub-system Performance is decided by followings

• Transmission speed of physical network• I/O handling capability• Overheads of the communication protocol

Communication using traditional protocols is the bottle-neck of parallel systems

• Myrinet with TCP/IP is not FAST.• Small-granularity or communication-dense apps

show poor performance

Page 3: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Introduction – cont’dIntroduction – cont’d

A high proportion of apps don’t need very complicated communication functions

• By practice and theoretic analysis

Page 4: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Overheads analysis of Overheads analysis of traditional protocolstraditional protocols

Traditional protocols overheads• Time of context switching• Time of data copying

User space – system space, adjacent protocol layers• Time of data partitioning, re-constructing, data

analyzing• Time of transmitting packet headers• Time of routing, connection maintaining, traffic

controlling, error detecting, recovering, buffer management

Page 5: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Overheads analysis of Overheads analysis of traditional protocols - cont’dtraditional protocols - cont’d

End-to-end latency L, bandwidth W modeling• Assumptions : homogeneous, low network traffic

(1) )T(n

nW

T(1)or T(0)L

max

max

m

1ii0 (2) (n))T2(τ(n)TT(n)

T(n) : n-bytes transmission timenmax : comm. subsystem max packet lengthm : # of protocol layersTi(n) : i-th protocol layer processing time (T0(n) : physical network transmission time)

Page 6: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Overheads analysis of Overheads analysis of traditional protocols - cont’dtraditional protocols - cont’d

(5) ω

n(n)T

(4) )modπ(nT)ρ(πTπ

n

ω

nτ(n)T

(3) mi1 ρ1)π

n(nn

00

i1i1iii1ii

1i1iii

ii

1i1ii

: context switching time : memory bandwidth

0 : physical network transmission bandwidthi : max packet length of i-th layerI : packet header length of i-th layerni : data length of i-th layeri : calling expense (routing,traffic control, error detecting, buffer management, connection maintaining)

Page 7: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Overheads analysis of Overheads analysis of traditional protocols - cont’dtraditional protocols - cont’d

Analytical & testing results

Testing conclusions• Very large overhead using above IP protocol layer• Memory-to-memory copying is not neglected

If transmission bandwidth is the same as memory bandwidth, data copying(ni+1/) problem is bigger

Protocol Analytical Testing

Layer L(s) W(Mbps)

L(s) W(Mbps)

TCP 1350 8.5 1450 8.6

UDP 1110 9.5 1150 9.5

DLPI 450 10.0 650 10.0

Page 8: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Design Strategies for RPCDesign Strategies for RPC

• Support reliable, synchronous, asynchronous communications

• Implement reliale broadcast and multicast basing directly on the physical layer

• Lay the protocol below the IP layerAbove physical or datalink layer

• Avoid data copying AFAP• If possible, avoid buffer management using

hardware buffering• Run the protocol entirely in the user space

In the form of libraries

Page 9: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Implementation of RCPImplementation of RCP

OSI-DLPI version• Standard physical-device independent data link

layer interfaceCan write uniform program on different machines

and network devices Myrinet version

Providing user interface like the TCP-socket

Page 10: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Implementation of RCP – cont’dImplementation of RCP – cont’d

RCP supports unicast, broadcast, multicast RCP addressing

• Unique source/destination using hostname+port#• Static address configuration

Supports heterogeneous machines No connection maintaining, error detecting

• Assuming that underlying network is reliable

Page 11: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Implementation of RCP – cont’dImplementation of RCP – cont’d

Sequencing control, traffic control• Sliding-window algorithm+selective retransmission• Windows size is adjusted accoring to retransmission

frequencyFast-Adapt and Slow-Recover algorithm

• Very efficient traffic control Data partitioning and packaging algorithm

• Almost no data-copy, work in user-space

Page 12: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

RCP Tesing resultsRCP Tesing results

Bandwidth(W) Lantency(L)

Page 13: Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim 2000.9.

Clunix Inc.

Conclusions and future issuesConclusions and future issues

RCP design considerations• How to reduce the overheads

Over-complicated protocol processingContext switchingOverhead of data copying

• How to use the transmission control functions supported by hardware

To reduce the protocol processing Future Work

• To gurantee the quality of the communication.