Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University...

1
Transport Level Protocol Performance Evaluation for Bulk Dat Transfers Matei Ripeanu The University of Chicago http://www.cs.uchicago.edu/~matei/ Abstract: Before developing new protocols targeted at bulk data transfers, the achievable performance and limitations of the broadly used TCP protocol should be carefully investigated. Our first goal is to explore TCP's bulk transfer throughput as a function of network path properties, number of concurrent flows, loss rates, competing traffic, etc. We use analytical models, simulations, and real-world experiments. The second objective is to repeat this evaluation for some of TCP's replacement candidates (e.g. NETBLT). This should allow an informed decision whether (or not) to put effort into developing and/or using new protocols specialized on bulk transfers. Application requirements (GriPhyN): Efficient management of 10s to 100s of PetaBytes (PB) of data, many PBs of new raw data / year. Granularity: file size 10M to 1G. Large-pipes: OC3 and up, high latencies Efficient bulk data transfers Gracefully share with other applns. Projects: CMS, ATLAS, LIGO, SDSS Stable-statethroughputas% ofbottleneck link rate (RTT=80m s, M SS=1460bytes) 0% 20% 40% 60% 80% 100% 1.E-02 5.E-03 2.E-03 1.E-03 5.E-04 2.E-04 1.E-04 5.E-05 2.E-05 1.E-05 5.E-06 2.E-06 1.E-06 5.E-07 2.E-07 1.E-07 5.E-08 2.E-08 1.E-08 Lossindication rates(log scale) % ofbottleneck link rate (% ). O C3 link (155M bps) O C12 link (622 M bps) T3 link (43.2M bps) (Rough) analytical stable-state throughput estimates (based on [Math96] 2 max max max 2 max 3 8 8 1 1 * 3 8 * W p for pW W RTT MSS W p for p C RTT MSS Throughput Main inefficiencies TCP is blamed for: Overhead. However, less than 15% of time spent in proper TCP processing. Flow control. Claim: a rate-based protocol would be faster. However, there is no proof that this is better than (self) ACK- clocking. Congestion control : •Underlying problem: underlying layers do not give explicit congestion feedback, TCP therefore assumes any packet loss is a congestion signal •Not scalable. Questions: Is TCP appropriate/usable? What about rate based protocols? Want to optimize: Link utilization Per file transfer delay While maintaining “fair” sharing TCP Refresher: Time Slow Start (exponential growth) Congestion Avoidance (linear growth) Fast retransmit Packet loss discovered through fast recovery mechanism Packet loss discovered through timeout Simulations (using NS []): Simulation topology: Significant throughput improvements can be achieved just by tuning the end-systems and the network path: set up proper window-sizes, disable delayed ACK, use SACK and ECN, use jumbo frames, etc. For high link loss rates, striping is a legitimate and effective solution. 1 10 100 1000 1.E-02 5.E-03 2.E-03 1.E-03 5.E-04 2.E-04 1.E-04 5.E-05 2.E-05 1.E-05 5.E-06 2.E-06 1.E-06 5.E-07 2.E-07 1.E-07 5.E-08 2.E-08 1.E-08 Link lossrate(log scale) 100M B transfertim e(sec)(log scale). M SS=1460,D elA ck,huge W S M SS=1460,D elA ck,W S ok M SS=1460,W S ok M SS=9000 M SS=9000,FA C K M SS=9000,FA C K ,5 flow s Ideal 10 100 1000 1.E-02 5.E-03 2.E-03 1.E-03 5.E-04 2.E-04 1.E-04 5.E-05 2.E-05 1.E-05 5.E-06 2.E-06 1.E-06 5.E-07 2.E-07 1.E-07 5.E-08 2.E-08 1.E-08 Link loss rate(log scale) 1G B transfertim e(sec)(log scale) M SS=1460, D elAck M SS=1460, W S ok M SS=9000 FACK 5 flow s 25 flow s Ideal OC3 link, 80ms RTT, MSS=1460 initially OC12 link, 100ms RTT, MSS=1460 initially 1Gbps, 1ms RTT links 1Gbps, 1ms RTT links OC3, 35ms or OC12, 45ms 0 1000 2000 3000 4000 5000 6000 0 100 200 300 400 500 600 700 800 900 1000 N um berofparallelflow sused Packetsdropped 0 2 4 6 8 10 12 Transfertim e(StD ev) StD ev (rightscale) D ropped m essages (leftscale) 0 5 10 15 20 25 30 35 40 45 50 0 100 200 300 400 500 600 700 800 900 1000 N um berofparallelflow sused Tim e(sec) 10 flow s 20 flow s 50 flow s 100 flow s 200 flow s 300 flow s 400 flow s 500 flow s 600 flow s 700 flow s 800 flow s 900 flow s 1000 flow s 0 5 10 15 20 25 30 35 40 45 50 0 100 200 300 400 500 600 700 800 900 1000 N um berofparallelflow s(stripes)used Tim e(sec) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 300 400 500 600 700 800 900 1000 N um berofparallel flowsused D ropped packets. 0 2 4 6 8 10 12 Transfertim e(StdD ev) PacketsD ropped (left scale) StdD ev (right scale) TCP striping issues Widespread usage exposes scaling problems in TCP congestion control mechanism: •Unfair allocation: a small number of flows grabs almost all available bandwidth •Reduced efficiency: a large number of packets are dropped. •Rule of thumb: have less flows in the systems than ‘pipe size’ expressed in packets Not ‘TCP unfriendly’ as long as link loss rates are high Even high link loss rates do not break 0.5GB striped transfer, OC3 link (155Mbps), RTT80ms, MSS=9000 using up to 1000 flows Loss rate=0.1% Loss rate=0 Conclusions TCP can work well with careful end-host and network tuning For fair sharing with other users, need mechanisms to provide congestion feedback and distinguish genuine link losses from congestion indications. In addition, admission mechanisms based on the number of parallel flows might be beneficial G ridFTP and iperfPerform ance (betw een LB N L and AN L via ES -N et) 0 50 100 150 200 250 300 350 0 5 10 15 20 25 30 35 # ofTC P stream s Bandwidth (Mbs) GridFTP iperf Striping Widely used (browsers, ftp, etc) Good practical results Not ‘TCP friendly’! •RFC2140/ Ensemble TCP – share information and congestion management among parallel flows MCS/ANL courtesy Future work What are optimal buffer sizes for bulk transfers? Can we use ECN and large buffers to reliably detect congestion without using dropped packets as a congestion indicator? Assuming the link loss rate pattern is known, can it be used to reliably detect congestion and improve throughput and OC12, ANL to LBNL (56ms), Linux boxes
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University...

Page 1: Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago matei/ Abstract:

Transport Level Protocol Performance Evaluation for Bulk Data TransfersMatei Ripeanu

The University of Chicagohttp://www.cs.uchicago.edu/~matei/

Abstract: Before developing new protocols targeted at bulk data transfers, the achievable performance and limitations of the broadly used TCP protocol should be carefully investigated. Our first goal is to explore TCP's bulk transfer throughput as a function of network path properties, number of concurrent flows, loss rates, competing traffic, etc. We use analytical models, simulations, and real-world experiments. The second objective is to repeat this evaluation for some of TCP's replacement candidates (e.g. NETBLT). This should allow an informed decision whether (or not) to put effort into developing and/or using new protocols specialized on bulk transfers.

Application requirements (GriPhyN): Efficient management of 10s to 100s of

PetaBytes (PB) of data, many PBs of new raw data / year.

Granularity: file size 10M to 1G. Large-pipes: OC3 and up, high

latencies Efficient bulk data transfers Gracefully share with other applns. Projects: CMS, ATLAS, LIGO, SDSS

Stable-state throughput as % of bottleneck link rate (RTT=80ms, MSS=1460bytes)

0%

20%

40%

60%

80%

100%

1.E

-02

5.E

-03

2.E

-03

1.E

-03

5.E

-04

2.E

-04

1.E

-04

5.E

-05

2.E

-05

1.E

-05

5.E

-06

2.E

-06

1.E

-06

5.E

-07

2.E

-07

1.E

-07

5.E

-08

2.E

-08

1.E

-08

Loss indication rates (log scale)

% o

f bo

ttlen

eck

link

rate

(%

).

OC3 link (155Mbps)

OC12 link (622 Mbps)

T3 link (43.2Mbps)

(Rough) analytical stable-state throughput estimates (based on [Math96]

2maxmax

max

2max

38

81

1*

38*

Wpfor

pW

WRTT

MSS

Wpfor

p

C

RTT

MSS

Throughput

Main inefficiencies TCP is blamed for:Overhead. However, less than 15% of time spent

in proper TCP processing.Flow control. Claim: a rate-based protocol would

be faster. However, there is no proof that this is better than (self) ACK-clocking.

Congestion control: • Underlying problem: underlying layers do not

give explicit congestion feedback, TCP therefore assumes any packet loss is a congestion signal

• Not scalable.

Questions: Is TCP appropriate/usable? What about rate based protocols?

Want to optimize: Link utilization Per file transfer delay While maintaining “fair” sharing

TCP Refresher:

Time

Slow Start (exponential growth)

Congestion Avoidance (linear growth)

Fast retransmit

Packet loss discovered through fast recovery mechanism

Packet loss discovered through timeout

Simulations (using NS []): Simulation topology:

Significant throughput improvements can be achieved just by tuning the end-systems and the network path: set up proper window-sizes, disable delayed ACK, use SACK and ECN, use jumbo frames, etc.

For high link loss rates, striping is a legitimate and effective solution.

1

10

100

1000

1.E

-02

5.E

-03

2.E

-03

1.E

-03

5.E

-04

2.E

-04

1.E

-04

5.E

-05

2.E

-05

1.E

-05

5.E

-06

2.E

-06

1.E

-06

5.E

-07

2.E

-07

1.E

-07

5.E

-08

2.E

-08

1.E

-08

Link loss rate (log scale)

100M

B t

rans

fer

tim

e (s

ec)

(log

sca

le).

MSS=1460, DelAck, huge WS

MSS=1460, DelAck, WS ok

MSS=1460, WS ok

MSS=9000

MSS=9000, FACK

MSS=9000, FACK, 5 flows

Ideal

10

100

1000

1.E

-02

5.E

-03

2.E

-03

1.E

-03

5.E

-04

2.E

-04

1.E

-04

5.E

-05

2.E

-05

1.E

-05

5.E

-06

2.E

-06

1.E

-06

5.E

-07

2.E

-07

1.E

-07

5.E

-08

2.E

-08

1.E

-08

Link loss rate (log scale)

1GB

tran

sfer

tim

e (s

ec)

(log

sca

le)

MSS=1460, DelAck

MSS=1460, WS ok

MSS=9000

FACK

5 flows

25 flows

Ideal

OC3 link, 80ms RTT, MSS=1460 initially

OC12 link, 100ms RTT, MSS=1460 initially

1Gbps, 1ms RTT links

1Gbps, 1ms RTT links

OC3, 35ms or OC12, 45ms

0

1000

2000

3000

4000

5000

6000

0

100

200

300

400

500

600

700

800

900

1000

Number of parallel flows used

Pac

kets

dro

pped

0

2

4

6

8

10

12

Tra

nsfe

r ti

me

(StD

ev)

StDev (right scale)

Dropped messages(left scale)

0

5

10

15

20

25

30

35

40

45

50

0

100

200

300

400

500

600

700

800

900

1000

Number of parallel flows used

Tim

e (s

ec) 10 flows

20 flows

50 flows

100 flows

200 flows

300 flows

400 flows

500 flows

600 flows

700 flows

800 flows

900 flows

1000 flows

0

5

10

15

20

25

30

35

40

45

50

0

100

200

300

400

500

600

700

800

900

1000

Number of parallel flows (stripes) used

Tim

e (s

ec)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0

100

200

300

400

500

600

700

800

900

1000

Number of parallel flows used

Dro

pped

pac

kets

.

0

2

4

6

8

10

12

Tra

nsfe

r ti

me

(Std

Dev

)

Packets Dropped (leftscale)

StdDev (right scale)

TCP striping issues Widespread usage exposes

scaling problems in TCP congestion control mechanism:• Unfair allocation: a small

number of flows grabs almost all available bandwidth

• Reduced efficiency: a large number of packets are dropped.

• Rule of thumb: have less flows in the systems than ‘pipe size’ expressed in packets

Not ‘TCP unfriendly’ as long as link loss rates are high

Even high link loss rates do not break unfairness

0.5GB striped transfer, OC3 link (155Mbps), RTT80ms, MSS=9000 using up to 1000 flows

Loss rate=0.1%Loss rate=0ConclusionsTCP can work well with careful end-host and

network tuningFor fair sharing with other users, need

mechanisms to provide congestion feedback and distinguish genuine link losses from congestion indications.

In addition, admission mechanisms based on the number of parallel flows might be beneficial

GridFTP and iperf Performance(between LBNL and ANL via ES-Net)

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35

# of TCP streams

Band

wid

th (M

bs)

GridFTP iperf

Striping Widely used (browsers, ftp, etc) Good practical results Not ‘TCP friendly’!

•RFC2140/ Ensemble TCP – share information and congestion management among parallel flows

MCS/ANL courtesy

Future workWhat are optimal buffer sizes for bulk

transfers? Can we use ECN and large buffers to reliably

detect congestion without using dropped packets as a congestion indicator?

Assuming the link loss rate pattern is known, can it be used to reliably detect congestion and improve throughput and

OC12, ANL to LBNL (56ms), Linux boxes