Post on 17-Dec-2015
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
Tsung-i (Mark) Huang
Jaspal SubhlokUniversity of Houston
GAN’05 / May 10, 2005
TMH - GAN'05, 05/10/2005 2
Outline Background Problem Description Methodology Experiments and Results Conclusion and Future Works
TMH - GAN'05, 05/10/2005 3
“Are we there yet?”
When you need Throughput Prediction?File download: xx minutes left: MS IE vs. Mozilla
Mirror site selection: Knoppix: Florida State Univ. (fsu.edu) or TU Ilmenau, Germany (tu-ilmenau.de)
Resource selection in a grid environmentCache selection for web content
delivery services
TMH - GAN'05, 05/10/2005 4
Which site will give the best throughput? Current approaches and tools:
Geographical distance Ping (ICMP) Download 512 KBytes (fixed size) – NWS / iperf Download 10 seconds (fixed duration) - iperf
Last two approaches are most accurate: How much data to download / How long?
Is “Bandwidth * Delay” the answer? One size fits all?
“All or nothing” – no result is available until the
end of transmission
TMH - GAN'05, 05/10/2005 5
Problem Description Predicted future throughput can be used in
mirror/replica site selection Predict throughput of a TCP bulk transfer
Single TCP stream Input: Time Series of (Arrival time, Bytes received) Output: Predicted future throughput Make a prediction of future throughput after 10 ~ 100
RTTs Utilize knowledge of TCP flow patterns Assume TCP flow patterns will repeat later in the same
TCP stream
TMH - GAN'05, 05/10/2005 6
TCP Flow Patterns
(a) Rate Control (b) Congestion Control
(c) Rate Control with delay (d) Mixed Congestion Control
• Textbook Examples:
• In Reality:
TMH - GAN'05, 05/10/2005 7
Approach to Throughput Prediction Analyze Time-Series (TS1) of (Arrival Time, Bytes received) to
get a meaningful throughput Time-Series Possible solutions:
Instant throughput: throughput since previous TCP segment Fixed Interval throughput: avg throughput over a fixed time
period Per RTT throughput: partition using fixed SYN-ACK RTT
Idea: TCP sends a window full of data segments every RTT
Partition Time-Series (TS1 ) with fixed SYN-ACK RTT, and get per RTT Throughput (TS2 )
Analyze per RTT Throughput Time-Series (TS2 ) to predict future throughput
Compare different prediction methods across all traces
TMH - GAN'05, 05/10/2005 8
TCP Segment Partitioning (1)
Over 1 GBytes/sec
About 220 Bytes/sec
Instant throughput shows wide-range of fluctuation.
Log S
cale
d
Fixed Interval throughput shows less fluctuation.
121 KB/sec
40 KB/sec
Fixed Intervalof 100 ms
SYN-ACK RTT = 176 ms
per RTT Throughput
TMH - GAN'05, 05/10/2005 9
RTT estimationUse fixed SYN-ACK RTTSimple and effective
Partition TCP segments into per RTT throughput time series
TCP Segment Partitioning (2)
SYN
ACKRTT
TMH - GAN'05, 05/10/2005 10
Throughput Prediction (1)
TCP Patterns Rate Control limited (RC) Congestion Control limited (CC)
Identify basic elements Flat regions Exponential Climb regions Linear Climb regions Drop points
Drop points
Flat
Linear ClimbExponential Climb
TMH - GAN'05, 05/10/2005 11
Throughput Prediction (2) Peak of slow start
Data points up to end of 1st slow
start are ignored for prediction initial slow start does not repeat
RC-based prediction Use flat regions
CC-based prediction Use complete CC cycles
Window-based prediction If no clear pattern observed
Peak of slow start
TMH - GAN'05, 05/10/2005 12
Experiments (1) - Setup
Download data files from 290 web sites (Debian/Gentoo mirrors) Use TCPDUMP to capture receiver’s traffic Record SYN-ACK RTTs Include Retransmitted packets (0.09%) Average file size is 30 MBytes
461 traces collected at Univ. of Houston Traces are analyzed using perl scripts
TMH - GAN'05, 05/10/2005 13
Experiments (2) – Prediction Methods
Prediction methods compared Moving Average (MA) – avg throughput of previous 10 RTTs
Exponential Weighted Moving Average (EWMA) Aggregate throughput – average past throughput (same as
cumulative average); use this as predicted throughput
TCP Pattern prediction
Average error in predicted future throughput
Cut off at 100% if over, in case measured future throughput is very small
predicted throughput – measured throughput
measured throughputx 100%
TMH - GAN'05, 05/10/2005 14
• TCP Throughput Prediction: average throughput of 9~25 RTTs (RC-based prediction)
• Aggregate Throughput Prediction: average throughput of 0~25 RTTs
Window size (in RTTs)
Illustration of Prediction (1)Make a prediction for next 200 RTTs:
Peak of slow start
• TCP Throughput Prediction: using Window-based prediction after 27th RTTs (a significant drop)
Drop at 27th RTT
Prediction at 25th RTT
Prediction at 40th RTT
per RTT throughputAggregateTCP Pattern
25th RTT 40th RTT
TMH - GAN'05, 05/10/2005 15
Window size (in RTTs)
Illustration of Prediction (2)Make a prediction for next 200 RTTs:
• Avg error against measured future throughput of next 200 RTTs (for example, at 20th RTT, avg throughput of 21~220 RTTs is used)
Closer to 0, better the prediction.
per RTT throughputAggregateTCP Pattern
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 16
Illustration of Prediction (3)Make a prediction for next 200 RTTs:
Throughput prediction using Congestion-Control based patterns.
Prediction made at 65th RTT using 3 CC complete cycles
One complete CC cycle
Closer to 0, better the prediction.
per RTT throughputAggregateTCP Pattern
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 17
• Aggregate is not accurate for small window size (< 30 RTTs)
Results (1) – predict next 200 RTTs at different time
• MA / EWMA generally not as accurate
30th RTT
per RTT throughputAggregateTCP Pattern
per RTT throughputMoving AverageEWMATCP Pattern
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 18
Results (2) – predict at 15th RTT for different time in the future
• When only limited data is available,
• MA performs best; TCP Pattern is close
• Aggregate is not accurate
per RTT throughputAggregateTCP Pattern
per RTT throughputMoving AverageEWMATCP Pattern
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 19
Results (3) – predict at 25th RTT for different time in the future
• More data is available,
• TCP Pattern performs best; MA is close
• Aggregate performs better
per RTT throughputAggregateTCP Pattern
per RTT throughputMoving AverageEWMATCP Pattern
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 20
Results (4) – predict at 50th RTT for different time in the future
• Even more data is available,
• MA now performs worse, due to dynamic of TCP flows
• TCP Pattern best and Aggregate is close
per RTT throughputAggregateTCP Pattern
per RTT throughputMoving AverageEWMATCP Pattern
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 21
Summary of Results
Aggregate is accurate with sufficient data, not with a few RTTs of data
MA performs very well for a few RTTs of data
EWMA is not a good predictor TCP Pattern generally performs better or
as well as other methods
TMH - GAN'05, 05/10/2005 22
Summary of Results (table view)
MethodsSmall # of RTTs
of dataLarge # of RTTs
of data
Aggregate Worse (3) Better (2)
Moving Average
Best (1) Worse (3)
EWMA Worst (4) Worst (4)
TCP Pattern Better (2) Best (1)
TMH - GAN'05, 05/10/2005 23
Conclusion and Future Works TCP-pattern based throughput prediction is as
good or better than other methods. Good predictions within 25 RTTs (or ~ 5 sec). Patterns observed: 65% Rate Control, few
Congestion Control Methods using Aggregate (e.g. NWS) can not be
expected to work well for small test files What’s next?
Identify more patterns Add a degree of confidence for each prediction Multiple TCP streams
TMH - GAN'05, 05/10/2005 26
Characteristics of collected traces (1)Terms Values Comments
Number of traces 461
Downloaded file size 26-34 MB Avg: 30 MB
Unique web sites 290 Debian/Gentoo
Avg # segment per trace 24,062(min/max/median) = (17,025/69,866/24,412)
Retransmitted segments 0.09% 97 out of 461 traces
Avg # retransmitted segments per trace
103.6(min/max/median) = (0/2,672/4)
Avg SYN-ACT RTT 0.1696 sec(min/max/median) = (0.02/2.91/0.155)
Avg # RTTs per trace 2,589(min/max/median) = (143/110,673/662)
TMH - GAN'05, 05/10/2005 27
Characteristics of collected traces (2)
Type #traces % Comments
Rate Control 301 65.29% 35 traces (7.59%) have big gaps (> 10 RTTs)
Congestion Avoidance
30 6.51%
Mixed or Congestion Control
130 28.20% 51 traces (11.06%) are very low in volume (up to 8~12 pkts/RTT (vs ~44 pkts/RTT))
Total 461 100.00%
• Classification: one trace presents over 50% “some type” of patterns.
TMH - GAN'05, 05/10/2005 29
Results (0.5) – predict next 100 RTTs at different time
per RTT throughputAggregateMoving AverageEWMATCP Pattern
TMH - GAN'05, 05/10/2005 30
Results (1.5) – predict next 400 RTTs at different time
per RTT throughputAggregateMoving AverageEWMATCP Pattern