Infrastructure and Protocols for Dedicated Bandwidth Channels

66
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Infrastructure and Protocols for Dedicated Bandwidth Channels Nagi Rao Nagi Rao Computer Science and Mathematics Division Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge National Laboratory [email protected] [email protected] March 14, 2005 1 st Annul Workshop of Cyber Security and Information Infrastructure Research Group (CSIIR) and Information Operations Center (IOC) Oak Ridge, TN Research Sponsored by Department of Energy National Science Foundation Defense Advanced Research Agency

description

Infrastructure and Protocols for Dedicated Bandwidth Channels. Nagi Rao Computer Science and Mathematics Division Oak Ridge National Laboratory [email protected]. March 14, 2005 1 st Annul Workshop of Cyber Security and Information Infrastructure Research Group (CSIIR) and - PowerPoint PPT Presentation

Transcript of Infrastructure and Protocols for Dedicated Bandwidth Channels

Page 1: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Infrastructure and Protocols for Dedicated Bandwidth Channels

Nagi RaoNagi RaoComputer Science and Mathematics DivisionComputer Science and Mathematics Division

Oak Ridge National LaboratoryOak Ridge National [email protected]@ornl.gov

March 14, 20051st Annul Workshop of

Cyber Security and Information Infrastructure Research Group (CSIIR)and

Information Operations Center (IOC)Oak Ridge, TN

Research Sponsored byDepartment of Energy

National Science FoundationDefense Advanced Research Agency

Page 2: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Collaborators Steven Carter, Oak Ridge National Laboratory Leon O. Chua, University of California at Berkeley Jianbo Gao, University of Florida Qishi Wu, Oak Ridge National Laboratory William Wing, Oak Ridge National Laboratory

Department of EnergyHigh-Performance Networking Program

National Science FoundationAdvanced Network Infrastructure Program

Defense Advanced Research AgencyNetwork Modeling and Simulation Program

Oak Ridge National LaboratoryLaboratory Directed R&D Program

Sponsors

Page 3: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline of Presentation Network Infrastructure Projects

DOE UltraScienceNet NSF CHEETAH

Dynamics and Control of Transport Protocols TCP AIMD Dynamics

Analytical Results Experimental Results

New Class of Protocols Throughput Stabilization for Control Transport Protocol

Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm

Page 4: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline of Presentation Network Infrastructure Projects

DOE UltraScienceNet NSF CHEETAH

Dynamics and Control of Transport Protocols TCP AIMD Dynamics

Analytical Results Experimental Results

New Class of Protocols Throughput Stabilization for Control Transport Protocol

Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm

Page 5: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Science Objective: Understand supernova evolutions DOE SciDAC Project: ORNL and 8 universities Teams of field experts across the country collaborate on computations

Experts in hydrodynamics, fusion energy, high energy physics Massive computational code

Terabyte in generated in a day currently Archived at nearby HPSS Visualized locally on clusters – only archival data

Desired network capabilities Archive and supply massive amounts of data to supercomputers and visualization engines Monitor, visualize, collaborate and steer computations

Motivation for Networking Projects:Terascale Supernova Initiative (TSI)DOE large-scale science application

Visualization channel

Visualization control channel

Steering channel

Page 6: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DOE UltraScience Net

The Need DOE large-scale science applications on supercomputers and experimental

facilities require high-performance networking Petabyte data sets, collaborative visualization and computational steering

Application areas span the disciplinary spectrum: high energy physics, climate, astrophysics, fusion energy, genomics, and others

Promising Solution High bandwidth and agile network capable

of providing on-demand dedicated channels: multiple 10s Gbps to 150 Mbps

Protocols are simpler for high throughput and control channels

Challenges: Several technologies need to be (fully) developed User-/application-driven agile control plane:

Dynamic scheduling and provisioning Security – encryption, authentication, authorization

Protocols, middleware, and applications optimized for dedicated channels

Contacts:Bill Wing ([email protected])Nagi Rao ([email protected])

Page 7: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DOE UltraScience NetConnects ORNL, Chicago, Seattle and

Sunnyvale: Dynamically provisioned dedicated

dual 10Gbps SONET links Proximity to several DOE locations:

SNS, NLCF, FNL, ANL, NERSC Peering with ESnet, NSF CHEETAH

and other networks

Data Plane User Connections:Direct connections to:

core switches –SONET channelsMSPP – Ethernet channels

Utilize UltraScience Net hosts

Funded by U. S. DOE High-Performance Networking Program at Oak Ridge National Laboratory– $4.5M for 3 years

Page 8: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Control-Plane Phase I

Centralized VPN connectivity TL1-based communication with

core switches and MSPPs User access via centralized web-

based scheduler Phase II

GMPLS direct enhancements and wrappers for TL1

User access via GMPLS and web to bandwidth scheduler

Inter-domain GMPLS-based interface

Allows users to logon to website Request dedicated circuits Based on cgi scripts

Web-based User Interface and API

Computes path with target bandwidth Is bandwidth available now?

Extension of Dijkstra’s algorithm Provide all available slots

Extension of closed semi ring structure to sequences of reals

Both are polynomial-time algorithms GMPLS does not have this capability

Bandwidth Scheduler

Page 9: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Objective: Develop the infrastructure and networking technologies to

support a broad class of eScience projects and specifically the Terascale Supernova Initiative.

Main Technical Components: Optical network testbed Transport protocols Middleware and applications

NSF CHEETAH:Circuit-switched High-speed End-to-End Transport ArcHitecture

Collaborative Project: $3.5M for 3 years U. Virginia, ORNL, NC State, CUNY

Sponsor: National Science Foundation

Contacts:Malathi Veeraraghavan([email protected])Nagi Rao ([email protected])

Page 10: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

CHEETAH Project concept Network:

Create a network that on-demand offers end-to-end dedicated bandwidth channels to applications

Operate a PARALLE network to existing high-speed IP networks – NOT AN ALTERNATIVE!

Transport protocols: Design to take advantage of dedicated and dual end-to-end

paths IP path and dedicated channel

eScience Application Requirements: High-throughput file/data transfers Interactive remote visualization Remote computational steering Multipoint collaborative computation

Page 11: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

CHEETAH: Initial Configuration

NC

GbE/10GbEEthernetSwitch

To hosts

NCSU

Controlcard

OC192card

GbE/10GbEcard

GbE/10GbEEthernetSwitch

To hosts

ORNL

MSPP

OC192card

Controlcard

GbE/10GbEcard

NCSU/MCNC/NLR

MSPP

OC192card

Controlcard

Atlanta (NLR/SOX)

MSPP

OC192card

GbE/10GbEcard

GbE/10GbEEthernetSwitch

To hosts

To DC – Dragon

ImplementsGMPLS protocols

Page 12: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Peering Coast-to-coast dedicated

channels Access to ORNL

supercomputers and storageApplications: TSI on larger scale

Peering: UltraScience Net - CHEETAH

CERN

Chicago

Sunnyvale

Atlanta

ANLFNAL

ORNL

CalTech

SLAC

LBL

NERSC

PNNL

10 Gbps

10 Gbps

DOE Science UltraNet + NSF CHEETAHSeattle

BNL

JLab

University

DOE National Lab

Future Connections

UltraNetCHEETAH

UVa

NCSU

CUNY

Page 13: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline of Presentation Network Infrastructure Projects

DOE UltraScienceNet NSF CHEETAH

Dynamics and Control of Transport Protocols TCP AIMD Dynamics

Analytical Results Experimental Results

New Class of Protocols Throughput Stabilization for Control Transport Protocol

Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm

Page 14: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Transport Dynamics are Important

Data Transport: High bandwidth for large data transfers over dedicated channels maintain suitable sending rate to achieve effective throughput

Control of end devices: Remote control of visualizations, computations and instruments Jittery dynamics will destabilize the control loops Will not be able to effectively execute interactive simulations

Page 15: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Study of Transport Dynamics

Understanding of transport dynamics: Analytically showed that TCP-AIMD contains chaotic regimes

concept of w-update map Internet traces are shown to be both chaotic and stochastic

underlying process is anomalous diffusion.

Development and tuning of protocols: Protocols for stable flows of fixed rate: ONTCOU

Based on classical Robbins-Monro method Transport protocols with statistical stability: RUNAT

Combination of AIAD and Kiefer-Wolfowitz method

Page 16: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Simulation Results: TCP-AIMD exhibits “complicated” trajectoriesTCP streams competing with each other (Veres and Boda 2000)TCP competing with UDP (Rao and Chua 2002)

Analytical Results (Rao and Chua 2002): TCP-AIMD has chaotic regimesDeveloped state space analysis and Poincare maps

Internet Measurements (2004): TCP-AIMD traces are a complicated mixture of stochastic and chaotic components

Complicated TCP AIMD Dynamics - History

Working Definition of Chaotic Trajectories:•Nearby starting points will result in trajectories that move far apart

at a rate determined by Lyapunov (>0) exponent•Trajectories are non-periodic for some starting points•The attractor is geometrically complicated

Page 17: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Simplified View: Dynamics of TCP

Transport Control Protocol Outline Uses window mechanism to send W bytes/RTT Dynamically adjusts W to network and receiver state

Keeps increasing if no loses Keeps shrinking if losses are detected

Slow start phase: W increase exponentially until or loss

Congestion Control: Additively increase W with delivered packets Multiplicatively decrease with loss

Slow start:aCongestion control:1/w

time

Early loss slows throughput

timetime

tW

tWW

WW

Page 18: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Chaotic Dynamics of TCP Competing TCP streams: Window dynamics are

chaotic Hard to predict – resemblance to random noise Hard to conclude from experiments – nearby orbits move

faraway later Hard to characterize – chaotic attractor

Poincare map of two window sizes Two-streams case Four streams case

Veres and Boda (2000) did not rigorously establish chaos in a formal sense Attractor could have been

generated by periodic orbit with large period We repeated the simulation and found only quasi periodic trajectories

Page 19: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Noisy Nature of TCP(simulation)

Simple random traffic generates complicated attractors TCP reacts to network traffic randomness

Jittery end-to-end delays Do not need chaos to generate complicated attractors

Poincare map of message delay vs. window size

TCP source Router: uniform random drops destination

Page 20: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

TCP Competing with UDP (ns-2 simulation) As CBR rate is varied

TCP competing with UDP/CBR at the router generates a variety of dynamicsTCP/Reno

sink

UDP/CBR

Router

Poincare phase plot:Window-size W(t) vs. pkt end-to-end delay D(t)

2Mb, 10ms,DT

2Mb, 10ms,DT

1.7Mb, 10ms,DT

D(t)

W(t)

W(t)

time

UDP/CBR=1Mbs

Page 21: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

TCP Competing with UDP

UDP/CBR: 0Mbs

UDP/CBR: 0.5Mbs

UDP/CBR: 1.7Mbs

UDP/CBR: 1.0MbsUDP/CBR: 1.7Mbs

UDP/CBR:1.75Mbs

Page 22: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Summary of Our Analytical ResultsState-Space of TCP:

congestion window; packet delay including re-transmits; acknowledgements since last MD; losses inferred since last AI

TCP-AIMD dynamics have two qualitatively different regimes Regime one: high-lighted in usual TCP literature

increased with while Regime two: high-lighted by and

decreases with Its effect and duration is enhanced by network delay and high

buffer occupancy Trajectories move back and forth between these two regimes

We define Poincare that updates : w-update map M M is 1-dimensional if Regime Two is short-lived M is 2-dimensional and complicated if Regime Two is significant

M is qualitatively similar to tent map – generates chaotic trajectories

( )w t

( )w t

( )w t

( )w t

( )e t( )a t

( )r t( )a t

( ) 0r t

( )r t( )r t

( )e t

Page 23: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Dynamics of Transitions Between Regimes map for long TCP transfers

Regime 1

tt

Regime 2

Both regimes are unstable – Eigenvalue analysis

w w

( ) ( )w t e t

( )w t

( )M w

( )w t

( )M w

1/dw dawdt dtde daudt dtda dadt dt

2drdt

wdw

dtde drndt dtdr drdt dt

Page 24: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

M: w-update mapGiven value, gives its next updated values

after some time period (not fixed)Regime 1:

Regime 2:

depends on the number of dropped packets- buffer occupancy at that time- delay between source and bottleneck buffer

Result: M is parametrized, and each piece resembles twisted version of classical tent-map

( ) 1/M w w w

( ) 1/ 2 inM w

w ( )M w

in

Rao, Gao and Chua, chapter inComplex Dynamics in Communications Networks,2004

Page 25: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Question 1: How relevant are previous simulation and analytical results on chaotic trajectories?

Answer: Relevant from an analysis perspective to certain extent.

Question2: Do actual Internet TCP measurement exhibit chaotic behavior?

Answer: Yes. They are more complicated than chaotic (deterministic).

Internet Measurements – Joint work with Jianbo Gao

Page 26: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Internet (net100) traces show that TCP-AIMD dynamics are complicated mixture of chaotic and stochastic regimes: Chaotic – TCP-AIMD dynamics Stochastic – TCP response to network traffic

Basic Point: TCP Traces collected on all Internet connections showed complicated dynamics classical “saw-tooth” profile is not seen even once This is not a criticism against TCP, it was not intended for

smooth dynamics

Internet Measurements

Page 27: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Cwnd time series for ORNL-LSU connectionConnection: OC192 to Atlanta-Sox; Internet2 to Houston; LAnet to LSU

Time series: cwnd=x(t)Collected at 1ms (approx) resolutions collected using net100 instruments

Page 28: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Time-Dependent Exponent Plots

( ) ln i k j k

i j

V Vk

V V

( ), ( 1),..., ( 1)iV x i x i x i m

Lorenz – chaoticCommon envelope

Informally, a measure of how separated close-by states become in time: Exponential separation is characteristic of chaotic regime

Form state vectors of size m from time series x(t), sampled denoted by x(1), x(2), ….

For a two state vectors satisfying i jr V V r r we define time-dependent exponent as

Uniform RandomSpread out

Page 29: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Internet cwnd measurements:Both Stochastic and Chaotic Parts are Dominant

TCP traces have:•Common envelope – chaotic •Spread out – stochastic

at certain scales

Observations:•From analysis, chaotic dynamics are from AIMD•Stochastic component is in response to network traffic; losses and RTT variations

Gao and Rao, IEEE Comm Letters, 2005,in press

Page 30: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Observation 1: Avoid AIMD-like behavior to avoid chaotic dynamics

Challenge: Randomness is inherent in Internet connections – will not go away even if protocol is non-chaotic.

Our Solution: Explicitly account for randomness in the protocol design – stochastic approximation

Design of Transport Protocols with Smooth Dynamics

Page 31: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Throughput Stabilization Niche Application Requirement: Provide stable

throughput at a target rate - typically much below peak bandwidth High-priority channels Commands for computational steering and visualization Control loops for remote instrumentation

TCP AIMD is not suited for stable throughput Complicated dynamics Underflows with sustained traffic

Important Consideration Stochasticity of Internet connections must be explicitly

accounted for

Rao, Wu and Iyengar, IEEE Comm Letters, 2004

Page 32: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Stochastic Approximation: UDP window-based method

Transport control loop

Source node S Destination node D

data packets

acknowledgements

transmission rater(t)

Destinationgoodput

)(tgD

Sourcegoodput

)(tg S

Objective: adjust source rate to achieve (almost) fixed goodput at the destination application

Difficulty: data packets and acks are subject to random processes

Approach: Rely on statistical properties of data paths

Page 33: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

UDP-Based Framework RUNAT sender RUNAT receiver

Sender Buffer

inter-window-delay

Receiver Buffer

)(tTIWD

)(tg

)(tl

UDP datagrams

cwin )(tW

acknowledgements

)(tr

Send datagrams and wait for periodSource Sending rate:Destination goodput:Loss rate

( )W t ( )IWDT t

( )Dg t( )l t

( ) ( ) / ( )IWDr t W t MDS T t

( ) ( ) | ( )DG r E g t r t r

( ) ( ) | ( )L r E l t r t r

Goodput regression:

Loss regression:

Page 34: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Channel Throughput profilePlot of receiving rate as a function of sending rate

Its precise interpretation depends on: Sending and receiving mechanisms Definition of rates

For protocol optimizations, it is important to use its own sending mechanism to generate the profile

Window-based sending process for UDP datagrams:

Send datagrams in a one step – window size

Wait for time called idle-time or wait-time

Sending rate at time resolution :

This is an adhoc mechanism facilitated by 1GigE NIC

( )cW t

( )ST t

( )( )( ) ( )

cs

s c

W tr tT t T t

( )ST t

Page 35: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Throughput Profile:Throughput and loss rates vs. sending rate (window size, cycle time)

Objective: adjust source rate to yield the desired throughput at destination

Typical day Christmas day

Stabilization zone Peak zone

Page 36: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Adaptation of source rate Sending process: send datagrams and wait

for duration

Adjust the window size

Adjust cycle-time

Both are special cases of classical Robbins-Monroe method

*, 1 , ( )sc n c n n

a TW W g gn

, 1*

,

1.0/1.0 ( )

s nc

ns n

Ta W g g

T n

^

1 ( ) *n n n nr r g r g

20, 0, ,n n n nn n

target throughput

noisy estimate

,c nW ,s nT

Page 37: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Performance Guarantees Summary:

Stabilization is achieved with a high probability with a very simple estimation of source rate

Basic result: for the general update

We have),1min(

21,0*)(1 agg

narr nnn

,23if)1(

,23if)1(

])[(

)(2

2

nO

nO

rE nn

)(1

nOnn *( )nE g g

Page 38: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Internet Measurements ORNL-LSU connection (before recent upgrade)

Hosts with 10 M NIC 2000 mile network distance

ORNL-NYC – ESnetNYC-DC-Hou – AbileneHOU-LSU – Local n/s

ORNL-GaTech Connection Hosts with GigE NICS ORNL-Juniper router – 1Gig link Juniper- ATL Sox – OC192 (1Gig link) Sox-GaTech – 1Gig link

Page 39: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

ORNL-LSU Connection

ESnet

Local

ORNL

LSU

Page 40: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Goodput Stabilization: ORNL-LSUExperimental Results

Case 1: Target goodput = 1.0 Mbps, rate control through congestion window, a = 0.8,

Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)

s

Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)

• Case 2. Target goodput = 2.0 Mbps, rate control through congestion window, a = 0.8,

Page 41: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Throughput Stabilization: ORNL-GaTech

Target goodput = 20.0 Mbps, a = 0.8, adjust congestion window size

8.0

Target goodput level = 2.0 Mbps, a = 0.8, , adjust sleep time

8.0

Page 42: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

RUNAT: Reliable UDP-based Network Adaptive Transport

Transport protocol Maximize connection utilization: Track peak goodput Uses Keifer-Wolfowitz stochastic approximation to handle ACKs and

losses

Features: Tailored to random loss rate and RTT Segmented rate control

3 control zones: bottleneck link is underutilized, saturated, and overloaded

Explicit accounting for random components Use stochastic approximation methods based on goodput estimates

TCP-friendliness Rate-increasing and rate-decreasing coefficients are dynamically

adjusted Adaptable to diverse network environments

Measurements and control periods are not constant, but link-specific (use RTT).

Wu and Rao, INFOCOM2005

Page 43: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Three Zone of Goodput Profile Three control zones

Zone I: Adaptive Increase Bottleneck link is underutilized Low packet loss due to occasional congestion or

transmission errors Fixed with increasing source rate

Zone II (transitional): dynamic KWSA method Bottleneck link is saturated Peak goodput falls within this zone SA determines whether to increase or decrease source

rate Zone III: Adaptive Decrease

Bottleneck link is overloaded Large packet loss due to network congestion Back off to recover from congestion collapse

Goodputregression

sending rate rZone I~zero loss

Zone IIIhigh loss

Zone II low loss Stabilize sending rate at

*( ) max ( )r

G r G r

*r

*r

( )G r

Page 44: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Segmented Rate Control Algorithm

)(^tr

)(^tl

)(^tg

highl

lowl

)(^

ntg )(^

ntl

o45

)(^

tr low )(^

tr high

Phase I Phase II Phase III

Zone IAdaptive Decrease

Zone IIKWSA

Zone IIIAdaptive Increase

)(*^tr

)(max

^tg

))()(()()( 1

^

1 lownnIWDInIWDnIWD ltltTctTtT lown ltl )( 1

^

)()(

)()(

)()()()()(

11

1

^^

^

1

^^

1

nnIWD

nn

nnnnnn

trMDSWtT

trtr

tgtgnatrGatrtr

],[)( 1

^

highlown lltl

))()(()()( 1

^

1 highnnIWDDnIWDnIWD ltltTctTtT highn ltl )( 1

^

when

when

when

Loss rate estimate:

Basic Idea: Control sending rate based on loss rate estimate to achieve peak goodput

ˆ( )l t

Page 45: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Convergence Properties of RUNATInformal Statement:

If in zones I or III, it will exit to zone IIIf in zone II, it will converge to maximum throughput

Condition A1: loss statistics vary slowly Condition A2: loss regression is differentiable and its derivative is monotonically increasing with respect to r in Phase II.

Result: RUNAT in zone I or III, enters II in a finite number of steps almost surely; In zone II, RUNAT will almost surely converge to the peak goodput

ˆ( ) | ( ) ( )E l t r t r L r

*( )r t r *( ) max ( )r

G r G r

Page 46: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Experimental Results on link between ozy4 (ORNL) and robot (LSU) - Illustration of microscopic RUNAT behaviors during transfer of 20MB data

Zone III (loss rate: 37.33%)

Slow Start

Zone II (loss rate: 3.33%)

When far away from the saturation (peak) point, is adjusted to large values to quickly move towards the peak point.

IcWhen approaching the saturation (peak) point, is adjusted to small values to slowly converge to and remain at the peak point.

The decrement of source rate upon packet loss is determined by congestion levels (local loss rate measurements) and : higher congestion levels result in larger rate drops.

The increment of source rate is determined by congestion levels (local loss rate measurements) and .Ic

IcDc

Zone I (loss rate: 0%)

Page 47: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Experimental Results on link between ozy4 (ORNL) and robot (LSU) - RUNAT transport performance during transfer of 2GB data with concurrent TCP transfer of 50MB data

RUNAT throughput: 10.49Mbps

TCP throughput: 0.376Mbps

Single TCP throughput: 0.377Mbps

Note:The low throughputs were due to the high traffic volume at the time of experiments.

In a normal day with regular traffic volume, TCP is able to achieve 3~6Mbps and RUNAT may reach 15~30Mbps at lower loss rates without significantly affecting concurrent TCP on this link.

Case 1: run RUNAT & TCP concurrently

Case 2: run a single TCP only

Page 48: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Experimental Resultson link from ozy4 (ORNL) to orbitty (NC State)

Transport method

Data sent(MBytes)

Throughput(Mbps)

Is TCP friendly?

iperf (TCP) 100 8.7 YES

Iperf (UDP) 1000 95.6NO

(no congestion control, no reliability)

FTP (TCP) 100 18.6 YES

SABUL 1000 80.0Seems NOT

Concurrent TCP: 18.6Mbps 10.1Mbps

RUNAT 1500 80.0Statistically YES

Concurrent TCP: 18.6Mbps 18.2Mbps

Page 49: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

ORNL-Atlanta-ORNL 1Gbps Channel

Dell Dual Xeon 3.2GHz

Dual Opteron2.2 GHz

OC192

ORNL-ATLGigE

GigE

Juniper M160 Router at ORNL

Juniper M160 Router at Atlanta

GigE blade

SON

ET blade

SON

ET blade

IP loop

Host to Router Dedicated 1GigE NIC

ORNL Router Filter-based forwarding to override both at input and

middle queues and disable other traffic to GigE interfaces IP packets on both GigE interfaces are forwarded to out-

going SONET port Atlanta-SOX router

Default IP loopback Only 1Gbps on OC192 link is used for production traffic

– 9Gbps spare capacity

Page 50: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

1Gbps Dedicated IP Channel

• Non-Uniform Physical Channel:• GigE – SONET – GigE• ~500 network miles

• End-to-End IP Path• Both GigE links are dedicated to the channel• Other host traffic is handled through second NIC

• Routers, OC192 and hosts are lightly loaded • IP-based Applications and Protocols are readily executed

Dell Dual Xeon 3.2GHz

Dual Opteron2.2 GHz

OC192

ORNL-ATLGigE

GigE

Juniper M160 Router at ORNL

Juniper M160 Router at Atlanta

GigE blade

SON

ET blade

SON

ET blade

IP loopback

Page 51: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Dedicated Hosts

• Hosts:• Linux 2.4 kernel (Redhat, Suse)• Two NICS:

• optical connection to Juniper M160 router• copper connection Ethernet switch/router

• Disks: RAID 0 dual disks (140GB SCSI)• XFS file system

• Peak disk data rate is ~1.2Gbps (IO Zone measurements)• Disk is not a bottleneck for 1Gbps data rates

Page 52: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

UDP goodput and loss profile

Point in horizontal plane:

Gooput plateau~990Mbps

Non-zero and random loss rate

High gooput is received at non-trivial loss

( ), ( )c sW t T t

Page 53: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

1GigE NICS Act as Rate Controllers

GigE NICApplicationBuffer

Kernelbuffer

JuniperM160

Host

Rate Limited1Gbps

Data rates could exceed 1Gbps

Rate Limited1Gbps

Our window-based method:•Flow rate from application to NIC is ON/OFF and exceeds 1Gbps at times•Flow is regulated to 1Gps: NIC rate matches the link rates

This method does not work well if NIC rate is higher than link rate or router port rate:

- NIC may send at higher rate causing losses at router port

Page 54: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Best Performance of Existing ProtocolsDisk-to-Disk Transfers (unet2 to unet1)

Memory-to-Memory Transfers UDT: 958Mbps Both Iperf and throughput profiles indicated 990 Mbps levels

Potentially such rates are achievable if disk access and protocol parameters are tuned

Protocol goodputtsunami 919 MbpsUDT 890 MbpsFOBS 708 Mbps

Page 55: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Hurricane ProtocolComposed based on principles and experiences with UDT and SABUL was not easy for us to figure out all tweaks for pushing

peak performanceUDP window-base flow-control

Nothing fundamentally new but needed for fine tuning 990 Mbps on dedicated 1Gbps connection disk-to-disk No attempt for congestion control

Page 56: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Hurricane Control StructureSender receiver

( )CW t

( )ST t

Receiverbuffer

Reorderingdatagrams

datagrams

Groupk NACKs

Reload lostdatagrams

TCP

disk

disk

Send datagrams

Different subtasks are handled by threads, which are woken up on demandThread invocations are reduced by clustered NCKs instead of individual ACKS

Page 57: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Hurricane

Page 58: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Adhoc Optimizations• Manual tuning of parameters

Wait-time parameter: Initial value chosen from throughput profile Empirically, goodput is “unimodel” in : pairwise measurements for binary search

Group size for k for NACKs empirically, goodput is unimodel in k and is tuned

Disk-specific detailsReads done in batch – no input bufferNAKs are handled using fseek – attached to the next batch

This tuning is not likely to be transferable to other configurations and different host loads

More work needed: automatic tuning and systematic analysis

( )sT t

( )sT t

Page 59: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline of Presentation Network Infrastructure Projects

DOE UltraScienceNet NSF CHEETAH

Dynamics and Control of Transport Protocols TCP AIMD Dynamics

Analytical Results Experimental Results

New Class of Protocols Throughput Stabilization for Control Transport Protocol

Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm

Page 60: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Shortest Path Problem

Classical Problem:Given a graph along with “distance” function on edges

For path we define the path distance delay: for

Compute a path with smallest path distance from source node to destination node

Solved using Dijkstra’s Algorithm with complexity

,G V E 0:d E R

logO n n e

0 1, , , pP v v v

1

( ) ( )p

ii

d P d e

1( , )i i ie v v

Page 61: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Quickest Path Problem

Problem:Given a graph along with 1. “delay” function on edges 2. “bandwidth” function on edges

For path we define the total delay: for

Compute a path with smallest total delay from source node to destination node

Solved using Chen and Chin’s Algorithm with complexity

Important Observation: Subpath of a quickest path is not necessarily quickest

,G V E0:d E R

2logO ne n e

0 1, , , pP v v v

11

( , ) ( )min ( )

p

ipi

ii

T P d eb e

1( , )i i ie v v

0:b E R

s d5,20 5,20

15,515,20

T(60)=32

T(60)=29

T(60)=52

T(60)=57

Page 62: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Quickest Path Algorithm – Chin and Chen

Let denote distinct bandwidthsLet subnetwork - edges with bandwidth smaller than b are removed

path with least delay in :

Quickest path is given by

Typically implemented using m invocations of Dijkstra’ algorithm

m could be quite large

( )G b1 2, , , mb b b

iP ( )iG b

1min ( )

( )

m

iii

d Pb P

1( ) min ( )

p

i iib P b e

1

( ) ( )p

i ii

d P d e

Page 63: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Simple Probabilistic Quickest Path AlgorithmRandomly choose a fraction of `s and compute only on ib iP ( )iG b

For larger networks we only needed less than 10% shortest delay computations Question: Is there a fundamental reason for this?

Page 64: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Analysis

Critical Observation:

For delay function is non-decreasing

Its Vapnik and Chervonenkis dimension is 1Makes it efficient to approximate it by random sampling

1 2 mb b b ( ) ( )i iD b d P

Rao 2004, Theoretical Computer Science

22*

2 20

/4*ˆ( ) ( ) 1 8

i

r i

p p mm D

P T P T P pe

Optimal delay

Approximation based on p shortest path computations

1b 2b mbib

ib ( )iD b

LinearApproximation with p points

Page 65: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Conclusions TCP-AIMD Dynamics:

Analytically established chaotic dynamics Analyzed Internet traces: combination of

chaotic and stochastic dynamics New Classes of Protocols

ONTCOU: achieve stable target flow level RUNAT: statistical approach to congestion

control Based on Stochastic Approximation:

convergence proof under general conditions Experimental results are promising both on

Internet and dedicated connections

Page 66: Infrastructure and Protocols for Dedicated Bandwidth Channels

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Thank You