QCN with Delay-based Congestion Detection for Limited Queue Fluctuation in Data Center Networks
description
Transcript of QCN with Delay-based Congestion Detection for Limited Queue Fluctuation in Data Center Networks
QCN with Delay-based Congestion Detection for
Limited Queue Fluctuation in Data Center Networks
Y. Tanisawa M. YamamotoKansai University, Japan
Outline of Presentation
QCN QCN with large number of flows
Performance Evaluation Our proposed QCN/DC
Overview Performance Evaluation
Conclusions
QCN ( Quantized Congestion Notification)
Switch
Feedback frame
Receiving
• On receiving a feedback frame,
• When source receives no feedback,it increases sending rate
• Switch calculates feedback valuewhen data frame arrives
Switch dynamics
• Switch returns feedback frame to source with a certain probability
source decreases sending rate
• Detection of congestion byfeedback value
Source dynamics
Sendingdevice device
QCN ( Quantized Congestion Notification)
Switch
Feedback frame
ReceivingSendingdevice device
Source dynamics(2)Whenever
Byte CounterRP sends frames of BC_LIMIT
orTime Counter
Timer spends 15 msis incremented by 1, sending rate is increased
(150 KB)
oldeqb QQwQQF
Calculating feedback
How much smaller queue
length is than Qeq
The increase of queue QeqQ
Queue
Qold
bF
1%
10%
0
Feedback propability
Switch sends feedback to maintain queue close to the target queue
Switch Dynamics
When CP receives a data and its Fb is negative,
Reducing the control overhead
CP sends feedback value with a certain probability
Source Dynamics
・ Rate DecreaseWhenever a feedback frame is received,
・ Fast RecoveryCR is increased rapidly,
・ Active IncreaseRate increase is slower, because
・ Hyper-Active IncreaseWithout congestion detection for long
Source Dynamics
timeFeedback message received
TR ( Target Rate )CR ( Current Rate )
Fast RecoveryActive
Increase
Rate DecreaseHyper-Active
IncreaseRateThe source reduces sending rate
if no feedback is received
CR is close to the previous rate
at which congestion occurred
time, CR and TR is increased rapidly
QCN Performance with Large # of Flows
• Simulation tool : NS2
• Simulation time : 1[s]• Queue length : 100[pkts]
• Qeq(Target queue) : 22[pkts] • Bandwidth : 10[Gbps]
• # of flows : 10,50,70• RTT : 100[us]Simulation parameter
…
Receiver 1
…Sender 2
Sender N
Receiver 2
Receiver N
Switch 1 Switch 2
10[us]10[us]
30[us]
10[Gbps]
We preliminary evaluate QCN performance in the situationthat many flows share a same bottleneck
link
Sender 1
Queue Length Characteristics
Time[sec]
Queue
length
[pkt
s]
Target queue
(a)10 flows (b)50 flows (c)70 flows
Time[sec] Time[sec]
20
0
60
40
0.3 0.50.4 0.3 0.50.4 0.3 0.50.4
When the number of flows through a bottleneck link
grows, queue length behavior becomes unstable.
0.3 0.35 0.4
0
0.5
1
0
5
10
0.3 0.35 0.4
0
20
40
60
80
: non congestion : congestion
Queue length
[pkt
s]Time[sec
]Time[sec]
CR
[Gbps]
# o
f Counte
r
Simulation Result in Case of 70 Flows
Feedback receivedByte Counter
Time CounterHyper Active Increase
Probabilistic feedback transmission causes no feedback reception
even in a congested time periodqueue fluctuation
Increase of transmission rate
Aim of the paper
QCN
+ Delay-based Congestion Detection
QCN suffers queue length fluctuation in the case of large
number of flows
Another congestion detection is requiredLoss-based congestion detection
Original QCN is loss-less
QCN with Delay-based Congestion detection, QCN/DC
Our proposal
Overview of QCN/DC
CRTT<TRTT: CRTT≧TRTT:
Round trip time of each transmitted frame(measured at sender)
Congestion is detected at each sender
SRSR *βSR:delay-based transmission rate
β:decreased factor
• QCN/DC work as original QCN
Transm
issi
on r
ate
Time
CRSR
FB received
CRTT ≧ TRTT
CRTT < TRTT
CRTT < TRTT
① CR is continuously calculated by received
②①
CRTT:
TRTT: Threshold for RTT
by Delay-based control
•
control is switched to delay-based one
feedback even when transmission rate
② When CRTT become less than TRTT,
transmission rate is switched from SR to CR
Rate is controlled by fb
Feedback-based Delay-based
Time[sec]
QCN
Target queue
(a)10 flows (b)50 flows (c)70 flows
Qu
eue
length
[pkts
]
QCN
Delay-based
Time[sec] Time[sec]
+
TRTT : 130[us]
β : 0.99
Queue Length Characteristics60
40Q
ueue
length
[pkts
]
0
60
40
20
0
20
0.3 0.50.4 0.3 0.50.4 0.3 0.50.4
Some large spikes for queue length are newly observed
Qu
eue
length
[pkts
]
# o
f C
ou
nte
r
Time[sec](b)50 flows
Target queue
FB received
Byte counter
Time Counter
Hyper-Active Increase
Timing control
CRTT < TRTT
Transm
issi
on r
ate
CRSR
CRTT ≧ TRTT
Time
60
40
20
20
10
00
converted
Cause of Spike
No feedback frame is received at a focus sender, and Byte Counter and Time Counter continuously increase
After both counters reach 5, HAI starts
0.3 0.4
During delay-based control phase
delay-based control phase
delay-based control is switched to QCN control
Some spikes are observed
Hyper-Active Increase(HAI) control
During delay-based control phase
To prevent rapid increase of CR
caused by HAI phase
large increase of transmission rate is not reasonable
Still in congestion
Hyper-Active Increase (HAI) control• CRTT ≧ TRTT
Byte Counter reaches 5
Time Counter reaches 5
Time Counter = 0
Byte Counter = 0
and
(in delay-based phase)
Queue length
[pkts
]
Time[sec]
Queue length
[pkts
]
QCN
Target queue
(a)10 flows (b)50 flows (c)70 flows
Queue length
[pkts
]
QCN
Delay-based
Time[sec] Time[sec]
0
60
40
20
0.3 0.50.4 0.3 0.50.4 0.3 0.50.4
+
0
60
40
20
0
60
40
20
TRTT : 130[us]
β : 0.99
QCN
Delay-based+
without HAI control
with HAI control
Queue Length Characteristics
Dynamic Situation
• Simulation tool : NS2
• Simulation Time : 1[s]• Queue Length : 100[pkts]
• Qeq : 22[pkts] • Bandwidth : 10[Gbps]
• # of flows : 10,50,70• RTT : 100[us]Simulation parameter
…
Receiver 1
…Sender 2
Sender N
Sender 1
Receiver 2
Receiver N
Switch 1 Switch 2
0 1 2Time[s]
N-4
N-3
N-2
N-1
N
• TRTT : 130[μs]
• β : 0.99
# o
f flow
s
0.2s
We evaluate queue length behavior in the case of a new flow arrival and withdraw of a flow
QCN
QCN
Delay-based+
with HAI control
Qu
eue
length
[pkts
]Q
ueue
length
[pkts
]
Time[sec]
(a)10 flows (b)50 flows (c)70 flows
Time[sec] Time[sec]
Queue behavior of QCN/DC shows undershoot
Link Utilization0 1 2 0 1 2 0 1 2
50
100
0
50
100
0
Dynamic Situation
10flowsQCN 0.995610456QCN/ DC 0.744283688
0.2 0.201 0.202 0.203
0
50
100
0
20
40
60
0.2 0.201 0.202 0.2030
1
2
3
0
20
40
60
Qu
eue
length
[pkts
]
Time[sec] Time[sec]Fb
valu
e
Fb v
alu
e
Rate
[Gbps]
FB received
CRSR
In Delay-based phase
A Cause of Undershoot
When New flow arrives …
In QCN, queue length temporally grows
With feedback reception, transmission rate is rapidly adjusted
Feedback is ignored
Transmission rate is gently decreased with MD (β=0.99)
Bumpy Switching
Switched to Feedback-based
Bumpy Switching
Smooth Switching
𝑆𝑅=𝑀𝑖𝑛(𝛽∗𝑆𝑅 ,𝑪𝑹)Smooth Switching (SS)
β * SR > CR β * SR < CR
CR is too much regulated by FB
When CC is operated in Delay-based phase, undershoot might happen
With SSFeedback-based
CR is not adequately by Feedback-based
Feedback-based (original QCN) cannot work well
Delay-based
Dynamic Situation
QCN
QCN+
Delay-basedwithout SS
QCN+
Delay-basedwith SS
Time[sec]
(a)10 flows (b)50 flows (c)70 flows
Time[sec] Time[sec]
0.3 0.50.4 0.3 0.50.4 0.3 0.50.4
100
50
0
100
50
0
100
50
0
Queue length
[pkts
]Q
ueue length
[pkts
]Q
ueue length
[pkts
]
10flowswithout SS 0.744283688with SS 0.996122352
Link Utilization
In QCN, we show that queue length fluctuates with large number of flows in congested link
Future works
Conclusions
We reveal that reason for queue fluctuation is HAI increase in some flows receiving no feedbacks even in congestion time period
We propose QCN/DC in which delay-based congestion detection is additionally used
QCN/DC realizes stable and small queue occupancy with high utilization of bottleneck link
Detailed investigation about adaptive adjustment of TRTT is our future work