Deep-Q: Traffic-driven QoS Inference using Deep Generative...

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network

Shihan Xiao, Dongdong He, Zhibo Gong

Network Technology Lab,

Huawei Technologies Co., Ltd., Beijing, China

1

• What is a QoS Model?

Background

Traffic

Network

Delay, jitter, packet loss…QoS Model

• Why is it important?

Background

Online QoS Monitoring

Path A

Path B

Path C

Delay Monitoring

SLA guarantee & anomaly detection

Monitor Require high cost on real-time active QoS measurements!

A QoS model helps reduce most of the cost!


Path A

Path B

Path C

Delay Monitoring


Offline Traffic Analysis

Traffic trace Network

Path A

Path B

Path C

Delay Inference

Inference

+Monitor


Background

A QoS model can do QoS inference without QoS measurements


Background


Path A

Path B

Path C

Delay Monitoring


Offline Traffic Analysis

Traffic trace Network

Path A

Path B

Path C

Delay Inference

Inference

+

“What if” Analysis

Path A

Path C

Predict

Delay Prediction

How QoS will changeif a flow switches from Path A to C?

Monitor

Traditional Methods

6

Traffic

Network

Delay, jitter, packet lossNetwork Simulator

NS2, NS3, OMNeT++…

Slow and Inaccurate

• 1. Network simulator

• 2. Mathematical modeling

Traditional Methods

7

Traffic

Network

Delay, jitter, packet lossQueuing Theory

Simplified assumptions

Large human-analysis cost & Inaccurate

• 2. Mathematical modeling

Traditional Methods

8

Traffic

Network

Delay, jitter, packet lossQueuing Theory

Simplified assumptions

A fast, accurate & low-cost QoS model is helpful!

Large human-analysis cost & Inaccurate

Key Observations

• Observation 1: Traffic load per link is much easier to collect & well-supported by existing tools (e.g., SNMP) than QoS values per path

9

Key Observations

• Observation 1: Traffic load per link is much easier to collect & well-supported by existing tools (e.g., SNMP) than QoS values per path

• Observation 2: Traffic load is the key factor of QoS changes

10

Traffic: collected link load matrixes Delay, jitter, packet loss

QoS Model

Node index

No

de

ind

ex

Key Observations

• Observation 3: Different traffic loads lead to different QoS distributions

11

Testbed measurement40 traffic loads (per 20 min)

Measured delay samples

Key Observations

• Target Problem: Given a set of traffic load matrixes during time T, what are the distributions of QoS values (delay, jitter, loss...) of each network path during T?

12

Different traffic loads lead to different QoS distributions

Solution of Deep-Q

• Why deep learning helps?

13

Data-driven VS. Human-engineered model

Low human-analysis cost Fast inference

Network SimulatorPackets QoS values

Running time of Hours!

QoS values

Running time of Milliseconds!

…

… …

Traffic load matrix

Delay model

Loss model

…

AutoTraining

Delay/Jitter/Loss…

Key Technology: Deep Generative Network

• State-of-the-art DGNs in deep learning– GAN(Generative Adversarial Network) & VAE(Variational Autoencoder)

14

Input: “this small bird has a pink breast and crown, and black primaries and secondaries”

infer

Source: ICML2016, “Generative Adversarial Text to Image Synthesis”

infer

Input: number 2

(Conditional) GAN Example (Conditional) VAE Example

Source: NIPS2014, “Semi-supervised Learning with Deep Generative Models”

Input: traffic load matrixes

infer

Delay (us)

Pro

bab

ility

Deep-Q

Image domain Network domain

So what is the difference?

Key Technology: Deep Generative Network

• Differences

15

Discrete Label Image samples Traffic statistics QoS values

Input

Discrete & Low/high Dimensional

Discrete & High Dimensional

Output Input Output

Continuous & High Dimensional

Continuous & Low Dimensional

Image domain(GAN & VAE)

Network domain(Deep-Q)

Target: the generated image samples satisfy “real” image distribution and match the label class

Target: the generated QoS values satisfy real QoS distribution and match the traffic statistics

Deep-Q requires a high accuracy on the output distribution, but GAN & VAE do not apply!

Application: text label to images Application: traffic load matrixes to QoS values

Deep-Q Solution

• 1. Handle the continuous high-dimensional input– Extract traffic features from a sequence of high-dimensional traffic load matrixes

16

LSTMCell

LSTMCell … …

LSTMCell

𝑀𝑡2 𝑀𝑡

3 𝑀𝑡𝑛

Traffic features

Micro-load matrixes during time t

Hidden State Hidden StateLSTMCell

Hidden State

𝑀𝑡1 … …

LSTM (Long Short Term Memory) module: a state-of-the-art deep learning method to learn features from a data sequence

…

… …

…

… …

…

… … …

… …

• 2. Handle the continuous low-dimensional output– Challenge: high accuracy is required for QoS distribution inference

– Solution: a new metric “Cinfer loss” to accurately quantify the QoS distribution error

Deep-Q Solution

17

CDF curve of X CDF curve of Y

X: Inferred QoS distributionY: Target QoS distribution

Height Difference

Cu

mu

lati

ve P

rob

abili

ty

Delay (ms)

CDF (Cumulative Distribution Function)

Deep-Q Solution

• Deep-Q: A stable & accurate inference engine

– Built upon VAE (Stable) and augmented with Cinfer Loss (Accurate)

18

L2 Loss of VAE KL Loss of GAN Cinfer Loss of Deep-Q

VAE: Stable but Inaccurate GAN: More accurate but unstable Deep-Q: Stable & Accurate

A simple example of learning ability:Target distributionInferred distribution

Deep-Q Solution

• Cinfer-Loss computation for training– The exact computation is NP-hard

– The approximation must be fully differentiable to compute gradients for training

• Step 1: Discretization

19

Cu

mu

lati

ve P

rob

abili

ty

Delay (ms)

Discretization

Cu

mu

lati

ve P

rob

abili

tyDelay (ms)

From integral to a discrete sum of bins

Deep-Q Solution



• Step 2: Bin Height Computation– required to be differentiable

• An intuitive method:– Calculate the located bin index of each sample & Count the sample number per bin

20

Cu

mu

lati

ve P

rob

abili

ty

Delay (ms)

Ceil function is non-differentiable & difficult to approximate!

Deep-Q Solution



• Step 2: Bin Height Computation– required to be differentiable

• A differentiable method with some math tricks (borrowed from deep learning)

21

Cu

mu

lati

ve P

rob

abili

ty

Delay (ms)

Step 2): Approximate 𝑆𝑖𝑔𝑛 function with 𝑡𝑎𝑛ℎ

Approximation error< 10−5 in experiments

Step 1): Use 𝑆𝑖𝑔𝑛 function

Deep-Q Solution

• Put it all together

22

VAEEncoder

VAEDecoder

Z

Sampling from N(0,1)

X

Traffic load matrixalong time

X’

Automatic feature engineering & QoS modeling：end-to-end training using Cinfer Loss

Network QoS(delay,jitter,

loss…)

Collect traffic data

Underlay network

Space-time Traffic Features

Network QoS(delay,jitter,

loss…)

Inference phase of Deep-Q

Training phase of Deep-Q

…

… …

…

… …

…

……

• Testbed Topology

• Traffic traces: WIDE backbone network [1]

– Training set: 24 hours of traffic traces on April 12, 2017

– Test set: 24 hours of traffic traces on April 13, 2017

• Neural network: TensorFlow implementation with 2 hidden layers

Experiment Setup

23

CPE CPE

r0

r1

r2

r3

AS-1

Internet

AS-2

r4

NEU200 Probe NEU200 Probe

NEU200 Probe NEU200 Probe

Experiment topology of data center network Experiment topology of overlay IP network

[1] Traffic traces are public available at http://mawi.wide.ad.jp/mawi/

Experiment Results

• Delay Inference in Datacenter Topology

24

Traffic

Real delay

Distribution error of inference

Mean error of inference

90-percentile error of inference

99-percentile error of inference

Queuing theory Deep learning

1. Deep learning methods achieve on average 3x higher accuracy over Queuing theory2. Deep-Q achieves the lowest errors and most stable performance over all cases

Experiment Results

• Packet Loss Inference in Overlay IP Topology

25

1. Deep learning methods achieve on average 3x higher accuracy over Queuing theory2. Deep-Q achieves the lowest errors and most stable performance over all cases

Queuing theory Deep learning

Deep-Q inference speed < 10ms for network scale < 200 nodes

Conclusion

• Deep-Q: an accurate, fast and low-cost QoS inference engine– Automation: LSTM module for auto traffic feature extraction

– High stability: an extended VAE inference structure with the encoder and decoder

– High accuracy: a new metric “Cinfer loss” to accurately quantify the QoS distribution error

• Future vision: – Learn device-level QoS models (routers/switches) → scalable network-level QoS models

– Learn high-level application QoE from traffic traces

26

Deep-Q: Traffic-driven QoS Inference using Deep Generative...

Documents

Transcript of Deep-Q: Traffic-driven QoS Inference using Deep Generative...