CoCo: Compact and Optimized Consolidation of Modularized ... · CoCo: Compact and Optimized...

Post on 27-Jun-2020

3 views 0 download

Transcript of CoCo: Compact and Optimized Consolidation of Modularized ... · CoCo: Compact and Optimized...

CoCo: Compact and Optimized Consolidation of Modularized Service Function Chains in NFV

Zili Meng Jun Bi Haiping Wang   Chen Sun   Hongxin Hu

NFV & Modularization

2

VPN FirewallMonitor LoadBalancer

NFV: Commodity Hardware Devices

VM VM VM VM

Dedicated Dedicated Dedicated Dedicated

Service Chain

Read OutputClassifierAlert

Drop elementsModularized SFC (MSFC)

NFV & Modularization

3

VPN FirewallMonitor LoadBalancer

NFV: Commodity Hardware Devices

VM VM VM VM

Dedicated Dedicated Dedicated Dedicated

Low Cost

Flexibility

Scalability……

Service Chain

Read OutputClassifierAlert

Drop elementsModularized SFC (MSFC)

However…

• Two drawbacks:–High latency–poor resource efficiency

4

However…

• Two drawbacks:–High latency–poor resource efficiency

5

• OpenBox [Sigcomm’16]– Element reuse

• NFVnice [Sigcomm’17]– NF consolidation: containers in one VM (core).

Which elements to consolidate?

Key Observations

6

E1

E2 E3 E4

E5 E6

E7

VM1 VM2 VM3

E1

E2 E3 E4

E5 E6

E7

VM1 VM2 VM3

placement affects MSFC performance by affecting inter‐VM transfers

CoCo…

identifies inter‐VM transfer between elements

optimizes placement of elements on VMs

optimizes dynamic scaling mechanism

Challenges

• Optimized Placement–How to model the inter‐VM transfer?–How to find optimal solutions efficiently?

• Optimized Dynamic Scaling–How to reduce inter‐VM transfers during scaling out?

8

Challenges

• Optimized Placement–How to model the inter‐VM transfer?–How to find optimal solutions efficiently?

• Optimized Dynamic Scaling–How to reduce inter‐VM transfers during scaling out?

9

Optimized Placer

Individual Scaler

Optimized Placer

• Packet Transfer Cost:– Four‐step transfer delay: – Service chain throughput: Θ– Delayed Bytes: 

Θ ⋅• Resource Analysis:

– Observation:The CPU utilization of an element is linear to processing speed

10

VM #n

vSwitchvSwitch

③vNICvNIC

VM MemoryVM Memoryelementelement…Scheduler

VM #1

②vNICvNIC

VM MemoryVM Memoryelementelement

Scheduler

Optimized Placer

• Packet Transfer Cost:– Four‐step transfer delay: – Service chain throughput: Θ– Delayed Bytes: 

Θ ⋅• Resource Analysis:

– Observation:The CPU utilization of an element is linear to processing speed

11

VM #n

vSwitchvSwitch

③vNICvNIC

VM MemoryVM Memoryelementelement…Scheduler

VM #1

②vNICvNIC

VM MemoryVM Memoryelementelement

Scheduler

VM2

Logger Alert

VM1Header Classifier

Stateful Payload Analyzer

Optimized Placer – 0‐1 Quadratic Programming

• Intuition: Consolidate adjacent elements together– If we place two adjacent elements together to one VM, there will be no inter‐VM packet transfer.

12

inter‐VM intra‐VM

Optimized Placer – 0‐1 Quadratic Programming

• , : indicating element  is placed onto instance • Challenge: How to express two elements are placed together?

, ,

13

1 2 3 4 5 6

, 0 1 0 0 0 0

, 0 0 1 0 0 0

, ⋅ , 0 0 0 0 0 0

1 2 3 4 5 6

, 0 0 1 0 0 0

, 0 0 1 0 0 0

, ⋅ , 0 0 1 0 0 0

indicator: (quadratic)

Optimized Placer – 0‐1 Quadratic Programming

• Objective– The total inter‐VM Delayed Bytes.

• Constraints– The placement cannot lead to the overload of any instances.

• For other mathematical details, please refer to our paper.

14

Optimized Individual Scaling

15

VM2Logger Alert

VM1Header Classifier

Stateful Payload Analyzer

VM2Logger Alert

VM1Header Classifier

Stateful Payload Analyzer

VM3 Stateful Payload Analyzer

state syn~100ms according to 

OpenNF [Sigcomm’14]

MSFC before scaling

Scaling with traditional method

additional packet transfer

Optimized Individual Scaling

• Key novelty

Migrate other elements consolidated together to release resources for the overloaded element.

16

Optimized Individual Scaling

17

VM3

VM1

VM2Logger Alert

VM1Header Classifier

Stateful Payload Analyzer

VM2Logger Alert

Header Classifier

Stateful Payload Analyzer

VM2Logger Alert

VM1Header Classifier

Stateful Payload Analyzer

Stateful Payload Analyzer

state syn

MSFC before scaling

Scaling with traditional method

CoCo

additional packet transfer

Optimized Individual Scaling

• Consistency guarantee mechanism– Overload should be alleviated.–Migration will not lead to new hotspots.

• Advantage of CoCo Individual Scaler– No new hardware resource consumed– Additional packet transfer avoided– State synchronization avoided

• Application scenario of CoCo Individual Scaler– Imbalance between VMs (OFM [IWQoS’18])

18

Optimized Individual Scaling

• Consistency guarantee mechanism– Overload should be alleviated.–Migration will not lead to new hotspots.

• Advantage of CoCo Individual Scaler– No new hardware resource consumed– Additional packet transfer avoided– State synchronization avoided

• Application scenario of CoCo Individual Scaler– Imbalance between VMs (OFM [IWQoS’18])

19

Implementation and Evaluation

• Evaluation Setup– Docker for consolidation, DPDK version 16.11– OpenNF [Sigcomm’14] and TFM [ICNP’16] for migration mechanisms.–MATLAB for solving 0‐1 Quadratic Programming– Intel(R) Xeon(R) E5‐2690 v2 CPUs, 256G RAM, 2×10G NICs

• Evaluation Goal– demonstrate the assumption of linearity– demonstrate the effectiveness of CoCo placement– demonstrate the performance of CoCo scaling

20

1. Throughput‐CPU Utilization

• For one core only• Sender

– 0.9997• Classifier

– 100 rules on IP header– 0.9999997

21

2. Simulations on Placement

• Evaluation Target– Random: select available VMs randomly– Greedy: place elements in sequence chain‐by‐chain

• Traffic: Randomly pick flows from CAIDA traffic• Two topology:

22

E1 E2

E3 E4

E5 E6

Chain 1

Chain 2E1 E2 E3 E4 E5

E6

Chain 3

E7

E8 E9

Chain 1

Chain 2

2. Simulations on Placement

• Performance • Resource Utilization

23

01

23

45

6

Topo1 Topo2

Sum of D

B (M

B)

CoCoGreedyRandom

59%

18%

0%

5%

10%

15%

20%

25%

Topo1 Topo2Placem

ent F

ailure Rate

CoCoGreedyRandom

2. Simulations on Placement

• Performance • Resource Utilization

24

01

23

45

6

Topo1 Topo2

Sum of D

B (M

B)

CoCoGreedyRandom

59%

18%

0%

5%

10%

15%

20%

25%

Topo1 Topo2Placem

ent F

ailure Rate

CoCoGreedyRandom

3. Evaluation on Dynamic Scaling

• Based on OpenNF [Sigcomm’14] • Per‐packet latency

25

VM3

VM1 VM2Logger Alert

Header Classifier

Stateful Payload Analyzer

VM2Logger Alert

VM1Header Classifier

Stateful Payload Analyzer

Stateful Payload Analyzer

0

20

40

60

80

0 10 20 30 40 50

Latency (m

s)Packet # (kilo)

CoCoTraditional

by 45%

traffic increases

Conclusion

• CoCo: Compact and Optimized Consolidation of MSFCs in NFV– Optimized Placer– Individual Scaler

• Significant Performance Improvement– Up to 59% Delayed Bytes reduction in initial placement.– 45% latency reduction when dynamic scaling.

• Future work–Multi‐core placement– Intra‐core cache analysis

26

Thank you!

netarchlab.tsinghua.edu.cnmengzl15@mails.tsinghua.edu.cn