Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2....

36
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013

Transcript of Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2....

Page 1: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Scalable Rule Management for

Data Centers

Masoud Moshref, Minlan Yu,

Abhishek Sharma, Ramesh Govindan

4/3/2013

Page 2: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Motivation Motivation Design EvaluationIntroduction

Introduction: Definitions

2

Datacenters use rules to implement management policiesDatacenters use rules to implement management policies

• Access control• Rate limiting• Traffic measurement • Traffic engineering

Page 3: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Motivation Motivation Design EvaluationIntroduction

Introduction: Definitions

3

Datacenters use rules to implement management policiesDatacenters use rules to implement management policiesDatacenters use rules to implement management policies

An action on a set of ranges on flow fieldsAn action on a set of ranges on flow fields

Examples:

• Deny

• Accept

• Enqueue

Flow fields examples:

• Src IP / Dst IP

• Protocol

• Src Port / Dst Port

Page 4: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Motivation Motivation Design EvaluationIntroduction

Introduction: Definitions

4

Datacenters use rules to implement management policies

R2

R1

Src IP R1: Accept

� SrcIP: 12.0.0.0/7

� DstIP: 10.0.0.0/8

An action on a set of ranges on flow fields

Dst

IP

R2: Deny

� SrcIP: 12.0.0.0/8

� DstIP: 8.0.0.0/6

Page 5: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Motivation Motivation Design EvaluationIntroduction

Current practice

5

Rules are saved on predefined fixed machines

On hypervisors On switches

Page 6: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Motivation Motivation Design EvaluationIntroduction

Machines have limited resources

6

Top-of-Rack switch

Network Interface Card

Software switches on servers

Page 7: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Motivation Motivation Design EvaluationIntroduction

Future datacenters will have many fine-grained rules

7

Regulating VM pair communication

• Access control (CloudPolice)

• Bandwidth allocation (Seawall)

Per flow decision

• Flow measurement for

traffic engineering (MicroTE, Hedera)

VLAN per server

• Traffic management (NetLord, Spain)

1B – 20B rules

10M – 100M rules

1M rules

Page 8: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Rule location trade-off (resource vs. bandwidth usage)

8

Storing rules at hypervisor incurs CPU overhead

R0

Page 9: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Rule location trade-off (resource vs. bandwidth usage)

9

Storing rules at hypervisor incurs CPU overheadMove the rule to ToR switch and forward traffic

R0

Page 10: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Rule location trade-off: Offload to servers

10

R1

Page 11: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Challenges: Concrete example

11

Src IP

Dst

IP

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

R0

R1 R5

R6

R2

R3

R4

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

VM0

VM1

VM2

VM3

VM4

VM5

VM6

VM7

Page 12: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Challenges: Overlapping rules

12

R0

R1 R5

R6

R2

R3

R4

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

VM2

VM6

R0

R1

R2

R3

R4

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

Source Placement: Saving rules on the source machine means

minimum overhead

Src IP

Dst

IP

R0

R1

R2

R3

R4

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

Page 13: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

Challenges: Overlapping rules

13

R0

R1

R2

R3

R4

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

VM2

VM6

R4

R0

R1

R2

R3

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

If Source Placement is not feasible

Src IP

Dst

IP

Page 14: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction ArchitectureMotivation Design Evaluation

Heterogeneous devices

Challenges

14

Respect resource constraints

Minimize traffic overhead

Traffic changesRule changesVM Migration

Preserve the semantics of overlapping rules

Handle Dynamics

Page 15: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

Contribution: vCRIB, a Virtual Cloud Rule Information Base

15

vCRIB

Proactive rule placement abstraction layer

Optimize traffic given resource constraints & changes

Rules

R1 R2

R3R4 R3

Page 16: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design

16

Source Partitioning with Replication

Topology &Routing

Rules

Partitions

Overlapping

Rules

Minimum Traffic Feasible Placement

Page 17: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

R0

R 1

R5

R7

R8

R2

R3 R6R

4

Partitioning with cutting

17

P1

Smaller partitions have more flexibility

Cutting causes rule inflation

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

R0

R 1

R8

R3

R4 R

0

R5

R 1

R7

R2

R3R6

Page 18: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

Partitioning with replication

18

R0

R3

R1

A7

R6R

0

R 1

R8

R3

R4 R0

R1

R5R

2

R3

R0

R1R

5R

2

R3

R7

R6

Introduce the concept of

similarity to mitigate inflation

��� ��, �� � �� ∩ ��� R0, R1, R3 � 3

P1 (5 rules) P3 (5 rules)P2(5 rules)

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

00 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

P1 ∪ P2(7 rules)

Page 19: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

Per-source partitions

19

R0

R1 R5

R6

R2

R3

R4

0 1 2 3 4 5 6 7

1

2

3

4

5

6

7

0

• Limited resource for forwarding

• No need for replication to

approximate source-placement

• Closer partitions are more similar

Src IP

Dst

IP

Page 20: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design: Placement

20

Source Partitioning with Replication

Topology &Routing

Rules

Partitions

PlacementT11

T21 T22

T23

T32

T33

Traffic Overhead

Resource

Constraints

Minimum Traffic Feasible Placement

Page 21: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design: Placement

21

Source Partitioning with Replication

Topology &Routing

Rules

Partitions

Minimum Traffic Feasible Placement

Feasible Placement

Traffic Overhead

Resource

Constraints

Traffic Overhead

Resource-Aware Placement

Traffic-Aware Refinement

Traffic-Aware Refinement

Page 22: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

FFDS (First Fit Decreasing Similarity)

22

1. Put a random partition on an empty device

2. Add the most similar partitions to the initial partition

until the device is full

Find the lower bound for optimal solution for rules

Prove the algorithm is a 2-approximation of the lower

bound

Page 23: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design: Heterogeneous resources

23

Source Partitioning with Replication

Topology &Routing

Rules

Partitions

Minimum Traffic Feasible Placement

Resource-Aware Placement

Feasible Placement

Resource

Heterogeneity

Resource Usage

FunctionTraffic-Aware Refinement

Page 24: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design: Traffic-Aware Refinement

24

Source Partitioning with Replication

Resource-Aware Placement

Partitions

Feasible Placement

Minimum Traffic Feasible Placement

Resource Usage

FunctionTraffic Overhead

Topology &Routing

Rules

Traffic-Aware Refinement

Page 25: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

Traffic-aware refinement

� Overhead greedy approach

1. Pick maximum overhead partition

2. Put it where minimizes the overhead and maintains feasibility

25

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

VM2 VM4

P2

P4

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

VM2 VM4

P2P4

Page 26: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

Traffic-aware refinement

� Overhead greedy approach

1. Pick maximum overhead partition

2. Put it where minimizes the overhead and maintains feasibility

�Problem: Local minima

� Our approach: Benefit greedy

26

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

VM2

VM4

P2

P4

Agg1 Agg2

ToR1 ToR2

S1 S2 S3 S4 S5 S6

ToR3

VM2

VM4

P2

P4

Page 27: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design: Dynamics

27

Source Partitioning with Replication

Resource-Aware Placement

Rule/VM Dynamics

Partitions

Feasible Placement

Minimum Traffic Feasible Placement

Resource Usage

Function

Major Traffic Changes

Dynamics

Topology &Routing

Rules

Traffic-Aware Refinement

Page 28: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Introduction DesignMotivation Evaluation

vCRIB design

28

Source Partitioning with Replication

Resource-Aware Placement

Rule/VM Dynamics

Partitions

Feasible Placement

Minimum Traffic Feasible Placement

Resource Usage

FunctionTraffic Overhead

Major Traffic Changes

Dynamics

Topology &Routing

Rules

Resource

Constraints

Resource

Heterogeneity

Overlapping

Rules

Traffic-Aware Refinement

Page 29: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

Evaluation

� Comparing vCRIB vs. Source-Placement

� Parameter sensitivity analysis

� Rules in partitions

� Traffic locality

� VMs per server

� Different memory sizes

� Where is the traffic overhead added?

� Traffic-aware refinement for online scenarios

� Heterogeneous resource constraints

� Switch-only scenarios

29

Page 30: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

Simulation setup

� 1k servers with 20 VMs per server in a Fat-tree network

� 200k rules generated by ClassBench and random action

� IPs are assigned in two ways:

� Random

� Range

� Flows

� Size follows long-tail distribution

� Local traffic matrix (0.5 same rack, 0.3 same pod, 0.2 interpod)

30

0 1 2 3 4 5 6 70 1 2 3 4 5 6 7

Page 31: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

Comparing vCRIB vs. Source-Placement

31

4k_0 4k_4k 4k_6k0

0.05

0.1

0.15

0.2

0.25

0.3

Server memory_Switch memory

Tra

ffic

over

he

ad ra

tio

RangeRandom

Maximum Load is 5K Capacity is 4K

Range is better as similar partitions are from the same source

Random: Average load is 4.2K

Adding more resources helps vCRIB reduce traffic overhead

vCRIB finds low traffic feasible solution

Page 32: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

Parameter sensitivity analysis: Rules in partitions

32

Total space

Defined by maximum load on a server

Source placement

Page 33: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

0 1000 20000

1000

2000

Average Partition Size

Ave

rag

e S

imila

rity

Parameter sensitivity analysis: Rules in partitions

33

Total space

vCRIB

Source placement

Page 34: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

0 1000 20000

1000

2000

Average Partition Size

Ave

rag

e S

imila

rity

Total space

vCRIB

Parameter sensitivity analysis: Rules in partitions

34

Lower traffic overhead for smaller partitions and

more similar ones

vCRIB<10% Traffic

Source placement

A

A

Page 35: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

EvaluationIntroduction Motivation Design

Future work

Conclusion

Conclusion and future work

35

vCRIB allows operators and users to specify rules, and

manages their placement in a way that respects

resource constraints and minimizes traffic overhead.

• Support reactive placement by adding the

controller in the loop

• Break a partition for large number of rules per VM

• Test for other rulesets

Page 36: Scalable Rule Management for Data Centers · ToR1 ToR2 S1 S2 S3 S4 S5 S6 ToR3 VM2 VM4 P4 P2. Introduction Motivation Design Evaluation Traffic-aware refinement Overhead greedy approach

Scalable Rule Management for

Data Centers

Masoud Moshref, Minlan Yu,

Abhishek Sharma, Ramesh Govindan

4/3/2013