Data plane use case, Background

66
Dataplane Programming 1

Transcript of Data plane use case, Background

Dataplane Programming

1

Outline

Example use case

Introduction to data plane programming

P4 language

2

Example Use Case: Paxos in the Network

3

The Promise of Software Defined Networking

Increased “network programmability” allows ordinary programs to manage the network

Applications can leverage SDNs to improve performance through data plane configuration (e.g., route selection and QoS)

Can application logic be moved into the network?

This work focuses on the widely-deployed Paxos protocol

4

Why Paxos?

Paxos is a fundamental building block for distributed applications

e.g., Chubby, OpenReplica, and Ceph

There exists extensive work on optimizing Paxos (e.g., Fast Paxos)

Paxos operations can be efficiently implemented in hardware

5

Outline of This TalkMotivation

Paxos Background

Consensus in the Network

Paxos in SDN Switches (and required OpenFlow extensions)

Alternative consensus protocol (without OpenFlow changes)

Evaluation

Conclusions

6

Paxos Background

7

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Issues requests

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Paxos Protocol

8

CoordinatorAdvocates requests

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Chooses a value / provides

memory

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearnerProvides

replication

Paxos Protocol

8

Coordinator

Goal: Have a group of participants agree on one result (i.e., consensus)

Protocol proceeds in rounds, each round has two phases

Performance is measured in message hops

Classic Paxos requires 3 hops

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Fast Protocol

9

Coordinator

Key idea: optimize for the case when proposals don’t collide

Optimistically uses fast rounds that bypass coordinator

Only 2 message hops

Requires 1 more acceptor

If there is collision, revert to Classic Paxos

ProposerProposerProposer

AcceptorAcceptorAcceptor

LearnerLearner

Observations

Performance metric (hops) does not account for network topology

1 “classic” message hop has to travel through multiple switches

Coordinators and acceptors are typically bottlenecks

Must aggregate or multiplex messages

“Fault tolerance” does not include the network devices

10

Moving Paxos into the Network

11

Paxos on Network Devices

12

ProposerProposerProposer

LearnerLearner

Coordinator

AcceptorAcceptorAcceptor

Key idea: Move coordinator and acceptor (phase 2) logic into switches

OpenFlow API not sufficient

Required Extensions

13

ProposerProposerProposer

LearnerLearner

Coordinator

AcceptorAcceptorAcceptor

Required Extensions

13

ProposerProposerProposer

LearnerLearner

Coordinator

AcceptorAcceptorAcceptor

1. Generate round and sequence numbers

Required Extensions

13

ProposerProposerProposer

LearnerLearner

Coordinator

AcceptorAcceptorAcceptor

1. Generate round and sequence numbers

2. Persistent storage 3. Stateful comparisons 4. Storage cleanup

Hardware Feasibility

14

Requirement Device Capability

Round & Sequence Generator Netronome NFP-6xxx,

NetFPGA, Arista 7124FX

Stateful flow processing

Stateful Comparisons

Persistent Storage

Arista 7124FX 50 GB SSD loggingStorage Cleanup

Consensus without OpenFlow Extensions

15

Can We Avoid Extensions?

16

ProposerProposerProposer

LearnerLearner

Coordinator

AcceptorAcceptorAcceptor

Make optimistic assumptions about message order

Like Fast Paxos, have “fast” and classic rounds

No Sequence Numbers Needed

17

ProposerProposerProposer

LearnerLearner

Serializer

AcceptorAcceptorAcceptor

Rely on optimistic message ordering Use switch to increase probability

No On-Device State Needed

18

ProposerProposerProposer

LearnerLearner

Serializer

AcceptorAcceptorAcceptor

Use additional servers as storage

No On-Device Logic Needed

19

ProposerProposerProposer

LearnerLearner

Serializer

MinionMinionMinion

No coordinator Acceptors always “accept” Need an additional “minion”

Minion

Performance Assumption

20

ProposerProposerProposer

LearnerLearner

Serializer

MinionMinionMinion Minion

Messages from serializer to minion are in the same order If not, execute slow round

Correctness Assumption

21

ProposerProposerProposer

LearnerLearner

Serializer

MinionMinionMinion Minion

Messages from minion to servers are in same order If not, protocol breaks

NetPaxos Summary

Latency: Fewer “true” network hops, including switches

Throughput: Avoids potential bottlenecks, reduced logic

Fault tolerance: Serializer can easily be made redundant, other devices can fail, as with Classic Paxos

22

Evaluation

23

Experiments

Focus on two questions:

Do our ordering assumptions hold?

What is the potential benefit of NetPaxos?

Testbed:

Three Pica8 Pronto 3290 switches, 1Gbps links

Send messages of 1 MTU size, with sequence numbers

24

Assumptions Mostly Hold

25

0.00

0.10

0.20

0.30

Messages / Second

Perc

enta

ge o

f Pac

kets

Res

ultin

gin

Dis

agre

emen

t or I

ndec

isio

n

10,000 30,000 50,000

IndecisiveDisagree

Performance assumption held up to 70% link capacity

Correctness assumption was never violated

Traffic should not be bursty

High Potential for Performance

26

Disclaimer: Best case scenario

9x increase in throughput

90% reduction in latency

0.5

1.5

2.5

3.5

Messages / Second

Late

ncy

(ms)

10,000 30,000 50,000

Basic PaxosNetPaxos

Outlook

Formalizing protocol with Spin model checker

Implementing a NetFPGA-based prototype

Investigating root causes for packet reordering

27

Conclusions

SDNs enable tight integration with the network, which can improve distributed application performance

Proposed two approaches to moving Paxos logic into the network

Paxos is a fundamental protocol. Performance improvements would have a great impact on data center applications

28

29

Introduction to Data Plane Programming

30

SDN is Not Enough

SDN allows you to program the control plane

Many large data centers program host network stacks (hypervisors), roughly edge-based SDN

Not yet able to program the data plane

31

Data Plane Opportunities

Simplify and improve network management

Extensions for debugging and diagnostics

Dynamic resource allocation

Enable critical new features

Improved robustness

Port-knocking

Load balancing, enhanced congestion control

32

Suppressing Innovation

OpenFlow provides an (intentionally) limited interface

No state

No computation

Restricted to a fixed set of headers

May need to customize hardware support

Match tables usually have fixed width, depth, and execution order

33

Demand for New Features

SDNs and white boxes set the stage

Large private networks want new features

Rate of new feature arrivals exceeds rate of hardware evolution

34

Getting New Features

No DIY solution. Must work with vendors at the “feature” level

Hard to get consensus on the feature

Long time to realize the feature

Need to buy the new hardware

What you get is not usually what you want

35

Extensions to OpenFlow

OpenState project, G. Bianchi et al.

http://openstate-sdn.org

Mealy machine abstraction

ONF Working Groups

EXT-WG focused on extensions

FAWG focused on forwarding abstractions

36

Vision

What about the next version of OpenFlow? Or custom protocol?

We all know how to program CPUs

Supporting tools and infrastructure

Allows fast iteration and differentiation

Let’s you quickly realize your own ideas

Challenge: How can we replicate this in the network?

37

Networking is Late to the Game

38

Domain Target Hardware Language

Computers CPU C, Java, OCaml, JavaScript, etc.

Graphics GPU CUDA, OpenCL

Cellular BaseStation DSP C, MATLAB, Data

Flow languages

Networks ? ?

Two Questions

What do we need at the hardware level?

Once we have that, how do we program it?

39

Hardware Trend

PISA (Protocol Independent Switch Architecture)

Fundamental departure from switch ASICS

Programmable parsing

Protocol independence (i.e., generic match-action units)

Parallelism across multiple match-action units, and withineach stage

Power, size, or cost penalty is negligible

40

Why Now?

I/O, memory, and bus dominate chip size

Logic is getting proportionally smaller

Programmability means larger logic. The rest stays the same.

Very little power, area, or performance penalty for programmability.

41

PISA Chip

42

Logical4Dataplane4View4

Switch4Pipeline4

PISA)Chip)

Queues4

Parser4

Fixed4Ac:o

n4

Match4Table4

Match4Table4

Match4Table4

Match4Table4

L24IPv44

IPv64ACL4

Ac:o

n4Macro4

Ac:o

n4Macro4

Ac:o

n4Macro4

Ac:o

n4Macro4

84

PISA Chip

43

Match4Table4

Ac:o

n4Macro4

Mapping)Logical)Dataplane)Design)to)PISA)Chip)

Queues4

Parser4

Match4Table4

Match4Table4

Match4Table4

L24Table4

IPv44Table4

IPv64Table4

ACL4Table4

Ac:o

n4Macro4

Ac:o

n4Macro4

Ac:o

n4Macro4

L24IPv44

IPv64ACL4

Logical4Dataplane4View4

Switch4Pipeline4

L24

IPv64ACL4

IPv44L24Ac:on

4Macro4

v44Ac:on

4Macro4

v64Ac:on

44

ACL4Ac:o

n4Macro4

94

PISA Chip

44

Logical4Dataplane4View4

Switch4Pipeline4

ReMconfigurability)

Queues4

Parser4

L24Table4

IPv44Table4

ACL4Table4

IPv64Table4

MyEncap4

L24IPv44

IPv64ACL4

MyEncap4

L24Ac:on

4Macro4

v44Ac:on

4Macro4

ACL4Ac:o

n4Macro4

Ac:o

n4

MyEncap4

v64Ac:on

4Macro4

IPv44

Ac:o

n4

IPv44

Ac:o

n4

104

IPv64

Ac:o

n4

IPv64

PISA Chip

45

Parser4

L24Table4

IPv44Table4

IPv64Table4

ACL4Table4

Match4Memory4

Ac:on4ALU4

PISA)(Protocol4Independent4Switch4Architecture)4Q  Parallelism4across4pipelined4stages4Q  Parallelism4within4each4stage4

L24Ac:on

4Macro4

v44Ac:on

4Macro4

v64Ac:on

4Macro4

ACL4Ac:o

n4Macro4

114

Key PlayersHardware manufacturers

Proto-PISA chips already available: FlexPipe (Intel) and others (Cisco and Cavium)

Full-fledged PISA chips on the horizon (Barefoot)

High-level language

P4 (p4.org)

Compiler and development tools

“Hour-glass design” with IR, profilers, debuggers

46

Abstract Forwarding Model

47

Abstract)Forwarding)Model)

Parser4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Ingress4Stages4 Egress4Stages4Hdr4Fields4

Queues4

244

TCP4

IPv44 IPv64

VLAN4Eth4

P4 Language Components

48

P4)language)components)

Parser)Program)

Control)Flow)

StateQmachine;44Field4extrac:on4

Table4lookup4and4update;4Field4manipula:on;44Control4flow4

Field4assembly4

No:4memory4(pointers),4loops,4recursion,4floa:ng4point4

Match)Tables)+)Ac:ons)

Assembly)(“deparser”))Program)

254

P4 Examples

Header Fields

Parsing

Tables

Actions

Control Flow

49

Header Field and Parsing

50

Header)Fields)and)Parsing)

header_type ethernet_t { fields { dstAddr : 48; srcAddr : 48; etherType : 16; }}

parser parse_ethernet { extract(ethernet); return select(latest.etherType) { 0x8100 : parse_vlan; 0x800 : parse_ipv4; 0x86DD : parse_ipv6; }}

TCP4 New4

IPv44 IPv64

VLAN4Eth4Eth4

274

Table (Match)

51

Match)

table ipv4_lpm {

reads { ipv4.dstAddr : lpm; } actions { set_next_hop; drop;

}}

ipv4.dstAddr) ac:on)

0.*4 drop

10.0.0.*4 set_next_hop

224.*4 drop

192.168.*4 drop

10.0.1.*4 set_next_hop

Lookup4key4

284

Actions

52

Ac:ons)

action set_next_hop(nhop_ipv4_addr, port) {

modify_field(metadata.nhop_ipv4_addr, nhop_ipv4_addr); modify_field(standard_metadata.egress_port, port); add_to_field(ipv4.ttl, -1);

}

nhop_ipv4_addr) port)

10.0.0.104 14

10.0.1.104 24

ipv4.dstAddr) ac:on)

0.*4 drop

10.0.0.*4 set_next_hop

224.*4 drop

192.168.*4 drop

10.0.1.*4 set_next_hop

294

Control Flow

53

Control)Flow)

control ingress { apply(port); if (valid(vlan_tag[0])) { apply(port_vlan); } apply (bridge_domain); if (valid(mpls_bos)) { apply(mpls_label); } retrieve_tunnel_vni(); if (valid(vxlan) or valid(genv) or valid(nvgre)) { apply(dest_vtep); apply(src_vtep); }}

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

Match444444Ac:on4

304

Compilation

54

Compiler4

Parser4Program4 Control4Flow4Match4Tables4+4Ac:ons4

PISA4Chip4

314

Protocol API

55

My4(running)4switch4

MPLS4(source)4

Forwarding)

Table4Popula:on4MPLSQTE4

PISA4Chip4

Linux4Switch4Driver4

Switch4Run:me4API4

Table4Popula:on4

MPLSQTE4

Forwarding)

Add/delete4

Switch4Program4API4

Program4

Parser)Program) Control)Flow)Match)Tables)

+)Ac:ons)

Compiler4

56