Data plane use case, Background
Transcript of Data plane use case, Background
The Promise of Software Defined Networking
Increased “network programmability” allows ordinary programs to manage the network
Applications can leverage SDNs to improve performance through data plane configuration (e.g., route selection and QoS)
Can application logic be moved into the network?
This work focuses on the widely-deployed Paxos protocol
4
Why Paxos?
Paxos is a fundamental building block for distributed applications
e.g., Chubby, OpenReplica, and Ceph
There exists extensive work on optimizing Paxos (e.g., Fast Paxos)
Paxos operations can be efficiently implemented in hardware
5
Outline of This TalkMotivation
Paxos Background
Consensus in the Network
Paxos in SDN Switches (and required OpenFlow extensions)
Alternative consensus protocol (without OpenFlow changes)
Evaluation
Conclusions
6
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Issues requests
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Paxos Protocol
8
CoordinatorAdvocates requests
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Chooses a value / provides
memory
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearnerProvides
replication
Paxos Protocol
8
Coordinator
Goal: Have a group of participants agree on one result (i.e., consensus)
Protocol proceeds in rounds, each round has two phases
Performance is measured in message hops
Classic Paxos requires 3 hops
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Fast Protocol
9
Coordinator
Key idea: optimize for the case when proposals don’t collide
Optimistically uses fast rounds that bypass coordinator
Only 2 message hops
Requires 1 more acceptor
If there is collision, revert to Classic Paxos
ProposerProposerProposer
AcceptorAcceptorAcceptor
LearnerLearner
Observations
Performance metric (hops) does not account for network topology
1 “classic” message hop has to travel through multiple switches
Coordinators and acceptors are typically bottlenecks
Must aggregate or multiplex messages
“Fault tolerance” does not include the network devices
10
Paxos on Network Devices
12
ProposerProposerProposer
LearnerLearner
Coordinator
AcceptorAcceptorAcceptor
Key idea: Move coordinator and acceptor (phase 2) logic into switches
OpenFlow API not sufficient
Required Extensions
13
ProposerProposerProposer
LearnerLearner
Coordinator
AcceptorAcceptorAcceptor
1. Generate round and sequence numbers
Required Extensions
13
ProposerProposerProposer
LearnerLearner
Coordinator
AcceptorAcceptorAcceptor
1. Generate round and sequence numbers
2. Persistent storage 3. Stateful comparisons 4. Storage cleanup
Hardware Feasibility
14
Requirement Device Capability
Round & Sequence Generator Netronome NFP-6xxx,
NetFPGA, Arista 7124FX
Stateful flow processing
Stateful Comparisons
Persistent Storage
Arista 7124FX 50 GB SSD loggingStorage Cleanup
Can We Avoid Extensions?
16
ProposerProposerProposer
LearnerLearner
Coordinator
AcceptorAcceptorAcceptor
Make optimistic assumptions about message order
Like Fast Paxos, have “fast” and classic rounds
No Sequence Numbers Needed
17
ProposerProposerProposer
LearnerLearner
Serializer
AcceptorAcceptorAcceptor
Rely on optimistic message ordering Use switch to increase probability
No On-Device State Needed
18
ProposerProposerProposer
LearnerLearner
Serializer
AcceptorAcceptorAcceptor
Use additional servers as storage
No On-Device Logic Needed
19
ProposerProposerProposer
LearnerLearner
Serializer
MinionMinionMinion
No coordinator Acceptors always “accept” Need an additional “minion”
Minion
Performance Assumption
20
ProposerProposerProposer
LearnerLearner
Serializer
MinionMinionMinion Minion
Messages from serializer to minion are in the same order If not, execute slow round
Correctness Assumption
21
ProposerProposerProposer
LearnerLearner
Serializer
MinionMinionMinion Minion
Messages from minion to servers are in same order If not, protocol breaks
NetPaxos Summary
Latency: Fewer “true” network hops, including switches
Throughput: Avoids potential bottlenecks, reduced logic
Fault tolerance: Serializer can easily be made redundant, other devices can fail, as with Classic Paxos
22
Experiments
Focus on two questions:
Do our ordering assumptions hold?
What is the potential benefit of NetPaxos?
Testbed:
Three Pica8 Pronto 3290 switches, 1Gbps links
Send messages of 1 MTU size, with sequence numbers
24
Assumptions Mostly Hold
25
0.00
0.10
0.20
0.30
Messages / Second
Perc
enta
ge o
f Pac
kets
Res
ultin
gin
Dis
agre
emen
t or I
ndec
isio
n
10,000 30,000 50,000
IndecisiveDisagree
Performance assumption held up to 70% link capacity
Correctness assumption was never violated
Traffic should not be bursty
High Potential for Performance
26
Disclaimer: Best case scenario
9x increase in throughput
90% reduction in latency
0.5
1.5
2.5
3.5
Messages / Second
Late
ncy
(ms)
10,000 30,000 50,000
Basic PaxosNetPaxos
Outlook
Formalizing protocol with Spin model checker
Implementing a NetFPGA-based prototype
Investigating root causes for packet reordering
27
Conclusions
SDNs enable tight integration with the network, which can improve distributed application performance
Proposed two approaches to moving Paxos logic into the network
Paxos is a fundamental protocol. Performance improvements would have a great impact on data center applications
28
SDN is Not Enough
SDN allows you to program the control plane
Many large data centers program host network stacks (hypervisors), roughly edge-based SDN
Not yet able to program the data plane
31
Data Plane Opportunities
Simplify and improve network management
Extensions for debugging and diagnostics
Dynamic resource allocation
Enable critical new features
Improved robustness
Port-knocking
Load balancing, enhanced congestion control
32
Suppressing Innovation
OpenFlow provides an (intentionally) limited interface
No state
No computation
Restricted to a fixed set of headers
May need to customize hardware support
Match tables usually have fixed width, depth, and execution order
33
Demand for New Features
SDNs and white boxes set the stage
Large private networks want new features
Rate of new feature arrivals exceeds rate of hardware evolution
34
Getting New Features
No DIY solution. Must work with vendors at the “feature” level
Hard to get consensus on the feature
Long time to realize the feature
Need to buy the new hardware
What you get is not usually what you want
35
Extensions to OpenFlow
OpenState project, G. Bianchi et al.
http://openstate-sdn.org
Mealy machine abstraction
ONF Working Groups
EXT-WG focused on extensions
FAWG focused on forwarding abstractions
36
Vision
What about the next version of OpenFlow? Or custom protocol?
We all know how to program CPUs
Supporting tools and infrastructure
Allows fast iteration and differentiation
Let’s you quickly realize your own ideas
Challenge: How can we replicate this in the network?
37
Networking is Late to the Game
38
Domain Target Hardware Language
Computers CPU C, Java, OCaml, JavaScript, etc.
Graphics GPU CUDA, OpenCL
Cellular BaseStation DSP C, MATLAB, Data
Flow languages
Networks ? ?
Hardware Trend
PISA (Protocol Independent Switch Architecture)
Fundamental departure from switch ASICS
Programmable parsing
Protocol independence (i.e., generic match-action units)
Parallelism across multiple match-action units, and withineach stage
Power, size, or cost penalty is negligible
40
Why Now?
I/O, memory, and bus dominate chip size
Logic is getting proportionally smaller
Programmability means larger logic. The rest stays the same.
Very little power, area, or performance penalty for programmability.
41
PISA Chip
42
Logical4Dataplane4View4
Switch4Pipeline4
PISA)Chip)
Queues4
Parser4
Fixed4Ac:o
n4
Match4Table4
Match4Table4
Match4Table4
Match4Table4
L24IPv44
IPv64ACL4
Ac:o
n4Macro4
Ac:o
n4Macro4
Ac:o
n4Macro4
Ac:o
n4Macro4
84
PISA Chip
43
Match4Table4
Ac:o
n4Macro4
Mapping)Logical)Dataplane)Design)to)PISA)Chip)
Queues4
Parser4
Match4Table4
Match4Table4
Match4Table4
L24Table4
IPv44Table4
IPv64Table4
ACL4Table4
Ac:o
n4Macro4
Ac:o
n4Macro4
Ac:o
n4Macro4
L24IPv44
IPv64ACL4
Logical4Dataplane4View4
Switch4Pipeline4
L24
IPv64ACL4
IPv44L24Ac:on
4Macro4
v44Ac:on
4Macro4
v64Ac:on
44
ACL4Ac:o
n4Macro4
94
PISA Chip
44
Logical4Dataplane4View4
Switch4Pipeline4
ReMconfigurability)
Queues4
Parser4
L24Table4
IPv44Table4
ACL4Table4
IPv64Table4
MyEncap4
L24IPv44
IPv64ACL4
MyEncap4
L24Ac:on
4Macro4
v44Ac:on
4Macro4
ACL4Ac:o
n4Macro4
Ac:o
n4
MyEncap4
v64Ac:on
4Macro4
IPv44
Ac:o
n4
IPv44
Ac:o
n4
104
IPv64
Ac:o
n4
IPv64
PISA Chip
45
Parser4
L24Table4
IPv44Table4
IPv64Table4
ACL4Table4
Match4Memory4
Ac:on4ALU4
PISA)(Protocol4Independent4Switch4Architecture)4Q Parallelism4across4pipelined4stages4Q Parallelism4within4each4stage4
L24Ac:on
4Macro4
v44Ac:on
4Macro4
v64Ac:on
4Macro4
ACL4Ac:o
n4Macro4
114
Key PlayersHardware manufacturers
Proto-PISA chips already available: FlexPipe (Intel) and others (Cisco and Cavium)
Full-fledged PISA chips on the horizon (Barefoot)
High-level language
P4 (p4.org)
Compiler and development tools
“Hour-glass design” with IR, profilers, debuggers
46
Abstract Forwarding Model
47
Abstract)Forwarding)Model)
Parser4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Ingress4Stages4 Egress4Stages4Hdr4Fields4
Queues4
244
TCP4
IPv44 IPv64
VLAN4Eth4
P4 Language Components
48
P4)language)components)
Parser)Program)
Control)Flow)
StateQmachine;44Field4extrac:on4
Table4lookup4and4update;4Field4manipula:on;44Control4flow4
Field4assembly4
No:4memory4(pointers),4loops,4recursion,4floa:ng4point4
Match)Tables)+)Ac:ons)
Assembly)(“deparser”))Program)
254
Header Field and Parsing
50
Header)Fields)and)Parsing)
header_type ethernet_t { fields { dstAddr : 48; srcAddr : 48; etherType : 16; }}
parser parse_ethernet { extract(ethernet); return select(latest.etherType) { 0x8100 : parse_vlan; 0x800 : parse_ipv4; 0x86DD : parse_ipv6; }}
TCP4 New4
IPv44 IPv64
VLAN4Eth4Eth4
274
Table (Match)
51
Match)
table ipv4_lpm {
reads { ipv4.dstAddr : lpm; } actions { set_next_hop; drop;
}}
ipv4.dstAddr) ac:on)
0.*4 drop
10.0.0.*4 set_next_hop
224.*4 drop
192.168.*4 drop
10.0.1.*4 set_next_hop
Lookup4key4
284
Actions
52
Ac:ons)
action set_next_hop(nhop_ipv4_addr, port) {
modify_field(metadata.nhop_ipv4_addr, nhop_ipv4_addr); modify_field(standard_metadata.egress_port, port); add_to_field(ipv4.ttl, -1);
}
nhop_ipv4_addr) port)
10.0.0.104 14
10.0.1.104 24
ipv4.dstAddr) ac:on)
0.*4 drop
10.0.0.*4 set_next_hop
224.*4 drop
192.168.*4 drop
10.0.1.*4 set_next_hop
294
Control Flow
53
Control)Flow)
control ingress { apply(port); if (valid(vlan_tag[0])) { apply(port_vlan); } apply (bridge_domain); if (valid(mpls_bos)) { apply(mpls_label); } retrieve_tunnel_vni(); if (valid(vxlan) or valid(genv) or valid(nvgre)) { apply(dest_vtep); apply(src_vtep); }}
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
Match444444Ac:on4
304
Protocol API
55
My4(running)4switch4
MPLS4(source)4
Forwarding)
Table4Popula:on4MPLSQTE4
PISA4Chip4
Linux4Switch4Driver4
Switch4Run:me4API4
Table4Popula:on4
MPLSQTE4
Forwarding)
Add/delete4
Switch4Program4API4
Program4
Parser)Program) Control)Flow)Match)Tables)
+)Ac:ons)
Compiler4