PVPP: A Programmable Vector Packet Processoryo2seol/static/talks/pvpp-cisco.pdf · TCP IPv4...
Transcript of PVPP: A Programmable Vector Packet Processoryo2seol/static/talks/pvpp-cisco.pdf · TCP IPv4...
PVPP:AProgrammableVectorPacketProcessor
SeanChoi,XiangLong,MuhammadShahbaz,
SkipBooth,AndyKeep,JohnMarshall,Changhoon Kim
TCP
IPv4
Ethernet
UDP
IPv6 BGP
HTTP
TLS
Fixed-FunctionSwitchChipFixedSetofProtocols
TCP
IPv4
Ethernet
CUSTOM_P
IPv6 BGP
HTTP
TLS
ProgrammableSwitchingChipCustomProtocols
SoftwareSwitch
VM VM
3VirtualPorts
1PhysicalPort
0
20
40
60
2010 2011 2012 2013 2014 2015
Approx.NumberofPhysicalPortsvs.VirtualPorts[1]
PhyicalPorts VirtualPorts
[1] Martin Casado, VMWorld 2013
TCP
IPv4
Ethernet
CUSTOM_P
IPv6 BGP
HTTP
SoftwareSwitchCustomProtocols
PISCES[1]
BMv2[2]
[2] https://github.com/p4lang/behavioral-model
[1] PISCES. ACM SIGCOMM 2016.
7.59
13.32 13.43
0246810121416
64
Throughp
ut(G
bps)
PacketSize(Bytes)
PISCESv0.1 PISCESv1.0 NativeOVS
Performanceoverheadof
<2%
ThroughputonEth+IPv4+ACLbenchmarkapplication[1]
[1] PISCES. ACM SIGCOMM 2016.
So… whyANOTHERP4softwareswitch?
Parser Match+Action Tables Queues/Scheduling
Initially, the switching chip is not programmed and does not know any protocols.
Packet Metadata
Protocol Authoring
L2_L3.p4
Compile
Configure
Parser Match+Action Tables Queues/Scheduling
Packet Metadata
TCP New
IPv4 IPv6
VLANEthRun-time API
Driver
Switch OS
Protocol Authoring
L2_L3.p4
Compile
Configure
Parser Match+Action Tables Queues/Scheduling
Packet Metadata
Run-time APIDriver
Switch OSOF1-3.p4
KernelDPDK
SoftwareSwitch
Parser Match-Action Pipeline
KernelDPDK
Software Switch
Domain-Specific Language (DSL)
Parser Match-Action Pipeline
Compile
Parser Match-Action Pipeline
KernelDPDK
SoftwareSwitch
DSL 1
Parser Match-Action Pipeline
Compile
Parser Match-Action Pipeline
DSL 2
Parser Match-Action Pipeline
KernelDPDK
SoftwareSwitch2
Parser Match-Action Pipeline
PISCES• P4toOvS
BMv2• P4toa
C++customswitch
What’swrongwiththisdesign?
• NotdesignedforCPUbasedarchitectures
• Limitedinexpressiveness
• LimitedAPIstoaccesslowlevelconstructs
=>Lotofroomforimprovements!
VectorPacketProcessing(VPP)Platform
• OpensourceversionofCisco’s
VectorPacketProcessingtechnology
• Modular packetprocessingnodegraphabstraction
• Eachnodeprocessesavectorofpacketstoreduce
CPUI-cachethrashing
• Extensibleanddynamicallyreconfigurableviaplugins
VectorPacketProcessing(VPP)Platform
• ProvenPerformance[1]
[1] https://wiki.fd.io/view/VPP/What_is_VPP%3F
• MultipleMPPSfromasinglex86_64core
• >100Gbpsfull-duplexonasinglephysicalhost
• OutperformsOpenvSwitch invariousscenarios
1core: 9MPPSipv4in+out forwarding2cores:13.4MPPSipv4in+out forwarding4cores:20.0MPPSipv4in+out forwarding
…Packet Vector dpdk-input
ip6-inputip4-input llc-input
ip6-lookup
ip6-rewrite-transmit
…
dpdk-output
…Packet Vector dpdk-input
ip6-inputip4-input llc-input
ip6-lookup
ip6-rewrite-transmit
…
dpdk-output
Vanilla VPP Nodes
…
Custom-input
Node 1 Node 2 Node i
Node j
Node k
Custom Plugin
…Packet Vector dpdk-input
ip6-inputip4-input llc-input
ip6-lookup
ip6-rewrite-transmit
…
dpdk-output
Vanilla VPP Nodes
…
Enabled via CLI
Custom-input
Node 1 Node 2 Node i
Node j
Node k
Custom Plugin
PVPPOverview
• Createsaplugin basedontheinputP4program
• NochangestoexistingVPPcodebase
• Compileseithersinglenodeormultiplenodeplugin
• Multiplenodesaresplitbynumberoftablesinthe
inputP4program
• P4programscanbeswappeddynamically
…Packet Vector dpdk-input
ip6-inputip4-input llc-input
ip6-lookup
ip6-rewrite-transmit
…
dpdk-output
Vanilla VPP Nodes
…
Enabled via CLI
pvpp-input
Table 1 Table 2 Table i
Table j
Table k
Multi-NodePVPP Plugin
Front-endCompiler
BMv2Mid-endCompiler
BMv2Back-endCompiler
JSON-VPPCompiler
VPPPlugin
Directory
P4Program
VPP PluginCog
Templates
P4 Compiler (P4C)
JSON
C Files
DetailsofPVPPPlugin• HeadersaredefinedasCstructs
header_type ethernet_t {fields {dstAddr: 48;srcAddr: 48;etherType: 16;
}}
typedef struct {u8 dstAddr[6];u8 srcAddr[6];u16 etherType;
} p4_type_Ethernet_h;
• Actioninterfacetakespointerstoallheader,metadata,
runtimedataandcompilerselectsthecorrectpointerandset
ofprimitivestoperformonthedata.
DetailsofPVPPPlugin• Atabledefinitioncontainstwoparts
1. Amatchdefinitionthatdefinesthetypeofmatch
(EXACT,LPM)andwhichfieldstomatchwith
2. Aactiondefinitionwhichcontainssetofactionpointers
correspondingtothematchresult
PVPPCLI• TwoCLIsarecurrentlysupported
1. Enable/DisablePVPPPipeline
$ pvpp [ingress interface name]
2. CLItoinstallmatchrulesforaparticulartable
$ pvpp insert-rule [table name]
[match value] [action name]
[runtime data]
PVPPDPDK
MoonGenSender/Receiver
MoonGenSender/Receiver
10Gx3 10Gx3
M1 M2 M3CPU:IntelXeonE5-2640v32.6GHzMemory:32GBRDIMM,2133MT/s,DualRankNICs:IntelX710DP/QPDASFP+CardsHDD:1TB7.2KRPMNLSAS6Gbps
Experimental Setup
BenchmarkApplication
IPv4_match
Match:ip.dstAddrAction:Set_nhop
drop
ParseEthernet/
IPv4
Match:ip.dstAddrAction:Set_dmac
drop
Destination MAC
Match:egress_portAction:Set_dmac
drop
Source MAC
BaselinePerformance
7.867.05
0
1
2
3
4
5
6
7
8
9
64
Throughp
ut(M
pps)
PacketSize(Bytes)
SingleNode MultipleNode
Compileroptimizations• Removeredundanttables
• Reducingmetadataaccess
• BypassingredundantVPPnodes
• Reducepointerdereference
• CachinglogicalHWinterfaces
• Unrollingloopsformultiplepacketprocessing
LoopUnrolling
Manuallyfetchestwopackets
OptimizedPerformance
7.86
9.25 9.51 9.51 9.58 10.01 10.21
7.05
8.38 8.50 8.80 8.89 9.02 9.20
0
2
4
6
8
10
12
Baseline RemovingRedundantTables
ReducingMetadataAccess
LoopUnrolling BypassingRedundantNodes
ReducingPointer
Dereferences
CachingLogicalHWInterface
Throughp
ut(M
pps)
SingleNode MultipleNode64bytepackets,single10Gport
OptimizedPerformance
10.21
8.07
5.634.38
9.208.07
5.65
4.38
0
2
4
6
8
10
12
64 128 192 256
Throughp
ut(M
pps)
PacketSize(Bytes)
SingleNode MultipleNode
OptimizedPerformance
010002000300040005000600070008000900010000
64 128 192 256
Throughp
ut(M
bps)
PacketSize(Bytes)
SingleNode MultipleNode
OptimizedPerformance
133.00149.00
171.00194.00
159.00172.30
222.30
255.20
0
50
100
150
200
250
300
64 128 192 256
AverageCP
UCyclesp
erPacket
PacketSize(Bytes)
SingleNode MultipleNode
Scalability
8.52
17.03
26.40
35.83
44.23
53.11
8.14
16.57
24.14
33.41
40.69
49.34
0
10
20
30
40
50
60
1 2 3 4 5 6
Throughp
ut(M
pps)
NumberofCPUs
SingleNode MultipleNode
64bytepacketsacross3x10Gports
PerformanceComparison
59.53
49.31
34.71
26.78
63.49
47.23
34.72
26.7830.22 30.22 30.20
26.78
0
10
20
30
40
50
60
70
64 128 192 256
Throughp
ut(M
pps)
PacketSize(Bytes)
PVPP PISCES(withMicroflow) PISCES(withoutMicroflow)
FutureWork
• Automatednodesplitsbasedontheinputprogram
• Morecompilerannotationsforlowlevelconstructs
• ExtendingP4supportsuchasdataplanestates
• VPPspecificP4_16backendcompiler
• ExtendingPVPPCLIfeatures
Summary
PVPP
VPP
P4- Aperformantanddynamically
reconfigurableP4switchbasedonadifferentpacketprocessingabstraction
- Moreimprovementsplannedoverthesummerpriortopublicrelease
Questions?