NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec...
Transcript of NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec...
![Page 1: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/1.jpg)
technicolor.com
NICOLAS LE SCOUARNEC
JOINT WORK WITH FABIEN ANDRÉ,
STÉPHANE GOUACHE, ANTOINE MONSIFROT
![Page 2: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/2.jpg)
2
Typical internet access network
Customer Premise Equipment
(Residential Gateway)
Access Network
Internet
![Page 3: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/3.jpg)
3
vCPE Rationale
Simplify software running on millions of
embedded devices
► Easier upgrades
► Better integration
Provide visibility into home network
► Secure IoT
► Remote troubleshooting
![Page 4: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/4.jpg)
4
Building middleboxes for residential networks
InternetMiddlebox
(e.g., NAT and Firewall)
Customer Premise Equipment
(Residential Gateway)
Access Network
![Page 5: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/5.jpg)
5
What (not) to use ?
NFV approach (virtualized appliances)
► One VM/container per customer
► Running existing software (e.g., OpenWRT or Linux)
► As done for example in R-CORD
Virtual Switches for traffic dispatching to VM
Does not scale to millions of VMs/containers
Not cost effective
![Page 6: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/6.jpg)
6
Which equipment to use ?
Vendor B
(HW Appliance)
Vendor C
(HW Appliance)
2x Xeon E5 v4
(40 cores)
Vendor A
(VM)
StatelessNF
(NSDI’17)
L4 Throughput
(Simple IMIX)
58 Mpps
130 Gbps
63 Mpps
140 Gbps
180 Mpps
400 Gbps
4,5 Mpps
10 Gbps
4 Mpps
10 Gbps
Cost 65 K$ (HW+SW) 200 K$
(HW+SW)
30 K$
(HW)
21 K$ (SW) NA
Redundancy
model
1+1 1+1 N+1 1+1 N+1
Objective:
180 Mpps / server
4.5 Mpps / core
Available SW for running
on COTS server
![Page 7: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/7.jpg)
7
The residential vCPE challenge
Build a middlebox (firewall, NAT, …)
for residential networks
from COTS hardware
Efficient, Reliable, Scalable
L4 connection tracking
For millions of users
![Page 8: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/8.jpg)
8
Best practices for high-performance networking software
Avoid context switches
► Use kernel-bypass systems (e.g., DPDK)
Don’t lock, don’t share
► Cross-core sharing is expensive even without explicit locking
Run-to-completion model
► Receive, process, transmit, without buffering nor blocking
Applying all these principles everywhere is non-trivial
![Page 9: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/9.jpg)
9
Reliable - sharding and replication
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5Assign both a master server and
a slave server to each shardReplicate state
from master to slave for each shard
Provide reliability by design, not as an afterthought
![Page 10: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/10.jpg)
10
Replication - Availability rather than Consistency
No external DB► Faster insertion and lookup rate (450M lookups/second on 18 cores)
► Non-blocking (no remote memory access)
Availability rather than consistency► Networks are unreliable, applications will recover
► Yet, even short unavailabilities are noticed by user
► Master does not wait for acknowledgment from slave
Efficient lock-less replication► Batching for improved performance
► Same thread for packet processing and replication
► Traffic not interrupted during slave initialization, using support from hash table
![Page 11: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/11.jpg)
11
Efficient (I) – Sharding to the core
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5
![Page 12: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/12.jpg)
12
Efficient (I) - Sharding to the core
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5
Enforce share-nothing by binding each shard exclusively to a single CPU core
All packet processing & management done by the corresponding thread12
![Page 13: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/13.jpg)
13
Efficient (II) - Expose each core to the network
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5Expose an independant identity for each core (not server nor NIC) on the network
One single mechanism to address between and within servers
Each core appears in the system as a independent router
06:00:00:02:12:45 / 172.24.0.1
06:00:00:02:13:A0 / 172.24.0.2
06:00:00:02:25:B2 / 172.24.0.3
06:00:00:02:F2:35 / 172.24.0.4
06:00:00:02:31:A5 / 172.24.0.5
06:00:00:02:13:AC / 172.24.0.6
06:00:00:02:45:D2 / 172.24.0.7
06:00:00:02:F9:A4 / 172.24.0.8
06:00:00:02:B2:30 / 172.24.0.9
06:00:00:02:53:BE / 172.24.0.10
06:00:00:02:DF:E3 / 172.24.0.11
06:00:00:02:A2:32 / 172.24.0.12
13
![Page 14: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/14.jpg)
14
Efficient (II) - Scalable load-balancing by NICs and Switches
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5
Leverage existing top-of-rack switches and server-class NIC to entirely offload load-balancing
Physical L3 Switches are much more efficient than virtual switches
06:00:00:02:12:45 / 172.24.0.1
06:00:00:02:13:A0 / 172.24.0.2
06:00:00:02:25:B2 / 172.24.0.3
06:00:00:02:F2:35 / 172.24.0.4
06:00:00:02:31:A5 / 172.24.0.5
06:00:00:02:13:AC / 172.24.0.6
06:00:00:02:45:D2 / 172.24.0.7
06:00:00:02:F9:A4 / 172.24.0.8
06:00:00:02:B2:30 / 172.24.0.9
06:00:00:02:53:BE / 172.24.0.10
06:00:00:02:DF:E3 / 172.24.0.11
06:00:00:02:A2:32 / 172.24.0.12
BGP
BGP
BGP
IP Routing Table
14
![Page 15: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/15.jpg)
Efficient (III) – Handle reverse traffic efficiently
15
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5
IP Routing Table
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5
IP routing allows precise control on reverse path and also failover path
Traffic is highly asymetrical, use VLAN to improve hardware usage
IP routing allows more control than RSS or ECMP based distribution
VLAN 4 / VLAN 5 / 06:00:00:52:11:45 / 172.25.0.1 06:00:00:02:12:45 / 172.24.0.1
15
IP Routing Table
![Page 16: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/16.jpg)
16
Our design: benefits
Distribution across servers and across cores identical
► Simplified implementation
► Performance scale linearly across cores and across servers
Dynamic load-balancing included (dynamic routing + replication)
► Re-balance the load between servers
► Scale-out and in as demand evolve : elasticity
00:00 03:00 06:00 12:00 15:00 18:00 21:00 00:00
Daily Internet TrafficUnused resources
(75% potential savings
energy, cooling,…)
![Page 17: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/17.jpg)
17
Benchmarking
Multi-core, multi-server benchmarking tool following the same principles
System under test
(large-scale and multi-server)
Traffic generator
![Page 18: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/18.jpg)
18
Benchmarking
Multi-core, multi-server benchmarking tool following the same principles
System under test
(large-scale and multi-server)
![Page 19: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/19.jpg)
19
Performance
0
20
40
60
80
100
Linux(e.g., R-CORD)
Krononat
Mp
ps
Performance (12 cores) for established connections
![Page 20: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/20.jpg)
20
Scalability
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12
Mp
ps
Core
Performance for established connections
Linux Krononat (without replication) Krononat (with replication) Objective
Objective:
4,5 Mpps/core
![Page 21: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/21.jpg)
21
Availability - Server departure
0
10
20
30
40
50
60
70
80
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750
occu
rre
nce
s
Service interruption duration (ms)
Less than 600 ms → below network timeouts
![Page 22: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/22.jpg)
22
Conclusions
Resilient distributed middlebox using COTS hardware► 77 million packets per second on only 12 cores
• 6,4 Mpps/core above objective (4,5 Mpps/core)
► Recover from failures automatically without users noticing
► Cost-effective N+1 redundancy
► Redundancy and dynamic load-balancing allow elasticity
Re-usable design► Expose each core as a distinct entity to the network
► Push per-core traffic steering to the networking equipments (NIC, switches)
► Applied to multi-server multi-core benchmarking tool
![Page 23: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/23.jpg)
23
References
Don’t share, Don’t lock: Large-scale Software Connection Tracking with Krononat
Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot
USENIX ATC’18
Cuckoo++ Hash Tables: High-Performance Hash Tables for Networking
Applications
Nicolas Le Scouarnec
ACM/IEEE ANCS’18
![Page 24: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/24.jpg)
APPENDIX
![Page 25: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/25.jpg)
25
Building a distributed software CG-NAT/FW/…
► Bi-directional traffic
► Must filter unknown connections
L4 Load-balancers
► Maglev
► Ananta
► Fastly@NSDI
► SilkRoad
► …
Access network
► No-reverse path traffic (DSR)
► Leverage deterministic hashing
![Page 26: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/26.jpg)
26
Availability : Graceful departure
0
10
20
30
40
50
60
70
80
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750
occu
rre
nce
s
Service interruption duration (ms)
Failure detection and recovery
Load
rebalancing
![Page 27: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash](https://reader033.fdocuments.us/reader033/viewer/2022042117/5e95661dea37665f34553ba4/html5/thumbnails/27.jpg)
27
Availability : Hard Failure
0
20
40
60
80
100
120
0 400 800 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 5200 5600 6000
occu
ren
ce
s
Service interruption duration (ms)
Failure detection and recoveryLoad
rebalancing
Less than 7s → below many network timeouts