Introducing FreeBSD in new environment of FreeBSD... · pfSense FreeNAS Niksun ...
FreeBSD Network Stack Optimizations for Modern...
Transcript of FreeBSD Network Stack Optimizations for Modern...
![Page 1: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/1.jpg)
FreeBSD Network Stack Optimizations for
Modern HardwareRobert N. M. WatsonFreeBSD Foundation
EuroBSDCon 2008
![Page 2: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/2.jpg)
Introduction
• Hardware and operating system changes
• TCP input and output paths
• Hardware offload techniques for TCP
• Software optimizations
2
![Page 3: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/3.jpg)
Changes in hardware
• Memory access dominates computation
➡ Cache-centered design
• Per-CPU performance growth slows
➡ Trend towards multi-processing
• Network speed growth outpaces CPUs
➡ Trend towards hardware offload (again)
3
![Page 4: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/4.jpg)
OS design responses
• Manage cache footprint and contention
• Balance of instructions vs cache misses
• Develop techniques to exploit parallel processing with increasing core density
• Network stack support for offloaded protocol assist
4
![Page 5: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/5.jpg)
TCP input and output
5
![Page 6: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/6.jpg)
TCP
6
• Transmission Control Protocol (TCP)
• Stream protocol: reliable, ordered delivery
• Acknowledgement, retransmission, congestion control, data checksums
![Page 7: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/7.jpg)
TCP input path
7
Hardware UserspaceKernel
ithread netisr software ithread user thread
Device ApplicationLinker layer
+ driverIP TCP + Socket Socket
Data stream to
application
Validate checksum,
strip IP header
Validate checksum, strip
TCP header
Reassemble segments, deliver to
socket
Interpret and strips link
layer header
Kernel copies out mbufs +
clusters
Receive, validate
checksum
Look up socket
![Page 8: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/8.jpg)
TCP output path
8
Userspace Kernel Hardware
user threadithread
2k, 4k, 9k, 16k
256
MSS MSS
Data stream from
application
TCP segmentation, header encapsulation,
checksum
IP header encapsul-
ation, checksum
Kernel copies in data to mbufs +
clusters
Application Socket TCP IPLink layer +
driver
Checksum + transmit
Ethernet frame encapsulation,
insert in descriptor ring
Device
![Page 9: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/9.jpg)
Hardware offload techniques for TCP
9
![Page 10: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/10.jpg)
Hardware offload
• Reduce instructions and cache misses
• Input and output checksum offload
• TCP segmentation offload (TSO)
• TCP large receive offload (LRO)
• Full TCP offload (TOE)
• Not discussed: iSCSI offload, RDMA
10
![Page 11: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/11.jpg)
TCP checksum offload
• Generate / validate TCP/UDP/IP checksums
• Reduce CPU instructions, cache misses
• Input path: hardware validates checksums
• Output path: hardware generates checksums
11
![Page 12: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/12.jpg)
TCP inputchecksum offload
12
Hardware UserspaceKernel
ithread user thread
Device ApplicationLinker layer
+ driverIP TCP + Socket Socket
Data stream to
application
Strip IP header
Interpret and strips link
layer header
Kernel copies out mbufs +
clusters
Receive, validate TCP, IP, ethernet
checksums
Look up socket
Strip TCP header
Reassemble segments, deliver to
socket
Move checksum validation fromnetwork stack to hardware
![Page 13: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/13.jpg)
TCP input checksum implementation
• Drivers declare support in capability flags on interface (if_hwassist)
• Checksum results (pass/fail) placed in receive descriptor entry by card
• Input routines check test for hardware-validated checksums, skipping software if possible
13
![Page 14: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/14.jpg)
TCP input checksum implementation
14
if (m->m_pkthdr.csum_flags & CSUM_DATA_VALID) { if (m->m_pkthdr.csum_flags & CSUM_PSEUDO_HDR) th->th_sum = m->m_pkthdr.csum_data; else th->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr, htonl( m->m_pkthdr.csum_data + ip->ip_len + IPPROTO_TCP)); th->th_sum ^= 0xffff;} else { len = sizeof(struct ip) + tlen; ... th->th_sum = in_cksum(m, len);}if (th->th_sum) goto drop;
tcp_input()
![Page 15: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/15.jpg)
TCP outputchecksum offload
15
Userspace Kernel Hardware
user threadithread
Data stream from
application
TCP segmentation, header encapsulation
IP header encapsulation
Kernel copies in data to mbufs +
clusters
Application Socket TCP IPLink layer +
driver
Checksum TCP, IP,
ethernet + transmit
Ethernet frame encapsulation,
insert in descriptor ring
Device
2k, 4k, 9k, 16k
256
MSS MSS
Move checksum generation fromnetwork stack to hardware
![Page 16: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/16.jpg)
Output checksum implementation
• TCP layer doesn’t know if offload available on interface until routing decision made
• Defer checksums to IP output routine
• mbuf header flags track deferral state
• Device driver pushes higher layer state into hardware via transmit descriptors
16
![Page 17: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/17.jpg)
Output checksum implementation
17
m->m_pkthdr.csum_flags = CSUM_TCP; m->m_pkthdr.csum_data = offsetof(struct tcphdr, th_sum); th->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr, htons(sizeof(struct tcphdr) + IPPROTO_TCP + len + optlen));
m->m_pkthdr.csum_flags |= CSUM_IP; sw_csum = m->m_pkthdr.csum_flags & ~ifp->if_hwassist; if (sw_csum & CSUM_DELAY_DATA) { in_delayed_cksum(m); sw_csum &= ~CSUM_DELAY_DATA; } m->m_pkthdr.csum_flags &= ifp->if_hwassist;
tcp_output()
ip_output()
![Page 18: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/18.jpg)
TCP large receive offload (LRO)
• TCP reassembles segments into streams
• Significant per-packet processing overhead for socket lookup, reassembly, delivery
• LRO: driver reassembles sequential packets to reduce number of “packets” processed
• Software implementation shared by drivers
18
![Page 19: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/19.jpg)
TCP LRO
19
Hardware Kernel
ithread user thread
Userspace
Linker layer + driverIP TCP + Socket Socket
Strip IP header
Interpret and strips link
layer header
Kernel copies out mbufs +
clusters
Receive, validate ethernet, IP, TCP
checksums
Reassemble segments
Application
Data stream to
application
Look up and deliver to socket
Strip TCP header
Move TCP segment reassemblyfrom network protocol to device driver
Device
![Page 20: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/20.jpg)
TCP LRO implementation
20
while ((staterr & E1000_RXD_STAT_DD) && (count != 0) && (ifp->if_drv_flags & IFF_DRV_RUNNING)) {... if ((!lro->lro_cnt) || (tcp_lro_rx(lro, m, 0))) { /* Pass up to the stack */ IGB_RX_UNLOCK(rxr); (*ifp->if_input)(ifp, m); IGB_RX_LOCK(rxr); i = rxr->next_to_check; }...while (!SLIST_EMPTY(&lro->lro_active)) { queued = SLIST_FIRST(&lro->lro_active); SLIST_REMOVE_HEAD(&lro->lro_active, next); tcp_lro_flush(lro, queued);}
igb_rxeof()
![Page 21: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/21.jpg)
TCP segmentation offload (TSO)
• TCP transmit processing converts data stream into a series of packets
• Per-packet overhead is significant
• Pass large chunks of data to card, let it perform segmentation based on MSS
21
![Page 22: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/22.jpg)
TSO
22
Userspace Kernel Hardware
user threadithread
Data stream from
application
TCP header encapsulation
IP header encapsulation
Kernel copies in data to mbufs + clusters
Application Socket TCP IPLink layer +
driver
Checksum + transmit
Ethernet frame encapsulation,
insert in descriptor ring
Device
2k, 4k, 9k, 16k
MSS MSS
TCP segmentation
Move TCP segmentation fromTCP layer to hardware
![Page 23: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/23.jpg)
TSO implementation
• TCP layer detects TSO support, generates segments exceeding interface MTU
• TCP tags packet with CSUM_TSO and MSS so device can implement segmentation
• If TSO is disabled/route changes, ip_output() returns error, MSS recalculated
• Device driver tags TSO data in descriptor
23
![Page 24: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/24.jpg)
TSO implementation
24
if (len > tp->t_maxseg) { if ((tp->t_flags & TF_TSO) && tcp_do_tso && ((tp->t_flags & TF_SIGNATURE) == 0) && tp->rcv_numsacks == 0 && sack_rxmit == 0 && tp->t_inpcb->inp_options == NULL && tp->t_inpcb->in6p_options == NULL && ipsec_optlen == 0) tso = 1; else { len = tp->t_maxseg; sendalot = 1; tso = 0; }}...if (tso) { m->m_pkthdr.csum_flags = CSUM_TSO; m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;}
tcp_output()
![Page 25: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/25.jpg)
TCP Offload Engine (TOE)
• Hardware implements most or all of TCP
• TCP state machine, bindings
• Segmentation reassembly
• Acknowledgement, pacing
• Sockets deliver directly to driver/hardware
• Driver delivers directly to sockets
25
![Page 26: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/26.jpg)
TOE input
26
Hardware Kernel
ithread user thread
Userspace
Driver + socket layer
Strip IP header
Interpret and strips link
layer header
Kernel copies out mbufs +
clusters
Receive, validate ethernet, IP, TCP
checksums
Reassemble segments
Application
Data stream to
application
Look up and deliver to socket
Strip TCP header
Move all portions of TCP processingup until socket deliver to hardware
Device
![Page 27: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/27.jpg)
TOE output
27
Userspace Kernel Hardware
user threadithread
Data stream from
application
TCP header encapsulation + checksum
IP header encapsulation + checksum
Kernel copies in data to mbufs + clusters
Application Socket
Ethernet frame encapsulation + checksum +
transmit
DeviceDriver
Insert in descriptor
ring
Move all portions of TCP processingafter socket send into hardware
2k, 4k, 9k, 16k
MSS MSS
TCP segmentation
![Page 28: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/28.jpg)
Side effects of offload
• “Layering violations” not invisible to users
• Hardware bugs harder to work around
• Stack instrumentation below socket
• BPF, firewalls, traffic management, etc
• TCP protocol behavior
• Not all TOEs equal: SYN, TIMEWAIT, etc.
28
![Page 29: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/29.jpg)
Software structure optimizations
29
![Page 30: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/30.jpg)
Direct dispatch
• Stack input processing in three contexts
• interrupt thread (device driver, link layer)
• netisr thread (protocol, socket deliver)
• user/kernel thread (socket copy out)
• netisr limits parallelism, so directly dispatch protocol from interrupt context
30
![Page 31: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/31.jpg)
Direct dispatch
31
Hardware UserspaceKernel
ithread user thread
ithread netisr software ithread user thread
netisr dispatch
Direct dispatch
ApplicationLinker layer
+ driverIP TCP + Socket Socket
Data stream to
application
Reassemble segments, deliver to
socket
Interpret and strips link
layer header
Kernel copies out mbufs +
clusters
Receive, validate
checksum
Device
Look up socket
Validate checksum,
strip IP header
Validate checksum, strip
TCP header
![Page 32: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/32.jpg)
SMPng Project
• FreeBSD 3.x introduced SMP support
• Userspace on multiple processors
• Kernel on a single processor at a time
• FreeBSD 5.x introduced SMPng
• Fine-grained kernel locking, parallelism
• 6.x, 7.x increase maturity and performance
32
![Page 33: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/33.jpg)
UMA - Universal Memory Allocator
• Kernel slab allocator (Bonwick 1994)
• Mature object life cycle: amortize initialization and destruction costs
• Per-CPU caches: encourage locality by allocating memory from CPU where last freed, avoid lock contention
33
Memory consumers (mbufs, sockets, ...)
UMA zone
Virtual memory
Zone Cache
CPU 0 Cache CPU 1 Cache
Alloc bucket
Free bucket
Alloc bucket
Free bucket
Bucket BucketBucket
CPU 0Consumer
CPU 1Consumer
![Page 34: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/34.jpg)
Multi-processor network stack
• Apply SMPng architecture to network stack
• Identify and lock key data structures
• Create new opportunities for parallelism
• Project essentially complete, although optimization continues
• All protocols and network device drivers run without the Giant lock
34
![Page 35: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/35.jpg)
From mutual exclusion to read-write locking
• Initial locking strategy largely mutexes
• “Mutual exclusion”
• Many data structures require stability rather than mutability in most uses
• rwlock primitive optimized in FreeBSD 7.1
• Many mutexes are becoming rwlocks
35
![Page 36: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/36.jpg)
UDP receive and transmit optimization• FreeBSD 7.1: fully parallel UDP receive and
transmit at socket and protocol layers
• rwlocks for mostly read-only state: connection lists and connections
• Specialized socket send/receive functions avoid stream socket overhead
• Limited by routing, hardware queues
36
![Page 37: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/37.jpg)
Read-mostly locks
• New primitive in FreeBSD 8.x
• Synchronization optimized for reads
• Address/connection lists, firewall rules, ..
• Read acquire with non-atomic instruction
• Write synchronizes CPUs using inter-processor interrupts (IPIs)
37
![Page 38: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/38.jpg)
Hardware devices as a source of contention
• Locking not just for data structures
• I/O to hardware also needs serialization for non-atomic sequences
• Ethernet cards one (or more) receivers, transmitters
• Serialization leads to contention, lack of software parallelism
38
![Page 39: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/39.jpg)
Multiple input queues• Parallelism for hardware
input interface
• Device delivers flows into independent queues using hashing scheme to maintain
• Each queue assigned an ithread to process input queue, allowing parallel processing of input
39
![Page 40: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/40.jpg)
Multiple output queues
• Hardware exposes multiple independent output queues
• OS assigns work based on flows to maintain ordering relative to flow
• Moderates outgoing queue lock contention
40
![Page 41: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/41.jpg)
Where next?
• Continue to optimize locking primitives
• Improve granularity on route locking
• Multiple output queue support (8.0)
• Weak CPU affinity on connections (8.0)
• Hashed locks on global structures (8.0)
• Zero-copy BPF (7.2), sockets (today)
41
![Page 42: FreeBSD Network Stack Optimizations for Modern Hardwarerobert/freebsd/2008eurobsdcon/20081019-eurobsdcon-networking.pdfFreeBSD Network Stack Optimizations for Modern Hardware Robert](https://reader030.fdocuments.us/reader030/viewer/2022040220/5e2568262705735feb67a892/html5/thumbnails/42.jpg)
Conclusion
• Hardware changes motivate significant operating system changes
• Cache-centric design
• Parallelism
• Hardware offload
• Progression over FreeBSD versions as use of techniques introduced and refined
42