IP Tra c Monitoring at 10 Gbit and above - · PDF fileMonitoring at 10 Gbit and above. Luca...
Transcript of IP Tra c Monitoring at 10 Gbit and above - · PDF fileMonitoring at 10 Gbit and above. Luca...
Luca Deri <deri@{unipi.it,ntop.org}>
IP Traffic Monitoring at 10
Gbit and above
Luca Deri <[email protected]> - April 2008
• Introduction to 10 Gbit monitoring.• Lessons learnt by the author while
monitoring a 10 Gbit link using nProbe pre-5.x, his own-grown NetFlow probe.
• Overview of solutions for monitoring faster networks (40 and 100 Gbit).
Talk Overview
Luca Deri <[email protected]> - April 2008
• 2003-05 IST Scampi Project (1 Gbit)• 2005-07 IST Lobster Project (2.5
Gbit)• 2008: Capturing and analysis traffic
at 1 Gbit using commodity hardware is feasible and widespread.
Some Background
Luca Deri <[email protected]> - April 2008
• For a few years, 10G has been used for SAN (storage area networks) and clustering applications in various flavors (e.g. Myri 10G).
• The initial 10G standard has been published in 2002, consolidated into the IEEE 802.3-2005 standard.
• 10Gbit is available in various PHY (6 for fiber, 3 for copper), the most popular/cheap is 10GBASE-SR (fiber 850nm)
10G Technology Overview [1/2]
Luca Deri <[email protected]> - April 2008
• Retention of 802.3 MAC and frame format• Different from other versions of Ethernet
• No half duplex mode• 10G only: no 10/100/1000/10G
• Works with 802.1Q, 802.3ad, etc.• 10 GE is still an emerging technology with
only 1 million ports shipped in 2007.• PC adapters prices are falling (< 1000
Euro), PCI-X adapters replaced by PCIe.
10G Technology Overview [2/2]
Luca Deri <[email protected]> - April 2008
• High number of packets to be analyzed (10 times as much as 1 Gbit).
• CPU-based traffic analysis (e.g. as it happens in most router-based netflow probes) is not feasible at these speeds.
• Packet filtering is very important, in particular on WANs, in order to early discard those packets that are not supposed to be analyzed.
10 Gbit Monitoring Challenges
Luca Deri <[email protected]> - April 2008
• Multiple form-factor and interface variants• Multi-port T1/E1 to 10GbE and OC-192/
STM-64• 40 Gig MPLS/PoS/SONET via 40G1• TDM, SONET, PoS, ATM, Ethernet• PCI to PCIe• Half-size and full-size
• Varied OS support• Most Linux distributions; Windows; FreeBSD;
Solaris• Totally secure and transparent
• i.e. No MAC Address• No layer 2 participation
The Endace DAG
Luca Deri <[email protected]> - April 2008
SNORT1 (a)
SNORT2 (a)
SNORT8 (a)
CORE1
CORE2
CORE8
RAM
Single high-speed
segment
INVERSE MULTIPLEXE
R
BUFFER COLOR OR DROP
HASH FUNCTION
PACKET FILTERS
1GbE / 10GbEINTERFACE
LOAD BALANCE
Σ
Condition & Steer
Load Balance
Polled DMA
Memory Map
CPU Cores & OS
Application(s)• Load balance (with session continuity) between multiple instances of a common application
• 5-tuple filtering distinct traffic to independent analysis tools
• Duplicate / clone complete data to different applications running on distinct CPUs
• Flexible load balancing and duplication for increased deployment flexibility
SNORT1 (a)
SNORT2 (b)
SNORT8 (h)
Packet Filtering
App. ‘A’1
App. ‘B’2
App. ‘H’8
CLONING FUNCTION
Packet Duplication
nProbe1 (a)
nProbe2 (a)
nProbe8
CLONE AND I-MUX
Duplicate and Balance
NinjaBox: Balance, Dup,
Luca Deri <[email protected]> - April 2008
• NinjaBox based on dual 2.0 GHz Xeon E5335 (8 Cores), and Fedora Linux 64 bit.
• 10 Gbit DAG 8.2Z
• Started one nProbe instance on each 8 DAG channels (no DAG code optimizations).
• Smartbit traffic generator
• 1’000’000 IP addresses, 111 bytes packets
• 8’648.64 Mbps (100% utilisation)
nProbe-DAG Test Setup
Luca Deri <[email protected]> - April 2008
Flow Sampling
nProbe CPU Load
System Load
None 100% 8.19
1:2 73% 6.88
1:5 65% 4.84
1:10 50% 3.45
Note: no packet sampling has been used.
nProbe-DAG Test Result [1/2]
Luca Deri <[email protected]> - April 2008
• Worst case test setup: tiny packets (111 bytes), short flow duration (1 min), 1 million IP address spread.
• Packet balancing across 8 nProbes/cores.
• Peak nProbe performance: 100% Packet capture and flow processing up to ~6 Mpps with no sampling.
• Using packet or flow sampling, loss is very limited if any (depends on sampling rate).
• nProbe-DAG is basically able to analyze 10Gbit with no loss using real-life traffic.
nProbe-DAG Test Result [2/2]
Luca Deri <[email protected]> - April 2008
• 64-core CPU.• Linux-based 2.6 operating system
running on board.• Programmable in C with limited
C++ support.
Tilera TILExpress64
Luca Deri <[email protected]> - April 2008
TILExpress64 Architecture
Luca Deri <[email protected]> - April 2008
• No need to capture packets as it happens with PCs.
• 12 x 1 Gbit, or 6 x 1Gbit and 1 x 10 Gbit Interfaces (XAUI connector).
• Ability to boot from flash for creating stand-alone products.
TILExpress64 Features
Luca Deri <[email protected]> - April 2008
• Code porting required in order to exploit multi-core vs multi-thread.
• Implemented libpcap layer in order to hide Tile internals from nProbe core hence simplify the porting.
• nProbe can start either from host-PC or flash (stand-alone NetFlow probe).
TILExpress64 nProbe [1/2]
Luca Deri <[email protected]> - April 2008
• Input traffic on the 1/10G connector(s), output flows either using one board interface or via host-PC ethernet.
• Tilera tested nProbe at 10 Gbit. They were able to keep up with network speed at 10 Gbit using a limited number of tiles (room for growth).
TILExpress64 nProbe [2/2]
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [1/6]
Rx Tx
10G
Rx Tx
10G
Tx Tx Tx Tx
10G 10G
FilteringTime StampForwarding
1G 1G
Rx Tx
configstats
Optional Filtering
Luca Deri <[email protected]> - April 2008
• Cost-effective smart “bump-in-wire” device able to handle 2 x 10 Gbit links. Scalable at 40, and 100 Gbit.
• Ability to operate at wire speed with any packet size.
• Full header and payload, with regex search, filtering.
• Support for “biased” sampling.
cPacket’s cTap [2/6]
Luca Deri <[email protected]> - April 2008
• Provisioning via Web, CLI or network for seamless integration into existing applications.
• Dynamic filter (re)configuration.
cPacket’s cTap [3/6]
Luca Deri <[email protected]> - April 2008
• Biased sampling allows cTap to be a great solution for tackling security/DoS attacks or for monitoring a portion of the traffic flowing in a WAN trunk.
• Great and cheap solution for scaling existing applications at higher speeds.
cPacket’s cTap [4/6]
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [5/6]• Preprocess and pre-filtering improve
performance.
• Separate to “relevant”, “irrelevant” and unknown.
• Add tag/digest, save SW cycles, alleviate bottlenecks.
• Distribute workload to multiple resources (hardware or virtual).
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [6/6]Despite its name, cTap delivers more than just a tap:• Simplicity: filter, forward, balance.• Speed: it operates at wire rate, with
any packet size, no packet loss, full payload inspection.
• Cost: < 10’000 Euro, CT-20G model.
Luca Deri <[email protected]> - April 2008
Recently Intel has introduced a few innovations in their Xeon 5000 chipset that allowed to accelerate network applications:
• I/O Acceleration Technology (I/OAT)
• QuickData Technology
• Direct Cache Access (DCA)
• MSI-X, low latency interrupts and load balancing across multiple RX queues.
Intel and 10G [1/2]
Luca Deri <[email protected]> - April 2008
Intel and 10G [1/2]
Luca Deri <[email protected]> - April 2008
In order to accelerate the capture process, the author has implemented a new Linux driver for Intel 10G PCIe adapters that features:
• Multithreaded packet capture (one thread per RX queue, per adapter).
• Packet RX load balancing across cores: one core, one RX ring.
• Driver-based packet filtering (in-core).
• RX queue virtualization (work in progress).
Accelerating 10G
Luca Deri <[email protected]> - April 2008
• The testbed is an IXIA 400T traf. gen. with 4 x 1Gbit ports mixed into a 10 G port using an HP ProCurve 3400cl-24 switch.
• Using the accelerated driver and nProbe, it has been possible to handle full 4 Gbit traffic with no loss and low CPU usage (<< 10% load).
• A new testbed will be setup in order to produce more test traffic.
• Joint work with UCL’s Click group.
Preliminary 10G Tests
Luca Deri <[email protected]> - April 2008
Part 5Beyond 10 Gbit
Luca Deri <[email protected]> - April 2008
The old principle of “divide et impera” is still valid. Some solutions include:• Endace NinjaBox 40G• cPacket cTap
Beyond 10 Gbit
Luca Deri <[email protected]> - April 2008
OC-768 / PoS Packet Capture
10G Metro Optical40G Backbone Optical
Single C-Band λ drop or 1550nm
MPLS/PoS Optical Transport Switch
C-Band EDFA
Transponder
40G Framer
Timestamp
Classify & Color
Inverse Mux
χ – Splitter Loss = γ
γ + Amplifier Gain
Colored λ In
Optical to Electrical (O-E)
Electrical Out
Parse SONET / PoS
PoS
Append ERF w/Timestamp
PPP or MPLS over PPP
MPLS or PPP. Opt. Drop
Colorized Packets
Steer to 1 of 4 Outputs
10Gb/E w/Encapsulated ERF
4 x 10Gb/E platforms capture and store output for further
interrogation
Router (Non-OTH)
NinjaProbe 40G1
Luca Deri <[email protected]> - April 2008
cTap• Approach similar to NinjaBox.• Traffic reduction facilities.• Traffic can be balanced based on
filtering rules (both header and payload).
• Behavioral traffic profiling leveraging built-in counters.
• Ability to scale to 40 and 100 Gbit.
Luca Deri <[email protected]> - April 2008
• Endace is the only solution that allows 10G to be monitored at (almost) any packet size.
• There is some packet loss with tiny packets and nProbe, but with better Xeon’s or flow/packet sampling they can be overcome.
• Linux-based development, no need to port code, every pcap-based application can be accelerated at almost no cost.
Summary: Endace
Luca Deri <[email protected]> - April 2008
• Excellent for building stand-alone PC-less monitoring solutions.
• Code porting is required, but learning curve is not steeply.
• Not as performant as Endace, but new generation Tile64 chip should be twice as fast.
• Lack of native, on-board 10G connector.
• Not suitable for mono-thread applications as they can’t take advantage of multi-core.
Summary: Tilera
Luca Deri <[email protected]> - April 2008
• Innovation happens here: Intel introduces new controllers/boards every month.
• 64 Xeon CPU announced for 4Q 2008.
• Only solution able to deliver multi-gbit monitoring at very low cost.
• Not yet able to run at 10G with small packets, but the gap is getting smaller.
• Almost linearly scalable with number of CPU cores (same as Endace).
Summary: Commodity Hw
Luca Deri <[email protected]> - April 2008
• Suitable for NetFlow monitoring 100% of traffic (through balancing).
• Biased sampling is useful for tracking traffic peaks, DoS attacks without over-flooding probes.
• Low cost: ideal for moving at 10G and above without investing much money nor porting exiting apps.
• Scalability at 40 and 100 Gbit.
Summary: cTap
Luca Deri <[email protected]> - April 2008
• 10 Gbit NetFlow monitoring (not just packet capture) is possible using open source probes such as nProbe.
• The same code has been ported on all different platforms.
• Scaling to 40 and 100 Gbit is also possible.
• The new challenges are now on the collector side: will it be able to handle
Summary
Luca Deri <[email protected]> - April 2008
• http://www.ntop.org/nProbe.html
• http://www.endace.com• http://www.tilera.com• http://www.cpacket.com• http://www.intel.com/network/
connectivity/
References