Post on 03-Oct-2020
1
Layer 2 Tunnel
xConnect performance test
2
About test
The main purpose of this examination was to find out the router performance in Layer 2 tunnel mode during transmission of packets with different length. Some background information is present in document also for better understanding. During the test were used:
Cisco 892/K9, C890 Software (C890-UNIVERSALK9-M), Version 15.1(2)T2, RELEASE SOFTWARE (fc1)
JDSU SmartClass Ethernet Testers.
Measured L1 Rate [Mbps] results in tables are rounded. However, it is sufficient for the review. A more accurate value
you can get out from Frame Measured Rate.
For example: for 64-Byte frames L1 Rate is 14,9 Mbps and frame rate is 22172 frames per seconds, In this case
14,9Mbps is rounded. A more accurate value is 22172 [frames/s] x 84 [bytes on wire] x 8 [bits in byte] =
14.899584Mbps.
Background Each layer have own unit of measure. PDU – Protocol Data Unit
Layer 1 Physical Layer - Bit
Layer 2 Data Link Layer - Frame
Layer 3 Network Layer - Packet
Ethernet frame Ethernet II / DIX
Figure 1 shows fields and lengths of Ethernet II/DIX frame.
8 Bytes 6 Bytes 6 Bytes
Ethertype
2 Bytes
Payload
46 - 1500 Bytes
CRC-32
4 Bytes
Interframe gap
12 Byte times
Destination MAC
SourceMACPreamble
Included in the Ethernet frame length
This is Layer 1 part.not included
in the Ethernet frame length
This is Layer 1 part.not included
in the Ethernet frame length
Figure 1. Untagged Ethernet frame
3
Maximum throughput
The inter frame gap is inserted between frames during transmitting (Figure 2)
#4 #3 #2 #1 Frames
Interframe gap12 Byte times
Figure 2 Ethernet interframe gap
Preamble 8 Bytes + Destination MAC 6 Bytes + Source MAC 6 Bytes + Ethertype 2Bytes + Payload 1500 bytes + FCS 4
Bytes + Inter frame Gap “12 Bytes” = 1538 Bytes. In this way 1538 bytes are needed to transmit 1518 bytes untagged
frame.
Layer 3 maximum throughput can’t reach 100% wire speed and depends from packet length. 1538 Bytes are needed for transmitting 1500 bytes of L3 data -> 1500/1538*100% = 97,53% @ untagged frame. 84 Bytes are needed for transmitting 46 bytes of L3 data -> 64/84*100% = 76,19% @ untagged frame.
L2TP encapsulation
L2TP overhead is 38 bytes (Figure 3).
Figure 3 L2TP Encapsulation
4
Performance test L2 Tunnel topology Topology: L2 Tunnel with redundant paths (Figure 4). Routing protocol: BGP protocol is used, but routing protocol selection is not important in this test. Test type: Layer 2 RFC2544. Two variants were tested:
1st
variant: Tunnel via Gi0 port 2
nd variant: Tunnel via Fa7 port (interface SVI VLAN 100)
Figure 4 Schematic diagram
Cisco 892 router has 2 routed ports and 8-port LAN switch. SVI is configured for creation of 3rd routed port.
Connections Fa7-Fa7 and Gi0-Gi0 are tunnels and must have Layer 3 MTU 1538 bytes (1500 + 38 bytes L2TP overhead).
SVI MTU by default is 1514 bytes. It included L3 packet 1500 bytes + L2 MAC header 14 bytes and without CRC.
Because SVI involved in tunneling, then MTU must be 1552 bytes (1538 + 14).
NB! Bug with MTU was found during SVI testing. It described below.
Initial router configurations: Router R1 interface FastEthernet8
no ip address
xconnect 2.2.2.2 123 encapsulation l2tpv3 manual pw-class L2_TUNNEL <- define xConnect
l2tp id 10 20 <- define tunnel session ID
interface Loopback0
ip address 1.1.1.1 255.255.255.255
interface GigabitEthernet0
mtu 1538 <- tunnel MTU
ip address 192.168.2.1 255.255.255.0
interface FastEthernet7
switchport access vlan 100
mtu 1538 <- tunnel MTU
interface Vlan100
mtu 1552 <- tunnel MTU
ip address 192.168.1.1 255.255.255.0
pseudowire-class L2_TUNNEL <- define pseudo wire class
encapsulation l2tpv3 <- define encapsulation
protocol none <- manual mode
ip local interface Loopback0 <- use Loopback 0 interface
5
router bgp 10 <- BGP configuration
bgp router-id 1.1.1.1
bgp log-neighbor-changes
redistribute connected
neighbor 192.168.1.2 remote-as 20
neighbor 192.168.2.2 remote-as 20
neighbor 192.168.2.2 weight 10 <- path via Gi0 is preferred (primary)
no auto-summary
Router R2 interface FastEthernet8
no ip address
xconnect 1.1.1.1 123 encapsulation l2tpv3 manual pw-class L2_TUNNEL <- define xConnect
l2tp id 20 10 <- define tunnel session ID
interface Loopback0
ip address 2.2.2.2 255.255.255.255
interface GigabitEthernet0
mtu 1538 <- tunnel MTU
ip address 192.168.2.2 255.255.255.0
interface FastEthernet7
switchport access vlan 100
mtu 1538 <- tunnel MTU
interface Vlan100
mtu 1552 <- tunnel MTU
ip address 192.168.1.2 255.255.255.0
pseudowire-class L2_TUNNEL <- define pseudo wire class
encapsulation l2tpv3 <- define encapsulation
protocol none <- manual mode
ip local interface Loopback0 <- use Loopback 0 interface
router bgp 20 <- BGP configuration
bgp router-id 2.2.2.2
bgp log-neighbor-changes
redistribute connected
neighbor 192.168.1.1 remote-as 10
neighbor 192.168.2.1 remote-as 10
neighbor 192.168.2.1 weight 10 <- path via Gi0 is preferred (primary)
no auto-summary
The #sh processes cpu sorted 1min and #sh processes cpu history were used for CPU statistics gathering. Please see the example below:
R1#sh processes cpu sorted 1min | i CPU utilization
CPU utilization for five seconds: 57%/55%; one minute: 38%; five minutes: 17%
R1#sh proc cpu hist
R1 07:02:20 AM Wednesday Jan 28 2015 UTC
5555555555555555555555556666655555555555555511111 333332
777777777666666666666666333336666666666666669999911111333339
100
90
80
70
60 ********************************************
50 ********************************************
40 ********************************************
30 ******************************************** ******
20 ************************************************* ******
10 ************************************************* ******
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per second (last 60 seconds)
The CPU utilization for five seconds: 57%/55%; should be read as "Total CPU usage"/"CPU Usage Caused by traffic".
6
Test results
Original L2 Frame
Length [Bytes]
L2 Tunnel Frame Length [Bytes]
Test results. Tunnel via Gi0 port
Measured L1 Rate [Mbps]
CPU total usage
[%]
CPU usage caused by traffic [%]
Measured Rate
frame/sec
64 102 14.9 99 96 22172
128 166 26.3 99 96 22212
256 294 49.9 99 96 22554
512 550 89.2 99 96 20958
1024 1062 96.5 98 95 11552
1280 1318 97.1 57 55 9336
1518 1556 97.6 52 50 7931
random 70 60 56 avg. 12443
Table 1 Throughput of L2 Tunnel via Gi0 and CPU utilization
Original L2 Frame
Length [Bytes]
L2 Tunnel Frame Length [Bytes]
Test results. Tunnel via SVI port
Measured L1 Rate [Mbps]
CPU total usage
[%]
CPU usage caused by traffic [%]
Measured Rate
frame/sec
64 102 10.0 99 97 14880
128 166 17.0 99 97 14358
256 294 30.0 99 97 13587
512 550 59.5 99 97 13980
1024 1062 96.5 93 92 11552
1280 1318 97.1 84 83 9336
1518 1556 14.6 12 11 1186
Table 2 Throughput of L2 Tunnel via Fa7 and CPU utilization
You can see that performance fell fast for 1518-byte frames. It was strange. I began to look for the cause and saw some
interesting information.
R2(config)#do sh buffer leak | i Fa8|Header
Header DataArea Pool Size Link Enc Flags Input Output User
85B44984 1EC01544 DMA-1 1542 7 1 20280 Fa8 None L2X Data
85B45BB4 1EC0A944 DMA-1 1542 7 1 20280 Fa8 None L2X Data
85B4892C 1EC21B44 DMA-1 1542 7 1 20280 Fa8 None L2X Data
85B496D0 1EC28A44 DMA-1 1542 7 1 20280 Fa8 None L2X Data
85B4B6A4 1EC38D44 DMA-1 1542 7 1 20280 Fa8 None L2X Data
85B4DB04 1EC4B544 DMA-1 1542 7 1 20280 Fa8 None L2X Data
85B4F1C0 1EC56E44 DMA-1 1542 7 1 20280 Fa8 None L2X Data
......
L2 tunnel data does not fit to the MTU size of tunnel.
7
I began to capture traffic with Wireshark.
For start, frames are captured for the path through the Gi0 interfaces.
1. Source frame (Figure 5). Length 1518 bytes.
Figure 5. Source frame 1518 bytes
Frame size 1518 bytes. Wireshark shows 1514 bytes – without CRC. Payload pattern is 0xAA. Each frame contains 4
bytes (red selection) in the end of DATA.
2. Frame from R1 to R2 through interface Gi0 (Figure 6). MTU 1552 bytes.
Figure 6. L2TP frame. Path via Gi0
8
MTU is correct. New MAC header 14bytes + New IP header 20 bytes + L2TP header 4 bytes + Original frame without CRC 1514 bytes = 1552 bytes (without CRC). Each frame contains 4 bytes (red selection) in the end of DATA. 3. Frames from R2 to tester 2, from Tester 2 to R2, from R2 to R1 and from R1 to Tester 1.
Frames are correct and have right MTU. In general, changing the frame size is shown in Figure 7.
Figure 7. Frame length change during transmission thru tunnel (via Gi0)
Next step, frames are captured for the path through the Fa7 interfaces.
4. Source frames with length 1518 bytes
Frames are same like in Figure 5.
5. Frame from R1 to R2 through interface Fa7. MTU 1552 bytes.
Router R1 encapsulates frames into L2TP packets and sends to the router R2 via SVI and Fa7 ports. This step is
correct. Frames have correctly MTU (Figure 8).
9
Figure 8. L2TP frame. Path through Fa7
NB! But further steps are an anomaly. In general, changing the frame size is shown in Figure 9.
Figure 9. Frame length change during transmission through tunnel (via Fa7)
6. Frames from R2 to Tester 2
Router R2 adds 4 bytes to end of original frame DATA for frames which de-encapsulates from Tunnel to
output port. Please see Figure 10.
10
NB! This anomaly starts only from frame length 1493 bytes with CRC (1489 without CRC). Frames with length up to
1492 bytes are forwarded correctly.
Figure 10. Frame from R2 to Tester 2. 4 Bytes are added.
Added DATA is shown in yellow frame in Figure 10. These 4 bytes are outside of the Layer 3 packet length.
Please note, my NIC card sends frames without CRC to Windows and Wireshark also. Wireshark shows a value
“bytes on wire” without 4 bytes of CRC.
Wireshark decodes frame and sees that length of Layer 3 payload is 1500 bytes. But additional bytes are
present after End of packet. Wireshark thinks that they are CRC and starts check it. Check fails, because these
bytes are not Layer 2 CRC. Warning message is shown [Ethernet Frame Check Sequence Incorrect].
7. Frames from R2 to R1
Next step. Tester 2 receives frames with inserted 4 bytes from R2, and sends them back toward Tester1.
Because incoming port Fa8 is tunnel port, Router 2 does not check any Layer 3 payload. Router encapsulates
all incoming bytes into L2TP packet and forward to Router R1 via tunnel. This procedure does not add
additional bytes (Figure 11).
11
Figure 11. L2TP frame. Direction from R2 to R1. Tunnel through Fa7.
8. Frames from R1 to Tester 1
Final step. Router R1 also adds 4 bytes to end of “original frame DATA” for frames which de-encapsulates
from Tunnel to output port. “Original frame” from tunnel has 4 additional bytes and router R1 adds next 4
bytes. Frames with length 1522 bytes (without CRC) are returned to Tester 1. Please see Figure 12.
12
Figure 12. Frame with additional 8 bytes.
Tester receives long frames but ignores any data after End of Packet. This is same situation as padding is ignored for
short packets. In this case tester does not show error or lost packets. Only can be seen in the tester that the received
frames are recognized as >1518/1526, but must be as 1024-1518/1526.
MTU tuning MTU are increased for Fa7 and VLAN 100 interfaces - 1542 bytes for Fa7 and 1556 for VLAN 100. Throughput test was
repeated after MTU increasing. Now performance is much better (Table 3). However anomaly with additional bytes is
remained. Router still adds 4 bytes during de-encapsulation from Tunnel to output port for frames with length 1493-
1518 bytes.
13
Original L2 Frame
Length [Bytes]
L2 Tunnel Frame Length [Bytes]
Test results. Tunnel via SVI port
Measured L1 Rate [Mbps]
CPU total usage
[%]
CPU usage caused by traffic [%]
Measured Rate
frame/sec
64 102 10.0 99 97 14880
128 166 17.0 99 97 14358
256 294 30.0 99 97 13587
512 550 59.5 99 97 13980
1024 1062 96.5 93 92 11552
1280 1318 97.1 84 83 9336
1518 1556 97.3 72 71 7908
random 60 64 60 avg. 11530
Table 3. L2 Tunnel throughput and CPU utilization
Summary The Graph 1 is summary graph for RFC2544 tests. This graph shows a maximum throughput. The Graph 2 shows a
router performance for random frames, this gives more close result to a real throughput.
Graph 1 Throughput and CPU utilization.
10,0 Mbps
17,0 Mbps
30,0 Mbps
59,5 Mbps
96,5 Mbps 97,1Mbps 97,3 Mbps
14,9 Mbps
26,3 Mbps
49,9 Mbps
89,2 Mbps
96,5 Mbps 97,1 Mbps 97,6 Mbps 99/97% 92/91% 99/97% 99/97%
93/92% 84/83%
72/71%
99/96% 99/96% 99/96% 99/96%
98/95%
57/55% 52/50%
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
CP
U U
tiliz
atio
n [
%]
Thro
ugh
pu
t [M
bp
s]
Packet Length [Bytes]
L1 Throughput. Tunnel via Fa7 L1 Throughput. Tunnel via Gi0
CPU untilization. Tunnel via Fa7 CPU total usadge. Tunnel via Gi0
14
Graph 2. Throughput and CPU utilization for random frames
Juri Jestin
9.02.2014
70 Mbps
60 Mbps
0
10
20
30
40
50
60
70
L1 R
ate
[Mb
ps]
Tunnel via Gi0 Tunnel via Fa7
CPU utilization 60%
CPU utilization 64%