1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic...

26
1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi, Raffaello Secchi Department of Information Engineering University of Pisa

Transcript of 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic...

Page 1: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

1 Raffaello Secchi SPECTS 2005 – July 27, 2005

BRUTE: A High Performance and Extensible Traffic Generator

Nicola Bonelli, Stefano Giordano, Gregorio Procissi, Raffaello Secchi

Department of Information EngineeringUniversity of Pisa

Page 2: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

2 Raffaello Secchi SPECTS 2005 – July 27, 2005

Outline

● Motivations● BRUTE Features● Architecture design and internals

● Implementing issues● Extensibility and modularity

● Script language● Application Programming Interface (API)

● Programming library● Macros

● Traffic modules● Performance evaluation

● Fast Ethernet Scenario● Gigabit Ethernet Scenario

● Conclusions

Page 3: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

3 Raffaello Secchi SPECTS 2005 – July 27, 2005

Motivations & Requirements

● The current open-source software tools are not suitable to deal with high-speed networks:

● poor performance in terms of generated frames per second● scarce timing/rate accuracy in traffic generation

● Requirements:● high performance and precision● extensibility● configurability● RFC2544 compliance

● We developed a tool that …● generate high speed flows over Fast- and Gigabit-Ethernet● extensible through a modular architecture● configurable through an ad-hoc script language● IP version independent: IPv4, IPv6

Page 4: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

4 Raffaello Secchi SPECTS 2005 – July 27, 2005

BRUTE Features

● What is BRUTE?● BRUTE is a Linux user-space real-time traffic engine operating at layer-II

and layer-III

● High performance ● Saturate Fast-Ethernet link with short frame length (64 bytes)● Saturate Gigabit-Ethernet link with 128 bytes frame length

● Configuration● Flexible script language, which allows the user to define customized traffic

patterns

● Extensible design● Traffic modules (C-language)

● API (library functions and macros)● Frame building, memory allocation, sockets handling● IP checksum● Reliable statistical distributions● Timing resources

Page 5: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

5 Raffaello Secchi SPECTS 2005 – July 27, 2005

Implementing Choices (1/3)

Timing issues: temporal accuracy – A traffic generator deals with packets and

inter-departure times…• busy-wait polling versus system call sleep

mechanism

– The gettimeofday features:• low resolution (1 μsec)• high latency due to the time evaluation (500 CPU

cycles)• system-call interrupt mechanism

– Reading the CPU time-stamp-counter…• higher resolution (1 nano-sec with 1Ghz CPU clock)• lower latency around 32 CPU cycles (Intel Pentium)• no interrupt (no system call)

Timers comparison

0

100

200

300

400

500

600

rdtsc gettimeofday

Page 6: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

6 Raffaello Secchi SPECTS 2005 – July 27, 2005

Internals of Linux Kernel 2.4

Page 7: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

7 Raffaello Secchi SPECTS 2005 – July 27, 2005

Implementing Choice (2/3)

Socket family– The sendto computational load differs according

to socket family

– PF_PACKET family avoids routing and headers building

– PF_PACKET bypass the Linux NetFilter Framework

– PF_PACKET allows to customize the Ethernet frame

• RFC2544 suggests some tests using random MAC address

Socket Latency comparison

0

500

1000

1500

2000

2500

PF_PACKET PF_INET

Page 8: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

8 Raffaello Secchi SPECTS 2005 – July 27, 2005

Implementing Choices (3/3)

Scheduling policy● Real-time requirements

● Traffic generator is a typical real-time application

● Linux soft real-time SCHED_FIFO policy● control over the order of execution of processes● static priority assigned to process● preemption of any normal process● no time slicing

● Memory blocking avoid paging delays● mlockall used to disable paging

Page 9: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

9 Raffaello Secchi SPECTS 2005 – July 27, 2005

Overall Architecture

● The modular design involves a distributed parser algorithm● The core parser handles grammar and part of lexical tasks● Micro-parsers distributed in traffic modules complete the lexical

parsing

● The traffic engine executes the micro-engines codes in order to generate the traffic pattern

Page 10: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

10 Raffaello Secchi SPECTS 2005 – July 27, 2005

Extensibility: T-module

• A module implements a traffic class:T-module• only few lines of C-language code define a fully

customizable pattern of traffic

• A T-module consists of:• The structure module_descriptor

• to allow the link between BRUTE core and the module

• The structure mod_line• to define the parameters of a specific traffic class

• The micro-parser handler• to implement the ad-hoc lexical parser

• The micro-engine handler• in charge of generating traffic

Page 11: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

11 Raffaello Secchi SPECTS 2005 – July 27, 2005

Brute script language

<label:> command tok_1 <+->=val; tok_2 <+->=val; …

● A statement consists of:● label● command identifier ● sequence of semicolon terminated atoms

● Where an atom consists of:● Tokens identifier (l-value)● Numbers, functions and variables (r-values)

cbr msec=1000; saddr=192.168.0.1; daddr=192.168.0.2;\ rate=1000; len=udp_data(18); sport=1024; dport=1024;

lab: cbr msec=1000; rate +=1000;

loop times=10; label=lab;

Page 12: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

12 Raffaello Secchi SPECTS 2005 – July 27, 2005

API (1/2)

● Memory management● Allocate and free the memory space required to hold the frame● Setup the frame headers according to the parameters specified in the

configuration file or using random destination (MAC or IP) when specified in the command line.

● The UDP data is filled as specified in the RFC2544

● Timing management● Read the TSC register of the CPU using architecture dependent assembly

instructions (get_cycles)● Busy-wait routine in charge of introducing inter-departure times between

packets

● Frame management● Update the frame with the changes required to obtain the subsequent. It

modifies the IP id and checksum fields and destination IP or MAC according the command line options.

● Forward the frame to network device driver

Page 13: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

13 Raffaello Secchi SPECTS 2005 – July 27, 2005

API (2/2)

● Random Number Generation ● Implemented the Mersenne Twister algorithm

● Quasi infinite period (219937-1)● ~100 CPU cycles (fast to be executed at run-time)● Good statistical properties

● Statistical Distributions● Implemented functions to generate some statistical distribution (uniform,

exponential, Pareto …)

Algorithm CPU cycles Period Lifetime Entropy Chi2 Correlation

Linux rand 109 16(231-1) 9.5 hours 7.95421 0.01% -0.04935

/dev/urandom 20100 - - 7.99996 90.00% -0.00016

TT800 94 2800-1 ∞ 7.356743 0.01% 0.139006

Mersenne T. 100 219937-1 ∞ 7.99995 50.00% 0.00028

Page 14: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

14 Raffaello Secchi SPECTS 2005 – July 27, 2005

Implemented Traffic Modules (2/4)

• Poisson process• constant packet length • exponential inter-departure time

parameters: msec, saddr, daddr, sport, dport, len, tos, ttl, lambda

Page 15: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

15 Raffaello Secchi SPECTS 2005 – July 27, 2005

Poisson Arrival of Burts

• Poisson Arrival of Burst (PAB) process:

R(t) = R N(t)(R is a constant [bitrate])

• N(t) underlying state process

• N(t): superposition of bursts, occurring with exponential inter-arrival time and arbitrary burst length distribution• N(t) is equivalent to the number of busy servers in a M/G/∞ queue, with service time B

• For fixed t, R(t) ~ Poiss (R*E[B])

• If B’s are Pareto distributed (1<<2), R(t) is Long Range Dependent with Hurst parameter H = (3 –

X X X

T1 T2 T3

B1

B2

N(t)

tX

Page 16: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

16 Raffaello Secchi SPECTS 2005 – July 27, 2005

Implemented Traffic Modules (3/4)

• PAB process• constant packet length• Poisson inter-arrival of burts, pareto burst’s length

• parameters: msec, saddr, daddr, sport, dport, len, tos, ttl, alpha, theta, lambda

0

0.05

0.1

0.15

0.2

0.25

0 2000 4000 6000 8000 10000 12000

P(r

ate

)

rate

Empirical distribution

Poisson distribution envelope

Tn

en

T !

0

2000

4000

6000

8000

10000

12000

0 50 100 150 200

fra

me

/se

c

sec

PAB istantaneous rate

Page 17: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

17 Raffaello Secchi SPECTS 2005 – July 27, 2005

Implemented Traffic Modules (4/4)

• End to end delay estimation requirements:

• Measurement Methodology• Two hosts synchronized clock via GPS• One host closed in loopback

• Packet format• Rude implements a proprietary packet format• Using a standard RTCP (SR) we don’t need a specific

receiver applications (tcpdump, ethereal, AX4000…)

• Transmission delay compensation from application layer to device driver

Page 18: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

18 Raffaello Secchi SPECTS 2005 – July 27, 2005

Performance Measurement

● Internal measurement (non invasive)● Allocated a vector into the device driver to store packets’ timestamps

(using get_cycles).● Developed a kernel module to dump off-line timestamps through a /proc

entry.

● Wire-line measurement● Over Fast- and Gigabit-Ethernet (on copper line and optical fiber).

● Hardware employed● Genuine Intel Pentium-4 2.40 Ghz, 512 Mbyte RAM, motherboard ASUS

P4PE, Fast Ethernet 3com 3c905c-TX Tornado● Dual Genuine Intel Xeon 2.66 Ghz, 1Gbyte RAM, motherboard SuperMicro

X5DPE-G2, Intel PRO/1000LX Gigabit Ethernet Controller (fiber)● Spirent AX4000 Traffic Analyzer

Page 19: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

19 Raffaello Secchi SPECTS 2005 – July 27, 2005

• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• Frame length: 64 bytes• BRUTE saturates the link capacity

0

20000

40000

60000

80000

100000

120000

140000

160000

0 20 40 60 80 100

fps

sec

brute

rude

udpgen

mgen

Maximal Rate Test Comparisons

Page 20: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

20 Raffaello Secchi SPECTS 2005 – July 27, 2005

Throughput vs. frame length

• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• BRUTE matches the ideal rate curve at each frame length

0

20000

40000

60000

80000

100000

120000

140000

160000

0 200 400 600 800 1000 1200 1400 1600

fra

me

/se

c

byte/frame

brute

rude

udpgen

mgen

Page 21: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

21 Raffaello Secchi SPECTS 2005 – July 27, 2005

Rate Bias Comparison

-30

-25

-20

-15

-10

-5

0

5

10

0 20000 40000 60000 80000 100000 120000 140000 160000

err

%

frame/sec

brute

rude

udpgen

mgen

• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• Error rate averaged over 106 frames• The through of BRUTE is unbiased at each frame rate

Page 22: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

22 Raffaello Secchi SPECTS 2005 – July 27, 2005

0

5000

10000

15000

20000

25000

30000

35000

0 20000 40000 60000 80000 100000 120000 140000 160000

sta

nd

ard

de

v.

frame/sec

brute

rude

udpgen

mgen

Standard Deviation of Rate Comparison

• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• averaged performed over a window size of 100 frames• Std. dev of the rate of BRUTE grows linearly

Page 23: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

23 Raffaello Secchi SPECTS 2005 – July 27, 2005

Maximal Rate Test Comparison

• Gigabit-Ethernet Scenario • Adapter: Intel PRO/1000LX Gigabit Ethernet Controller

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

0 20 40 60 80 100 120

fra

me

/se

c

sec

brute

rude

udpgen

mgen

Page 24: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

24 Raffaello Secchi SPECTS 2005 – July 27, 2005

Bias Error Comparison

-70

-60

-50

-40

-30

-20

-10

0

10

20

30

0 100000 200000 300000 400000 500000 600000 700000 800000

err

%

frame/sec

brute

rude

udpgen

mgen

• Gigabit-Ethernet Scenario • Adapter: Intel PRO/1000LX Gigabit Ethernet Controller• average over 106 frames

Page 25: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

25 Raffaello Secchi SPECTS 2005 – July 27, 2005

Standard Deviation Comparison

1

10

100

1000

10000

100000

1e+06

0 100000 200000 300000 400000 500000 600000 700000 800000

sta

nd

ard

de

v.

frame/sec

brute

rude

udpgen

mgen

• Gigabit-Ethernet Scenario • Adapter: Intel PRO/1000LX Gigabit Ethernet Controller• averaged performed over a window size of 103 frames

Page 26: 1 Raffaello Secchi SPECTS 2005 – July 27, 2005 BRUTE: A High Performance and Extensible Traffic Generator Nicola Bonelli, Stefano Giordano, Gregorio Procissi,

26 Raffaello Secchi SPECTS 2005 – July 27, 2005

Conclusions

● BRUTE is real-time extensible traffic generator:● Flexible architecture and extensible design.● Along with several traffic modules that generate different pattern of

Ethernet traffic.

● High performance and high level of precision suitable

for network benchmarking● Use of timing paradigms to better satisfy realtime requirements ● Capability to generate workloads at wirespeed in order to stress

network device