February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf....

29
February 2003 FIRST Technical Colloquium February 10-11, 2003 @ Uppsala, Sweden bifrost a high performance router & firewall Robert Olsson Hans Wassen

Transcript of February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf....

Page 1: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

F

ebru

ary

2003

FIR

ST

Tec

hnic

al C

ollo

quiu

m F

ebru

ary

10-1

1, 2

003

@U

ppsa

la, S

wed

en

bifr

ost a

hig

h pe

rfor

man

ce

rout

er &

fire

wal

l

Rob

ert O

lsso

nH

ans

Was

sen

Page 2: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Bif

rost

co

nce

pt

Sm

all s

ize

Linu

x di

strib

utio

n ta

rget

ed fo

r F

lash

disk

s 20

MB

Opt

imiz

ed fo

r ne

twor

king

/fire

wal

ling

Tes

ted

with

sel

ecte

d dr

iver

s an

d ha

rdw

are

Ope

n pl

atfo

rm fo

r de

velo

pmen

t and

co

llabo

ratio

n

Res

ults

and

exp

erie

nces

sha

red

Page 3: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Bif

rost

co

nce

pt

Linu

x ke

rnel

col

labo

ratio

n

FA

ST

RO

UT

E, H

W_F

LOC

ON

TR

OL,

New

NA

PI

for

netw

ork

stac

k.

Per

form

ance

test

ing,

dev

elop

men

t of t

ools

an

d te

stin

g te

chni

ques

Har

dwar

e va

lidat

ion,

sup

port

from

big

ve

ndor

s

Det

ect a

nd c

ure

prob

lem

s in

lab

not i

n th

e ne

twor

k in

fras

truc

ture

.

Tes

t dep

loy

(Ofte

n in

ow

n ne

twor

k)

Page 4: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Co

llab

ora

tio

n/d

evel

op

men

tT

he

New

AP

I

Page 5: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Co

re P

rob

lem

s

hea

vy n

et lo

ad: s

yste

m c

onge

stio

n co

llaps

e

Hig

h In

teru

pt r

ates

Live

lock

and

Cac

he lo

calit

y ef

fect

s

Inte

rupt

s ar

e ju

st s

impl

y ex

pens

ive

CP

U

inte

rupt

driv

en: t

akes

too

long

to d

rop

bad

pack

et

Bus

(P

CI)

Pac

kets

stil

l bei

ng D

MA

ed w

hen

syst

em o

verlo

aded

Mem

ory

band

wid

th

Con

tinou

s al

locs

and

free

s to

fill

DM

A r

ings

Unf

airn

ess

in c

ase

of a

hog

ger

netd

ev

Page 6: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Ove

rall

Eff

ect

Inel

egan

t han

dlin

g of

hea

vy n

et lo

ads

Sys

tem

col

laps

e

Sca

labi

lity

affe

cted

Sys

tem

and

num

ber

of N

ICS

A s

ingl

e ho

gger

net

dev

can

brin

g th

e sy

stem

to it

s kn

ees

and

deny

ser

vice

to o

ther

s

010

2030

4050

6070

8090

100

0510152025303540455055

Sum

mar

y 2.

4 vs

feed

back

Mar

ch 1

5 re

port

on

lkm

lT

hrea

d: "

How

to o

ptim

ize

rout

ing

perf

oman

ce"

repo

rted

by M

arte

n.W

ikst

ron@

fram

sfab

.se

- Li

nux

2.4

peak

s at

27K

pps

- P

entiu

m P

ro 2

00, 6

4MB

RA

M

Page 7: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Lo

oki

ng

insi

de

the

bo

x

Bac

klog

queu

epr

oces

sing

For

war

ding

,lo

cally

ge

nera

ted

outg

oing

pack

ets

Inco

min

g pa

cket

sfr

om d

evic

es

To

stac

k

IRQ

Late

r tim

e

Bac

klog

que

ue

Sof

tIRQ

Transmit path

Pac

ket e

nque

ued

to b

ackl

og if

que

ue n

ot fu

ll

Page 8: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

BY

E B

YE

Bac

klo

g q

ueu

e

Pac

ket s

tays

in o

rigin

al q

ueue

(eg

DM

A)

Net

rx s

oftir

q

fore

ach

dev

in p

oll

list

Cal

ls d

ev->

poll(

) to

gra

b up

to q

uota

pac

kets

Dev

ice

driv

er a

re p

olle

d fr

om s

oftir

q an

d pk

ts a

re p

ulle

d an

d de

liver

ed to

net

wor

k st

ack.

Dev

driv

er in

dica

tes

done

/not

done

.

Don

e =

=>

we

go b

ack

to IR

Q m

ode.

Nod

one

==

> d

evic

e re

mai

n on

pol

ling

list

Bre

akes

the

netr

x so

ftirq

at o

ne ji

ffie

or n

etde

v_m

ax_b

ackl

og

Thi

s to

ens

ure

othe

r ta

skes

to r

un

Page 9: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

A h

igh

leve

l vie

w o

f n

ew

syst

em

P

pkts

Inte

rupt

are

aP

ollin

g ar

ea

� P p

acke

ts to

del

iver

to th

e st

ack

(on

the

RX

rin

g)

� Hor

izon

tal l

ine

show

s di

ffere

nt n

etde

vs w

ith d

iffer

ent i

nput

rat

es

� Are

a un

der

curv

e sh

ows

how

man

y pa

cket

s be

fore

nex

t int

erru

pt

� Quo

ta e

nfor

ces

fair

shar

e

Quo

ta

Page 10: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Ker

nel

su

pp

ort

NA

PI k

erne

l par

t was

incl

uded

in:

2.5.

7 an

d ba

ck p

orte

d to

2.4

.20

Cur

rent

driv

er s

uppo

rt:

e100

0 In

tel G

IGE

NIC

'stg

3

Bro

adC

om G

IGE

NIC

'sdl

2k

D-L

ink

GIG

E N

IC's

tulip

(pe

ndin

g) 1

00 M

bs

Page 11: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

NA

PI:

ob

serv

atio

ns

& is

sues

Ooh

I ge

t eve

n m

ore

inte

rrup

ts...

. with

pol

ling.

As

we

seen

NA

PI i

s an

inte

rrup

t/pol

ling

hybr

id.

NA

PI u

ses

inte

rrup

ts to

gua

rant

ee lo

w la

tenc

y an

d at

hig

h lo

ads

inte

rrup

ts n

ever

get

s re

-ena

bled

. C

onse

cutiv

e po

lling

occ

ur.

Old

sch

eme

adde

d in

terr

upt d

elay

to h

andl

eC

PU

from

bei

ng k

illed

by

inte

rrup

ts.

In th

e N

AP

I cas

e w

e ca

n do

with

out t

his

dela

yfo

r th

e fir

st ti

me

but i

t mea

ns m

ore

inte

rrup

ts in

low

load

situ

atio

ns.

Sho

uld

we

add

inte

rrup

t del

ay ju

st o

f old

hab

it?

Page 12: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Tes

ted

devi

ce

Fle

xib

le n

etla

b a

t U

pp

sala

U

niv

ersi

ty

* R

aw p

acke

t per

form

ance

* T

CP

* T

imin

g*

Var

iant

s

sink

devi

celin

ux

El c

heap

o--

Hig

h cu

stom

able

--

We

writ

e co

de :-

)

Eth

erne

t

| |

Tes

t ge

nera

tor

linux

Eth

erne

t

Page 13: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Mot

herb

oard

CP

UU

ni o

r m

ulti-

proc

esso

rC

hips

etB

X, S

erve

rWor

ks, E

750X

BU

S/P

CI-

desi

gn#

PC

I-B

US

'es

@ 1

33M

Hz

Inte

rrup

t des

ign

PIC

, IO

-AP

IC e

tc

Sta

ndby

Pow

er (

Wak

e on

Lan

) ca

n be

a p

robl

em w

ith m

any

NIC

's

Har

dw

are

for

hig

h p

erf.

N

etw

ork

ing

Page 14: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Har

dw

are

for

hig

h p

erf.

N

etw

ork

ing

Ser

verW

orks

, In

tel E

750X

chip

set

man

y P

CI-

X h

ubs/

brid

ges

And

dua

l X

EO

N

PC

I-X

is h

ere

bus

at 8

.5 G

bit/s

Man

y ve

ndor

sus

e C

ompa

ct

PC

I alre

ady

Mem

ory

PC

I-X

I/O b

ridge

PC

I-X

I/O b

ridge

CP

UC

PU

NIC

NIC

Pro

cess

or, I

/O

and

mem

ory

cont

rolle

r

Page 15: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Har

dw

are

for

hig

h p

erf.

N

etw

ork

ing

Cur

rent

ly In

tel h

as a

dvan

tage

. Bro

adco

m c

anbe

a d

ark

hors

e. A

ll ha

s N

AP

I driv

ers.

GIG

E c

hips

ets

avai

labl

e fo

r P

CI

e100

0In

tel -

- e1

000

BC

M57

00B

road

com

– tg

3dl

-2k

D-L

ink --

dl2k

Som

e bo

ard

man

ufac

tors

sw

itch

chip

set o

ften.

Chi

p do

cum

enta

tion

a pr

oble

m.

Page 16: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

So

me

GIG

E e

xper

imen

ts/N

AP

I

Idle

DoS

75100

125

150

175

200

225

250

275

300

325

350

375

400

425

450

475

500

125

117

92

391

95

379

92

478

92

344

91

389

91

426

90

262

91

211

Pin

g la

tenc

y/fa

irnes

s un

der

xtre

me

load

/UP

0 1 2 3 4 5 6 7 8

Latency in microsecondsP

ing

thro

ugh

a id

le r

oute

rP

ing

thro

ugh

a ro

uter

unde

r a

DoS

atta

ck 8

90 k

pps

V a eV

ery

wel

l beh

aved

just

an

incr

ease

a c

oupl

e of

100

mic

rose

c !!

Page 17: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

So

me

GIG

E e

xper

imen

ts

Clo

ne

Allo

c 0

1000

0

2000

0

3000

0

4000

0

5000

0

6000

0

7000

0

8000

0

9000

0

2*X

EO

N 1

.8 M

Hz

pack

et s

endi

ng @

151

8 by

te81

300

pps

is 1

Gbi

t/s

eth0

eth1

eth2

eth3

eth4

eth5

eth6

eth7

eth8

eth9

eth1

0

packets/sec

Pkt

gen

send

ing

test

w. 1

1 G

IGE

C

lone

= 8

.5 G

bit/s

Allo

c =

5.4

Gbi

t/s

Sev

erW

orks

X5D

L8-G

G In

tel e

1000

Page 18: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

So

me

GIG

E e

xper

imen

ts

Clo

ne

Allo

c 0

1000

0

2000

0

3000

0

4000

0

5000

0

6000

0

7000

0

8000

0

9000

0

2*X

EO

N H

yper

Thr

eadi

ng o

n 1.

8 M

Hz

pack

et s

endi

ng @

151

8 by

te81

300

pps

is 1

Gbi

t/s

eth0

eth1

eth2

eth3

eth4

eth5

eth6

eth7

eth8

eth9

eth1

0

packets/sec

Pkt

gen

send

ing

test

w. 1

1 G

IGE

C

lone

= 1

0.0

Gbi

t/sA

lloc

= 7

.4 G

bit/s

Sev

erW

orks

X5D

L8-G

G In

tel e

1000

Page 19: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

So

me

GIG

E e

xper

imen

ts

w/o

HT

w H

T0

250

500

750

1000

1250

1500

1750

2000

XE

ON

2*1

.8 G

Hz

@ 6

4 by

te p

kts

1.48

Mpp

s =

1 G

bit/s

Allo

c

Clo

ne

Kpps

Agg

rega

ted

send

ing

perf

orm

ance

from

pk

tgen

w. 1

1 G

IGE

.

Page 20: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Fo

rwar

din

g

per

form

ance

6412

825

651

210

2415

180

100

200

300

400

500

600

700

800

900

Linu

x fo

rwar

ding

rat

e at

diff

eren

t pkt

siz

es

Linu

x 2.

5.58

UP

/skb

rec

yclin

g 1.

8 G

Hz

XE

ON In

put

Thr

ough

put

pack

et s

ize

kpps

Fill

s a

GIG

E p

ipe

-- s

tart

ing

from

256b

yte

pkts

Page 21: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

R&

D

I O A P I C

Eth

1E

th0

CP

U 0C

PU 0

CP

U 1C

PU

1

Par

alle

lizat

ion

Ser

ializ

atio

n

Eth

1 ho

lds

skb'

sfr

om d

iffer

ent C

PU

'sC

lear

ing

TX

-buf

f re

leas

es c

ache

bou

ncin

g

For

use

r ap

ps n

ew s

ched

uler

does

affi

nty

But

for

pack

et fo

rwar

ding

....

eth0

->et

h1 C

PU

0 (w

e ca

n se

t affi

nity

et

h1 -

> C

PU

0)

But

it w

ould

be

nice

to o

ther

CP

U fo

r fo

rwar

ding

too.

:-)

TX

rin

g

Page 22: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

R&

DV

ery

high

tran

sact

ion

pack

et m

emor

y sy

stem

for

GIG

E a

nd u

pcom

ing

10G

E

Pro

filin

g in

dica

tes

slab

is n

ot fu

lly p

er-C

PU

SM

P-2

-CP

U30

0 kp

ps

SM

P-1

-CP

U30

2 kp

ps

Counter 0 counted GLOBAL_POWER_EVENTS events

vma samples %-age symbol name

c0138e96 37970 8.23162 cache_alloc_refill

c0229490 37247 8.07488 alloc_skb

c0235e90 32491 7.04381 qdisc_restart

c0235b54 27891 6.04657 eth_type_trans

Not

e se

tting

inpu

t affi

nity

hel

ps.

But

we

like

to w

ork

on th

e ge

nera

l pro

blem

c02296d2 25675 8.67698 skb_release_data

c0235b54 24438 8.25893 eth_type_trans

c0235e90 24047 8.12679 qdisc_restart

c0229490 18188 6.14671 alloc_skb

c0110a1c 15741 5.31974 do_gettimeofday

Page 23: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

R&

D

V U

P

gcc-

3.1

V S

MP

2 gc

c-3.

1 V

SM

P1

gcc-

3.1

V S

MP

2 gc

c-2.

95.3

RC

UP

gc

c-3.

1R

C U

P

gcc-

2.95

.3

RC

S

MP

2 gc

c-3.

1

RC

S

MP

1 gc

c-3.

1

IA S

MP

2 gc

c-3.

1IA

RC

S

MP

2 gc

c-3.

1

050100

150

200

250

300

350

400

450

500

550ro

uter

pro

file

XE

ON

no

HT

2*1

.8 G

Hz

Routing Througput in kpps

V=

vani

llaU

P=

unip

ross

orS

MP

1= S

MP

1 C

PU

SM

P2=

SM

P 2

CP

UR

C=

skb

rec

yclin

gIA

=in

put a

ffini

ty

Pro

file

with

p4/

xeon

pe

rfor

man

ce c

ount

ers

GLO

BA

L_P

OW

ER

_EV

EN

TS

M

ISP

RE

D_B

RA

NC

H_R

ET

IRE

DB

SQ

_CA

CH

E_R

EF

ER

EN

CE

MA

CH

INE

_CLE

AR

ITLB

_RE

FE

RE

NC

E

Page 24: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

NA

PI/S

MP

pro

du

ctio

n in

use

: u

u.se

S

tock

holm

Sto

ckho

lm

PIII

933

MH

z2.

4.10

poll/

SM

PF

ull I

nter

net r

outin

gvi

a E

BG

P/IB

GP

DM

Z

AS

283

4

UU

- 1

UU

- 2

Inte

rner

al

UU

-Net

L- u

u1L-

uu2

Page 25: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Rea

l Wo

rld

use

:ftp

.su

net

.se

Ftp

0F

tp1

Ftp

2

Sto

ckho

lmO

C-

48

PIII

- 93

3MH

zN

AP

I/IR

Q

Load

sha

ring

& R

edun

danc

yw

ith R

oute

r D

isco

very

Ful

l Int

erne

t rou

ting

via

EB

GP

/IBG

P

AS

165

3

AS

159

80

GS

R

Arc

hive

- r1

Arc

hive

- r2

Sw

itch

Page 26: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

IP-l

og

in -

- a

Lin

ux

rou

ter

app

. u

ser

auth

enti

cate

d r

ou

tin

g

user

@ho

stIP

- lo

gin

rout

erU

ser's

can

onl

y re

ach

the

IP-

logi

nro

uter

. T

his

host

s a

web

ser

ver.

Use

r w

eb r

eque

sts

are

dire

cted

to

web

serv

er a

nd a

sked

for

user

nam

e,

pass

wor

d ev

. Aut

hetic

atio

n se

rver

. T

oday

TA

CA

CS

If us

er/p

assw

d is

acc

epte

d.

1) F

orw

ardi

ng is

ena

bled

for

hos

t2)

Mon

itorin

g ar

ping

is s

tart

ed

Loss

of a

rpin

g di

sabl

es fo

rwar

ding

.

HHR R

Bas

ed o

n st

olen

cod

e fr

om:

Paw

el K

raw

czyk

--

taca

cs c

lient

A

lexe

y K

uzne

tsov

--

arp

ing

Page 27: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

IP-l

og

in in

stal

lati

on

at U

pp

sala

Un

iver

sity

App

rox

1000

out

lets

Page 28: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

A n

ew n

etw

ork s

ymbo

l has

bee

n se

en..

.

The

Pen

guin

Has

Lan

ded

Page 29: February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers.

Ref

eren

ces

and

Oth

er S

tuff

http

://bi

fros

t.slu

.se

Cla

im th

ey c

an d

o 43

5 K

pps

on P

III 7

00

http

://w

ww

.pdo

s.lc

s.m

it.ed

u/cl

ick/

http

://w

ww

.cyb

erus

.ca/

~ha

di/u

seni

x-pa

per.

tgz

Som

e ot

her

wor

k

http

://ro

bur.

slu.

se/L

inux

/net

-dev

elop

men

t/