February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf....
Transcript of February 2003 FIRST Technical Colloquium February 10-11 ...€¦ · Hardware for high perf....
F
ebru
ary
2003
FIR
ST
Tec
hnic
al C
ollo
quiu
m F
ebru
ary
10-1
1, 2
003
@U
ppsa
la, S
wed
en
bifr
ost a
hig
h pe
rfor
man
ce
rout
er &
fire
wal
l
Rob
ert O
lsso
nH
ans
Was
sen
Bif
rost
co
nce
pt
�
Sm
all s
ize
Linu
x di
strib
utio
n ta
rget
ed fo
r F
lash
disk
s 20
MB
�
Opt
imiz
ed fo
r ne
twor
king
/fire
wal
ling
�
Tes
ted
with
sel
ecte
d dr
iver
s an
d ha
rdw
are
�
Ope
n pl
atfo
rm fo
r de
velo
pmen
t and
co
llabo
ratio
n
�
Res
ults
and
exp
erie
nces
sha
red
Bif
rost
co
nce
pt
�
Linu
x ke
rnel
col
labo
ratio
n
�
FA
ST
RO
UT
E, H
W_F
LOC
ON
TR
OL,
New
NA
PI
for
netw
ork
stac
k.
�
Per
form
ance
test
ing,
dev
elop
men
t of t
ools
an
d te
stin
g te
chni
ques
�
Har
dwar
e va
lidat
ion,
sup
port
from
big
ve
ndor
s
�
Det
ect a
nd c
ure
prob
lem
s in
lab
not i
n th
e ne
twor
k in
fras
truc
ture
.
�
Tes
t dep
loy
(Ofte
n in
ow
n ne
twor
k)
Co
llab
ora
tio
n/d
evel
op
men
tT
he
New
AP
I
Co
re P
rob
lem
s
�
hea
vy n
et lo
ad: s
yste
m c
onge
stio
n co
llaps
e
�
Hig
h In
teru
pt r
ates
�
Live
lock
and
Cac
he lo
calit
y ef
fect
s
�
Inte
rupt
s ar
e ju
st s
impl
y ex
pens
ive
�
CP
U
�
inte
rupt
driv
en: t
akes
too
long
to d
rop
bad
pack
et
�
Bus
(P
CI)
�
Pac
kets
stil
l bei
ng D
MA
ed w
hen
syst
em o
verlo
aded
�
Mem
ory
band
wid
th
�
Con
tinou
s al
locs
and
free
s to
fill
DM
A r
ings
�
Unf
airn
ess
in c
ase
of a
hog
ger
netd
ev
Ove
rall
Eff
ect
�
Inel
egan
t han
dlin
g of
hea
vy n
et lo
ads
�
Sys
tem
col
laps
e
�
Sca
labi
lity
affe
cted
�
Sys
tem
and
num
ber
of N
ICS
�
A s
ingl
e ho
gger
net
dev
can
brin
g th
e sy
stem
to it
s kn
ees
and
deny
ser
vice
to o
ther
s
010
2030
4050
6070
8090
100
0510152025303540455055
Sum
mar
y 2.
4 vs
feed
back
Mar
ch 1
5 re
port
on
lkm
lT
hrea
d: "
How
to o
ptim
ize
rout
ing
perf
oman
ce"
repo
rted
by M
arte
n.W
ikst
ron@
fram
sfab
.se
- Li
nux
2.4
peak
s at
27K
pps
- P
entiu
m P
ro 2
00, 6
4MB
RA
M
Lo
oki
ng
insi
de
the
bo
x
Bac
klog
queu
epr
oces
sing
For
war
ding
,lo
cally
ge
nera
ted
outg
oing
pack
ets
Inco
min
g pa
cket
sfr
om d
evic
es
To
stac
k
IRQ
Late
r tim
e
Bac
klog
que
ue
Sof
tIRQ
Transmit path
Pac
ket e
nque
ued
to b
ackl
og if
que
ue n
ot fu
ll
BY
E B
YE
Bac
klo
g q
ueu
e
�
Pac
ket s
tays
in o
rigin
al q
ueue
(eg
DM
A)
�
Net
rx s
oftir
q
�
fore
ach
dev
in p
oll
list
�
Cal
ls d
ev->
poll(
) to
gra
b up
to q
uota
pac
kets
�
Dev
ice
driv
er a
re p
olle
d fr
om s
oftir
q an
d pk
ts a
re p
ulle
d an
d de
liver
ed to
net
wor
k st
ack.
�
Dev
driv
er in
dica
tes
done
/not
done
.
�
Don
e =
=>
we
go b
ack
to IR
Q m
ode.
�
Nod
one
==
> d
evic
e re
mai
n on
pol
ling
list
�
Bre
akes
the
netr
x so
ftirq
at o
ne ji
ffie
or n
etde
v_m
ax_b
ackl
og
�
Thi
s to
ens
ure
othe
r ta
skes
to r
un
A h
igh
leve
l vie
w o
f n
ew
syst
em
P
pkts
Inte
rupt
are
aP
ollin
g ar
ea
� P p
acke
ts to
del
iver
to th
e st
ack
(on
the
RX
rin
g)
� Hor
izon
tal l
ine
show
s di
ffere
nt n
etde
vs w
ith d
iffer
ent i
nput
rat
es
� Are
a un
der
curv
e sh
ows
how
man
y pa
cket
s be
fore
nex
t int
erru
pt
� Quo
ta e
nfor
ces
fair
shar
e
Quo
ta
Ker
nel
su
pp
ort
NA
PI k
erne
l par
t was
incl
uded
in:
2.5.
7 an
d ba
ck p
orte
d to
2.4
.20
Cur
rent
driv
er s
uppo
rt:
e100
0 In
tel G
IGE
NIC
'stg
3
Bro
adC
om G
IGE
NIC
'sdl
2k
D-L
ink
GIG
E N
IC's
tulip
(pe
ndin
g) 1
00 M
bs
NA
PI:
ob
serv
atio
ns
& is
sues
Ooh
I ge
t eve
n m
ore
inte
rrup
ts...
. with
pol
ling.
As
we
seen
NA
PI i
s an
inte
rrup
t/pol
ling
hybr
id.
NA
PI u
ses
inte
rrup
ts to
gua
rant
ee lo
w la
tenc
y an
d at
hig
h lo
ads
inte
rrup
ts n
ever
get
s re
-ena
bled
. C
onse
cutiv
e po
lling
occ
ur.
Old
sch
eme
adde
d in
terr
upt d
elay
to h
andl
eC
PU
from
bei
ng k
illed
by
inte
rrup
ts.
In th
e N
AP
I cas
e w
e ca
n do
with
out t
his
dela
yfo
r th
e fir
st ti
me
but i
t mea
ns m
ore
inte
rrup
ts in
low
load
situ
atio
ns.
Sho
uld
we
add
inte
rrup
t del
ay ju
st o
f old
hab
it?
Tes
ted
devi
ce
Fle
xib
le n
etla
b a
t U
pp
sala
U
niv
ersi
ty
* R
aw p
acke
t per
form
ance
* T
CP
* T
imin
g*
Var
iant
s
sink
devi
celin
ux
El c
heap
o--
Hig
h cu
stom
able
--
We
writ
e co
de :-
)
Eth
erne
t
| |
Tes
t ge
nera
tor
linux
Eth
erne
t
Mot
herb
oard
CP
UU
ni o
r m
ulti-
proc
esso
rC
hips
etB
X, S
erve
rWor
ks, E
750X
BU
S/P
CI-
desi
gn#
PC
I-B
US
'es
@ 1
33M
Hz
Inte
rrup
t des
ign
PIC
, IO
-AP
IC e
tc
Sta
ndby
Pow
er (
Wak
e on
Lan
) ca
n be
a p
robl
em w
ith m
any
NIC
's
Har
dw
are
for
hig
h p
erf.
N
etw
ork
ing
Har
dw
are
for
hig
h p
erf.
N
etw
ork
ing
Ser
verW
orks
, In
tel E
750X
chip
set
man
y P
CI-
X h
ubs/
brid
ges
And
dua
l X
EO
N
PC
I-X
is h
ere
bus
at 8
.5 G
bit/s
Man
y ve
ndor
sus
e C
ompa
ct
PC
I alre
ady
Mem
ory
PC
I-X
I/O b
ridge
PC
I-X
I/O b
ridge
CP
UC
PU
NIC
NIC
Pro
cess
or, I
/O
and
mem
ory
cont
rolle
r
Har
dw
are
for
hig
h p
erf.
N
etw
ork
ing
Cur
rent
ly In
tel h
as a
dvan
tage
. Bro
adco
m c
anbe
a d
ark
hors
e. A
ll ha
s N
AP
I driv
ers.
GIG
E c
hips
ets
avai
labl
e fo
r P
CI
e100
0In
tel -
- e1
000
BC
M57
00B
road
com
– tg
3dl
-2k
D-L
ink --
dl2k
Som
e bo
ard
man
ufac
tors
sw
itch
chip
set o
ften.
Chi
p do
cum
enta
tion
a pr
oble
m.
So
me
GIG
E e
xper
imen
ts/N
AP
I
Idle
DoS
75100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
125
117
92
391
95
379
92
478
92
344
91
389
91
426
90
262
91
211
Pin
g la
tenc
y/fa
irnes
s un
der
xtre
me
load
/UP
0 1 2 3 4 5 6 7 8
Latency in microsecondsP
ing
thro
ugh
a id
le r
oute
rP
ing
thro
ugh
a ro
uter
unde
r a
DoS
atta
ck 8
90 k
pps
V a eV
ery
wel
l beh
aved
just
an
incr
ease
a c
oupl
e of
100
mic
rose
c !!
So
me
GIG
E e
xper
imen
ts
Clo
ne
Allo
c 0
1000
0
2000
0
3000
0
4000
0
5000
0
6000
0
7000
0
8000
0
9000
0
2*X
EO
N 1
.8 M
Hz
pack
et s
endi
ng @
151
8 by
te81
300
pps
is 1
Gbi
t/s
eth0
eth1
eth2
eth3
eth4
eth5
eth6
eth7
eth8
eth9
eth1
0
packets/sec
Pkt
gen
send
ing
test
w. 1
1 G
IGE
C
lone
= 8
.5 G
bit/s
Allo
c =
5.4
Gbi
t/s
Sev
erW
orks
X5D
L8-G
G In
tel e
1000
So
me
GIG
E e
xper
imen
ts
Clo
ne
Allo
c 0
1000
0
2000
0
3000
0
4000
0
5000
0
6000
0
7000
0
8000
0
9000
0
2*X
EO
N H
yper
Thr
eadi
ng o
n 1.
8 M
Hz
pack
et s
endi
ng @
151
8 by
te81
300
pps
is 1
Gbi
t/s
eth0
eth1
eth2
eth3
eth4
eth5
eth6
eth7
eth8
eth9
eth1
0
packets/sec
Pkt
gen
send
ing
test
w. 1
1 G
IGE
C
lone
= 1
0.0
Gbi
t/sA
lloc
= 7
.4 G
bit/s
Sev
erW
orks
X5D
L8-G
G In
tel e
1000
So
me
GIG
E e
xper
imen
ts
w/o
HT
w H
T0
250
500
750
1000
1250
1500
1750
2000
XE
ON
2*1
.8 G
Hz
@ 6
4 by
te p
kts
1.48
Mpp
s =
1 G
bit/s
Allo
c
Clo
ne
Kpps
Agg
rega
ted
send
ing
perf
orm
ance
from
pk
tgen
w. 1
1 G
IGE
.
Fo
rwar
din
g
per
form
ance
6412
825
651
210
2415
180
100
200
300
400
500
600
700
800
900
Linu
x fo
rwar
ding
rat
e at
diff
eren
t pkt
siz
es
Linu
x 2.
5.58
UP
/skb
rec
yclin
g 1.
8 G
Hz
XE
ON In
put
Thr
ough
put
pack
et s
ize
kpps
Fill
s a
GIG
E p
ipe
-- s
tart
ing
from
256b
yte
pkts
R&
D
I O A P I C
Eth
1E
th0
CP
U 0C
PU 0
CP
U 1C
PU
1
Par
alle
lizat
ion
Ser
ializ
atio
n
Eth
1 ho
lds
skb'
sfr
om d
iffer
ent C
PU
'sC
lear
ing
TX
-buf
f re
leas
es c
ache
bou
ncin
g
For
use
r ap
ps n
ew s
ched
uler
does
affi
nty
But
for
pack
et fo
rwar
ding
....
eth0
->et
h1 C
PU
0 (w
e ca
n se
t affi
nity
et
h1 -
> C
PU
0)
But
it w
ould
be
nice
to o
ther
CP
U fo
r fo
rwar
ding
too.
:-)
TX
rin
g
R&
DV
ery
high
tran
sact
ion
pack
et m
emor
y sy
stem
for
GIG
E a
nd u
pcom
ing
10G
E
Pro
filin
g in
dica
tes
slab
is n
ot fu
lly p
er-C
PU
SM
P-2
-CP
U30
0 kp
ps
SM
P-1
-CP
U30
2 kp
ps
Counter 0 counted GLOBAL_POWER_EVENTS events
vma samples %-age symbol name
c0138e96 37970 8.23162 cache_alloc_refill
c0229490 37247 8.07488 alloc_skb
c0235e90 32491 7.04381 qdisc_restart
c0235b54 27891 6.04657 eth_type_trans
Not
e se
tting
inpu
t affi
nity
hel
ps.
But
we
like
to w
ork
on th
e ge
nera
l pro
blem
c02296d2 25675 8.67698 skb_release_data
c0235b54 24438 8.25893 eth_type_trans
c0235e90 24047 8.12679 qdisc_restart
c0229490 18188 6.14671 alloc_skb
c0110a1c 15741 5.31974 do_gettimeofday
R&
D
V U
P
gcc-
3.1
V S
MP
2 gc
c-3.
1 V
SM
P1
gcc-
3.1
V S
MP
2 gc
c-2.
95.3
RC
UP
gc
c-3.
1R
C U
P
gcc-
2.95
.3
RC
S
MP
2 gc
c-3.
1
RC
S
MP
1 gc
c-3.
1
IA S
MP
2 gc
c-3.
1IA
RC
S
MP
2 gc
c-3.
1
050100
150
200
250
300
350
400
450
500
550ro
uter
pro
file
XE
ON
no
HT
2*1
.8 G
Hz
Routing Througput in kpps
V=
vani
llaU
P=
unip
ross
orS
MP
1= S
MP
1 C
PU
SM
P2=
SM
P 2
CP
UR
C=
skb
rec
yclin
gIA
=in
put a
ffini
ty
Pro
file
with
p4/
xeon
pe
rfor
man
ce c
ount
ers
GLO
BA
L_P
OW
ER
_EV
EN
TS
M
ISP
RE
D_B
RA
NC
H_R
ET
IRE
DB
SQ
_CA
CH
E_R
EF
ER
EN
CE
MA
CH
INE
_CLE
AR
ITLB
_RE
FE
RE
NC
E
NA
PI/S
MP
pro
du
ctio
n in
use
: u
u.se
S
tock
holm
Sto
ckho
lm
PIII
933
MH
z2.
4.10
poll/
SM
PF
ull I
nter
net r
outin
gvi
a E
BG
P/IB
GP
DM
Z
AS
283
4
UU
- 1
UU
- 2
Inte
rner
al
UU
-Net
L- u
u1L-
uu2
Rea
l Wo
rld
use
:ftp
.su
net
.se
Ftp
0F
tp1
Ftp
2
Sto
ckho
lmO
C-
48
PIII
- 93
3MH
zN
AP
I/IR
Q
Load
sha
ring
& R
edun
danc
yw
ith R
oute
r D
isco
very
Ful
l Int
erne
t rou
ting
via
EB
GP
/IBG
P
AS
165
3
AS
159
80
GS
R
Arc
hive
- r1
Arc
hive
- r2
Sw
itch
IP-l
og
in -
- a
Lin
ux
rou
ter
app
. u
ser
auth
enti
cate
d r
ou
tin
g
user
@ho
stIP
- lo
gin
rout
erU
ser's
can
onl
y re
ach
the
IP-
logi
nro
uter
. T
his
host
s a
web
ser
ver.
Use
r w
eb r
eque
sts
are
dire
cted
to
web
serv
er a
nd a
sked
for
user
nam
e,
pass
wor
d ev
. Aut
hetic
atio
n se
rver
. T
oday
TA
CA
CS
If us
er/p
assw
d is
acc
epte
d.
1) F
orw
ardi
ng is
ena
bled
for
hos
t2)
Mon
itorin
g ar
ping
is s
tart
ed
Loss
of a
rpin
g di
sabl
es fo
rwar
ding
.
HHR R
Bas
ed o
n st
olen
cod
e fr
om:
Paw
el K
raw
czyk
--
taca
cs c
lient
A
lexe
y K
uzne
tsov
--
arp
ing
IP-l
og
in in
stal
lati
on
at U
pp
sala
Un
iver
sity
App
rox
1000
out
lets
A n
ew n
etw
ork s
ymbo
l has
bee
n se
en..
.
The
Pen
guin
Has
Lan
ded
Ref
eren
ces
and
Oth
er S
tuff
�
http
://bi
fros
t.slu
.se
�
Cla
im th
ey c
an d
o 43
5 K
pps
on P
III 7
00
�
http
://w
ww
.pdo
s.lc
s.m
it.ed
u/cl
ick/
�
http
://w
ww
.cyb
erus
.ca/
~ha
di/u
seni
x-pa
per.
tgz
�
Som
e ot
her
wor
k
�
http
://ro
bur.
slu.
se/L
inux
/net
-dev
elop
men
t/