G ö khan Ü nel / CHEP 2004- Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system...

1G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Performance of the ATLASDAQ DataFlow system

•Introduction/Generalities–Presentation of the ATLAS DAQ components

•Functionality & Performance Measurements–Prototype Setup–Event Building, ROI collection, Combined systems–At2sim: discrete data Simulation

•Conclusions–From Prototype setup & simulations

•Outlook

N Gökhan Ünel

on behalf of the ATLAS TDAQ Group

2G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Generalities : ATLAS DAQ

Level1(L1) rate: 75 kHz min, upgradeable to 100 kHz

Level2(L2) rate per ROS : 20 kHz ; L2 time budget per event: 10 ms

EventBuilding(EB) rate : 3-3.5 kHz for 1.5 2 MByte events

Recording rate: 200 Hz for 1.5 2 MByte events

SFIL2PU

L2SV DFM

pROS

ROSROI data

(100kHz)

Event data

(100kHz)

L2 decision

To EventFilter (3kHz)

ROI data

Event Clear

Assi

gn e

vent

Re

qu

est

da

ta

Req

uest

dat

a

L2

de

tails

L2 decision

End

of e

vent

3G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Matching requirements

– DataFlowManager(DFM), L2SuperVisor(L2SV): • previous work (TDR) has shown currently available

hardware can match the requirements.

– ReadOutSystem(ROS), SubFarmInput (SFI): • Latest studies will be presented in this talk

– L2ProcessingUnit (L2PU): • Since the physics algorithms for event selection are

not finalized, only time to fetch fragments from ROS will be compared to computation budget.

– Networking:• Discrete event simulation tool will be used to scale

from prototype setup up to final ATLAS size.

4G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

EB / L2 Setups

EB: up to 16SFIs

Up to 24 ROSs

L2: up to 14L2PUs

up to 6 L2SVs

up to 8 ROSs

FastIron – 64 ports

T6 – 31 ports

Few FAST ROS

5G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

EB only setups

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

1 3 5 7 9 11 13 15 17# SFIs

EBrate (Hz) 3ROS 12ROS 18ROS

24ROS 5ROS 3GHz

EventBuilding RateSolid lines: ROS=2GHzDashed line: ROS=3GHz

8.55 kHzx12.4k=106MB/sROS cpu limit

Small & Large systems have the same max EB rate

no penalty as event size grows

Can run 24 ROS vs 16 SFI EB system stably

Faster ROS does a better job (we hit the io limit)

110MB/s per SFI NIC limit

ROS : 12 emulated input channels, 1kB /channel

SFI : No output to EF

More ROS = Bigger Events !

9.66 kHzx12.4 k = 120MB/s ROS NIC limit

6G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Scaling in EB throughput

•EB throughput scales linearly with Nb of SFIs

•No show-stoppers

•Possible to estimate the rate of any EB system in the prototype setup

Fit for SFI limiting case

0

500

1000

1500

2000

2500

0 5 10 15 20# SFIs

EB Tput (MB/s) 3ROS 6ROS 12ROS18ROS 24 ROS

7G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

SFI requirements for 3 kHz EB

0

20

40

60

80

100

0 500 1000 1500 2000event size (KB)

SF

Is

SFIs @ 112MB/s SFIs @ 75MB/s

Determining Number of SFIs

Requirement: 3-3.5 kHz of EB for 60-70 % bandwidth usage per SFI

60% bw

90% bw

Typical ATLAS event size

•At typical event size of 1.5 Mb, 60 SFIs (2.4 GHz SMP) are enough

•Output to EF + extra SFIs for safety margin should be considered

100 SFIs (2.4 GHz SMP) would easily handle 3-3.5 kHz 1.5-2MB events

8G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

ROS cpu limited

Level2 Rate

• dummy algorithms in L2PUs• 6 concurrent ROI collection per L2PU• Linear scaling when ROS is not the limiting factor

L2 only Setup

0102030405060708090

100

1 3 5 7 9 L2PUs

L1 (kHz) 4 ROS 1 ROS 1 ROS @ 3GHz

9G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

L2 Time budgetdummy algorithm time vs number of 1K fragments utilized

0

100

200

300

400

500

600

700

800

900

1000

0 2 4 6 8 10 12 14 16 18ROL

time (usec)

from different ROS

from same ROS

• If 500 L2PU 3 GHz SMP is used – 10 ms /event at 100 kHz L1 rate for L2 decision– Worst case of 16 ROLs all from different ROS < 0.8ms

Requirement: 10 ms event for L2 decision, ROI fetch time << 10ms

Longest ROI fetch: 13-16 ROL

10G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Foundry EI

FoundryFastIron 800

SFI(O)1 - 16

SFI01

ROS19

L2P01

L2P14

…..L2SV06

…L2SV01

pROSDFM

ROS01

ROS18

……

ROS24

… …

Combined setups: EB + L2

BATM T6

11G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

3x2 combined system

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 3 5 7 9 11 L2PU

EB rate (Hz) 3% 2GHz ROS 5% 2GHz ROS3% 3GHz ROS 5 % 3GHz ROSsimulation

Small system:3ROS x 2SFI x ..12 L2PU

Since the Max rates for EB and L2 are known,

Use the plateau region to calculate the ROS cpu utilization for “clear” task

Plateau: ROS cpu limit

12G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

2GHz CPU 3.06 GHz CPU Max L2 rate (kHz) 33.0 55.5 Max EB rate (kHz) 8.6+ 14.5+,** CPU per 1kHz of L2 (GHz) 0.06061 0.05564 CPU per 1kHz of EB (GHz) 0.2252 0.20274 CPU per 1kHz of Clears (GHz) 0.0074 0.0083

Analysis for ROS cpu

CPU= REB×CPUEB + RL2×CPUL2 + RL1 ×CPUCl

CPUEB is the CPU power spend by the ROS on 1 kHz of Event Building CPUL2 is the CPU power spend by the ROS on 1 kHz of Level 2 ROI CPUCl is the CPU power spend by the ROS on 1 kHz of Event Clears

Requirement: 100 kHz L1, 20 kHz L2, 3-3.5 kHz EB

+ including clears ** using 2 NICs simultaneously

2GHz ROS needs: 20x0.06061 + 3x0.2252 + 100x0.0074= 2.6 > 2.0

3GHz ROS needs: 20x0.05564 + 3x0.20274 +100x0.0083= 2.55 < 3.06

13G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Combined system

Largest possible system using 2GHz ROS18ROS x 16SFI x 12 L2PU runs stably

18 ROS x 16 SFIs w/ 2GHz ROSs

0

1000

2000

3000

4000

5000

6000

7000

8000

1 3 5 7 9 11 13#L2PU

EB rate (Hz) 3% 5% 10% EB only

14G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Meeting requirements with 3 GHz ROS

Good agreement between data and simulation

3 GHz ROS can do 20 kHz L2 & 3 kHz EB at 100 kHz L1

5x4 3.06 GHz ROS

0

1000

2000

3000

4000

5000

6000

7000

8000

1 3 5 7 9 11#L2PU

EB rate (Hz) 3% 5% 10%EB only 100% 3% simulSeries8

EB=3 kHz, acc=3%L2 = 20kHz L1=100 kHz

15G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Final system Simulation -1• 160ROS x 110SFI x N L2PU• Using concentrating switches for PUs (61)• Realistic Trigger Menu & ROI distribution

Impact of L2PU farm size on L1Rate - Slow ROS

70

75

80

85

90

95

100

105

110

0 2 4 6 8 10 12 14 16 18time [s]

L1 R

ate

[kH

z]

75kHz, 252L2PUs75kHz, 304L2PUs75kHz, 404L2PUs75kHz, 504L2PUs100kHz, 252L2PUs100kHz, 304L2PUs

Stable @ 75 kHz

Stable @ 95 kHz

16G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Final system Simulation -2at2sim: 127ROS, 110 SFIs, 504 L2PUs with concentrator switches

0

20

40

60

80

100

120

0 2 4 6 8 10time (s)

Final size system runs smoothly with fast ROSs (3.06GHz)

L1 rate (kHz)

EB latency (ms)

# events in L2

Slowest ROS Q

17G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Conclusions - I• 3GHz ROS can do 3kHz EB & 20kHz L2

– we need ~140 such nodes

• Dual 2.4 GHz SFI can do 3kHz EB at 60% of line-speed– We need ~100 such nodes

• Dual 3GHz L2PU can do ROI collection better than 8% of its time budget – We need ~500 such nodes

• The largest test system was 18x16x12– No scalability/functionality problems observed

18G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

• at2sim of the final setup:160x100x ..500– Scaling from 20% to 100%:

• no surprises, no queues, no anomalies

• Network: we can handle extreme traffic caused by ultra-fast L2 PUs without algorithms

• Prototype L2PUs running @ 12.5 kHz, ~25 times faster then in the final system

Conclusions - II

19G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Next Steps• Test: Prototype custom hardware with 2 input

channels

• Preseries: 10 % setup down in the ATLAS cavern– A bigger switch (128 ports) will be bought– Merge with existing prototype setup– Time scale: Q2 / 2005

• Networking aspects: scalability & performance– Separate test bed – Dedicated hardware (line-speed @ any Frame-size)– Stress testing candidate switches

20G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Backup slides

21G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

Hardware inventory–Networking

• 1 EB switch: Foundry FastIron 800 – 62 Ports

• 1 L2 switch: BATM T6 – 31 Ports• 1 X-over switch: Foundry EdgeIron – 10 Ports

–PCs (intel Xeon, 64bit/66MHz PCI)• 31 Tower Uni-proc. (2.0 GHz)

– 25 used as ROS for scaling studies– 06 used as L2SVs– 01 used as DFM

• 16 Tower Dual-proc. (3.06 GHz)– Used as L2PUs– 5 used as ROS for performance studies

• 16 rack mountable Dual proc. (2.4 GHz)– Used as SFIs

22G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

EFD setup

DFM

EFD1

ROS1

SFI

ROS2

EFD2

EFD15

23G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS EB with EFD output

0500

10001500200025003000350040004500

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

# EFDs

EB

ra

te

EFD Studies

40% performanceloss

No EF outputSingle SFI: small events, WORST case.

24G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

2 0

2 4

2 8

3 2

3 6

4 0

0 4 8 1 2

N u m b e r o f L 2 P U s

LVL1

eve

nt ra

te (k

Hz)

DFM & L2SV performance

25G

ökha

n Ü

nel /

CH

EP

200

4- I

nter

lake

n

ATLAS

ROS input emulation vs Prototype Hardware

Data

Emulation

G ö khan Ü nel / CHEP 2004- Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system...

Documents

Transcript of G ö khan Ü nel / CHEP 2004- Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system...