Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) [email protected].

20
Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) [email protected]

Transcript of Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) [email protected].

Page 1: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Process and Data Flow Control in KLOE

E. Pasqualucci (INFN - Roma)[email protected]

Page 2: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Outline

• System overview• Process structure and local

communication• SNMP and remote communication• Process control• Data Flow Control system• DFC monitor

Page 3: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

FDDI Switch

VIC

AUXM

ADC16

...

ADC1

ROCK

VIC

AUXM

ADC16

...

ADC1

ROCK

VIC

AUXM

ADC16

...

ADC1

ROCK

VIC

AUXM

ADC16

...

ADC1

ROCK

CPU

FDDI

VIC

VIC

ROCKM

CPU

FDDI

VIC

VIC

ROCKM

CPU

FDDI

VIC

CPU

FDDI

VIC

ROCKM

CPU server CPU server

Storage system

Run Control

Monitor System

. . .

. . .

Trigger chain DFC system

VIC

VIC

CBUS

Level-2Level-2cratescrates

FDDI

~ 23000 FEE channels @ 2.5 kHz f + bckg (~10 kHz) Bandwidth: ~ 50 Mbytes/s (5 Kbyte/ev.)

Storage: 200 Tbyte/y

Tested with peak rates of 10 kHz in multibunches mode.Tested at maximum required throughputusing no zero suppressed calorimeter data

DAQ system architecture

Page 4: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

DAQ software organization

Spy dump

RecorderBuilder

Circ

Circ (Ybos)To Disk/Tape

dmap

Receiver

FarmFarm

FDDI switch

GeoVme mapCollector

Circ

Sender

LevelLevel 22

Farm status

Chain tools

simulation

VME

Level 1 chainData

Map data

Messages

Traps

SpyD

Monitorsystem

Didone

SpyBuff

RSpyD

SlowCtlsystem

DFCsystem

CmdSrv

CmdSrv

RunCtl

Page 5: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Process structure

• Initialization– Msg Q creation– Shmem subscription– Shmem space allocation

for variables

• Main Loop– Process Event– Process Command– Idle time

• Interrupt Handler– Extract command from

Msg Q.

Id Contents Mapping

Pro

cess

es

Process numberPointer to 1st process

Pointer to 2nd processProcess nameProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

Header

Proc. 1

Proc. 2

All

Page 6: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Id Contents Mapping

Pro

cess

es

Header

Proc. 1

Proc. 2

All

Process numberPointer to 1st process

Pointer to 2nd processMy processMy process idMy Q idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

Process numberPointer to 1st process

Pointer to 2nd processMy processMy process idMy Q idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

• Reads the Q

Local communication

Process numberPointer to 1st process

Pointer to 2nd processProcess nameProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

• Getting a variable:Process numberPointer to 1st process

Pointer to 2nd processMy processProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

Process numberPointer to 1st process

Pointer to 2nd processMy processProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1My variable = value…..

Pointer to 3rd process…..

• Locate process• Locate variable

Process numberPointer to 1st process

Pointer to 2nd processProcess nameProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

• Sending a command:– The sender:

• Locates the process• Gets its id and message Q

Process numberPointer to 1st process

Pointer to 2nd processMy processMy process idMy Q idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

My process

Q

• Puts command to Q• Sends an interrupt

signal

Stop !

Process numberPointer to 1st process

Pointer to 2nd processMy processMy process idMy Q idProcess statusStop !ExecutingNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

• Polls on command status

Process numberPointer to 1st process

Pointer to 2nd processMy processMy process idMy Q idProcess statusStop !SuccessNumber of variablesVariable 1Variable 2…..

Pointer to 3rd process…..

Stop !

– The receiver:

• Writes the command and status and executes it• Writes the command status (acknowledgement)

Page 7: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Managing the DAQ network

• SNMP (Simple Network Management Protocol) • Largely used to manage network devices• Defined as a standard by the IETP (Internet

Engineering Task Force)• Implemented using a reliable UDP protocol• Used to retrieve and/or set information about :

– network configuration– traffic– faults– accounting

• Managed objects defined in a Manager Information Base (MIB) defined by IETP

• Private extensions of the standard MIB are allowed• Public domain software, allows the implementation

of :– dedicated agents– utilities for remote access

Page 8: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

SNMP client-server policy• MIB

– Variables organized as a tree

• Primitives:– get, get-next, set

• Each device runs a daemon able to:– Understand MIB requests– Obtain required information– Execute required actions

• Trap mechanism• KLOE uses SNMP to:

– Control DAQ devices and network– Implement message distribution– Implement process control

– Implement Data Flow Control (DFC)

Page 9: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

The command server andthe KLOE MIB sub-tree

iso.org.dod.internet.mgmt.mib-2

system(1)system(1) KLOE(13)KLOE(13)

sysDescr(1)

sysObjectID(2)

sysUpTime(3)

sysContact(4)

sysName(5)

sysLocation(6)

sysServices(7)

kprocesses(1)kprocesses(1) ……..

kprocNumber(1)

kprocTable(2)

kprocEntry(1)

kprocVarTable(3)

kprocVarEntry(1)

kprocVarProcIndex(n,1)

kProcVarIndex(n,2)

kprocVarName(n,3)

kprocVarSize(n,4)

kprocVarType(n,5)

kprocVarValue(n,6)

kprocIndex(1)

kprocName(2)kprocId(3)

kprocMsgQId(4)kprocStatus(5)

kprocLastCommand(6)

kprocLastCommandStatus(7)

kprocVarNumber(8)

Page 10: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Message system implementation

Msg Q

putcommand

INT

getcommand

write last command and status

executing

executecommand

write command status (success, fault)

Node A Node B

Command Server

Shared Memory

Run Control

DAQ Process

locate processsend command

SNMP ack

getprocess variables

first ack req

second ack req

first ack

second ack

Page 11: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Remarks and performance• Command server

– DAQ process• receives commands and shares variables

– Command distributor

• Run and process control tools– tcl/tk commands implemented

• get variable, send message

– Fortran interface for old fashioned software– Portable

• AIX, OSF1, HP-UX, Solaris, Linux, LynxOS supported

• Optimized library– Parallel message distribution implemented

• Performance• Local command ~1.2 ms• Remote variable reading ~1.2 ms• Remote command completion ~4 ms

Page 12: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Production process control

cmdsrv

OffCtl

locpc

Proc_1

pcd

Proc_2

Shmem(variables)

commandcommand + starttrapsignalcheck

Productionnode

ControlControlnodenode

Page 13: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

FDDI Switch

VIC

AUXM

ADC16

...

ADC1

ROCK

VIC

AUXM

ADC16

...

ADC1

ROCK

VIC

AUXM

ADC16

...

ADC1

ROCK

VIC

AUXM

ADC16

...

ADC1

ROCK

CPU

FDDI

VIC

VIC

ROCKM

CPU

FDDI

VIC

VIC

ROCKM

CPU

FDDI

VIC

CPU

FDDI

VIC

ROCKM

CPU server CPU server

Storage system

Run Control

Monitor System

. . .

. . .

Trigger chain DFC system

VIC

VIC

CBUS

Level-2Level-2cratescrates

FDDI

DAQ system architecture

Page 14: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Network and trigger stat

The DFC System• Changes the packet distribution sequence

– Avoids slow-down in data transmission and blocking timeouts

• Keeps latency under control

latmon

DFCdDFC

Flow table data

Performance stat

Statistics

CommandsTraps

Flow table

Flow table

VIC bus

Collector

shmem

RunCtlReceiver

DFC statusTS

Page 15: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Receiver protocol• Receives event sub-packets through the GigaSwitch• Put packets into multiple circular buffer• Implements DFC and LatMon farm interface• Dynamic thresholds

Select and copy sub-event packets

Get max occupancy

Send trap“full”

Send trap“empty”

To DFC system

Send LatMontrap (#)

If last # arrived

To LatMon

TCP/IP onFDDI 0.5 MB/s

EVB (1)

EVB (n)

If “empty”after “full”

If “full”

. . .

. . .

0.5 MB/s

0.5 MB/s

Page 16: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

. . .

Flo

w t

able

s

DFC Protocol

• Initialization:– Builds Network Map– Builds DFC map (ordered list of RECV

IP addresses)– Creates the first table with Infinity

Trigger number validity

• Main Loop:– Wait for “trap”– On trap (full/empty):

• Reads the last trigger number from Trigger Supervisor

• Creates next table• Modifies the validity of the previous

table– Sends auto-test traps

DF

C m

ap

Max number of tables

N. of RECV nodes

IP addresses

DFC data in VME shared memory

Flags111111…1111

0Validity trigger

Flags111101…1111

0

0

Page 17: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

DFC algorithm and performance• Validity:

– v = t0 + (ttr + (tdfc + kdfc))*(n + k) + • k = 5

– autotest

• DFCd reaction time (trap):– 1.2 ms

• DFC reaction time:– tlocal ~ 1.2 ms

– trigger interaction ~6-7 ms

– tdfc ~ O(10-2) ms

– total 10 ms

• DFC-L2 interaction rate:– ~ 1 table / 50 ms (sustained)

• DFC “dead time” implemented

Page 18: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

The DFC status monitor

Page 19: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Packet latency• Latency measurements:

– SNMP traps sent to LatMon:• Collector trap when the packet # is released for sender• Receiver trap when all the sub-packets # arrived

• Test for receiver’s buffers

Page 20: Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it.

Summary• A fast and reliable message system has been

implemented using standard UNIX mechanisms and the SNMP protocol

• Very simple to use– process template + command definition– fortran and tcl/tk interface

• Allows full process control• A Data Flow Control system has been developed

using message system and SNMP traps• It allows to redirect network traffic taking into

account the dynamics of the whole system• Dynamic redefinition of thresholds• It successfully ran during KLOE data acquisition