Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) [email protected].
-
Upload
bertram-park -
Category
Documents
-
view
213 -
download
0
Transcript of Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) [email protected].
Process and Data Flow Control in KLOE
E. Pasqualucci (INFN - Roma)[email protected]
Outline
• System overview• Process structure and local
communication• SNMP and remote communication• Process control• Data Flow Control system• DFC monitor
FDDI Switch
VIC
AUXM
ADC16
...
ADC1
ROCK
VIC
AUXM
ADC16
...
ADC1
ROCK
VIC
AUXM
ADC16
...
ADC1
ROCK
VIC
AUXM
ADC16
...
ADC1
ROCK
CPU
FDDI
VIC
VIC
ROCKM
CPU
FDDI
VIC
VIC
ROCKM
CPU
FDDI
VIC
CPU
FDDI
VIC
ROCKM
CPU server CPU server
Storage system
Run Control
Monitor System
. . .
. . .
Trigger chain DFC system
VIC
VIC
CBUS
Level-2Level-2cratescrates
FDDI
~ 23000 FEE channels @ 2.5 kHz f + bckg (~10 kHz) Bandwidth: ~ 50 Mbytes/s (5 Kbyte/ev.)
Storage: 200 Tbyte/y
Tested with peak rates of 10 kHz in multibunches mode.Tested at maximum required throughputusing no zero suppressed calorimeter data
DAQ system architecture
DAQ software organization
Spy dump
RecorderBuilder
Circ
Circ (Ybos)To Disk/Tape
dmap
Receiver
FarmFarm
FDDI switch
GeoVme mapCollector
Circ
Sender
LevelLevel 22
Farm status
Chain tools
simulation
VME
Level 1 chainData
Map data
Messages
Traps
SpyD
Monitorsystem
Didone
SpyBuff
RSpyD
SlowCtlsystem
DFCsystem
CmdSrv
CmdSrv
RunCtl
Process structure
• Initialization– Msg Q creation– Shmem subscription– Shmem space allocation
for variables
• Main Loop– Process Event– Process Command– Idle time
• Interrupt Handler– Extract command from
Msg Q.
Id Contents Mapping
Pro
cess
es
Process numberPointer to 1st process
Pointer to 2nd processProcess nameProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
Header
Proc. 1
Proc. 2
All
Id Contents Mapping
Pro
cess
es
Header
Proc. 1
Proc. 2
All
Process numberPointer to 1st process
Pointer to 2nd processMy processMy process idMy Q idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
Process numberPointer to 1st process
Pointer to 2nd processMy processMy process idMy Q idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
• Reads the Q
Local communication
Process numberPointer to 1st process
Pointer to 2nd processProcess nameProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
• Getting a variable:Process numberPointer to 1st process
Pointer to 2nd processMy processProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
Process numberPointer to 1st process
Pointer to 2nd processMy processProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1My variable = value…..
Pointer to 3rd process…..
• Locate process• Locate variable
Process numberPointer to 1st process
Pointer to 2nd processProcess nameProcess idMessage queue idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
• Sending a command:– The sender:
• Locates the process• Gets its id and message Q
Process numberPointer to 1st process
Pointer to 2nd processMy processMy process idMy Q idProcess statusLast commandLast command statusNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
My process
Q
• Puts command to Q• Sends an interrupt
signal
Stop !
Process numberPointer to 1st process
Pointer to 2nd processMy processMy process idMy Q idProcess statusStop !ExecutingNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
• Polls on command status
Process numberPointer to 1st process
Pointer to 2nd processMy processMy process idMy Q idProcess statusStop !SuccessNumber of variablesVariable 1Variable 2…..
Pointer to 3rd process…..
Stop !
– The receiver:
• Writes the command and status and executes it• Writes the command status (acknowledgement)
Managing the DAQ network
• SNMP (Simple Network Management Protocol) • Largely used to manage network devices• Defined as a standard by the IETP (Internet
Engineering Task Force)• Implemented using a reliable UDP protocol• Used to retrieve and/or set information about :
– network configuration– traffic– faults– accounting
• Managed objects defined in a Manager Information Base (MIB) defined by IETP
• Private extensions of the standard MIB are allowed• Public domain software, allows the implementation
of :– dedicated agents– utilities for remote access
SNMP client-server policy• MIB
– Variables organized as a tree
• Primitives:– get, get-next, set
• Each device runs a daemon able to:– Understand MIB requests– Obtain required information– Execute required actions
• Trap mechanism• KLOE uses SNMP to:
– Control DAQ devices and network– Implement message distribution– Implement process control
– Implement Data Flow Control (DFC)
The command server andthe KLOE MIB sub-tree
iso.org.dod.internet.mgmt.mib-2
system(1)system(1) KLOE(13)KLOE(13)
sysDescr(1)
sysObjectID(2)
sysUpTime(3)
sysContact(4)
sysName(5)
sysLocation(6)
sysServices(7)
kprocesses(1)kprocesses(1) ……..
kprocNumber(1)
kprocTable(2)
kprocEntry(1)
kprocVarTable(3)
kprocVarEntry(1)
kprocVarProcIndex(n,1)
kProcVarIndex(n,2)
kprocVarName(n,3)
kprocVarSize(n,4)
kprocVarType(n,5)
kprocVarValue(n,6)
kprocIndex(1)
kprocName(2)kprocId(3)
kprocMsgQId(4)kprocStatus(5)
kprocLastCommand(6)
kprocLastCommandStatus(7)
kprocVarNumber(8)
Message system implementation
Msg Q
putcommand
INT
getcommand
write last command and status
executing
executecommand
write command status (success, fault)
Node A Node B
Command Server
Shared Memory
Run Control
DAQ Process
locate processsend command
SNMP ack
getprocess variables
first ack req
second ack req
first ack
second ack
Remarks and performance• Command server
– DAQ process• receives commands and shares variables
– Command distributor
• Run and process control tools– tcl/tk commands implemented
• get variable, send message
– Fortran interface for old fashioned software– Portable
• AIX, OSF1, HP-UX, Solaris, Linux, LynxOS supported
• Optimized library– Parallel message distribution implemented
• Performance• Local command ~1.2 ms• Remote variable reading ~1.2 ms• Remote command completion ~4 ms
Production process control
cmdsrv
OffCtl
locpc
Proc_1
pcd
Proc_2
Shmem(variables)
commandcommand + starttrapsignalcheck
Productionnode
ControlControlnodenode
FDDI Switch
VIC
AUXM
ADC16
...
ADC1
ROCK
VIC
AUXM
ADC16
...
ADC1
ROCK
VIC
AUXM
ADC16
...
ADC1
ROCK
VIC
AUXM
ADC16
...
ADC1
ROCK
CPU
FDDI
VIC
VIC
ROCKM
CPU
FDDI
VIC
VIC
ROCKM
CPU
FDDI
VIC
CPU
FDDI
VIC
ROCKM
CPU server CPU server
Storage system
Run Control
Monitor System
. . .
. . .
Trigger chain DFC system
VIC
VIC
CBUS
Level-2Level-2cratescrates
FDDI
DAQ system architecture
Network and trigger stat
The DFC System• Changes the packet distribution sequence
– Avoids slow-down in data transmission and blocking timeouts
• Keeps latency under control
latmon
DFCdDFC
Flow table data
Performance stat
Statistics
CommandsTraps
Flow table
Flow table
VIC bus
Collector
shmem
RunCtlReceiver
DFC statusTS
Receiver protocol• Receives event sub-packets through the GigaSwitch• Put packets into multiple circular buffer• Implements DFC and LatMon farm interface• Dynamic thresholds
Select and copy sub-event packets
Get max occupancy
Send trap“full”
Send trap“empty”
To DFC system
Send LatMontrap (#)
If last # arrived
To LatMon
TCP/IP onFDDI 0.5 MB/s
EVB (1)
EVB (n)
If “empty”after “full”
If “full”
. . .
. . .
0.5 MB/s
0.5 MB/s
. . .
Flo
w t
able
s
DFC Protocol
• Initialization:– Builds Network Map– Builds DFC map (ordered list of RECV
IP addresses)– Creates the first table with Infinity
Trigger number validity
• Main Loop:– Wait for “trap”– On trap (full/empty):
• Reads the last trigger number from Trigger Supervisor
• Creates next table• Modifies the validity of the previous
table– Sends auto-test traps
DF
C m
ap
Max number of tables
N. of RECV nodes
IP addresses
DFC data in VME shared memory
Flags111111…1111
0Validity trigger
Flags111101…1111
0
0
DFC algorithm and performance• Validity:
– v = t0 + (ttr + (tdfc + kdfc))*(n + k) + • k = 5
– autotest
• DFCd reaction time (trap):– 1.2 ms
• DFC reaction time:– tlocal ~ 1.2 ms
– trigger interaction ~6-7 ms
– tdfc ~ O(10-2) ms
– total 10 ms
• DFC-L2 interaction rate:– ~ 1 table / 50 ms (sustained)
• DFC “dead time” implemented
The DFC status monitor
Packet latency• Latency measurements:
– SNMP traps sent to LatMon:• Collector trap when the packet # is released for sender• Receiver trap when all the sub-packets # arrived
• Test for receiver’s buffers
Summary• A fast and reliable message system has been
implemented using standard UNIX mechanisms and the SNMP protocol
• Very simple to use– process template + command definition– fortran and tcl/tk interface
• Allows full process control• A Data Flow Control system has been developed
using message system and SNMP traps• It allows to redirect network traffic taking into
account the dynamics of the whole system• Dynamic redefinition of thresholds• It successfully ran during KLOE data acquisition