Silicon MAPS Readout in ALICE and sPHENIX...ATP Assembly and Trigger Processors Buffer Box Interim...
Transcript of Silicon MAPS Readout in ALICE and sPHENIX...ATP Assembly and Trigger Processors Buffer Box Interim...
ORNL is managed by UT-Battelle, LLC for the US Department of Energy
Silicon MAPS Readout in ALICE and sPHENIX
J. Schambach
Oak Ridge National Laboratory
(for the ALICE ITS Upgrade & sPHENIX MVTX collaborations)
➢ ALICE ITS
➢ ALICE Readout Paradigm
➢ ALICE ITS Detector
➢ ALPIDE Sensor
➢ Readout Electronics
➢ ALICE CRU & DAQ
➢ sPHENIX MVTX
➢ MVTX Readout Scheme
➢ sPHENIX Backend: FELIX
➢ sPHENIX DAQ Architecture
22 J. Schambach ([email protected])
The New (Run-3) Computing Model: ALICE O2 in a nutshellData of all interactions shipped from detector to online farm in triggerless, continuous mode
Baseline correction, zero suppression, cluster finderNo event discardedAverage compression factor 6.6
Asynchronous event reconstruction with finalcalibration with a delay of few hours.
HI run 3.4 TByte/s
Data Storage: 1 year of compressed data• Bandwidth: Write 170 GB/s; Read 270 GB/s• Capacity: 60 PB
90 GByte/s
Tier 0 20 GByte/s
Tiers 1 and Analysis Facilities (e.g. ORNL)
Data reduction by massively parallel, on-the-fly, single pass, online tracking. Only reconstructed data go to data storageAverage compression factor 5.5
500 GByte/s
90 GByte/s
Paradigm shift
compared to
Run1 and Run2
Requirements:
• LHC min bias Pb-Pb at 50 kHz ~ 100 x
more data than Run 1 + Run 2
• Measure rare processes, where standard
triggering techniques are very inefficient if
not impossible
• 50 kHz > TPC inherent rate (drift time
~ 100 µs) – requires continuous readout
33 J. Schambach ([email protected])
New ITS Layout
3 Inner layers
0.3% X0
2 Middle layers
2 Outer layers
▶ 7-layer barrel, based on CMOS MAPS sensors (“ALPIDE”)
▶ Coverage: r 22 – 400 mm, |h| ≤ 1.22
▶ ~24k pixel chips (a 12.5 G pixel camera)
▶ ~10m2 active area
150 cm
27 cm
1% X0
= 192 Staves in Total
Inner Barrel
Outer BarrelMiddle
Layers
Outer
Layers
48 42 30 24 20 16 12
44 J. Schambach ([email protected])
ALPIDE Sensor Technology
e e
ee
h
h
h
h
PWELL PWELL NWELL
DEEP PWELL
NWELL
DIODE
NMOS
TRANSISTOR
PMOS
TRANSISTOR
Epitaxial Layer P-
Substrate P++
Pixel Sensor CMOS 180 nm Imaging Process (TowerJazz)3 nm thin gate oxide, 6 metal layers
NA ~ 1016 cm-3
NA ~ 1013 cm-3
NA ~ 1018 cm-3
Not to scale
Key Features
29 mm x 27 mm pixel pitch, chip size 30 mm x 15 mm, 50 mm (IB) / 100 mm (OB) thick
Continuously active, ultra-low power front-end (40nW/pixel)
Ultra-low power matrix readout (< 200mW whole chip)
Global shutter: triggered acquisition or continuous readout
High speed output serial link: 1.2 Gbit/s (IB), 400 Mbit/s (OB)
zero
s
uppre
ssio
n
zero
s
uppre
ssio
n
zero
s
uppre
ssio
n
zero
s
uppre
ssio
n
Buffering and Interface
TH
COMPAMP
ALPIDE
51
2 r
ow
s
1024 pixel columns
0.5 MPixel Matrix
55 J. Schambach ([email protected])
ALPIDE Principle of Operation
STATE
(Latch)
STROBE
OUT_D
OUT_A
Particle hit
PIX_IN 6 µs~2 us Peaking time Particle hit
ΔV = Q/C : ~ 10 ns
Reset : ~1 ms
t1 t2
SUB
Collection
electrode
PIX_IN
VPULSE_*
Cinj
160 aF
Amp Comp MemoryOUT_A OUT_D STATE
STROBE
Pixel analog
Front end
Reset
Input stage Multi event
buffer
Threshold
V
t
Front-end acts as delay line
• Sensor and front-end continuously active
• Upon particle hit front-end forms a pulse with ~1-2ms peaking time (“analog delay”)
• Threshold is applied to form binary pulse
• Hit is latched into memory if strobe is applied during binary pulse
ultra low-power front-end circuit
40nW / pixel
Cdet~2.5 fF @ -6 VbbCin~1.6fF
66 J. Schambach ([email protected])
Triggered and Continuous Mode Operations• The ITS plans to operate in two modes:
Triggered mode: Pixels are latched with a short strobe window, followed by read out, based on an external trigger.
Continuous mode: Pixels are latched using long, periodic strobe windows with short inter-strobe periods (≈100 ns) to initiate read out.
Strobe 1 Strobe 2 Strobe 3
Strobe n-1 Strobe n Strobe n+1 Strobe n+2 Strobe n+…
77 J. Schambach ([email protected])
Matrix Readout
Prio
rity
en
co
de
r
Pix
el fr
on
t-e
nd
512
512
STATE
RESET
512
512
STATE
RESET
0 511
…
10
VA
LID
SE
LE
CT
AD
DR
10
VA
LID
SE
LE
CT
AD
DR
Periphery
DataBias Clock Control
+ triggerPulser
Pix
el fr
on
t-e
nd
Prio
rity
en
co
de
r
Pix
el fr
on
t-e
nd
512
512
STATE
RESET
512
512
STATE
RESET
Pix
el fr
on
t-e
nd
low-power matrix readout ~ 2mW
Pixel Matrix - Hit driven architecture
• Priority encoder sequentially provides addresses of all hit pixels present in double column
• No activity if no hit (no free running clock) ➔ low power
Cin ≈ 5 fF
collection
electrode
2 x 2
pixel
volume
0.3 pJ / bit
Qin (MIP) ≈ 1300 e V ≈ 40mV
88 J. Schambach ([email protected])
Stave Module Configurations
(4+4+4+4) data, (1+1+1+1) clock, (1+1+1+1) control
(7+7+7+7) data, (1+1+1+1) clock, (1+1+1+1) control
Middle Layer Stave
Outer Layer Stave
16 data pairs,400 Mb/s each
28 data pairs,400 Mb/s each
RU
9 data lines, 1 clock, 1 control9 data pairs,
1.2 Gb/s each
READOUT UNITInner Layer Stave
POWER BOARD
PB
PB
RU
POWERREGULATED POWER
POWER
POWER
PB
PB
RU9 data pairs,
1.2 Gb/s each8.6 Gbps payload
16 data pairs,400 Mb/s each
5.1 Gbps payload
28 data pairs,400 Mb/s each
8.9 Gbps payload
DATA OUTPUT:
99 J. Schambach ([email protected])
ITS Readout Hierarchy
One READOUT UNIT
for each STAVE
IL
9×960 Mb/s
ML
16×320 Mb/s
OL
28×320 Mb/s
RU
CRU
11× FLP
EPN
8× RU/CRU
24× LINKS
3× GBT LINKS/RU
RAM
NIC
3200Mb/s110 Gb/s
110 Gb/s
110 Gb/s
100 Gb/s
<40 GB/s
DETECTOR
LINKS
GBT
LINKS
CRU to
FLP
FLP to
EPN
LTU / CTPALICE Timing and Trigger
LHC Bunch Clock (40.08 MHz)
Timing and Trigger messages
1010 J. Schambach ([email protected])
Readout Unit – Top Level Diagram
Main SRAMFPGA
FLASHFPGA
FLASHMemory
GBTSCA
3 × GBTx
CANBus
USB3.0
4 × I2C
Sensors
Rad
-har
d b
y d
esig
n
Rad-hard CRAM(but SEFI possible)
Low cross section per bit (not zero!)
Hardware includes:
• Xilinx Kintex Ultrascale KCU060 FPGA (main programmable logic device)
• MicroSemi ProAsic3 FPGA A3PE600L (for Ultrascale scrubbing & configuration)
• 4 Rad Hard GBT Fiber Optic Transmitters (3.2 Gb/s each)
• 2 Rad Hard GBT Fiber Optic Receivers (3.2 Gb/s each)
• Rad Hard Slow Controls Adapter (GBT-SCA) for monitoring and control
• 12 channel twinax copper interconnects to stave
• Mezzanine to allow different stave configurations (“transition board”)
• Power board control interface via mezzanine
• CANbus interface as power board interface in case GBT is not available
1111 J. Schambach ([email protected])
ITS Readout Unit
Kintex
Ultrascale
Pro
ASIC3
GBTx
GBTx
GBTx
Fla
sh
Tra
nsi
tio
n B
oa
rdI/
F t
o s
en
sors
1212 J. Schambach ([email protected])
Hardware & Firmware Radiation Protection Measures
• Radiation hard components from the LHC “Radiation Hard Optical Link Project” for communication with the control room at Gigabit speed (3.2 Gbps payload):
– “GBTx” (Transceiver) ASIC, “VTRx” opto electronics, “Slow Controls Adapter” (SCA) ASIC
• Triplication of FPGA I/O’s at the hardware level for important signals
• “Scrubbing” (periodic refresh of FPGA CRAM) via a flash-based FPGA
• SRAM based FPGAs are sensitive to Single-Event Upsets (SEUs)
• Combine scrubbing with Triple Modular Redundancy (TMR)
– TMR with majority voters of firmware modules and FSM states
– ECC protection of memories / FIFOs (1-bit correction, multi-bit detection)
• Expecting improvement of Mean-Time Between Failure > 3 orders of magnitude vs unprotected designs
1313 J. Schambach ([email protected])
ALICE Common Readout Unit (CRU) v2
• Altera Arria 10 FPGA
• Custom designed cooling
• Avago Minipods:
• 4x 12 TX
• 4x 12 RX
• 48 optical outputs (MTP), 48 optical inputs (MTP)
• USB Blaster II
• X16, Gen3 PCIe (CFM rev 3.0); 128 Gb/s
1414 J. Schambach ([email protected])
Server/GPU Farms for Data Acquisition and Online Processing
Data Acquisition Servers & Electronics
• 11 “First level Processors” (FLP)
2 CRUs & 8TB of local storage each
• Storage Farm for commissioning:
200 TB local disk server + 500 TB cloud
storage
• Event Processing Nodes (GPU Farms)
1515 J. Schambach ([email protected])
sPHENIX - MVTX
MVTX parameters: L = 271 mm
3-layer sensor barrel equivalent to ITS IB
- 48 staves, 432 chips
MVTX Readout Units, PUs
& other services
Proposal Document: sPH-HF-2018-001
https://indico.bnl.gov/event/4072/
Located Outside Magnet on Platform:
Much lower Radiation than ITS
• sPHENIX cost-effectively reuses the ITS electronics as built,
replacing the ALICE CRU with the ATLAS FELIX backend
• Modified ALICE ITS mechanics to fit it into sPHENIX
1616 J. Schambach ([email protected])
MVTX Readout System
Power Board
Data (9x1.2Gbps), Clock,
Control/Trigger
Regulated power
Readout Unit
Front End
FELIX
Back End
Data (3x4.8 Gbps)
Control
Trigger
ALPIDE
Sensors
FPC
Cold Plate
5-10 m twinaxGBT Optical Links
9-sensor stave
Po
we
r su
pp
lies
Power
Clo
ck/T
rigge
r
Interaction Region Experimental Hall Counting House
DA
Q a
nd
Slo
w
Co
ntr
ol
PC
Ie
sP
HE
NIX
GT
M
1717 J. Schambach ([email protected])
sPHENIX “BackEnd”: ATLAS FELIX
sPHENIX Timing
Mezzanine
Architecture Highlights:
• Xilinx Kintex Ultrascale KU115 FPGA
• 48 bi-directional GBT links
• 16 lane Gen 3 PCIe
• Mezzanine site for sPHENIX timing system card
Performance:
• PCIe Tx > 100Gb/s
1818 J. Schambach ([email protected])
sPHENIX DAQ Architecture
DCMDCMDCMDCM2SEB
SEB
Buffer Box
Buffer Box
Buffer Box
Buffer Box
Buffer Box
Rack Room
DCMDCMDCMDCM2
DCMDCMDCMDCM2
DCMDCMDCMFEM
DCMDCMDCMFEM
DCMDCMDCMFEM
Event Builder
SEB
EBDCFELIX
EBDCFELIX
EBDCFELIX
Network
Switch
ATP
ATP
ATP
ATP
ATP
ATP
ATP
ATP
ATP
ATP
ATP
ATPBuffer Box
Buffer Box
DCMDCMDCMFEE
DCMDCMDCMFEE
DCMDCMDCMFEE
To the RACF/HPSS
Calorimeters, INTT, MBD
TPC
Acronyms:FEE Front End ElectronicsFELIX Front End LInk eXchangeEBDC Event Buffer and Data CompressorATP Assembly and Trigger ProcessorsBuffer Box Interim storageFEM Front End ModuleDCM2 Data Collection ModuleSEB Sub-Event Buffer
MVTX
MVTX Hardware:
48 Staves
48 FEE (Readout Units v2.0)
6 FELIX v2.0
6 EBDC servers
1919 J. Schambach ([email protected])
Outlook towards the EIC
• Silicon MAPS is recognized as a leading technology for EIC detector applications
• High granularity, low power, fast, radiation tolerant readout is essential
• Highly parallel, untriggered, continuous readout avoids deadtime and biases due to standard triggering techniques
• We have actively developed two such applications (readout for ALICE ITS upgrade and sPHENIX MVTX).
• ORNL will actively participate in the upcoming EIC EOI process with strong continued interest in this area.
2020 J. Schambach ([email protected])
BACKUP
2121 J. Schambach ([email protected])
ALICE UpgradesCurrently being installed in ALICE
ALICE Central Barrel Tracking from very low (100 MeV/c) to very high pt (100 GeV/c)
• Silicon: r = 22 – 400 mm
• Gas (TPC, TRD): r = 88 – 368 cm
➔ large BL2 (bending power), comparable to CMS
2222 J. Schambach ([email protected])
O2 System Overview
LTU
CTP
↑TTS-FTL & ↓busy(MCH, MID, ITS, MFT, TPC,
ZDC, TRD)
FIT
TPC
ITS
MCH
MID
MFT
TOF
TPC
ACO
CPV
EMC
HMP
TRD
ZDC
20180307
PHS
front-endlinks (GBT)
→data & ←trigger &←configuration
PCIe bus in FLP→data &
←configuration
O2
&
DCS
CRU
FIT, ITS , MCH, MFT, MID, TOF, TPC, TRD , ZDC
LTULTU
FIT ZDC ACO TOF EMC PHS
↑TTS & ↓busy TTC (ACO,EMC,HMP,PHS)
↑TTS (ITS, MFT unidirectional no busy)
↑TTC & ↓busy (TRD)CRU.. Common Read-out UnitCTP.. Central Trigger Processor
DCS.. Detector Control SystemFLP.. First level processorFTL .. Fast Trigger Links
GBT.. Gigabit TransceiverLTU.. Local Trigger Unit
O2.. Online and Offline Computing SystemTTS.. Trigger and Timing Distribution System
front-end links→data &
←configuration
DDL ACO , EMC HMP, PHS
DDL1 or 2→data &
←configuration O2
&
DCS
←trigger
←trigger (ITS/MFT/TRD)
Buffe
r
PC
Ie
Pro
c.
TTS interface
Continuous & triggered read-out
Triggered read-out
GBT
data links
(9215)
Trigger links
(4039)
PON
TTS links
(500)
GBT & TTC
(192 + 80) &1
PCIe
2323 J. Schambach ([email protected])
ITS Integration into O2 and Trigger System
CRU
CTP
ITS FEE
O2 - DCSGBT optical
CAN bus
PCIe
DCS
2424 J. Schambach ([email protected])
ALPIDE Architecture including TMR
16 double columns
32 readout regions
Region Readout (1)
128x24b DPRAM
RR (2) RR (3) RR (32)
Top Readout Unit
Data Management Unit
Readout Sequencing
Control Bus Logic
ConfigurationRegisters
Pixels Config8b DACs
11b ADC
Differential Control Port(40 Mbps)
Bandgap +Temp Sens
Data Transmission Unit
DriverPLL Serializer
Serial Out Port(1200 Mbps / 600 Mbps/ 400 Mbps)
24b×40MHz
24b×40MHz
8b/10b
30b×40MHz
32:1 DATA MUX
Triggers
ALPIDE elements potentially SEU sensitive1. Pixel Logic: mask and pulse registers (not protected)Mitigation : Refresh pixel mask bits in background 2. Periphery logic: state machine, FIFO pointers, counters and configuration registers with TMR3. Top readout unit and data management unit FIFO with Hamming protection4. Region Readout DPRAM partially protected
format protectedhit data not protected
5. Data Transmission Unit: PLL and Serializer with TMR
1
2
3
4
5
RESET_VALUE
FF
0
OUT
FF
1
FF
2
VO
TIN
G
LO
GIC
SEU_ERROR
IN
AUTO CORRECTION
Basic Triple Modular Redundancy register cell
2525 J. Schambach ([email protected])
Modules (HICs) with ALPIDE Chips
Clock + Control + Trigger
ITS Inner Barrel Module – 9 chips, common clock and control, independent data lines
ITS Outer Barrel Module – 2 groups of chips, Master + 6 Slaves
Only the Master interfaces to the external world and bridges control and data
transfer
Serial Outputs (1200 Mbps)
Clock, Control, Local Data
Clock, Control, Local Data
Clock + Control + Trigger
Serial Out (400 Mbps)
Serial Out (400 Mbps)
Clock + Control + Trigger
ALPIDE Chips
2626 J. Schambach ([email protected])
Main Challenge: Expected Radiation Levels
• Both TID and Fluence levels at the location of the RU are not a concern for modern micro-electronics material degradation and component reliability.
• High energy hadrons are the real concern, as they can cause latch-ups in electronic components and SEU on logic devices.
• Testing showed that latch-ups are not a problem, while proper design is required to mitigate SEU effects.
Position with respect to beam Radiation levels (total)
r z NameTotal Ionizing
Dose (TID)1 MeV neq
fluenceHigh energy hadron flux*
Charged particle flux
[cm] [cm] [krad] [cm-2] [kHz cm-2] [kHz cm-2]
2.2 [-13.5 ÷ 13.5] ITS L0 2734 1.7 × 1013 765 (770) 890 (910)
40 [-73.7 ÷ 73.7] ITS L6 20 8.1 × 1011 3.4 (4.9) 4.5 (6.7)
100 330 RU ≈ 5 ≈ 1.6 × 1011 0.86 1.7
258 [-260 ÷ 260] TPC Out 0.86 1.4 × 1011 0.27 (0.37) 0.2 (0.3)
FEE
2727 J. Schambach ([email protected])
Radiation Validation: Detailed Impact
Failure mode Affected sectionEstimated occurrence in ITS operations (average MTBF)
Corrective actionDowntime per
occurrence
IB OB Whole ITS
Sensor data lane1 sensor for IB
½ module for OB22 - 40 h 4 - 6 h 3 - 5 h Self-repairing < 1 s >
GBT data* 1 full stave 29 h 10 h 7 h30% self repairing, 70% reset
by slow control< 5 s >
Clock resources 1 full stave Negligible Negligible Negligible Reset by slow control < 5 s >
Transceiver settings 1 sensor, IB only > 932 h – – Reset by slow control < 5 s>
Flash memory 1 full stave Negligible Negligible NegligibleFLASH reprogramming (30 s
beam off, 15m beam on)30 s – 15 m
1PA3 1 full stave 172 h 58 h 43 h Reset by slow control < 0 s>
2DCDC power glitch 1 full stave 294 h 98 h 72 h Power cycle < 10 s >
* Data for the non-TMR block, the final version will use a TMR protected block. This failure mode also includes sensor control and clock failures.1 When PA3 gets stuck, the main FPGA is not compromised, and therefore no downtime occurs. TMR of key blocks in PA3 firmware will further improve that.2 Considering 200 RU with 8 DCDC each (20% overestimation)
2828 J. Schambach ([email protected])
ALICE Data Acquisition: First Level Processors with CRUs
O2 FLPCRUReadout UnitSensor
8 RUs per CRU
2 CRUs per FLPUp to 28 Datalines per RU
Up
to
3 G
BT
link
per
R
U (3
.2 G
b/s
eac
h)
FLP architectureCRU architecture
2929 J. Schambach ([email protected])
Pb-Pb @ 100 kHzAverage data throughput for MASTER sensor for Pb-Pb interaction at 100 kHz average rate, for triggered and continuousmode and for different analogue shaping times. Worst case scenario of noise considered, influent only for the outer layers.
Ou
ter
laye
rs
lim
it
3030 J. Schambach ([email protected])
FELIX Firmware