Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN...

50
Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014

Transcript of Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN...

Page 1: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Status of the LLTT(TPU) project

G.Punzi (Pisa)on behalf of the LLTT group

Meeting with INFN referees20/3/2014

Page 2: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Brief recap- Feb. 2013: LHCb Trigger workshop: First LHCb presentation

- June 2013: Feasibility study presented @LLT workshop- Received an official list of questions from management

- 4 Oct 2013: LLT workshop: Answers to questions + simulation of a baseline system- Asked to be reviewed to become a project for the upgrade

- LHCb outlook for the online evolves considerably

- December 2013: Presentation to LHCb week.

- 1-2 Feb. 2014: Presentations to LHCb Trigger Workshop

- 1 Mar 2014: Talk at international instrumentation conference (INSTR-14)

- 10 Mar 2014: Internal note presented for review

- 18 Mar 2014: Presentation to Technical Board meeting

TODAY: presentation to INFN referees.

- 31 Mar 2014: Presentation to LHCb external review committee

Page 3: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Effects of changes in readout design DAQ structure evolved to bi-directional EB Baseline readout has moved FPGA cards to EB LLT hardware shrunk, most (or all) functions moved into EB

CPUs (“software LLT”) BIG strategy choice: push investments upward

But HLT wants to keep a “safety net” (LLT) LLTT more substantial hardware:

→ Track Processing Unit (TPU), connected to EB Regarded by trigger group mostly as

HLT co-processor, or pre-processor Raises the bar considerably !

Can still fuel a software-LLTT functionality Safety net as well...

Page 4: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Hardware Status

Page 5: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Architecture

Cellular

Engines

switching network

Fitter

Tracking layers

Separate trigger-DAQ path

Custom switching networkdelivers hits to appropriate cells

Data organizedby cell coordinates

Blocks of cellularprocessors

Track finding and parameter determination

To DAQ

Page 6: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

For each cellular unit in the parameter space (u,v) calculate a weighed response summing over all hits and all layers.

Tracks are peaking structures in the parameter space. Find a track as clusters of excited cells

Trigger&Tracking Workshop- M.J. Morello

Basic Principles

Page 7: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Hit delivery by the switching logic

Hits must be delivered only to the cell that need them (they can be more than one)

The switch network “knows” where to deliver hits

All information about the network of connections is embedded in the network via distributed LUTs

Page 8: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Cellular engine

Performs calculation of weights for a hit into a cell

Deals with surrounding cells as well.

Handles time-skew between events

In second stage performs local clustering in parallel, and queues results to output

Page 9: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Track parameter estimation by cluster Center-of-Mass

-1,0,1

-1,0,1

REG REG

16

16

22

REG

PIPELINED DIVIDER

(ENGINE OUTPUT)

z +

p -

z +

p 0

z +

p +z 0

p -

z 0

p 0

z 0

p +z -

p -

z -

p 0

z -

p +

d

+0

-

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

EN

GIN

E

CoM UNIT

REQ ACK DATA

EN

GIN

E

EN

GIN

E

MUX MUX

MUX

Due to data reduction out of the engine, a 1:12 ratio is sufficient to keep up with the data flow

Final parameter determination can be done of EB CPUs to achieve full “offline-compliance”

Page 10: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Studies of Stratix-V capacityTPU completely implemented in firmware

All main components: - Switch

- Engines

- CoM

implemented in VHDL and placed in FPGA

Fit ~750 engines/chip on Stratix-V• exact number depends on details

(time-ordering of pixel data, etc.)

Arria 10 allows double the logic at the same price, with lower power consumption

Page 11: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Simulation and Timing

CELL

Hit_dataLAY

XY

addressIntersect_x_y d_x

d_y d_x^2 d_y^2

sum_square weight

Ready

Exceed 350 MHz clock freq

40MHz throughputTotal latency <1µs !

40MHz throughputTotal latency <1µs !

(Not accounting for I/O)

Much better than AM : Originally intended as

“Low Level Track Trigger”

Page 12: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Tracking ConfigurationVELO+UT, 16+2 layers

- Split into two separate telescopes for ease of cabling- Covers longable tracks

Page 13: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Layer configurations

Page 14: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Layer configuration acceptances

Page 15: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Integration in the DAQ

to Event Builder

to Event Builder

- ATC40 scheme (not the baseline anymore)

- Shows the map of connections

- Additional optical links needed to copy data to the TPU cards

TPU

Page 16: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

TPU integrated in the EB

TPU appears to the EB as an additional “virtual detector” producing tracks

Page 17: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Data flow inside EB

- Local CPUs can be used to refine FPGA output - Availability of TRACKS in the Event Builder

– Can control rate by confirming LLT muon(hadron) with stiff track– In the “partial reconstruction” scheme, could have HLT1 inside EB

TPU behaves as a virtual “track detector”

Tracks inpre-EB

Tracks inpost-EB

Small flows in TPU boxes

Page 18: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Lab test with TEL62- We have a plan for testing the retina algorithm with real FPGAs

(Stratix-3), in a simplified configuration.

- This is lower-speed, but helps us demonstrate that we can put together and operate a complete system

- We exploit TEL62 boards, that are compatible with current LHCb DAQ, and can be easily inserted in the system (agreement with local DAQ experts)

- TEL62 boards have been ordered together with NA62 order, and will arrive soon (~month).

- Pisa has lab space for a bench test. TEL62 is used for both sequence generation and “retina” implementation.

- Work ongoing in Pisa on connection boards and logistics.

Page 19: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Preparations for test on silicon telescope @Milano

- Second stage of testing planned with CR in a Si telescope being built in Milano

- Details in UT talk yesterday.

Page 20: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Document presented to LHCb Technical Board10/3/2014

Page 21: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Performance parameters

Page 22: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Simulation

• At least 3 VELO hits in the last 8 VELO stations • At least 1 hit in each axial UT layer.

Fiducial region:|u|<0.35, |v|<0.35 (about theta< 50mrad);|z| < 15 cm;Electron rejection.

Some details on LHCb simulation used:Ebeam = 7TeVnu = 7.6 (L=2x1033) and nu=11.4 (L=3x1033), bunch crossing: 25 ns, with spilloverGeometry: DDDB : dddb-20131025, CONDDB : sim-20130830-vc-md10.VeloUT offline reconstruction Brunel v44r9 with default setting.Performances on small angle telescope with 8VELO +2UT.

Page 23: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Mapping of detector to receptor cell array

Intersection of “base tracks” with detectors gives a map of “nerve endings”. This encodes the information about the geometry

Every hit on the detector produces a signal on nearby receptors, depending on distance

(I skip on several subtleties. For instance, effective operation require distribution to be non-uniform)

Not unlike the distribution of photoreceptors in visual system – but it is all virtual in our case, that is, implemented in the internal LUT of the system.

Page 24: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Simulated full LHCb events (µ=7.6)

Out of acceptance

Generated

Used ~45,000 cell enginesC++ code, can be inserted in standard analysis code

Page 25: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Efficiency/Uniformity

p , pT

Page 26: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Efficiency/Uniformity

z , IP

Page 27: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Efficiency/Uniformity

(u,v) ~ (θx , θ

y)

Page 28: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Momentum resolution

σk = 0.0102 σ

k = 0.0126

Page 29: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Physics performanceand robustness

Page 30: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

COSTING

Page 31: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Detailed cost estimate from the online group

Estimated at current prices: 940 kCHF Does not account for savings from moving to Arria-10 Assumes using identical boxes to the EB for simplicity

– Some further savings still possible

Page 32: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Is the TPU cost effective, TODAY ?Timing of piece of codeyielding the performanceswe have been comparing to:3.8 ms (standalone CPU-2012) It was later understood thatthis piece of code performs further extra work(backward VELO layers)

We have no piece of codedoing exactly the TPU workon the same sample,with the same performance

Various estimates: (16/26)*2.3+1.5 = 2.9 ms

(%GEC) 60% * 3.8 = 2.3 ms3.8ms -(VELO10) = 2.4 ms

Cost of (naked) CPU: ~120SWF/core→ 1ms@40MHz = 4.8 MCHF TPU equivalent: 10 ÷15 MCHF

Page 33: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Is the TPU cost effective, TODAY ?Timing of piece of codeyielding the performanceswe have been comparing to:3.8 ms (standalone CPU-2012) It was later understood thatthis piece of code performs further extra work(backward VELO layers)

We have no piece of codedoing exactly the TPU workon the same sample,with the same performance

Various estimates: (16/26)*2.3+1.5 = 2.9 ms

(%GEC) 60% * 3.8 = 2.3 ms3.8ms -(VELO10) = 2.4 ms

TPU clearly cost-effectivesolution at present time

Cost of (naked) CPU: ~120SWF/core→ 1ms@40MHz = 4.8 MCHF TPU equivalent: 10 ÷15 MCHF

Page 34: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Projections to 2020

“It is always difficult to make predictions,especially about the future”

– Yogi Berra

Page 35: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Timings presented by HLT group

Page 36: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

How the HLT group plans to use the TPU

Does not include:- Multicore inefficiency- Data/MC effects

Page 37: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

CPU cost projections from online group

Page 38: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Assumptions behind the 8ms

- TPU price 2020 = 2014- CPU price drop 16x- No inefficiency factor for 400 jobs/node- Additional 2x to CPU for other uses- Full cost of TPU vs. “scalable” cost of CPU

Page 39: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Conclusions of TB (18/3/2014)

Page 40: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Summary

• We designed a system capable of track reconstruction at 40MHz with offline-like performance and ~1µs latencies.

• Cost of TPU is an order of magnitude smaller than today's CPU solutions

• Projections to the upgrade era made by the HLT group and online group predict that the CPU solution will become more convenient.Based on some assumptions.

• The TB recommended a CPU-only solution as baseline for TDR.

Page 41: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

People

• A. Abba (MI)• F. Bedeschi (PI)• F. Caponio (MI)• M. Citterio (MI)• D. Corbino (CERN)• A. Cusimano (MI)• A. Geraci (MI)• S. Leo (PI)• F.Lionetto (PI) • P.Marino (PI)

• M.J. Morello (PI)• N. Neri (MI)• A. Piucci (PI)• G. Punzi (PI)• L. Ristori (PI)• F. Ruffini (PI)• F. Spinella (PI)• S.Stracka (PI) • D.Tonelli (CERN)• J.Walsh (PI)

Many thanks to all people who contributedto the development of this design

Page 42: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

BACKUP

Page 43: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Basic principle

3/17/14

43

We inject real hits (xr,yr)k in the detector layers. For each cellular unit ith in the parameter space (u,v) calculate Ri response summing over all hits and all layers.

Tracks are peaking structures in the parameter space. Find a track as clusters of excited cells

Trigger&Tracking Workshop- M.J. Morello

Page 44: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Tracking EfficiencyReconstructed “offline” VELO+UT tracks using Official LHCb-MC

Bs→φφ with mu=7.6.

— Require on offline reconstructed tracks

p >3 GeV/c

pT > 500 MeV/c

— and a geometrical acceptance (retina acceptance)

20 < theta < 60 mrad

Found that ~95% of offline tracks have a compatible match within the geometric acceptance of our track processor.

— All VELO and UT hits without any requirements sent to the LLTT.

Page 45: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Simulated full LHCb events (µ=7.6)

Out of acceptance

Out of acceptance

Generated

Full LHCb-MC

NB: it is simulable with 100% accuracy , C++ code available to users

Page 46: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.
Page 47: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.
Page 48: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Software simulation 4/10/13: Benchmark study on 6 VELOPIX layers + 2 UT planes

Used 36,000 cell units in (r,j) parameters

(r,j) polar coordinates on virtual plane of track intersection.

Mapping using LHCb-MC ParticleGun.

Tracks from Official Production Bs→φφ LHCb-MC.

L=2 × 1033 cm-2s-1, sqrt(s)=14 TeV, mu=7.6

DDDBtag    =“dddb-20130408“.

CondDBtag  = “simcond-20121001-vc-md100”

No kinematics cuts applied.

No requirement on hits.

Page 49: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Layer configurations

Page 50: Status of the LLTT(TPU) project G.Punzi (Pisa) on behalf of the LLTT group Meeting with INFN referees 20/3/2014.

Layer configuration acceptances