On the use of NI PXI platform for High reliability and...

23
On the use of NI PXI platform for High reliability and availability systems: LHC Collimators low level control a study case Dr. A. Masi CERN, Switzerland EN Dept, STI group [email protected] BIG PHYSICS Symposium, Austin 2011 Control, Diagnostic and Measurement for Physics Systems and Experiments

Transcript of On the use of NI PXI platform for High reliability and...

Page 1: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

On the use of NI PXI platform for High

reliability and availability systems:

LHC Collimators low level control a study case

Dr. A. Masi

CERN, Switzerland

EN Dept, STI group

[email protected]

BIG PHYSICS Symposium, Austin 2011

Control, Diagnostic and Measurement for Physics

Systems and Experiments

Page 2: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

Outline

• The LHC collimation system

• Reliability and Availability Requirements

• The control architecture

• Reliability and availability review

• Lessons learnt by the 2010 operation

• Are NI PXI systems suitable candidates for high reliability and availability control systems ???

• Conclusions

Page 3: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Secondary halo

Primary Collimator

(TCP)

Tertiary halo

Tertiary Collimator

(W)

Quartiaryhalo

6s 7s

15

Offset

(s)

10

5

0

-5

-10

-15

Secondary Collimator

(TCS)

~10s

The collimation system must carry out 2 main functions:

1. Beam cleaning, i.e. removing stray particles which may induce quenches in SC components. 2 out of 100k particles enough to trigger quench …

Several types of collimators at multiple locations are required to ensure

efficient beam cleaning ...

Very complex system (100+ LHC Collimators)Total of 108

collimators

(100 movable).

Two jaws (4 motors)

per collimator!

Page 4: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

The collimation system must carry out 2 main functions:

2. Machine Protection, i.e. shielding the other machine components from the catastrophic consequences of beam orbit errors.

LHC beam parameters:

• Stored energy 350 MJ per beam (factor 100 higher than others) …

• Beam transverse density ~1 GJ/mm2 (factor 1000 higher than others) …

• This makes the LHC beam highly destructive!!! Collimators are the closest

elements to the beam ...

Carbon/

carbon jaw Graphite jaw

5 full intensity shots ranging

from 1 to 5 mm, 7.2ms …

Each impact energy equivalent

to more than ½ kg of TNT

Robustness Test at 450 GeV 3.2x1013 protons

Collimator jaws after impact … no signs of mechanical damage• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 5: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

A collimator has two parallel jaws

Each jaw is controllable in position and angle

The jaws positioning accuracy is function of the beam size

(1/10 beam size). At top energy 20 um accuracy is

required

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 6: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Collimator Axes Motor type: Stepping Motors

Controlled in open loop ( 4 for the jaws `axes

+ 1 for the vertical axis)Steps Loss Detection: Resolvers ( 4 for the jaws`

axes) Positions Survey: LVDT sensors in redundant

number (5 for the axes absolute position and

2 for the jaws` gap)

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 7: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Control system parameters

Axes to control 555

Positioning sensors to monitor in RT 750

Resolvers to read synchronously

with the motors` steps

400

Limit switches to acquire 1200

Control system requirements

Axes positioning accuracy few µm

Axes motion synchronization below 1 ms

Response delay to a digital

start trigger

100 µs

Position sensors RT survey

frequency

100 Hz

Reliability Very high

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 8: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

System reliability is “an attribute of any system that

consistently produces the same results, preferably meeting or

exceeding its specifications.”

Availability :

Accelerator is available (called a mission) for nine months

per year 24/7. Short technical stops (5 days) are scheduled

in average every 2 months.

Goal:

• Reduce the number of failures in operation and the

consequent LHC downtime

• Possibility to mask a failure till the next scheduled

technical stop and/or fix it by remote avoiding

access to the LHC tunnel

Goal: No failure on the control system can put in danger the

LHC operation safety Failures in the collimation system

have to be detected and eventually the LHC beam dumped

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 9: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

12

Resolvers15 stepping motors

CMW Infrastructure

CORBA, RDA, JMS

FESA Gateway Linux

DIM CLIENT + Low Level FESA Class

MDCPXI Pentium Core Duo 2.16 Ghz 2 Gb RAM

OS Pharlap (LabView RT)

DIM Server

MDC

21 LVDTs

PRS PXI Pentium Core Duo 2.16 Ghz 2 Gb RAM

OS Pharlap (LabView RT)

DIM Server

PRS

Central Control Applications

Timing Network

Optical fibre

DIM (Distributed Information

Management

System) is a communication

system for distributed / mixed

environments based on the

client/server paradigm

The DIM system is currently available

for mixed platform environments

comprising the operating systems :

VMS, several Unix flavors (Linux,

Solaris, HP-UX, Darwin, etc.) Windows

NT/2000/XP and the real time OSs:

OS9, LynxOs and VxWorks. It uses as

network support TCP/IP

(See details at http://dim.web.cern.ch/dim/ )

The DIM Server library was

successfully compiled for

Pharlap

MDC PRS

DIM server DIM server

DIM Client

FESA Class Collimator LLCS ß CSS

Operator GUI

MD

C_

Co

mm

an

d

MD

C_

Ac

qu

isit

ion

MD

C_

Se

ttin

gs

MD

C_

Ac

k

MD

C_

Err

ors

MD

C_

Po

stM

ort

em

MD

C_

Co

mm

an

d

MD

C_

Ac

qu

isit

ion

MD

C_

Se

ttin

gs

MD

C_

Ac

k

MD

C_

Err

ors

MD

C_

Po

stM

ort

em

CO

MM

AN

D (

Op

era

tor)

AC

K

AC

QU

ISIT

ION

OP

ER

AT

OR

SE

TT

ING

S O

PE

RA

TO

R

Expert GUI

CO

MM

AN

D (

Ex

pe

rt)

SE

TT

ING

S E

xp

ert

AC

QU

ISIT

ION

Ex

pe

rt

AC

K

CPU

RT Controller 8106

FPGA 7833R FPGA 7833R FPGA 7833R

FIFO FIFO FIFO

DMA

TRIGGER

TC

P/IP

co

mm

un

ica

tio

n

TRIGGER TRIGGER

4 Resolvers

5 stepping motors

Compact RIO

I/O modules

Stepping

Motors Drivers

4 Resolvers

5 stepping motors

Compact RIO

I/O modules

Stepping

Motors Drivers

4 Resolvers

5 stepping motors

Compact RIO

I/O modules

Stepping

Motors Drivers

CPU

RT Controller 8106

FPGA

7831R

DAQ

6143

DMA

TRIGGER

TC

P/I

P c

om

mu

nic

atio

n

Output Buffer

DAQ

6143

7 LVDTs

DAQ

6143

DAQ

6143

7 LVDTs

DAQ

6143

DAQ

6143

7 LVDTs

Output Buffer Output Buffer

RTSI/TRG0

RTSI/TRG1

RTSI/TRG2

250 kHz

DAC clock

The new Front-End Software

Architecture (FESA) is a

comprehensive framework

for designing, coding and

maintaining LynxOS/Linux

equipment-software that

provides a stable functional

abstraction of accelerator

devices

See details at:

http://project-

fesa.web.cern.ch/project-fesa

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 10: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

CMW Infrastructure

CORBA, RDA, JMS

FESA Gateway Linux

DIM CLIENT + Low Level FESA Class

MDC PRS

Central Control Applications

Timing Network

Optical fibre

MDC MDCPRSPRS

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 11: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

• Redundancy at sensors level:

Position sensors : 1 LVDT per collimator axis (4) + 2

redundant to measure the collimator gap to easily detect

failures

Stepping motors used in open loop + resolvers used

in addition to detect steps lost

• Redundancy at system level:

MDC and PRS: Critical task of positioning and position

verification are split on two separated PXI system

Position axes monitoring with Energy Limits (network

synchronized ) and position limits (timing synchronized)

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 12: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Watchdog timers on CPU and FPGA to detect stuck conditions

Check of the Compact RIO module working status in the FPGA

Watchdog timers to verify Energy limits communication over the network

• Software implementation fail safe:

Dual core processor to: reduce the work loadseparate the critical tasks from the network communications

implement reciprocal watchdog timers

CPU

RT Controller 8106

FPGA 7833R FPGA 7833R FPGA 7833R

FIFO FIFO FIFO

DMA

TRIGGER

TC

P/IP

co

mm

un

ica

tio

n

TRIGGER TRIGGER

4 Resolvers

5 stepping motors

Compact RIO

I/O modules

Stepping

Motors Drivers

4 Resolvers

5 stepping motors

Compact RIO

I/O modules

Stepping

Motors Drivers

4 Resolvers

5 stepping motors

Compact RIO

I/O modules

Stepping

Motors Drivers

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 13: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

• Improvements on the NI PXI Systems:

FTP

Socket (RCP)

RT platforms diskless with network boot

Solid State Drive (SDD) as boot backup

PXI diagnostic data monitored (CPU and chassis T, memory usage, CPU load)

Remote reset based on safety PLC using the power supply control connector

KVM switches to redirect the Keyboard, Video and Mouse over the network

The Linux PXE server ensure the

software images coherence on the

106 PXI systems !!!!

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 14: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

The low level control software has been

designed to be fully configurable

Proper tools have been developed to make

easier the management, the maintenance and

the diagnostic of the system:

The software configuration is performed via a proper Configuration tool fetching the parameters by a Collimators Software Configuration Database The coherence of the software images is ensured by a PXE booting system

• The control system management:

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 15: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

• A sensor failed can be easily and quickly bypassed

remotely to reduce the downtime to few minutes. The

problem will be fixed at the next scheduled technical

stop

• Redundancy at sensors level:

• Power Supply Redundancy:

• All the power supplies of the custom electronics have

been designed redundant to make totally transparent to

the operation a power supply failure

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 16: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

LHC Collimators downtime analysis over 1 year operation

• No failure has put in danger the machine safety

?

• The system availability in 2010 has been of 99.54 %

(37.55 h/ 8184 operation hours)

• Many failures registered in 2010 have been definitively

fixed

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 17: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

• According to the failure data accumulated so far:

LHC Collimators NI PXI Platforms Failure Analysis

• Only 1 FPGA card failed so far

• The weak part has shown to be the PXI chassis

power supply. The MTBF evaluated on 2010

operation time gives 178716 h (20.4 years)

comparable with the data provided by NI this

means roughly a failure every 2 months and half

• LHC Collimator project is an important testbench

for NI PXI systems• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 18: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Definitively YES but with the following recipe:

• AMT features (remote reboot via network, KVM

features, remote bios upload)

• Systems diskless with network booting

• Recommended use of parallel architecture:

Multiprocessors and basically FPGA to split the

execution of critical tasks on different levels and

units

• On-line monitoring and verification of the

systems critical parameters ( CPU and chassis T,

CPU load, memory usage)

• High Availability PXI chassis

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 19: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Can the 8 Slot PXI 1042 Chassis be considered High

Availability ?????

• According to the requirements of the LHC

Collimators project NO:

• MTBF at least 1 order of magnitude lower than

the other components (i.e. compact RIO

modules, FPGA cards)

• The basic concept of power supply

redundancy is not followed

• According to the standard IEC 61508 on the

Safety Integrity Levels in continuous mode of

operation the PXI 1042 is a SIL 1. For High R&A

systems chassis SIL 3 or better 4 is recommended

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

MTBF~23 years

MTBF~230 years

Page 20: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Has NI available PXI chassis High Availability as

standard product ?????

At my knowledge so far NO for the PXI bus but….

A custom development is in progress for the LHC

Collimators project

Redundant power supply hot swappable

Redundant Cooling system

Compact sizes

Remote reset and monitoring of chassis

parameters (T, airflows) via an independent

network port

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 21: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

The LHC Collimators low level control system is characterized by tight requirements in terms of Reliability and Availability

The low level RT platforms chosen are the PXI systems Intel Dual core from NI

Particular design choices are implemented at the architecture level to improve the system reliability as well as technical enhancements of the PXI platforms have been applied

Proper tools have been developed to make easier the maintenance and the management o this challenging control system

The new PXI chassis High Availability will contribute to further improve the system availability

• The LHC collimation

system

• The control

architecture

• R&A review

• Lesson learnt by

2010 operation

• NI PXI platforms

suitable for high R&A

systems

• Conclusions

Page 22: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

We are indebted to NI for the precious support received. In particular the System Engineering Group (Doug, Brent and Christian) and all the R&D people that were involved in this project as well as the Big Physics team (Stefano,

Murali, Joel, Christian).

Last but not least Chris, Matthew and Giuseppe

Eric Brauel and his team for the current work on the High Availability PXI Chassis

• The LHC collimation

system

• Motorization

solution

• Control

requirements

• Low Level Control

System

• Measured

perfomances

• Conclusions

Page 23: On the use of NI PXI platform for High reliability and ...download.ni.com/pub/gdc/tut/high_reliablity.pdf · PXI Platforms for High Reliability and Availability Systems, Big Physics

A. Masi,

PXI Platforms for High Reliability and Availability Systems, Big Physics Symposium Austin 2011

Thank you very much for your attention

• The LHC collimation

system

• Motorization

solution

• Control

requirements

• Low Level Control

System

• Measured

perfomances

• Conclusions