Post on 20-Jan-2016
Managed by UT-Battellefor the Department of Energy
Best Ever Alarm System Tool
Xihui Chen,
Katia Danilova,
Kay Kasemir
SNS/ORNL
kasemirk@ornl.gov
April 2009
2 Managed by UT-Battellefor the Department of Energy
Previous Attempts First ALH,
then soft-IOCs and EDM generated from ALH config.(Pam Gurd)
Issues– GUI
Static Layouts N clicks to see (some of the) active alarms
– Configuration .. was bad Always too many alarms Changes required contacting one of the 2 experts, wait
~days, restart CA gateway, hope that nothing else broke
– Information Operator guidance? Related displays? Most frequent alarm? Timeline of alarm?
3 Managed by UT-Battellefor the Department of Energy
New End-User View: Alarm Table
All currentalarms– new, ack’ed
Sort by PV,Descr., Time, Severity, …
Optional: Annunciate or Enunciate
Acknowledge one or multiple alarms– Select by PV or description
– BNL/RHIC type un-ack’
4 Managed by UT-Battellefor the Department of Energy
Another View: Alarm Tree
All alarms– Disabled, inactive, new, ack’ed
Hierarchical– Optionally only show
active alarms
– Ack’/Un-ack’ PVs or sub-tree
5 Managed by UT-Battellefor the Department of Energy
Guidance, Related Displays, Commands
Basic Text
Start EDM screen
Open web page
Run ext. command
Hierarchical:Including info of parent entries
Merges Guidance etc. from all selected alarms
6 Managed by UT-Battellefor the Department of Energy
.. Within CSS
Alarms
History of PV
EPICS Config.
7 Managed by UT-Battellefor the Department of Energy
CSS Context Menus Connect the Tools
Send alarmPV to anyother CSSPV tool
8 Managed by UT-Battellefor the Department of Energy
E-Log Entries
“Logbook”from context menucreates text w/basic info aboutselected alarms.Edit, submit.
Pluggable implementation, not limited to Oracle-based SNS ELog
9 Managed by UT-Battellefor the Department of Energy
.. may require Authentication/Authorization
Log in/out while CSS is running
Online Configuration Changes
10 Managed by UT-Battellefor the Department of Energy
Add PV or Subsystem
1. Right-click on ‘parent’
2. “Add …”
3. Enter name
Online. No search for config files, no restarts.
11 Managed by UT-Battellefor the Department of Energy
Configure PV
Again online
Especially usefulfor operators toupdate guidanceand relatedscreens.
12 Managed by UT-Battellefor the Department of Energy
Logging
..into generic CSS log also used for error/warn/info/debug messages
Alarm Server: State transitions, Annunciations
Alarm GUI: Ack/Un-Ack requests, Config changes
Generic Message History Viewer– Example w/ Filter on TEXT=CONFIG
13 Managed by UT-Battellefor the Department of Energy
Logging: Get timeline
Example: Filter on TYPE, PV
1. PV triggers,clears, triggers again
2. Alarm Server latches alarm
4. Problem fixed
3. Alarm Server annunciates
5. Ack’ed by operator
6. All OK
14 Managed by UT-Battellefor the Department of Energy
All Sorts of Web Reports
15 Managed by UT-Battellefor the Department of Energy
Technical View
Alarm Cfg & StateRDB
Alarm Cfg & StateRDB
IOCsIOCs
Alarm ServerCurrent Alarms: Acknowledged? Transient? Annunciated?
Alarm ServerCurrent Alarms: Acknowledged? Transient? Annunciated?
LOGLOG
MessageRDB
MessageRDB
JMS2
Speech
JMS2
Speech
JMS2
RDB
JMS2
RDB
Tomcat-ReportsTomcat-Reports
CSS ApplicationsCSS Applications
Alarm Client GUI
JMS
Alarm Updates Ack’; Config UpdatesAnnunciationsLog Messages
TALKTALK ALARM_CLIENTALARM_CLIENTALARM_SERVERALARM_SERVER
PV Updates (Channel Access, …)
16 Managed by UT-Battellefor the Department of Energy
General Alarm Server Behavior
Latch highest severity, or non-latching– like ALH “ack. transient”
Annunciate
Chatter filter ala ALH Alarm only if severity persists some minimum time .. or alarm happens >=N times within period
Optional formula-based alarm enablement:– Enable if “(pv_x > 5 && pv_y < 7) || pv_z==1”
– … but we prefer to move that logic into IOC
When acknowledging MAJOR alarm, subsequent MINOR alarms not annunciated– ALH would again blink/require ack’
17 Managed by UT-Battellefor the Department of Energy
Best Ever Alarm System Tools, Indeed
.. but Tools are only half the issue
Good configuration requires plan & follow-up.
B. Hollifield, E. Habibi,"Alarm Management: Seven (??) Effective Methods for Optimum Performance", ISA, 2007
18 Managed by UT-Battellefor the Department of Energy
Alarm Philosophy
Goal:
Help operators take correct actions
– Alarms with guidance, related displays
– Manageable alarm rate (<150/day)
– Operators will respond to every alarm(corollary to manageable rate)
19 Managed by UT-Battellefor the Department of Energy
DOES IT REQUIRE IMMEDIATE OPERATOR ACTION?– What action? Alarm guidance!
Not “make elog entry”, “tell next shift”, … Consider consequence of no action
Is it the best alarm?– Would other subsystems, with better PVs, alarm at the
same time?
What’s a valid alarm?
20 Managed by UT-Battellefor the Department of Energy
How are alarms added?
Alarm triggers: PVs on IOCs– But more than just setting HIGH, HIHI, HSV, HHSV
– HYST is good idea
– Dynamic limits, enable based on machine state,...
Requires thought, communication, documentation
Added to alarm server with– Guidance: How to respond
– Related screen: Reason for alarm (limits, …), link to screens mentioned in guidance
– Link to rationalization info (wiki)
21 Managed by UT-Battellefor the Department of Energy
Impact/Consequence GridCategory So What Minor Consequence Major Consequence
Personnel Safety PPS independent from EPICS?
Environment, Public
Can EPICS cause contained spill of mercury?
Uncontained spill??
Cost:Beam Production, Downtime,Beam Quality
No effect
Beam off < 1 sec?
Beam off <10 min
<$10000
Beam off >10min
>$10000
Mostly: How long will beam be off?
22 Managed by UT-Battellefor the Department of Energy
.. combined with Response Time
Time to Respond Minor Consequence Major Consequence
>30 Minutes NO_ALARM MINOR
10..30 minutes MINOR MAJOR
<10 minutes MAJOR MAJOR + Annunciate
– This part is still evolving…
23 Managed by UT-Battellefor the Department of Energy
Example: Elevated Temp/Press/Res.Err./…
Immediate action required?– Do something to prevent interlock trip
Impact, Consequence?– Beam off: Reset & OK, 5 minutes?
– Cryo cold box trip: Off for a day?
Time to respond?– 10 minutes to prevent interlock?
MINOR? MAJOR?
Guidance: “Open Valve 47 a bit, …”
Related Displays: Screen that shows Temp, Valve, …
24 Managed by UT-Battellefor the Department of Energy
“Safety System” Alarms
Protection Systems not per se high priority– Action is required, but we’re safe for now, it won’t
get worse if we wait
Pick One“Mommy, I need to gooo!”“Mommy, I went”
(Does it require operator action? How much time is there?)
25 Managed by UT-Battellefor the Department of Energy
Avoid Multiple Alarm Levels
26 Managed by UT-Battellefor the Department of Energy
Bad Example: Old SNS ‘MEBT’ Alarms
Each amplifier trip:≥ 3 ~identicalalarms, no guidance
Rethought w/ subsystemengineer, IOC programmerand operators: 1 better alarm
27 Managed by UT-Battellefor the Department of Energy
Alarms for Redundant Pumps
28 Managed by UT-Battellefor the Department of Energy
Alarm Generation: Redundant Pumps the wrong way
Control System– Pump1 on/off status
– Pump2 on/off status
Simple Config setting: Pump Off => Alarm:– It’s normal for the ‘backup’ to be off
– Both running is usually bad as well Except during tests or switchover
– During maintenance, both can be off
29 Managed by UT-Battellefor the Department of Energy
Redundant Pumps
Control System– Pump1 on/off status
– Pump2 on/off status
– Number of running pumps
– Configurable number of desired pumps
Alarm System: Running == Desired?– … with delay to handle tests, switchover
Same applies to devices that are only needed on-demand
11Required Pumps:Required Pumps:
30 Managed by UT-Battellefor the Department of Energy
Weekly Review: How Many? Top 10?
31 Managed by UT-Battellefor the Department of Energy
A lot of information available
How often did PV trigger?
For how long?
When?
Temporary issue?Or need HYST,alarm delay,fix to hardware?
32 Managed by UT-Battellefor the Department of Energy
Weekly Check: Stale, Forgotten?
33 Managed by UT-Battellefor the Department of Energy
GUI: Similar to SNS GUI shown hereGUI: Similar to SNS GUI shown here
JMS
CSS OtherOther
RDBRDB
LOGLOG ALARMALARM
JMS2RDB
IOCIOC
LDAPLDAP
Interconnection ServerInterconnection Server
What about the DESY Alarm System?
FiltersFilters
Filt.AlrmFilt.Alrm
No Channel Access Monitor of selected alarm PVs!
IOCs push all alarms via new protocol into Interconn. Server.
No Channel Access Monitor of selected alarm PVs!
IOCs push all alarms via new protocol into Interconn. Server.
34 Managed by UT-Battellefor the Department of Energy
Design Choices
Similar alarm table and tree GUIs
JMS for communication– slightly different messages, though
DESY IOCs send all alarms, then filtered in AMS– DESY: All IOC alarms should show up in AMS, zero additional
configuration
– At SNS, how many of the 350000 PVs would send alarms?We want to make the addition of alarms simple, but not automatic, and encourage guidance, related displays.
DESY/SNS: LDAP vs. RDB for configuration/state– Choice was based on available infrastructure.
JMS Listeners– SNS: Logger, Annunciator
– DESY: Logger, Send SMS, EMail, Voice Mail
35 Managed by UT-Battellefor the Department of Energy
AMS – Alarm Message SystemConfiguration Views
- AMS is a JMS (Java Message Service ) based Information-System.
- It offers different options for message distribution:- SMS
- Voices-Mail
- Another JMS Topic
- Messages are sent on the basis of filtered PV. (Filters can be combined: AND/OR – Sequence)
- The recipients are Users or User groups. User groups can be used in two ways.
- Send to all Users
- Send to one after another until a user confirms the message
User, User groups as well as Filters and Actions are configures in the AMS configuration View
Slide info from Helge Rickens, DESYSlide info from Helge Rickens, DESY
36 Managed by UT-Battellefor the Department of Energy
AMS
Editor to configure a
Filter
Editor to configure a
Filter
Different views to select User, User-
Group, Filter condition, Filter
and Alarm Topics
Different views to select User, User-
Group, Filter condition, Filter
and Alarm Topics
Slide info from Helge Rickens, DESYSlide info from Helge Rickens, DESY
37 Managed by UT-Battellefor the Department of Energy
Summary
BEAST operational since Feb’09– Needs a logo
– For now without BEAUtY
– DESY AMS is similar and has beenoperational for longer
Pick either, but good configuration requires work in any case– Started with previous “annunciated” alarms
~300, no guidance, no related displays Now ~330, all with guidance, rel. displays
– “Philosophy” helps decide what gets added and how Immediate Operator Action? Consequence?
Response Time?
– Weekly review spots troubles and tries to improve configuration