DemoSystem2004 RTES Collaboration (NSF ITR grant ACI-0121658) Real-Time and Embedded Technology &...
-
Upload
madeline-arnold -
Category
Documents
-
view
221 -
download
0
Transcript of DemoSystem2004 RTES Collaboration (NSF ITR grant ACI-0121658) Real-Time and Embedded Technology &...
DemoSystem2004
RTES Collaboration (NSF ITR grant ACI-0121658)
Real-Time and Embedded Technology & Applications Symposium (IEEE)FALSE II WorkshopSan Francisco, March 7-10, 2005
7 March 2005 FALSE II Workshop 2
Introduction
Our context: BTeV A 20 TeraHz Real-Time System
Our Solution MIC, ARMORs, VLAs
The demonstration system Demonstration Comments
7 March 2005 FALSE II Workshop 3
BTeV High Energy Physics Experiment: A 20 TeraHz Real-Time System Input: 800 GB/s (2.5 MHz) Level 1
Lvl1 processing: 190s rate of 396 ns
528 “8 GHz” G5 CPUs (factor of 50 event reduction)
high performance interconnects Level 2/3:
Lvl 2 processing: 5 ms (factor of 10 event reduction)
Lvl 3 processing: 135 ms (factor of 2 event reduction)
1536 “12 GHz” CPUs commodity networking Output: 200 MB/s (4 kHz) = 1-2 Petabytes/year
7 March 2005 FALSE II Workshop 4
The Problem Monitoring, Fault Tolerance and Fault Mitigation are
crucial In a cluster of this size, processes and daemons are
constantly hanging/failing without warning or notice Software reliability depends on
Physics detector-machine performance Program testing procedures, implementation, and design
quality Behavior of the electronics (front-end and within the trigger)
Hardware failures will occur! one to a few per week
7 March 2005 FALSE II Workshop 5
The Problem (continued) Given the very complex nature of this system
where thousands of events are simultaneously and asynchronously cooking, issues of data integrity, robustness, and monitoring are critically important and have the capacity to cripple a design if not dealt with at the outset… BTeV [needs to] supply the necessary level of “self-awareness” in the trigger system.[June 2000 Project Review]
7 March 2005 FALSE II Workshop 6
BTeV’s Response: RTES The Real Time Embedded System Group
A collaboration of five institutions, University of Illinois University of Pittsburgh University of Syracuse Vanderbilt University (PI) Fermilab
NSF ITR grant ACI-0121658 Physicists and Computer Scientists/Electrical
Engineers at BTeV institutions with expertise in High performance, real-time system software and hardware, Reliability and fault tolerance, System specification, generation, and modeling tools.
Working on fault management in large computing clusters
7 March 2005 FALSE II Workshop 7
RTES Goals for BTeV High availability
Fault handling infrastructure capable of Accurately identifying problems (where, what, and why) Compensating for problems (shift the load, changing
thresholds) Automated recovery procedures (restart /
reconfiguration) Accurate accounting Extensibility (capturing new detection/recovery
procedures) Policy driven monitoring and control
Dynamic reconfiguration adjust to potentially changing resources
7 March 2005 FALSE II Workshop 8
RTES Goals for BTeV (continued) Faults must be detected/corrected ASAP
semi-autonomously with as little human intervention as possible
distributed and hierarchical monitoring and control Life-cycle maintainability and evolvability
to deal with new algorithms, new hardware and new versions of the OS
7 March 2005 FALSE II Workshop 9
The RTES SolutionAnalysis
Runtime
Design andAnalysis
AlgorithmsFault Behavior
Resource
Synt
hesi
s
PerformanceDiagnosabilityReliability
ExperimentControlInterface
SynthesisFe
edba
ck
ModelingReconfigure
Logi
cal D
ata
Net
Region Operations
Mgr
Region Operations
Mgr
L2,3/CISC/RISC L1/DSP
Region Fault Mgr
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/LinuxARMOR/Linux
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/LinuxARMOR/Linux
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/DSPARMOR/DSP
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/DSPARMOR/DSP
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Logical Control N
etwork
Logical Control N
etworkLo
gica
l Dat
a N
et
Logical Control N
etwork
Logical Control N
etwork Local
Fault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/LinuxARMOR/Linux
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Logical Control N
etwork
Global Fault Manager
Global Operations Manager
Soft Real Time Hard
7 March 2005 FALSE II Workshop 10
RTES Deliverables A hierarchical fault management system and toolkit:
Model Integrated Computing GME (Generic Modeling Environment) system modeling tools
and application specific “graphic languages” for modeling system configuration, messaging, fault behaviors, user interface, etc.
ARMORs (Adaptive, Reconfigurable, and Mobile Objects for Reliability) Robust framework for detection and reaction to faults in
processes VLAs (Very Lightweight Agents for limited resource
environments) To monitor/mitigate at every level
DSP, Supervisory nodes, Linux farm, etc.
7 March 2005 FALSE II Workshop 11
Configuration through Modeling Multi-aspect tool, separate views of
Hardware – components and physical connectivity Executables – configuration and logical connectivity Fault handling behavior using hierarchical state machines
Model interpreters can generate the system image At the code fragment level (for fault handling) Download scripts and configuration
Modeling “languages” are application specific Shapes, properties, associations, constraints Appropriate to application/context
System model Messaging Fault mitigation GUI, etc.
7 March 2005 FALSE II Workshop 12
Modeling Environment
Fault handlingProcess dataflowHardware Configuration
7 March 2005 FALSE II Workshop 13
ARMOR Adaptive Reconfigurable Mobile Objects of Reliability:
Multithreaded processes composed of replaceable building blocks
Provide error detection and recovery services to user applications
Hierarchy of ARMOR processes form runtime environment: System management, error detection, and error recovery
services distributed across ARMOR processes. ARMOR Runtime environment is itself self checking.
3-tiered ARMOR support of user application Completely transparent and external support Enhancement of standard libraries Instrumentation with ARMOR API
7 March 2005 FALSE II Workshop 14
ARMOR Scaleable Design ARMOR processes designed to be reconfigurable:
Internal architecture structured around event-driven modules called elements.
Elements provide functionality of the runtime environment, error-detection capabilities, and recovery policies.
Deployed ARMOR processes contain only elements necessary for required error detection and recovery services.
ARMOR processes resilient to errors by leveraging multiple detection and recovery mechanisms:
Internal self-checking mechanisms to prevent failures from occurring and to limit error propagation.
State protected through checkpointing. Detection and recovery of errors.
ARMOR runtime environment fault-tolerant and scalable: 1-node, 2-node, and N-node configurations.
7 March 2005 FALSE II Workshop 15
ARMOR System: Basic Configuration
Heartbeat ARMORDetects and recovers FTM failures
Fault Tolerant ManagerHighest ranking manager in the system
DaemonsDetect ARMOR crash and hang failures
ARMOR processesProvide a hierarchy of error detection and recovery.ARMORS are protected through checkpointingand internal self-checking.
Execution ARMOROversees application process(e.g. the various Trigger Supervisor/Monitors)
Daemon
Fault TolerantManager (FTM)
Daemon
HeartbeatARMOR
Daemon
ExecARMOR
AppProcess
network
7 March 2005 FALSE II Workshop 16
ARMOR: Internal Structure
Daemon
TCP ConnectionMgmt.
Named PipeMgmt.
ProcessMgmt.
DetectionPolicy
ARMOR Microkernel
ProcessMgmt. Network
Daemon
Daemon
Remote daemons
Node 1 Node 2
Node 3
ARMOR Microkernel
RecoveryPolicy
Local Manager ARMOR
TriggerApplication
ExecutionController
Adaptive, Reconfigurable, and Mobile Objects for Reliability
7 March 2005 FALSE II Workshop 17
Very Lightweight Agents
Minimal footprint Platform independence
Employable everywhere in the system!
Monitors hardware and software
Handles fault detection& communications with higher level entities
PhysicsApplication
HardwareOS Kernel
(Linux)
VLA
L2/L3 Manager Nodes(Linux)
PhysicsApplication
Level 2/3 Farm Nodes(Linux)
NetworkAPI
7 March 2005 FALSE II Workshop 18
BTeV Prototype Farms at Fermilab
16-node Apple G5 farm
Infiniband switch
L2/3 Trigger Racks F1-F5
L2/3 Trigger Workers L1 Trigger Farm
7 March 2005 FALSE II Workshop 19
L2/3 Prototype Farm Setup
100BT ethernet
1000BT ethernet
7 March 2005 FALSE II Workshop 20
L2/3 Prototype Farm Components Using PC’s from old PC Farms at Fermilab
3 dual-CPU Manager PCs boulder (1 GHz P3) - meant for data server iron (2.8 GHz P4) - httpd, BB and Ganglia gmetad slate (500 MHz P3) - httpd, BB
Managers have a private network through 1 Gbps link bouldert, iront, slatet
15 dual-CPU (500 MHz P3) workers (btrigw2xx) 84 dual-CPU (1GHz P3) PC workers (btrigw1xx)
No plans to add more, but may replace with faster ones Ideal for RTES
11 workers already have problems!
7 March 2005 FALSE II Workshop 21
The Demonstration System Components Matlab as GUI engine
GUI defined by GME models Elvin publish/subscribe networking (everywhere)
Messages defined by GME models RunControl (RC) state machines
Defined by GME models ARMORs
Custom elements defined by GME models FilterApp, DataSource
Actual physics trigger code File-reader supplies physics/simulation data to the FilterApp
Demo “faults” encoded onto Source-Worker data messages for execution on the Worker
7 March 2005 FALSE II Workshop 22
The Demonstration System Architecture
Iron
Gangliapublic
private
laptop
MatlabElvin
laptop
MatlabElvin
Boulder
Elvin
Global
RC, ARMOR
Regional
RC, ARMOR
Worker
RC, VLA, ARMORFilterApp
Worker
RC, VLA, ARMORFilterApp
…
Regional
RC, ARMOR
Worker
RC, VLA, ARMORFilterApp
Worker
RC, VLA, ARMORFilterApp
…
Regional
RC, ARMOR
Worker
RC, VLA, ARMORFilterApp
…
DataSource
file reader
7 March 2005 FALSE II Workshop 23
BB and Ganglia
http://iron.fnal.gov/
For traditional monitoring
7 March 2005 FALSE II Workshop 24
Matlab For RTES/Demo-specific monitoring
7 March 2005 FALSE II Workshop 25
Comments This is an integrated approach – from hardware to
physics applications Standardization of resource monitoring, management, error
reporting, and integration of recovery procedures can make operating the system more efficient and make it possible to comprehend and extend.
There are real-time constraints that must be met Scheduling and deadlines Numerous detection and recovery actions
The product of this research will Automatically handle simple problems that occur frequently Be as smart as the detection/recovery modules plugged into it
7 March 2005 FALSE II Workshop 26
Comments (continued)
The product can lead to better or increased System uptime by compensating for problems or
predicting them instead of pausing or stopping the experiment
Resource utilization the system will use resources that it needs
Understanding of the operating characteristics of the software
Ability to debug and diagnose difficult problems
7 March 2005 FALSE II Workshop 27
Further Information
General information about RTES http://www-btev.fnal.gov/public/hep/detector/rtes/
General information about BTeV http://www-btev.fnal.gov/
Information about GME and the Vanderbilt ISIS group http://www.isis.vanderbilt.edu/
7 March 2005 FALSE II Workshop 28
Further Information (continued) Information about ARMOR technology
http://www.crhc.uiuc.edu/DEPEND/projects-ARMORs.htm
Talks from our last workshop http://false2002.vanderbilt.edu/program.php
Wiki (internal, today) whcdf03.fnal.gov/BTeV-wiki/DemoSystem2004
Elvin publish/subscribe networking www.mantara.com
7 March 2005 FALSE II Workshop 29
Demonstration Outline
GME SIML, DTML, Custom Elements, Matlab GUI Meta language
Building and Deployment Runtime
ARMORs, VLAs Fault detection and mitigation
Reconfigure and Redeploy 2x, 4x
7 March 2005 FALSE II Workshop 30
Demonstration Slides
7 March 2005 FALSE II Workshop 31
MIC Based System Modeling
Design a domain-specific modeling language Provision concepts for all the different aspects of
the system Express their interactions A “super” system-wide modeling language
Implement a suite of translators Generate fault-mitigation behaviors Generate system build configurations Link with user code/libraries
7 March 2005 FALSE II Workshop 32
SIML –System Integration Modeling Language Model
Component Hierarchy and Interactions
Loosely specified model of computation
Model information relevant for system configuration
7 March 2005 FALSE II Workshop 33
System Architecture RunControl
Manager Router
Information How many
regions ? How many
worker nodes inside the region?
Node Identification information
7 March 2005 FALSE II Workshop 34
Data Type Modeling Language -DTML
•Modeling of Data Types and Structures
•Configure marshalling-demarshalling interfaces for communication
7 March 2005 FALSE II Workshop 35
Configuration of ARMOR Infrastructure (A)
Modeling of Fault Mitigation Strategies (B)
Specification of Communication Flow (C)
A B
C
Fault Mitigation Modeling Language (1)
7 March 2005 FALSE II Workshop 36
Fault Mitigation Modeling Language (2)
Model translator generates fault-tolerant strategies and communication flow strategy from FMML models
Strategies are plugged into ARMOR infrastructure as ARMOR elements ARMOR infrastructure uses these custom elements to provide customized fault-tolerant
protection to the application
ARMOR
ARMOR Microkernel
Fault TolerantCustom Element
FMML Model – Behavior Aspect
CommunicationCustom Element
Switch(cur_state)case NOMINAL:I f (time<100) { next_state = FAULT; }Break;case FAULT if () { next_state = NOMINAL; } break;
class armorcallback0:public Callback{
public:ack0(ControlsCection *cc, void *p) : CallbackFaultInjectTererbose>(cc, p) { } void invoke(FaultInjecerbose* msg)
{ printf("Callback. Recievede
dtml_rcver_LocalArmor_ct *Lo; mc_message_ct *pmc = new m_ct; mc_bundle_ct *bundlepmc->ple();
pmc->assign_name(); bundle=pmc->push_bundle();mc);
}};
Translator
7 March 2005 FALSE II Workshop 37
User Interface Modeling Language
Enables reconfiguration of user interfaces
Structural and data flow codes generated from models
User Interface produced by running the generated code
Example User Interface Model
7 March 2005 FALSE II Workshop 38
User Interface Generation
Generator
7 March 2005 FALSE II Workshop 39
GME meta language(s) –Meta Model
7 March 2005 FALSE II Workshop 40
Versioning/Build System An equivalent XML representation of GME models is stored in the CVS tree
UDM tools provide forward and backward translation from MGA to XML Model transformers developed with UDM
Can be built for Windows and Linux platforms Model transformation code is also stored in CVS
Compiler/Linker
Translator
SourceFiles
Models
LanguageSpecification
ArtifactsObjectArtifacts
ObjectExecutables
IN
OUT
IN OUT
IN
Compiler/Linker
Translator
SourceFiles
Models
LanguageSpecification
ArtifactsObjectArtifacts
ObjectExecutables
IN
OUT
IN OUT
IN
7 March 2005 FALSE II Workshop 41
Versioning/Build System
Multi-stage build1. Compiles model transformers2. Makefiles invoke model transformers upon the stored models and generates code artifacts
(behavior/config code)3. Generated code place in the existing source tree 4. Source tree is compiled and linked to produce binaries5. Additional tools take over to traverse the run tree and distribute the components to all the nodes
Run TreeRun Tree
System Executables
Build TreeBuild Tree
UDM TranslatorsUDM Translators
CanonicalXML models
Domain ModelsDomain Models
MetamodelsMetamodels
Language Specification
Domain Artifacts
ArtifactsCompiler/Linker
Translator
SourceFiles
Models
LanguageSpecification
ObjectSourceArtifactsOUT
IN
OUT
IN
IN
7 March 2005 FALSE II Workshop 42
ARMOR Configuration in RTES
FTM
Global Mgr Heartbeat/Source node
Regional Mgr 1
Worker 1.1
HB
Exec ARMOR
Exec ARMOR
Filter 1 Filter 2 Event Builder
Worker 1.2
Regional Mgr 2
Exec ARMOR
Worker 2.1
Elvin Router
GUI
Region 1
Elvin msg
ARMOR msg Exec ARMOR
Event Source
7 March 2005 FALSE II Workshop 43
Execution ARMOR in Worker
Worker
Exec ARMOR
Filter 1 Filter 2 Event Builder
ARMOR MicrokernelARMOR Microkernel
Msg tableMsg table
Named pipeNamed pipe
Msg routingMsg routing
Process mgmtProcess mgmt
App id mgmtApp id mgmt
Crash detectionCrash detection
Infrastructure elements
Elvin/Armor msg converterElvin/Armor msg converter
Hang detectionHang detection
Node status reportNode status report
Filter crash reportFilter crash report
Bad data reportBad data report
Execution time reportExecution time report
Custom elements
Memory leak reportMemory leak report
7 March 2005 FALSE II Workshop 44