Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable...
-
Upload
natasha-poynor -
Category
Documents
-
view
218 -
download
0
Transcript of Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable...
Software Fault Tolerance (SWFT)How to Design, Develop and Evaluate
Robust SW and OS’s
Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
Prof. Neeraj Suri
Abdelmajid Khelil (Majid)Constantin Sârbu (Dinu)
Brahim Ayari
Dept. of Computer ScienceTU Darmstadt, Germany
2
Outline of today’s lecture
Course info Course goals Research related to course
© DEEDS GroupSWFT WS ‘07
3
Related Courses
Lectures SW/OS Fault-Tolerance Kanonik: Introduction to Trusted Systems
Seminars Embedded Mobile Computing Secure and Reliable OS
Labs Selected Topics in Dependable SW & Mobile Computing
4
Course Info
Lecture (in English)Wed. (11:40am - 1:20pm), C120
Exercises (in E & G):Thu., 3. DS (10:45-13:20), C110 Starts 25th 2007
Course webpage:http://www.deeds.informatik.tu-darmstadt.de
© DEEDS GroupSWFT WS ‘07
5
Grading Related
Credit points: 7.5 - SWS: 5 (2+3) Exam
Mid-term exam: 25% (E or G), December 17th, 2007 Presentations: 24% (E or G) Final exam: 51% (E or G)
Exercises: (E or G) Practice + presentations
Lab stuff: Optional Do some live programming Gain some practical experience Please take this opportunity!
• May improve your grade (bonus points)• If you have a suggestion for a lab discuss it with us!
© DEEDS GroupSWFT WS ‘07
6
Learn more...
We have a selection of sub-projects related to this lecture will be targeted to interests of students See example See slides research@DEEDS
We offer Bachelor/masters theses HiWi Fun
© DEEDS GroupSWFT WS ‘07
7
Course Goals
Learn software fault tolerance concepts Learn how to develop robust programs
how to deal with software bugs software fault tolerance: continuation of service in the
face of failures
Learn concepts and mechanisms to build software fault tolerance tools
Learn how to evaluate and test robust SW/OS Learn some SW issues related to (a) mobile SW
and (b) security
© DEEDS GroupSWFT WS ‘07
8
Course Outline
1. Introduction/Concepts of SWFT2. SW-FT Mechanisms: Design Aspects
Process pairs, selective retries, graceful degradation,… Checkpointing, N-copy programming (NCP), N-version
Programming (NVP), micro-reboots,... Robust programming, …
3. Evaluation of fault-tolerant SW & OS’s SW reliability SW/OS stress testing Hardening of OS’s, Patching OS Driver profiling and testing
4. Transactional/Mobile SW Mobile transcation (FT, recovery ..), Wireless sensor
networks (Energy-efficient FT, spatial/temporal redundancy ..)
5. SW and Security: Buffer overflows etc
© DEEDS GroupSWFT WS ‘07
9
Literature
Most lectures will be based on research papers: URLs of papers available via class page References on slides (available on web)
Coverage for exams is primarily (a) the lecture content and (b) issues covered over the Exercises....so attending is important
© DEEDS GroupSWFT WS ‘07
10
Research@DEEDS Related to Course
DEEDS: Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
Dependable Embedded Systems & Software (DEEDS)
12
The Spread: Dependability, Safety, Security X-by-Wire: Safety-Service Critical Systems
(Aerospace/Automotive) System Architecture Design Protocols (Synchronous, Membership, Diagnosis, Recovery,
Scheduling) Dependability Evaluation Verification & Validation (V&V)
• Experimental fault injection (PROPANE) • Formal Methods
Distributed Systems: Byzantine Consensus, Failure Detectors, Verification
Mobile/WSN Networks: Fault-tolerant protocols, routing, reliability analysis
CPU Architectures: Energy-efficient FT, Transient Resilience, …
Operating Systems: OS Robustness Evaluation, Driver Testing & Evaluation, Vulnerability Profiling, Embedded/Desktop OS, …
14
Automotive/Aerospace (Federated Systems)
Dist. ResourcesNodes + Comm.
Comm.
Diagnostics
Steering
Env. Ctrl.
BrakingEngine/FlightControl
Navigation
User I/O
Multimedia
Body Elec.
Applications
Middleware
Resources
multiple nodes, varied criticality buses, clusters, bridges (HW, SW), …
15
Re-usable Core (technology + domain
invariant)
Core Services
MW + Arch
PlatformsShared, Distributed/Networked
High-level Services app1 app2 appn
Managing Complexity: Federated to Integrated
Applications will change Multi-Domain Solutions: Automotive,
Aerospace, Control Compositional Framework (+Tools)
Integrates diverse criticality apps Delineation over integration for
functionality and safety Flexible building blocks & interfaces
Technologies will change Benefits
Design flexibility, short time-to-market
Reduced number of nodes Reduced complexity and cost
16
P1: X-by-Wire Protocols
On-Line Diagnosis Enhance sustained autonomic system operations Self-healing
• On-line recovery (transient faults) Self-diagnosing
• Maintenance actions (permanent faults)
Challenges Avoid overreaction to transient faults
• The cure can be worse than the disease! Support mixed-criticalities applications
• From X-by-Wire to Comfort applications Portability for time-triggered (TT) platforms
• Add-on, middleware approach
Contact: Marco Serafini ([email protected])
17
P2: Mobile Database Systems
Mobile transactions Commit protocols
Challenges: Frequent perturbations Heterogeneity
• Wireless links (WLAN, UMTS, …)• Mobile nodes (laptops, PDAs, …)
Failures• Unpredictable disconnections• Node/Communication failures
Infrastructure-based vs. ad-hoc Mobile Ad-hoc NETworks (MANETs) Wireless Sensor Networks (WSNs)
Wired NetworkWired
Network
WLAN UMTSGPRS
Contacts: Brahim Ayari ([email protected])Abdelmajid Khelil ([email protected])
WAVE
18
P3: Dependable Ad-hoc Sensor Networks Applications
Car2Car communication• Cooperative driving• Announcements
Tracking & monitoring Measurement Disaster rescue
Research challenges Energy (efficiency,
maintenance..) Frequent failures (detection,
diagnosis..) Safety-critical applications Reliable communication
Contacts: Faisal Karim ([email protected])Abdelmajid Khelil ([email protected])
WAVE
WLAN
ZigBee
19
P4: Energy Efficient Dependable Systems Trends
Heterogeneous systems Increased dependence upon technology Mobility low voltage smaller noise margins more
transient errors Increased complexity Integration/communication between systems
Energy efficient fault tolerance Evaluate Characterize Optimize/trade-off
Dimensions Design-time vs. run-time Time vs. space System level vs. components level Service degradations and reconfiguration
Contact: Neeraj Suri ([email protected])
20
P5: Robustness Evaluation of Embedded OS/SWProblem:SW systems are vulnerable to errors in Commercial-Off-The-Shelf (COTS) components.Characterization of impact of 3rd party SW is hard.
Approach: Focus on device drivers Error propagation analysis using fault injection Robustness enhancing wrappers
Applications: Verification COTS integration (acceptance) Robustness enhancement
BFDT FZ CM
atadisk
BFDT FZ CM
91C111
BFDT FZ CM
cerfio_serial
Class 3
Class 2
Class 1
No failure
0%
20%
40%
60%
80%
100%
Contact: Andréas Johansson ([email protected])
21
P6: Improved Testing of Device DriversProblemFaulty COTS drivers used in modern OSs have a significant impact on system reliability.They are hard to test as execute in kernel space and are delivered sans source code.
Applications Black-box testing for COTS drivers Profiling, debugging System activity monitoring
Research aims Profile driver behavior at runtime Expedite driver testing by focusing on
runtime activity Test methods tuned to OS/driver
operational profiles
…
Hardware Layer
System Services
OS kernel
Application 1 Application p
Contact: Constantin Sarbu ([email protected])
Driver
Monitor