An Experimental Evaluation on Reliability Features of N-Version Programming
description
Transcript of An Experimental Evaluation on Reliability Features of N-Version Programming
An Experimental Evaluation on Reliability Features of N-Version Programming
Authors
Xia Cai, Michael R. Lyu and Mladen A. Vouk
International Symposium on Software Reliability Engineering 2005 (ISSRE’05)
Presented byOnur TÜRKYILMAZ
2
Outline
Introduction
Motivation
Experimental evaluation
• Fault analysis
• Failure probability
• Fault density
• Reliability improvement
Discussions
Conclusion
3
Introduction
N-version programming is one of the main techniques for software fault tolerance
It has been adopted in some mission-critical applications
Yet, its effectiveness is still an open question
• What is reliability enhancement?
• Is the fault correlation between multiple versions a big issue that affects the final reliability?
4
Research questions
What is the reliability improvement of NVP?
Is fault correlation a big issue that will affect the final reliability?
What kind of empirical data can be comparable with previous investigations?
5
Motivation
To address the reliability and fault correlation issues in NVP
To conduct a comparable experiment with previous empirical studies
To investigate the “variant” and “invariant” features in NVP
6
Experimental background
Some features about the experiment• Complexity
• Large population
• Well-defined
• Statistical failure and fault records
Previous empirical studies• UCLA Six-Language project
• NASA 4-University project
• Knight and Leveson’s experiment
• Lyu-He study
7
Experimental setup
RSDIMU avionics application
34 program versions
A team of 4 students
Comprehensive testing exercised• Acceptance testing: 800 functional test cases and 400 random
test cases
• Operational testing: 100,000 random test cases
Failures and faults collected and studied
Qualitative as well as quantitative comparisons with NASA 4-University project performed
8
Experimental description
Geometry
- estimating the vehicle acceleration using eight redundant accelerometers (sensors)
- sensors mounted on the four triangular faces of a semioctahedron
9
Comparisons between the two projects
Qualitative comparisons
• General features
• Fault analysis in development phase & operational test
Quantitative comparisons
• Failure probability
• Fault density
• Reliability improvement
10
General features comparison
11
Faults in development phase
12
Distribution of related faults
13
Fault analysis in development phase Common related faults
Display module (easiest part)
Calculation in wrong frame of reference
Initialization problems
Missing certain scaling computation
Faults in NASA project only Division by zero
Incorrect conversion factor
wrong coordinate system problem.
14
Fault analysis in development phase (cont’)
Both cause and effect of some related faults remain the same
Related faults occurred in both easy and difficult subdomains
Some common problems, e.g., initialization problem, exist for different programming languages
The most fault-prone module is the easiest part of the application
15
Faults in operational test
16
Input/Output domain classification
Normal operations are classified as:
Si,j = {i sensors previously failed and
j of the remaining sensors fail
| i = 0, 1, 2; j = 0, 1 }
Exceptional operations: Sothers
17
Failures in operational test
States S0,0, S1,0 and S2,0 are more reliable than states S0,1, S1,1, S2,1
Exceptional state reveals most of the failures
The failure probability in S0,1 is the highest
The programs inherit high reliability on average
18
Coincident failures
Two or more versions fail at the same test case, whether the outputs identical or not
The percentage of coincident failures versus total failures is low:• Version 22: 25/618=4%
• Version 29: 32/2760=1.2%
• Version 32: (25+32)/1351=4.2%
19
Failure bounds for 2-version system
Lower and upper bounds for coincident failure probability under Popov et al model
DP1: normal test cases without sensor failures dominates all the testing cases DP3: the test cases evenly distributed in all subdomains DP2: between DP1 & DP3
Version pair
DP1 DP2 DP3
Lower bound
Upper bound
Lower bound
Upper bound
Lower bound
Upper bound
(22,34) 0.000007 0.000130 0.000342 0.006721 0.000353 0.008396
(29,34) 0.000000 0.000001 0.000009 0.000131 0.000047 0.000654
Average in our project
1.25*10-8 2.34*10-7 6.26*10-7 0.000012 7.13*10-7 0.000016
Average in NASA project
2.32*10-7 0.000007 0.000023 0.000103 0.000072 0.000276
20
Quantitative comparison in operational test
NASA 4-university project: 7 out of 20 versions passed the operational testing
Coincident failures were found among 2 to 8 versions
5 out of 7 faults were not observed in our project
21
Invariants
Reliable program versions with low failure probability
Similar number of faults and fault density
Distinguishable reliability improvement for NVP, with 102 to 104 times enhancement
Related faults observed in both difficult and easy parts of the application
22
Variants
Compared with NASA project, our project:
• Some faults not observed
• Less failures
• less coincident failures
• Only 2-version coincident failures (other than 2- to 8- version failures)
• The overall reliability improvement is an order of magnitude larger
23
Discussions
The improvement of the project may attributed to
• stable specification
• better programming training
• experience in NVP experiment
• cleaner development protocol
• different programming languages & platforms
24
Discussions (cont’)
The hard-to-detected faults are only hit by some rare input domains
New testing strategy is needed to detect such faults:
• Code coverage?
• Domain analysis?
25
Conclusion
An empirical investigation is performed to evaluate reliability features by a comprehensive comparisons on two NVP projects
NVP can provide distinguishable improvement for final reliability according to the empirical study conducted
Small number of coincident failures provides a supportive evidence for NVP
Possible attributes that may affect the reliability improvement are discussed
Thank you !
Q & A