A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis
-
Upload
heiko-koziolek -
Category
Technology
-
view
1.000 -
download
0
description
Transcript of A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis
© ABB Group April 9, 2023 | Slide 1
A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis
Heiko Koziolek, Bastian Schlich, Carlos Bilich, ABB Corporate Research, 2010-11-01
Architecture-based Software Reliability Analysis (ABSRA)
What?
Typical questions of software architects concerning reliability
„What is the reliability (probability of failures) in my system?“
„How do individual components contribute to the system reliability?“
„Which architectural alternative is best for reliability?“
„Where shall I introduce fault-tolerance mechanisms?“
„How to distribute my limited testing efforts among components?“
Additional questions by ABB
„How much more reliable is a new architecture than a former one?“
„Does ABSRA work on large-scale systems?“
© ABB Group April 9, 2023 | Slide 2
Architecture-based Software Reliability Analysis (ABSRA)
How?
© ABB Group April 9, 2023 | Slide 3
Softwarecomponents, control flow, reliabilities
R=0.995
R=0.982
R=0.937
Markov Model
combine
Markov Model
Solution
trans-form
R = 0.9923Predicted system
reliabilitysolve
im-prove
Related workExisting empirical studies
© ABB Group April 9, 2023 | Slide 4
”… very little effort has been devoted to the validation of architecture-based software reliability techniques.”
[Gokhale2007, IEEE Transactions on Dependable and Secure Computing, Vol. 4, No. 1]
Source Name Year Lang. LOC # Components[Gokhale2004, Perf. Eval.]
SHARPE 1998 C 35,000 30
[Goseva2001, ISSRE]
ESA 2001 C 10,000 3
[Goseva2005,ISSRE]
GCC 2005 C 350,000 13
[Wang2005,JSS]
SMS 2006 C/C++ 13,000 15
[Goseva2006,ISSRE]
IDN 2006 C 11,000 6
Source Name Year Lang. LOC # Components[Gokhale2004, Perf. Eval.]
SHARPE 1998 C 35,000 30
[Goseva2001, ISSRE]
ESA 2001 C 10,000 3
[Goseva2005,ISSRE]
GCC 2005 C 350,000 13
[Wang2005,JSS]
SMS 2006 C/C++ 13,000 15
[Goseva2006,ISSRE]
IDN 2006 C 11,000 6
Our Paper ABB 2010 C++ >3,000,000 8 (>100)
System under study: Process control system
© ABB Group April 9, 2023 | Slide 5
System under study: Process control systemTopology
© ABB Group April 9, 2023 | Slide 6
Plant / Office Network
NetworkIsolation
Device
RemoteWorkplaces
Firewall
Internet
RemoteWorkplaces
Redundant Network
Workplaces
Controllers
Servers
Fieldbus
Remote I/O andField devices
System under study: Process control systemSubsystems within the servers
© ABB Group April 9, 2023 | Slide 7
Which steps are required for ABSRA?
Estimate component failure probabilities
Estimate transition probabilities
Construct the Markov model
Exploit the results
© ABB Group April 9, 2023 | Slide 8
Estimate component failure probabilitiesExisting methods
Code metrics [Nagappan2006]
• Validity debated
Reliability growth modeling [IEEE Std 1633-2008]
• Requires component failure reports
Random/statistical testing [Miller1992]
• Does not scale, difficult to apply on components
Fault injection [Gokhale2004]
• Does not determine the current reliability
Explicit failure modeling [Cheung2008]
• Accuracy unknown
© ABB Group April 9, 2023 | Slide 9
Reliability growth modelingGeneral principle
© ABB Group April 9, 2023 | Slide 10
0 ,
)(
))(exp()()(),,(
1
llilii
ilg
Littlewood/Verrall Model
Reliability growth modeling Using the Littlewood/Verrall-model on one subsystem
© ABB Group April 9, 2023 | Slide 11
Filtered subsystem bug list Release dates
Curve fitting in CASRE 3.0http://www.openchannelsoftware.com/projects/CASRE_3.0/
Reliability growth modeling Result
© ABB Group April 9, 2023 | Slide 12
R1= ...
R8= ...
R4= ...
R3= ...
R5= ...
R6= ...
R7= ...
R2= ...
Which steps are required for ABSRA?
Estimate component failure probabilities
Estimate transition probabilities
Construct the Markov model
Exploit the results
© ABB Group April 9, 2023 | Slide 13
Estimate component transition probabilitiesExisting methods
Exploiting design document [Gokhale2007]
• Only static dependencies in SW architecture
Profiling [Goseva2005]
• Complicated filtering of data required
Manual code instrumentation• Can be time-comsuming
© ABB Group April 9, 2023 | Slide 14
Self-coded script
Estimate component transition probabilitiesProfiling with proprietary tools
© ABB Group April 9, 2023 | Slide 15
Example trace from profiling
Set up and ran the system
Which steps are required for ABSRA?
Estimate component failure probabilities
Estimate transition probabilities
Construct the Markov model
Exploit the results
© ABB Group April 9, 2023 | Slide 16
Construct the Markov modelExisting state-based methods
[Littlewood1979]
[Cheung1980]
[Laprie1984]
[Kubat1989]
[Gokhale1998]
[Ledoux1999]
[Gokhale1998-2]
© ABB Group April 9, 2023 | Slide 17
[Goseva-Popstojanova2001]
Cheung modelAdding failure & end states, compute reliability
© ABB Group April 9, 2023 | Slide 18
[Cheung1980]
Which steps are required for ABSRA?
Estimate component failure probabilities
Estimate transition probabilities
Construct the Markov model
Exploit the results
© ABB Group April 9, 2023 | Slide 19
Exploit the resultsPossibilities
Estimate system reliability [Cheung1980]
• Experience by customers hard to validate
Conduct sensitivity analysis [Gokhale2002]
• Study system reliability for varying component failure rates
Assess costs of bugs [Cheung1980]
• Quantify the effect of an error in component
Evaluate design alternatives [Goseva2001]
• Values for new componentes need to be guessed
Allocate test budgets efficiently [Pietrantuono2010]
• Test critical components more often
© ABB Group April 9, 2023 | Slide 20
Sensitivity AnalysisImpact of varying subsystem failure rates
© ABB Group April 9, 2023 | Slide 21
http://www.prismmodelchecker.org/
Evaluation Cost estimations in person hours (best/worst case)
© ABB Group April 9, 2023 | Slide 22
ConclusionsLessons learned
Getting failure and transition probabilities is hard
Time consuming, error-prone, limited automation
Main obstacle for ABSRA is data collection
Currently rather simple models
No technologies, concurrency, hardware
Difficult to evaluate architecture alternatives
Limited decision support from the predictions
Lack of empirical studies in literature
Predominantly small systems
Often dubious techniques for estimating failure rates
Replicated case studies needed
© ABB Group April 9, 2023 | Slide 23
© ABB Group April 9, 2023 | Slide 24