Failure Mode Assumptions and Assumption Coverage David Powell.
-
Upload
edgar-alexander -
Category
Documents
-
view
222 -
download
1
Transcript of Failure Mode Assumptions and Assumption Coverage David Powell.
![Page 1: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/1.jpg)
Failure Mode Assumptions Failure Mode Assumptions and Assumption Coverageand Assumption Coverage
David PowellDavid Powell
![Page 2: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/2.jpg)
Fault-ToleranceFault-Tolerance
Key questionsKey questions– How components may fail?How components may fail?
Prevention strategiesPrevention strategies
– At what rate they may fail? At what rate they may fail? The Amount of redundancy neededThe Amount of redundancy needed
– What are the important type of faults? What are the important type of faults? Types of redundancy neededTypes of redundancy needed
– The relation between dependability, The relation between dependability, redundancy and faults? redundancy and faults? General FT design guidelinesGeneral FT design guidelines
![Page 3: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/3.jpg)
An F-T Paradox/DilemmaAn F-T Paradox/Dilemma
More faultyMore faulty
More redundancyMore redundancy
More possibility of faultsMore possibility of faults
??????
![Page 4: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/4.jpg)
Solution- Some Key StepsSolution- Some Key Steps
Classify, quantify and verify the Classify, quantify and verify the assumptionsassumptions
![Page 5: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/5.jpg)
Type of FailuresType of Failures
![Page 6: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/6.jpg)
OverviewOverview
Single-user serviceSingle-user service– Service ModelService Model– Potential ErrorsPotential Errors
Multiple-user serviceMultiple-user service– Service ModelService Model– Potential ErrorsPotential Errors
![Page 7: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/7.jpg)
Single-user Service ModelSingle-user Service Model
Service items: sService items: sii, i=1,2,…, i=1,2,…
Values of sValues of sii: vs: vsii
Observation time of sObservation time of sii: ts: tsii
Service Model: Service Model:
SSii= = <vs<vsii, ts, tsii>>
An omniscient observerAn omniscient observer
![Page 8: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/8.jpg)
Correctness ModelCorrectness Model
Service item sService item sii is correct iff is correct iff
(vs(vsii SV SVii) ) (ts (tsii ST STii) )
SVSVii and ST and STii are respectively the specified are respectively the specified
sets of values and times for service item ssets of values and times for service item s ii
![Page 9: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/9.jpg)
Potential ErrorsPotential Errors
Arbitrary value error: sArbitrary value error: sii : vs : vsii SV SVii
Noncode error: sNoncode error: sii : vs : vsii CV CV (CV defines a (CV defines a code)code)
Arbitrary timing error: sArbitrary timing error: sii : ts : tsii ST STii
Early timing error: sEarly timing error: sii : ts : tsii < min(ST < min(STii))
Late timing error: sLate timing error: sii : ts : tsii > max(ST > max(STii))
Omission error: sOmission error: sii : ts : tsi i = = Impromptu error: sImpromptu error: sii: (vs: (vsii = = ) ) (ts (tsi i = = ) )
![Page 10: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/10.jpg)
Multi-user Service ModelMulti-user Service Model
Service item sService item sii={s={sii(1), s(1), sii(2),…, s(2),…, sii(n),}(n),}
Service model: <vsService model: <vsii(u), ts(u), tsii(u)>, all i,u(u)>, all i,u
New issues: “consistency”New issues: “consistency”
![Page 11: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/11.jpg)
Correctness ModelCorrectness Model
vsvsii(u)– the value of service item i on process u (u)– the value of service item i on process u vsvsii-- the value of service item i -- the value of service item i SVSVii– the set of specified service item i– the set of specified service item itstsii(u)– the observation time of service item i on process u(u)– the observation time of service item i on process uSTSTii(u) – the range of specified observation time of service (u) – the range of specified observation time of service item i on process uitem i on process uuvuv -- the time bound of related occurrences -- the time bound of related occurrences
![Page 12: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/12.jpg)
Examples of Potential ErrorsExamples of Potential Errors
Consistent value errorConsistent value error
Consistent timing errorConsistent timing error
Semi-consistent value errorSemi-consistent value error
![Page 13: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/13.jpg)
Failure Mode AssumptionsFailure Mode Assumptions
Attempt to formalize the concept of an Attempt to formalize the concept of an assumed failure modeassumed failure modeBy assertions on the sequences of service By assertions on the sequences of service items delivered by a componentitems delivered by a component
![Page 14: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/14.jpg)
Examples of Value Error AssertionsExamples of Value Error Assertions
No value errors occur (VNo value errors occur (Vnonenone))
i , vsi , vsii SV SVii
The only value errors that occur are noncode The only value errors that occur are noncode value errors (Vvalue errors (Vnn))
i , (vsi , (vsii SV SVii) ) (vs (vsii CV CV ))
Arbitrary value error can occur (VArbitrary value error can occur (Varbarb))
i , (vsi , (vsii SV SVii) ) (vs (vsii SV SVi i ))
![Page 15: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/15.jpg)
Examples of Timing Error Examples of Timing Error AssertionsAssertions
No timing error occurs (TNo timing error occurs (Tnonenone))
The only timing errors are omission errors (TThe only timing errors are omission errors (TOO))
The only timing errors are late timing errors (TThe only timing errors are late timing errors (TLL))
The only timing errors are early timing errors (TThe only timing errors are early timing errors (TEE))
Arbitrary timing error can occur (TArbitrary timing error can occur (Tarbarb))
Permanent omission/crash (TPermanent omission/crash (Tpp))
Bounded omission degree (TBounded omission degree (TBkBk))
![Page 16: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/16.jpg)
Timing Error ImplicationsTiming Error Implications
![Page 17: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/17.jpg)
Failure Mode Assertions(FMA)Failure Mode Assertions(FMA)
A complete FMA entails an assertion on A complete FMA entails an assertion on errors occurring on both value and time errors occurring on both value and time domainsdomains
By taking the Cartesian production of the By taking the Cartesian production of the two domains, we get a family of FMAtwo domains, we get a family of FMA
![Page 18: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/18.jpg)
FMA Implication GraphFMA Implication Graph
![Page 19: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/19.jpg)
So what?So what?
The FMA classification and implication The FMA classification and implication graph can serve as a guideline to design graph can serve as a guideline to design families of FT algorithms that can process families of FT algorithms that can process errors in increasing severity!errors in increasing severity!
![Page 20: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/20.jpg)
Assumption CoverageAssumption Coverage
Establishing a link between assumed Establishing a link between assumed component failure mode and system component failure mode and system dependabilitydependability(The design a FT system relies on the (The design a FT system relies on the assumption they make)assumption they make)(The dependability of a FT system is related (The dependability of a FT system is related to the failure mode they assume) to the failure mode they assume)
![Page 21: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/21.jpg)
MotivationMotivation
Components may failComponents may fail
They may fail in a bad way They may fail in a bad way leads to a leads to a violation of assumptions of the systemviolation of assumptions of the system
The system, in turn, can failThe system, in turn, can fail
Question: to what degree can a Question: to what degree can a component FMA prove to be true in the component FMA prove to be true in the real system?real system?
![Page 22: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/22.jpg)
The Coverage of the AssumptionThe Coverage of the Assumption
DefinitionDefinition
P(X) = Pr{ X= true | component failed}P(X) = Pr{ X= true | component failed}
P(VP(Varbarb T Tarbarb) = 1) = 1
P(VP(Vnonenone T Tnonenone) = 0) = 0
![Page 23: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/23.jpg)
Coverage of an FT systemCoverage of an FT system
PS(X) = PS(X) =
Pr{ correct error processing |X= true}Pr{ correct error processing |X= true}
*Pr{ X= true | component failed}*Pr{ X= true | component failed}
![Page 24: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/24.jpg)
Influence of Assumption Influence of Assumption Coverage on System Coverage on System
DependabilityDependability
A Case StudyA Case Study
![Page 25: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/25.jpg)
The System The System
A system of n processorsA system of n processorsConnected via unidirectional message-passing busConnected via unidirectional message-passing busEach processor carries out the same computation stepsEach processor carries out the same computation stepsThe result of each processing step is communicated to The result of each processing step is communicated to all other processorsall other processorsEach process has a decision function (DF)Each process has a decision function (DF)The DF is applied to the results received from other The DF is applied to the results received from other processorsprocessors……Each processor and its associated bus is viewed as a Each processor and its associated bus is viewed as a single componentsingle component
![Page 26: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/26.jpg)
Fail-Silent Processor-busFail-Silent Processor-bus
A fail-silent processor A fail-silent processor – Only has semi-consistent value errorsOnly has semi-consistent value errors– Always produces message on time Always produces message on time – Or ceases to produce messages foreverOr ceases to produce messages forever– If a message is delivered to a processor, it is to be delivered to If a message is delivered to a processor, it is to be delivered to
all processors with consistent fixed delay all processors with consistent fixed delay
![Page 27: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/27.jpg)
Fail-Consistent Processor BusFail-Consistent Processor Bus
Only semi-consistent value errors may occur Only semi-consistent value errors may occur
Faulty processors may send erroneous valuesFaulty processors may send erroneous values
Consistent timing error may occurConsistent timing error may occur
![Page 28: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/28.jpg)
Fail-uncontrolled Processor BusFail-uncontrolled Processor Bus
Arbitrary timing errorArbitrary timing error
Arbitrary value errorArbitrary value error
![Page 29: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/29.jpg)
Implications of Assumption Implications of Assumption CoverageCoverage
Failure mode relationsFailure mode relations
Coverage relationsCoverage relations
![Page 30: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/30.jpg)
Dependability Expressions From Dependability Expressions From Markov ModelsMarkov Models
r = e r = e ––λλtt
λλ = failure rate = failure rate
![Page 31: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/31.jpg)
A Life-critical ApplicationA Life-critical Application
System reliability objective: R > 1-10System reliability objective: R > 1-10-9-9 over over 10 hours10 hours
Single processor reliability: Single processor reliability: – r = er = e--λλtt – 1/1/λλ = 5 years = 5 years
![Page 32: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/32.jpg)
![Page 33: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/33.jpg)
A Money-Critical ApplicationA Money-Critical Application
It is about availability of the system rather It is about availability of the system rather than reliability of the systemthan reliability of the system
Please look at the paper for more detailsPlease look at the paper for more details
![Page 34: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/34.jpg)
Unavailability v.s. CoverageUnavailability v.s. Coverage
![Page 35: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/35.jpg)
ConclusionConclusion
A formalism for describing component A formalism for describing component failure modesfailure modes
Multiplicity of value and timing errorsMultiplicity of value and timing errors
The notion of assumption coverageThe notion of assumption coverage
The relation between dependability, The relation between dependability, availability and assumption coverageavailability and assumption coverage
![Page 36: Failure Mode Assumptions and Assumption Coverage David Powell.](https://reader036.fdocuments.us/reader036/viewer/2022081512/56649f395503460f94c55e41/html5/thumbnails/36.jpg)
Thank youThank you