REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications...

52
1 Brookhaven National Laboratory U.S. Department of Energy REVIEW OF TRADITIONAL METHODS FOR MODELING DIGITAL SYSTEMS Digital I&C Risk TWG Meeting April 11-12, 2007 Alan S. Kuritzky Division of Risk Assessment and Special Projects Office of Nuclear Regulatory Research (301-415-6255, [email protected] ) Tsong-Lun Chu, Gerardo Martinez-Guridi, Meng Yue, and John Lehner Brookhaven National Laboratory

Transcript of REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications...

Page 1: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

1Brookhaven National Laboratory

U.S. Department of Energy

REVIEW OF TRADITIONAL METHODS FOR MODELING DIGITAL SYSTEMS

Digital I&C Risk TWG Meeting

April 11-12, 2007

Alan S. KuritzkyDivision of Risk Assessment and Special Projects

Office of Nuclear Regulatory Research

(301-415-6255, [email protected])

Tsong-Lun Chu, Gerardo Martinez-Guridi, Meng Yue, and John LehnerBrookhaven National Laboratory

Page 2: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

2Brookhaven National Laboratory

U.S. Department of Energy

Outline of Presentation

• Background and objectives

• Review of traditional methods

• Development of criteria for evaluating reliability models of digital systems

• Selection of applications of the methods for review

• Comparison of applications against criteria

• Identification of capabilities and limitations in state-of-the-art of modeling digital systems

• Conclusions, including traditional methods selected

• Supporting information

Page 3: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

3Brookhaven National Laboratory

U.S. Department of Energy

Background

• At present, there are no consensus methods for quantifying the reliability of digital systems

• NRC is conducting research to support the development of regulatory guidance for the use of risk information related to digital systems in licensing actions of nuclear power plants

• NRC is investigating several potential methods for digital system reliability modeling:

• Traditional methods supported by failure modes and effects analysis (FMEA) and data analysis

• Markov models coupled with the cell-to-cell mapping technique supported by advanced digital system test-based methods

• Dynamic flowgraph methodology

Page 4: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

4Brookhaven National Laboratory

U.S. Department of Energy

Overview of Traditional Methods Research

• Previous activities in support of NRC digital Instrumentation and Control (I&C) research:

• Literature search of methods and applications• White papers on issues associated with modeling digital systems• Analyses of failure experience• FMEAs of digital systems

• Current and future activities:• Traditional methods selection (current task)• Candidate method illustration• Pilot application of candidate methods to two digital systems• Integration of digital system models into a PRA

Page 5: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

5Brookhaven National Laboratory

U.S. Department of Energy

Objectives of Current Task

• Develop criteria for evaluating reliability models of digital systems.• These draft criteria could eventually provide input to the technical basis for risk-

informed decision-making.• Review of applications using ‘traditional’ risk methods, such as fault tree

and Markov methods, against the criteria to determine the capabilities and limitations of the state-of-the-art.

• Identify the most promising traditional methods for modeling andquantitatively assessing the reliability of digital systems.

Page 6: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

6Brookhaven National Laboratory

U.S. Department of Energy

Approach

• Review traditional methods for modeling digital systems

• Fault Tree / Event Tree, Markov, SINTEF, Reliability Prediction Methods, NASA (software reliability approach)

• In addition, review a simplified analytical method used for a Japanese ABWR.

• Develop criteria for evaluating digital models• Capture the unique features of digital systems that affect system reliability

• Identify existing applications of the methods• Advanced reactor PRAs, plant specific models

• Identify the capabilities and limitations of the existing applications by comparing them against the developed criteria.

Page 7: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

7Brookhaven National Laboratory

U.S. Department of Energy

Fault Tree / Event Tree (FT/ET) Method

• FT/ET method has become the standard for reliability modeling by the PRA community throughout the world.

• FT/ET has already been used for modeling digital systems.• FT/ET method is flexible because:

• It has been used for a wide variety of applications for many years, such as in computer, aerospace, chemical, and many other industries.

• Its building blocks can be used for constructing models of relevant features of the systems of a nuclear power plant (NPP).

• FT/ET method is powerful because:• It is well-suited to identify detailed failure modes of the plant, represented

by combinations of failures of system components, by combining system models into an overall model of the NPP.

• It can quantitatively evaluate the detailed failure modes of the plant. • Limitations of the FT/ET method:

• It treats timing of events in accident sequences implicitly.• It considers interactions with plant processes implicitly in an approximate

way.

Page 8: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

8Brookhaven National Laboratory

U.S. Department of Energy

Markov Method

• Markov method has been used for modeling NPP systems.• It has also been used for modeling digital systems.

• Markov method is flexible because:• It allows explicit modeling of the different states that a system can reach

during its operation, regardless of the type of system.

• Markov method is powerful because:• A digital system may be able to detect and re-configure during its

operation, i.e., it changes its own configuration. In addition, the system may be repaired and thus return to its original configuration. This method allows explicit and detailed modeling of these re-configurations.

• It explicitly treats failure and repair times within the model.

• Limitations of the Markov method:• The number of states can grow very rapidly usually due to the complexity

of the system, making the analysis of the model very difficult. • It considers interactions with plant processes implicitly in an approximate

way. • Integration with a fault tree / event tree model is not straightforward.

Page 9: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

9Brookhaven National Laboratory

U.S. Department of Energy

SINTEF Method

• SINTEF method is an adaptation of a method specified in international standard IEC 61508 for the Norwegian oil industry. The method uses data that was collected from offshore platforms that is provided in a companion handbook.

• It models a system in terms of a Markov model and solves the model by introducing some simplifying assumptions, such that analytical expressions can be derived.

• It explicitly models fault coverages, and safe and dangerous failures.

• Limitations of the SINTEF method:• It ignores the combinations of failures of components from different

subsystems.• It considers that common-cause failure (CCF) dominates the subsystem

unavailability, and independent random failures of components are not considered.

• The estimates on coverages and hardware failure fraction of the dangerous failure rates, and the beta factors, are based on expert judgment which is not documented.

Page 10: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

10Brookhaven National Laboratory

U.S. Department of Energy

Reliability Prediction Methods (RPMs)

• RPMs, such as PRISM and the Military Handbook 217, are methods for estimating the failure rate of a circuit board in terms of the failure rates of its components.

• A major assumption is that the components of the board are configured in series.

• RPMs are commonly used by the defense and telecommunication industries.

• A capability of the RPMs is estimating the failure rates of components at a detailed level taking into consideration such adjustment factors as operating environment.

• Limitations of the RPMs:• They cannot be used to model systems with configurations of components

in parallel. • The RPMs produce failure rates using some empirical formulas containing

the adjustment factors. However, the technical basis of the formulas and factors is not publicly available.

• They assess point estimates of the failure rates, without considering the data uncertainties.

• The Military Handbook 217 has been criticized as being inaccurate.

Page 11: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

11Brookhaven National Laboratory

U.S. Department of Energy

NASA (Software Reliability Approach)

• The approach in the NASA PRA Procedures Guide presented a framework for considering software failures in a PRA.

• Hardware failure conditions are used to define the boundary conditions for modeling software failures.

• For each boundary condition, a software failure probability should be estimated using a software reliability model.

• Limitations of the NASA approach:• It is only used for including quantitative software reliability measures in a

PRA, and does not address modeling of digital system hardware.

Page 12: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

12Brookhaven National Laboratory

U.S. Department of Energy

Observations from the Review of Methods

• Fault Tree / Event Tree, Markov, and SINTEF methods are general,and some of their applications are reviewed in detail in this study.

• In addition, an application of a simplified analytical method used for a Japanese ABWR is reviewed.

• RPMs can be considered sources of failure data for probabilistic analysis. Applications of the associated methods were not further examined as part of this study.

• The NASA approach is used only for including quantitative software reliability measures in a PRA, and no applications of this approach were reviewed.

Page 13: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

13Brookhaven National Laboratory

U.S. Department of Energy

Considerations Supporting the Model Evaluation Criteria

• The goal of a reliability model of a digital system is to account for those design features that have the potential to affect its reliability, so that the model can be used to:

• assess the system’s reliability, and

• address issues associated with the features.

• The modeling should be supported by an analysis, such as an FMEA, which:

• identifies different failure modes of the components,

• identifies potential ways the failure could propagate,

• identifies potential dependencies, and

• determines how the failures could be detected and mitigated.

Page 14: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

14Brookhaven National Laboratory

U.S. Department of Energy

Considerations Supporting the Review Criteria (2)

• Software failures (including CCF) should be included in the model. Their inclusion should be consistent with the way software failure occurs, i.e., how error triggering events/inputs occur and the operating contexts.

• Modeling dependencies is a basic requirement of a PRA. In addition to using communication networks, digital systems use buses and voting devices, and share hardware in some situations. A model should account for all dependencies, both within the digital system and with other plant systems and equipment.

Page 15: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

15Brookhaven National Laboratory

U.S. Department of Energy

Considerations Supporting the Review Criteria (3)

• Time-dependent interactions include those between an I&C system and the controlled plant physical processes. There does not appear to be consensus in the PRA technical community about the characteristics of systems (analog or digital) that must be treated by considering time-dependent interactions explicitly.

• Human errors, such as the introduction of faults during upgrading hardware or software, and errors arising from inadequate man-machine interfaces, should be taken into account.

Page 16: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

16Brookhaven National Laboratory

U.S. Department of Energy

Considerations Supporting the Review Criteria (4)

• An advantage of digital systems is their capability to self-test on line and the potential to mitigate detected failures. Ideally, realistic modeling has to take into consideration:

1) the test is usually not capable of detecting all possible failures,

2) a self-detected failure has to be transmitted to other parts of the system in order for the system to respond to it,

3) if a different component, e.g., a watchdog timer, is used to detect failures of a component, then this dependency has to be modeled,

4) a self-test feature can cause a failure, and

5) a fault tolerant feature that is already built into the failure data should not also be credited in the reliability model.

Page 17: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

17Brookhaven National Laboratory

U.S. Department of Energy

Considerations Supporting the Review Criteria (5)

• Quality data (e.g., applicable, source provided, parameter estimation documented) should be provided, especially in the case of modeling fault tolerance features and CCFs.

• Uncertainty analyses

• Modeling - Important assumptions and sources of uncertainties should be identified.

• Parameter - Uncertainty assessment of failure parameters should be performed with the uncertainties propagated.

• Ideally, a reliability model of a digital system should be easily integrated with the existing PRA, such that the dependencies of the digital system and the rest of the modeled plant are properly accounted for in the PRA.

Page 18: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

18Brookhaven National Laboratory

U.S. Department of Energy

Evaluation Criteria of Reliability Models of Digital Systems

• Eight main categories of criteria were identified, and a total of 48 detailed criteria were developed.

• Each criterion consists of a background discussion and a requirement description.

• The relative importance of individual criteria varies. • This variation will need to be evaluated when they are considered as input to the

technical basis for risk-informed decision making.

Page 19: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

19Brookhaven National Laboratory

U.S. Department of Energy

Eight Categories of Review Criteria

1 Level of Detail of the Probabilistic Model

2 Identification of Failure Modes of the Digital System

3 Modeling of Software Failures

4 Modeling of Dependencies

5 Modeling of Human Errors

6 Ease of Integration with a PRA Model

7 Probabilistic Data

8 Documentation and Results

Page 20: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

20Brookhaven National Laboratory

U.S. Department of Energy

Example Review Criteria

Criterion:

2.2 Are the failure modes of features, such as communication, voting, and synchronization, identified to support modeling?

Important because:

• These design features are potential sources of dependencies between redundant channels and between systems

• To a large extent these design features are unique to digital systems

Page 21: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

21Brookhaven National Laboratory

U.S. Department of Energy

Example Review Criteria

Criteria:

7.1 Were the data obtained from the operating experience of the same component being evaluated?

7.4 If generic data is used, is it of the same generic type of component?

Important because:

• It is desirable to use data that represents realistically the failure characteristics of the component being evaluated

• Generic data may not be fully applicable to the component being evaluated

• Generic data have large uncertainties

Page 22: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

22Brookhaven National Laboratory

U.S. Department of Energy

Selection of Applications of the Methods for Review

• Relevance to domestic nuclear industry.

• Availability of documentation.

• Inclusion of applications of fault tree, Markov, and SINTEF methods.

Page 23: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

23Brookhaven National Laboratory

U.S. Department of Energy

Six Applications Compared Against Criteria

• Advanced reactor vendor PRAs. Fault tree models for

• AP1000

• ESBWR

• Plant-specific fault tree model for

• ESFAS of Korean National Standard Plant (Westinghouse 80+ design)

• Simplified model of RPS and ESFAS of a Japanese ABWR

• Markov model of Tricon platform

• Example of SINTEF method.

Page 24: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

24Brookhaven National Laboratory

U.S. Department of Energy

Six Applications Compared Against Criteria (2)

• The characteristics of the digital system model were compared to the criteria.

• No attempt was made to validate the models.

• A description of the system, if applicable, was prepared.

• A summary of comments on each application as it relates to the review criteria was prepared.

Page 25: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

25Brookhaven National Laboratory

U.S. Department of Energy

Six Applications Compared Against Criteria (3)

• Each application was evaluated against each criterion to determine if the application satisfied the criterion.

• The evaluation involved considerable judgment so it was fairly subjective.

• The way the applications satisfied each of the 48 criteria represents the current state-of-the-art.

• The maximum number of criteria satisfied by any one application was 16 (out of 48).

• Twenty-one criteria were not addressed by any of the applications; ninecriteria were only addressed by one.

• The 3 FT/ET models satisfied the highest number of criteria.

Page 26: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

26Brookhaven National Laboratory

U.S. Department of Energy

Observations Common to All Applications Reviewed

• Main strengths of applications• CCFs of hardware within a system were usually modeled. However, data for

CCF of digital components appears scarce.• Individual failures and CCFs of software were explicitly included in the logic

model. However, quantification of these failures is still an issue.• Main limitations of applications

• Lack of understanding of possible failure modes and effects.• Lack of applicable failure parameter data.• Inadequate quantitative software reliability methods.• Inadequate treatment of uncertainties.

Page 27: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

27Brookhaven National Laboratory

U.S. Department of Energy

Application-Specific Observationson the Modeling of the AP1000

• Application of FT/ET method.

• Main strengths:• Software failures were explicitly included in the logic model.• CCF was modeled across the boundaries of systems.• System models were integrated with the overall PRA model. • Propagation of data uncertainty was carried out.

• Main limitations:• The model was developed at a higher level than that of the microprocessor.

• Main areas of uncertainty due to lack of information:• Documentation on identifying failure modes was not available. • The documentation indicates that a model for quantifying software reliability was

employed.• No further information about this model was available.

Page 28: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

28Brookhaven National Laboratory

U.S. Department of Energy

Application-Specific Observationson the Modeling of the ESBWR

• Application of FT/ET method.

• Main strengths:• Software CCFs were explicitly included in the logic model.• Modeling of voting logic.• System models were integrated with the overall PRA model.

• Main limitations:• The model was developed at a higher level than that of the microprocessor.• Identification of failure modes, e.g., using FMEA, was not carried out.

• Main areas of uncertainty due to lack of information:• Estimation of software CCF parameters was not documented. • The basis for the value of fault coverage was not documented.

Page 29: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

29Brookhaven National Laboratory

U.S. Department of Energy

Application-Specific Observationson the Modeling of the KSNPP

• Application of FT/ET method.

• Main strengths:• Modeling was developed at the microprocessor level. • Logic modeling of fault-tolerant features was performed. • System models were integrated with the overall PRA model.

• Main limitations:• Software failures were not included in the logic model.

• Main areas of uncertainty due to lack of information:• Documentation on identifying failure modes was not available.

Page 30: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

30Brookhaven National Laboratory

U.S. Department of Energy

Application-Specific Observationson the Modeling of the SINTEF Example

• Application of SINTEF method.

• Main strengths:• Software failures were explicitly included in the logic model.• Fault coverages and safe and dangerous failures were explicitly modeled.

• Main limitations:• No modeling of dependencies on support systems.• The data from the SINTEF data handbook is from operating experience of off-

shore platforms. Hence, their applicability to the nuclear industry is questionable.

• The estimates on coverages and hardware failure fraction of the dangerous failure rates, and the beta factors, were based on expert judgment which is not documented.

• Main areas of uncertainty due to lack of information:• The documentation available does not contain detailed information about the

design and modeling of fault-tolerant features. • For example, the basis for modeling coverage of many components was not provided.

Page 31: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

31Brookhaven National Laboratory

U.S. Department of Energy

Application-Specific Observationson the Tricon Model

• Application of Markov method.

• Main strengths:• Modeling was developed at the microprocessor level.• Both spurious trip and failure on demand were modeled. • Fault coverages and safe and dangerous failures were explicitly modeled.

• Main limitations:• Software failures were not included in the logic model.• Dependencies on support systems were not modeled.• A data bus (Tribus) is a potential single failure of the Tricon module. Its failure

was not considered.• The Markov modeling was developed for a single module, not a complete

system.• Main areas of uncertainty due to lack of information:

• The documentation available does not contain detailed information about the design and modeling of fault-tolerant features.

• For example, the basis for modeling coverage of many components was not provided.

Page 32: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

32Brookhaven National Laboratory

U.S. Department of Energy

Application-Specific Observationson the Modeling of the Reviewed ABWR

• Application of a simplified analytical method.

• Main strengths: • Software failures were explicitly included in the logic model.

• Main limitations:• Simplified modeling was performed.• Dependencies on support systems were not modeled.

• Main areas of uncertainty due to lack of information:• A model for quantifying software reliability was employed, but its technical basis

was not provided.

Page 33: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

33Brookhaven National Laboratory

U.S. Department of Energy

Main Limitations in the State-of-the-Art Based on the Applications Reviewed

• Limitations that appeared in all applications:

• Lack of understanding of possible failure modes and effects.

• Lack of applicable failure parameter data.

• Inadequate quantitative software reliability methods.

• Other limitations were found that can be overcome by careful application of the methods, e.g., inadequate treatment of uncertainties.

• The limitations identified are associated with supporting analyses for, or implementation of, the methods applied, and are not necessarily limitations of the methods themselves.

Page 34: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

34Brookhaven National Laboratory

U.S. Department of Energy

Lack of Understanding of Possible Failure Modes and Effects

• The level of detail of the models did not appear to be appropriate to model failure modes that have been observed in current digital I&C applications.

• Potential failures due to use of communication network, voting, synchronization, e.g., inter-channel communication, were not considered.

• Propagation of failures through interconnections within a digital system and with the rest of the plant was not considered.

• Basis for effectiveness of fault tolerance features, e.g., self-diagnostics, watchdog timers, and surveillance tests, was not provided.

Page 35: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

35Brookhaven National Laboratory

U.S. Department of Energy

Lack of Applicable Digital Failure Parameter Data

• Raw failure data is not publicly available, e.g., proprietary manufacturer data.

• Estimated hardware failure parameters are based on proprietary data and the analysis is not publicly documented, e.g., Advanced reactor PRAs and Reliability Prediction Methods.

• Data extracted from PRISM have large variability, and BNL estimated failure rates with very large error factors.

• Important parameters, such as hardware failure rates, CCF parameters and fault coverages are scarce.

• In some cases, the applications derived some parameters using judgment without any additional documentation.

Page 36: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

36Brookhaven National Laboratory

U.S. Department of Energy

Inadequate Quantitative Software Reliability Methods

• National Research Council recommended that software failures be included in a reliability model.

• The comparison of the applications to the criteria further confirmed that no commonly accepted quantitative software reliability methods exist for safety critical applications.

Page 37: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

37Brookhaven National Laboratory

U.S. Department of Energy

Inadequate Treatment of Uncertainties

• Modeling uncertainty:

• Some assumptions were made in developing the reliability models without providing their basis. Examples are:

• Assuming that digital features such as data networks and buses do not contribute to system failure.

• SINTEF’s assumption that hardware CCF dominates system unreliability, and that individual failures do not need to be modeled.

• Use of “fault coverage” may give double credit to fault tolerant features.

• Parameter uncertainty:

• Only one application assessed parameter uncertainties. This application also propagated the uncertainties.

Page 38: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

38Brookhaven National Laboratory

U.S. Department of Energy

Conclusions

• A detailed set of criteria was developed to assess the PRA models of digital systems.

• The review criteria are applicable to all reliability models of digital systems, and can be used to support:

• Development of a regulatory guide (RG) that is specific to digital systems.• Update of general PRA guidance to address digital systems.

• The criteria were applied to six applications of four traditional reliability modeling methods, and the applications were assessed to the extent they satisfied the criteria.

Page 39: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

39Brookhaven National Laboratory

U.S. Department of Energy

Conclusions (2)

• Limitations that appear applicable to all applications:

• Lack of understanding of possible failure modes and effects.

• Lack of applicable failure parameter data.

• Inadequate quantitative software reliability methods.

• The evaluation of the applications revealed limitations in the way methods are applied, e.g., uncertainty analysis.

• Fault tree / event tree (FT/ET) and Markov methods were selected as the most powerful and flexible traditional methods for modeling digital systems

• The methods themselves do not inherently have the limitations of the applications studied.

• It may be possible using FT/ET and Markov methods to develop reasonable digital system reliability models if the limitations above are overcome.

Page 40: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

40Brookhaven National Laboratory

U.S. Department of Energy

Next Steps

• External peer review of the criteria for evaluating reliability models of digital systems and the comparison of the selected applications against the criteria.

• The application of traditional FT/ET and Markov methods to two digital systems using the insights from this review and the best features from the current state-of-the-art.

Page 41: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

41Brookhaven National Laboratory

U.S. Department of Energy

Supporting Information

• The remaining slides provide information supporting the presentation:

• Areas identified that would enhance the state-of-the-art of modeling digital systems

• Summary of review of applications of traditional methods

Page 42: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

42Brookhaven National Laboratory

U.S. Department of Energy

Areas That Would Enhance the State-of-the-Art

• Development of methods for defining and identifying failure modes and effects of digital systems.

• How failure modes would propagate from their sources to the rest of the system and other systems of the plant.

• Consideration of communication networks, voting and synchronization.

Page 43: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

43Brookhaven National Laboratory

U.S. Department of Energy

Areas That Would Enhance the State-of-the-Art (2)

• Development of methods and parameter databases for modeling

1) all relevant digital components,

2) breakdown of failure rates into failure modes.

3) self-diagnostics, reconfiguration, and surveillance,

4) use of other components to detect failures, e.g., watchdog timers and microprocessors, and

5) the fraction of failures that can be detected.

Page 44: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

44Brookhaven National Laboratory

U.S. Department of Energy

Areas That Would Enhance the State-of-the-Art (3)

• Investigating the issue of potential double crediting in modeling fault tolerant features such as self diagnostics.

• Developing better databases for hardware CCFs of digital components.

• Developing software reliability methods for quantifying the likelihood of failures of application and support software.

• Developing methods for modeling software CCF across system boundaries due to common support software.

• Improving methods for considering modeling uncertainties associated with modeling of digital systems.

Page 45: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

45Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Level of Detail and Failure Modes

NNN?NN2.3 Is justification provided that the design requirements of the digital system are unambiguous, complete and consistent?

NNNNN?2.2 Are the failure modes of features, such as communication, voting, and synchronization, identified to support modeling?

N?N???2.1 Is a method used for identifying the failure modes of a digital system?

2. IDENTIFICATION OF FAILURE MODES OF THE DIGITAL SYSTEM

NNAYNANN1.3 Is the model at a higher level than the microprocessors, but with realistic data?

?YNYNN1.2 Is the level of detail at the level of the microprocessors?

NNNNNN1.1 Does the level of detail capture all the system’s design features that affect the system’s reliability?

1. LEVEL OF DETAIL OF THE PROBABILISTIC MODEL

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 46: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

46Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Modeling of Software Failures

NNANNAN?3.4 Does the software reliability model take into consideration the context/boundary conditions?

NNANNAN?3.3 Does the PRA model “application” and “support” software?

YNANNAYY3.2 Is the way software failures are modeled consistent with the way software failures occur?

YNYNYY3.1 Are software failures explicitly modeled?

3. MODELING OF SOFTWARE FAILURES

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 47: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

47Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Modeling of Dependencies (1)

?NANANAN?4.2.4 Are CCFs of software in different systems modeled?

?NANANANY4.2.3 Are CCFs of hardware components in different systems modeled?

YN?NYY4.2.2 Are CCFs of software within a system modeled?

YNYYYY4.2.1 Are CCFs of hardware within a system modeled?

?NANAN?NA4.1.4 Is inter-system failure propagation modeled?

?NNN?N4.1.3 Is intra-channel failure propagation modeled?

NNANNNN4.1.2 Is inter-channel failure propagation modeled?

NANNNNN4.1.1 Are failures in communication networks/buses modeled?

4. MODELING OF DEPENDENCIES

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 48: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

48Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Modeling of Dependencies (2)

NNNY??4.6.4 Did the PRA consider the possibility that a fault-tolerant feature may fail to operate properly?

NANNY?NA4.6.3 If the detection of a failure of a component depends on other components, was this type of dependency modeled?

?NNYN?4.6.2 Did the PRA not give double credit to the fault-tolerant features?

NNNYNY4.6.1 Are the failure modes that the fault-tolerant features can detect identified?

NNNNY?4.5.2 Are failures related to the components performing voting modeled?

NYYYYY4.5.1 Are sensor failures modeled?

NNANAYN?4.4 Are dependencies due to sharing of hardware considered?

NNNNNY4.3.2 Are the dependencies of a system on cooling modeled?

N?NYYY4.3.1 Are the dependencies of a system on the electrical power buses modeled?

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 49: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

49Brookhaven National Laboratory

U.S. Department of Energy

Review Summary: Modeling of Human Errors and Ease of Integration with a PRA Model

?NANAYYY6.2 If a model of a system has been integrated with a PRA, have all relevant dependencies been properly accounted for?

?NYYYY6.1 Can the model be efficiently integrated with a PRA?

6. EASE OF INTEGRATION WITH A PRA MODEL

NNNN??5.2 Are failures related to the man-machine interfaces modeled?

NAN?NNAN5.1 Are failures when upgrading a system modeled?

5. MODELING OF HUMAN ERRORS

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 50: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

50Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Probabilistic Data (1)

YYNAYYNA7.6 If generic data is used, were their sources documented?

YNNA?YNA7.5 If generic data is used, was it collected from an operating environment similar to that of a nuclear power plant?

NNNANYNA7.4 If generic data is used, is it of the same generic type of component?

NANANNANAN7.3 Is the method used in estimating the parameters documented?

NANAYNANAY7.2 Were the sources of raw data documented?

NANA?NANA?7.1 Were the data obtained from the operating experience of the same component being evaluated?

Probabilistic Data for Hardware

7. PROBABILISTIC DATA

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 51: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

51Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Probabilistic Data (2)

NNNN??7.12 Was parameter uncertainty properly analyzed?

NNANNANN7.11 Probabilistic Data for Software: Is the method used to assess parameters of software failure, such as failure probability, technically justified?

Probabilistic Data for Software

YNANNY?7.10 Is the model used for evaluating the probability of failure of basic events provided?

?NNN??7.9 Was the data for modeling fault-tolerant features, e.g., coverage, properly analyzed?

NN?N??7.8 Was the data for CCF properly analyzed?

NNYYYN7.7 Did the data consider the operating environment (such as temperature and humidity)?

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion

Page 52: REVIEW OF TRADITIONAL METHODS FOR MODELING …informed decision-making. • Review of applications using ‘traditional’ risk methods, such as fault tree and Markov methods, against

52Brookhaven National Laboratory

U.S. Department of Energy

Review Summary:Documentation and Results

748141615Total Number of Satisfied Criteria

NNNN?Y8.5 Was the propagation of uncertainty done properly?

NNNNNN8.4 Does the documentation discuss uncertainty in modeling?

N???NN8.3 Are the dominant failure modes of the reliability model documented?

NNN?YY8.2 Are assumptions made in developing the PRA realistic and technically justified?

NYYYYY8.1 Are the assumptions made in developing the logic model and probabilistic data documented?

8. DOCUMENTATION AND RESULTS

Japan

ABWR

TriconSINTEFKSNPPESBWRAP1000Criterion