Dynamic Risk Analysis Using Alarm Databases to Improve Process …€¦ · · 2017-11-07the...

Dynamic Risk Analysis Using Alarm Databasesto Improve Process Safety and Product Quality:

Part II—Bayesian AnalysisAnkur Pariyani and Warren D. Seider

Dept. of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA 19104

Ulku G. OktemRisk Management and Decision Processes Center, Wharton School, University of Pennsylvania,

Philadelphia, PA 19104

Masoud SoroushDept. of Chemical and Biological Engineering, Drexel University, Philadelphia, PA 19104

DOI 10.1002/aic.12642Published online May 23, 2011 in Wiley Online Library (wileyonlinelibrary.com).

Part II presents step (iii) of the dynamic risk analysis methodology; that is, a novelBayesian analysis method that utilizes near-misses from distributed control system(DCS) and emergency shutdown (ESD) system databases—to calculate the failureprobabilities of safety, quality, and operability systems (SQOSs) and probabilities ofoccurrence of incidents. It accounts for the interdependences among the SQOSs usingcopulas, which occur because of the nonlinear relationships between the variables andbehavior-based factors involving human operators. Two types of copula functions, mul-tivariate normal and Cuadras–Auge copula, are used. To perform Bayesian simulation,the random-walk, multiple-block, Metropolis–Hastings algorithm is used. The benefitsof copulas in sharing information when data are limited, especially in the cases ofrare events such as failures of override controllers, and automatic and manual ESDsystems, are presented. In addition, product-quality data complement safety data toenrich near-miss information and to yield more reliable results. Step (iii) is applied toa fluidized-catalytic-cracking unit (FCCU) to show its performance. VVC 2011 American

Institute of Chemical Engineers AIChE J, 58: 826–841, 2012

Keywords: abnormal events, near-misses, copulas, unsafe incidents, process safety,product quality, Bayesian theory, Monte-Carlo simulation, probabilistic risk analysis,petroleum refinery

Introduction

Despite the significant improvements in the reliability ofinstruments and safety practices in the chemical processindustries (CPIs) over the last two decades, the potential forserious accidents has increased because of the operation of

chemical plants at significantly higher production capacities.Consequently, there is an increased interest in improvingrisk-assessment techniques. This is especially challengingbecause accidents are rare events, having very low frequen-cies of occurrence. Normally, the probabilities of events orincidents are estimated using observed frequencies, but forrare events, data are too sparse.

Alternatively, estimates of the probabilities of the occur-rence of accidents are computed using probabilistic risk anal-ysis (PRA) based on Bayesian theory. Since the late 1970s

Correspondence concerning this article should be addressed to W. D. Seider [email protected].

VVC 2011 American Institute of Chemical Engineers

826 AIChE JournalMarch 2012 Vol. 58, No. 3

and early 1980s,1–4 this technique has received much atten-tion, particularly in the nuclear industry.5–19 A system (orplant) is decomposed into individual components—for exam-ple, power generators, pumps, and heat exchangers—andtheir failure probabilities are estimated using statisticalanalysis and expert knowledge. The latter is especially help-ful when the data are limited.20–23

The dependencies among the systems in chemical plantsare crucial. Often, there are direct causal relationshipsbetween the variables (e.g., temperature and pressure in achemical reactor), or lurking variables governing interde-pendences (e.g., human operators or electrical power inter-rupts—affecting several adjacent equipment items). Conse-quently, to obtain more practical and reliable risk estimates,it is important to account for these dependencies. Clearly,the explicit modeling of some dependencies is difficult—especially as humans exhibit behavior-based factors thatstrongly influence their actions, especially during upsets oremergencies. Besides psychological factors, physiologicalfactors (work conditions, weather, etc.) often play a signifi-cant role.24

The use of accident sequence precursor (ASP) data inPRA incorporates these dependencies among the componentparts.14,25–30 To facilitate this, copulas,31,32 which are multi-variate functions used to model the joint probability distribu-tion of the random variables, have been utilized to model thedependencies among different components of the sys-tem.25,26,33–36 The copula functions have many attractive fea-tures, as will be discussed in the subsections on SQOS Inter-actions and Copulas, Copulas, and Multivariate Normal andCuadras–Auge Copulas—especially in permitting the combi-nation of univariate marginal distributions from different dis-tribution families through their correlations. Although widelyused in banking and financial sectors, their use in the riskassessment of chemical and nuclear plants has been verylimited. Yi and Bier26 and Danaher and Smith37 presentexcellent case studies on the benefits of copulas in modelingthe dependences between the random variables in nuclearplants and marketing, respectively.

In our previous work,25,33 Bayesian analysis using copulaswas used for dynamic risk analysis to estimate the failureprobabilities of various critical incident scenarios for chemi-cal plants using hypothetical accident precursor data inresponse to abnormal events. Prior distributions wereupdated dynamically with incident consequence data to yieldmean failure and incident probabilities (using Monte-Carlointegrations) for fixed correlation matrices.

This article, Part II, presents step (iii) of the three-stepdynamic risk analysis methodology. First, in Part I, methodswere introduced to transform the large distributed controlsystem (DCS) and emergency shutdown (ESD) alarm data-bases into event-trees and set-theoretic formulations. In thispart, a novel Bayesian analysis method using copulas is pre-sented that utilizes the near-miss information in set-theoreticformulations, to yield risk levels in chemical plants and esti-mates for failure probabilities of the safety, quality, andoperability systems (SQOSs) as well as their correlation ma-trix. The Bayesian model accounts for the interdependencesamong the SQOSs using copulas, which occur because ofthe nonlinear relationships between the variables and behav-ior-based factors involving human operators. As in Part I,

the DCS and ESD databases for a large-scale fluidized-cata-lytic-cracking unit (FCCU) are utilized over a study perioddivided into 13 equal-time periods. The unit has about 150–200 alarmed variables and as many as 5000–10,000 alarmoccurrences per day—as a result of 500–1000 abnormalevents daily. For more specifics, see the Case Study 2 inPart I.

The organization of the remainder of this article is as fol-lows. The section entitled Preliminaries introduces theBayesian model and copulas. The elements of the Bayesiansimulation using the multiple-block, Metropolis–Hastingsalgorithm are presented in the Bayesian Simulation usingRandom-Walk, Multiple-Block, Metropolis–Hastings Algo-rithm. Next, the section entitled Results includes a discussionon the selection of the copulas, and contrasting engineeringand statistical perspectives, followed by the Conclusions.

Preliminaries

Classical methods, including maximum likelihood estima-tion,38 focus on the calculation of point estimates. Often,they unreliably estimate the occurrence of rare events due tolimited data. Alternatively, in PRA using Bayesian theory,estimates of incident frequencies (with uncertainties) areobtained. For every component, such as sensors and control-lers, failure probabilities are estimated using failure dataover many years and incident frequencies are modeled as afunction of the failure rates of the individual components.But, because incidents normally involve interacting compo-nents, the failure probabilities of the individual componentsare normally inadequate to reliably estimate the probabilitiesof incidents. To be effective, PRA requires an explicit repre-sentation of the dependencies between components, whichmay not always be feasible, resulting in biases in the esti-mated incident frequencies.

Consequently, herein, in the Bayesian analysis, prior/expert knowledge is used and interactions of the SQOSs aremodeled. In addition, vast amounts of near-miss data in DCSand ESD system databases, over extended periods of time,are utilized to obtain dynamic updates of the SQOS failureprobabilities and more reliable probability estimates of rareincidents. As will be shown, the risks associated with low-probability, high-consequence incidents are predicted moreaccurately.

In this section, the basics of the Bayesian analysis are pre-sented, beginning with a brief review of Bayesian theory,followed by the probability distributions for the SQOSs. Thecausal relationship between the SQOSs is outlined and theuse of copulas is discussed. In addition, the likelihood, pos-teriors, and priors for the Bayesian model are presented.

Bayesian theory

Bayesian analysis is a statistical approach to reasoningunder uncertainty. Herein, the uncertainty associated withthe failure probability of a SQOS, h, is modeled initiallyusing a probability density function, f(h), called the prior dis-tribution. An improved failure probability distribution,f(h|Data), called a posterior distribution (expressed in Eq. 1),is inferred using the near-miss data obtained from the DCSand ESD databases. On the basis of Bayes rule, f(h|Data) is

AIChE Journal March 2012 Vol. 58, No. 3 Published on behalf of the AIChE DOI 10.1002/aic 827

proportional to the distribution of data conditional on h,g(Data|h), the likelihood, multiplied by the prior distributionof h, f(h)

f ðhjDataÞ ¼ gðDatajhÞf ðhÞRgðDatajhÞf ðhÞdh (1)

In general, all Bayesian analyses begin with prior proba-bility distributions for unobservable parameters (typicallydenoted as h), normally based on prior or expert information.Then, using the data, the priors are updated to give posteriordistributions. Clearly, as more data are added, the influenceof the prior distribution on the resulting posterior distributiondecreases.

Similarly, in the Bayesian analysis herein, the failureprobabilities of the SQOSs are estimated with the help ofprior knowledge and shared information from other SQOSs(using copulas). The near-miss data, in the set-theoretic for-mulation in Part I, are used to update the prior failure proba-bilities to obtain posterior failure probabilities. The latterprobabilities permit a projection of the frequency of inci-dents. With more data points, estimates of their posteriorfailure probabilities become more accurate and certain.

Also, the method of conjugate priors is used to model theunobservable parameters. Their posterior distributions belongto the same family of distributions as their prior distribu-tions. Here, the prior distributions are called conjugate priorsto their likelihood distributions. For example, the Gammadistribution is a family of conjugate priors to the Poissondistribution. For more details on Bayesian theory, refer toGelman et al.39 and Bolstad.40

In addition, the posterior estimates of the correlation ma-trix of the SQOSs, denoted as W, are computed herein. TheBayesian simulation to obtain the marginal posterior distri-butions for hs and W is carried out using the Metropolis–Hastings algorithm. The algorithm to implement these stepsis shown in Figure 1. The results of the statistical analysisare obtained using the R and MATLAB packages.

Probability distributions for the SQOSs

As discussed in Part I, five SQOSs are defined for a typicalchemical plant. They are (a) the basic process control system(BPCS) or SQOS1, (b) operator (machine þ human) correc-tive actions, level I, or SQOS2, (c) operator (machine þhuman) corrective actions, level II, or SQOS3, (d) the overridecontroller, or SQOS4, and (e) automatic ESD, or SQOS5.

The number of abnormal events in a time period is thefailure count of the first SQOS (i.e., the BPCS denoted asSQOS1) in that period. Because the abnormal events in anyperiod occur with an average rate, and independently of thelast period, herein, the number of abnormal events in timeperiod t, nt, is assumed to follow a Poisson distribution withthe (average) failure rate of SQOS1 denoted as h1. That is

nt � Poissonðh1Þ (2a)

and

f ðntjh1Þ / ðh1Þnte�ðh1Þðlikelihood distributionÞ (2b)

Likelihood data on nt can be the number of abnormalevents for all the primary variables, nT (see Table 5, Part I),or the number of abnormal events for pP1, npP1

(see Table 6,Part I) or the number of abnormal events for all the primaryprocess variables, npP (see Table 7, Part I). As discussedabove, the Gamma family of distributions is the conjugatefamily for Poisson observations (i.e., a Gamma prior whencombined with Poisson likelihood leads to a Gamma poste-rior). Therefore, to simplify calculations, h1 is assumed tohave a Gamma prior distribution with a1 and b1 as shape para-meters

f ðh1Þ / ðh1Þa1�1e�b1h1 (3)

For the other four SQOSs, as the outcomes of their actionson any abnormal event can have one of two possible out-comes, ‘‘success’’ and ‘‘failure,’’ they are modeled as

Figure 1. Overall schematic of the Bayesian analysis framework.

828 DOI 10.1002/aic Published on behalf of the AIChE March 2012 Vol. 58, No. 3 AIChE Journal

independent and identical Bernoulli trials with failure proba-bilities, hj, j ¼ 2,…, 5. The prior distributions for hj areassumed to be Beta distributions (conjugate priors) with theshape vector [aj, bj], j ¼ 2,…, 5.

SQOS interactions and copulas

The interactions between the performances of the SQOSs(measured as failure rates or probabilities) are due to the fol-lowing.

Nonlinear Relationships Between the Variables. Asdiscussed in Part I and by Pariyani et al.,41 because of theinterdependent relationship between the variables of a pro-cess, the effects of the controlling actions of the SQOSs onvariables/group of variables get channeled to other variables/groups of variables, thereby, influencing the controllingactions of the other SQOSs. For example, the impact of cor-rective actions of the override controllers on the primary var-iables get channeled to the secondary variables, which returnthe majority of the secondary variables to their green-beltzones (or normal operating ranges)—this clearly improvesthe failure probabilities of the corrective actions of thehuman operators.

Behavior-Based Factors. Stressor performance-shapingfactors24,41 associated with human operators are likely toinfluence the performances of human-operated SQOSs.These, in turn, influence other SQOSs through the nonlineareffects of their variables. For example, increases in the fail-ure rates of the BPCSs, h1, are likely to deteriorate the per-formances of the operators (i.e., increase h2 and h3). These,in turn, could counter the corrective actions of SQOS1 andSQOS4. As SQOS5, the ESD system, also involves humaninterventions, its failure probability, h5, is also likely to beinfluenced by changes in h1, h2, h3, and h4.

Herein, these causal relationships between the SQOSs areaccounted for using a dependence matrix, which consists ofpairwise interaction parameters.

Because the random variables, hk (k ¼ 1, 2,…, 5), are notindependent, with their marginal distributions belonging todifferent families, copulas are used to describe their jointdistribution. Note that for these multivariate distributions,given a parent parametric distribution function, the marginalsare derived by integration. Consequently, the marginals arefrom the same family; for example, the marginals of a multi-variate normal distribution are also normal. By contrast, withcopula modeling, the marginals, which need not be from thesame family, are combined and interactions between the ran-dom variables are accounted for using the dependence ma-trix.31,37

To describe the dependences between the related randomvariables, measures of dependences, such as rank-correlationmeasures (e.g., Spearman’s q and Kendall’s s), which areinsensitive to the marginal distributions of the raw data,must be used. For example, when random variables are notnormally distributed, Pearson’s or linear correlations (whichassume normality) may no longer be meaningful measures ofdependence. The important feature of copula modeling isthat it permits the use of rank-correlation measures.

Another advantage of the dependence matrix is that it ena-bles the random variables to interact among each other andshare information. This is particularly important when data

on random variables are limited; for example, associatedwith rare events. In such situations, information associatedwith other random variables is channeled through the ele-ments of dependence matrix (i.e., interaction parameters).Clearly, the choices of copulas determine the dependencesamong random variables, as will be discussed in the sectionsentitled Copulas, Likelihood, Joint Posterior and Prior Distri-butions, and Multivariate Normal and Cuadras–Auge Copu-las, and the Results.

Copulas

A copula is a function that links a multivariate cumulativedistribution to its univariate cumulative marginals. Statedformally, according to the Sklar theorem,42 given a joint cu-mulative distribution function (CDF), F(h1, h2,…, hN), forthe random variables, h1, h2,…, hN, with marginal CDFs,F1(h1), F2(h2),…, FN(hN)), F can be written as a function ofits marginals

Fðh1; h2;…;hNÞ ¼ C½F1ðh1Þ;F2ðh2Þ;…;FNðhNÞ� (4)

where C is the copula (function). Given that Fk and C aredifferentiable, the joint density distribution, f(h1, h2,…, hN),can be written as

f ðh1; h2;…;hNÞ ¼ cðF1ðh1Þ;F2ðh2Þ;…;FNðhNÞÞ �YNk¼1

fkðhkÞ

(5)

where fk(hk) is the density distribution corresponding to Fk(hk)(denoted as uk hereafter), and c ¼ qNC/(qu1…quN), quN iscalled the copula density distribution.

Therefore, when hk, k ¼ 1,…, N, are independent, c ¼ 1and f ðh1; h2;…;hNÞ ¼

QNk¼1 fkðhkÞ. Here, the copula is

C½u1; u2;…;uN � ¼ u1u2…uN , also known as the ‘‘independ-ence copula.’’ Note that no restrictions are placed on themarginal distributions; for example, a bivariate distributioncan be constructed using beta and gamma distributions. Thecopulas do not impact the marginal distributions of the asso-ciated random variables and only model the dependencesbetween them—they have no other role.

Several copula types have been reported.31,32,43,44 Here,three types are reviewed:

Elliptical copulas are derived from common multivariate(elliptical) distributions, which can be extended to arbitrarydimensions. They have many parameters, which facilitatedata regression. The Gaussian copula (derived from the mul-tivariate normal distribution) and the t-copula (derived frommultivariate Student’s t-distribution) are two of the mostcommon elliptical copulas. Compared with other copulas,the Gaussian copula has a nearly full range (-1, 1) of pair-wise correlation coefficients—yielding a general and robustcopula, which is used in most applications, includingdependences between the SQOSs in the Results. The Gaus-sian copula, however, lacks the tail dependence; that is, theprobability of observing extreme observations in all randomvariables at once. This limitation can be addressed by usingt-copula or Archimedean copula.

Archimedean copulas are suitable for low-dimensionalsystems (n � 2) because of their simple closed functional


forms. For n-dimensional distributions, serial iterates of Ar-chimedean copulas are constructed, but these do not providearbitrary pairwise correlations. Furthermore, restrictions onKendall’s s limits reduce the usefulness of these copulas.26

Examples include the Gumbel, Frank, and Clayton copu-las.31,32,43

Marshall-Olkin copulas may be derived from a simplestochastic process model called a Poisson shock model,where components are subjected to fatal shocks, followingPoisson processes.32 One special copula in this family is theCuadras–Auge copula45 having an n-dimensional form capa-ble of modeling arbitrary pairwise correlations.26 It is usedherein to model the dependences between the SQOSs.

Likelihood, joint posterior, and prior distributions

The joint likelihood distribution, L(Data|h), is the productof likelihood distributions of the SQOSs

LðDatajhÞ ¼ ðh1Þnte�ðh1ÞYNsj¼2

ðhjÞKjt ð1� hjÞL

jt (6)

where Ns is the number of SQOSs (i.e., five for the FCCU),and Kj

t and Ljt denote the failure and success counts for SQOSj

in time period, t. For the total study period, T, the KjT and LjT

are the failure and success counts for all the primary variables,as given in Table 5, Part I, Kj

pP1and LjpP1

are for pP1 (Table 6,Part I), and Kj

pP and LjpP are for all the primary processvariables (Table 7, Part I). Clearly, once data are added, thejoint likelihood distribution is a function of the hs.

As shown in Figure 1, the joint posterior distribution (ortarget distribution), which is a function of the failure rateand probabilities of the SQOSs and their correlation matrix,is proportional to their joint prior distribution multiplied bytheir joint likelihood distribution

f ðh1; h2;…; hNs;WjDataÞ / ðh1Þnte�ðh1ÞYNsj¼2

ðhjÞKjð1� hjÞLj|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}likelihood

� f ðh1; h2;…; hNsjWÞ � f ðWÞ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}joint prior

ð7Þ

where W is the Spearman rank correlation matrix between thefailure rate and probabilities of the SQOSs, and Data is the setof success and failure counts for SQOS1–5, as given in Tables5–7 of Part I.

The joint prior distribution of hs, conditional on their cor-relation matrix, combines the marginal prior distributions ofhs and their dependences using the copula density distribu-tion, c, which is a function of the elements of the depend-ence matrix; that is

f ðh1; h2;…;hNs jWÞ / cðF1ðh1Þ;F2ðh2Þ;…;FNsðhNsÞÞ

�YNsk¼1

fkðhkÞ ð8Þ

As discussed earlier, the prior distributions for the hs are

f1ðh1Þ / ðh1Þa1�1e�b1h1 (9a)

fjðhjÞ / ðhjÞaj�1ð1� hjÞbj�1; j ¼ 2;…; 5 (9b)

where a ¼ [a1, a2, a3, a4, a5]T and b ¼ [b1, b2, b3, b4, b5]

T arethe shape-factor vectors. These vectors are selected tocharacterize the distributions before data are entered. Theyoften reflect prior knowledge of the process or theoreticalassumptions—and are especially important when few datapoints are available. Of course, expert knowledge or data forsimilar systems (or process units) should be used whenavailable.

The prior distribution for the Spearman rank correlationmatrix, W, is taken as the Inverse-Wishart distribution

fwðWÞ / jSinitjm=2jWj�ðmþnþ1Þ=2expð�0:5� trðSinitðWÞ�1ÞÞ

(10)

where Sinit and m are the initial scale matrix and the number ofdegrees of freedom, respectively. Herein, the latter is assumedto be 10 because there are 10 different pairwise correlationcoefficients for five SQOSs.

Multivariate normal and Cuadras–Auge copulas

Because of their abilities to model arbitrary pairwise corre-lations (both positive and negative) between the random vari-ables, the multivariate normal and Cuadras–Auge copulas areused herein, with details provided in the next subsections.

Multivariate Normal Copula. The multivariate normalcopula has been widely used in risk analysis.31,32,35,37,43

Similar to multivariate normal distributions, it modelsdependencies (using pairwise correlations), but does so forrandom variables with arbitrary marginals.31,35

The multivariate normal copula density is

cðu1; u2;…; u5Þ ¼ exp½�fU�1ðu1Þ;…;U�1ðu5ÞgðR�1 � IÞ� fU�1ðu1Þ;…;U�1ðu5ÞgT=2�=jRj1=2 ð11Þ

where R, U�1, and I are the product-moment correlationmatrix, the normal inverse transformation, and the 5 � 5identity matrix. The Spearman correlation matrix, W, withelements wkq, can be constructed from R, with elements rkq,using the following relationship35,46

rkq ¼ 2sinðpwkq=6Þ (12)

This copula density applies for positive and negative cor-relations.

Cuadras–Auge copula. The Cuadras and Auge copuladensity for positive correlations is

cðu1; u2;…; u5Þ ¼ @5C

@u1@u2@u3@u4@u5

¼Y5k¼1

Y5�k

h¼1

ð1� ak�ðkþhÞÞuQ5�k

h¼1

ð1�ak�ðkþhÞÞ�1

� �k

0BB@

1CCA ð13aÞ

where

uk _� � � � _�un (13b)


and

uk ¼ FkðhkÞ ¼Zhk0

fkðhkÞdhk; k ¼ 1;…; 5 (13c)

Note that _� denotes[ or �, as appropriate. Here, the cor-relation coefficient, ak�q, for interactions between hk and hq(k, q ¼ 1,…, 5, k = q) is related to the Spearman rank cor-relation coefficient, w(hk, hq), by

wðhk; hqÞ ¼ 3ak�q

4� ak�qfor k 6¼ q (14)

High w(hk, hq) indicates strong interaction between hk andhq—and zero indicates independence.

The constraint of Eq. 13b de-emphasizes the larger CDFsand ensures that uk in Eq. 13a is sorted in descending orderfor each iteration of the M–H algorithm. For example, whenu2 [ u3 [ u1 [ u5 [ u4, the copula density is

cðu1; u2;…; u5Þ ¼ ðð1� a2�1Þð1� a2�3Þð1� a2�4Þð1� a2�5Þuð1�a2�1Þð1�a2�3Þð1�a2�4Þð1�a2�5Þ�1

2 Þ� ðð1� a3�1Þð1� a3�4Þð1� a3�5Þuð1�a3�1Þð1�a3�4Þð1�a3�5Þ�1

3 Þ� ðð1� a1�4Þð1� a1�5Þuð1�a1�4Þð1�a1�5Þ�1

1 Þ� ðð1� a5�4Þuð1�a5�4Þ�1

5 Þ ð15Þ

On the basis of the study of Cuadras and Auge,45 thefunctional form of copula having negative correlation valuesis developed herein. When ak�q \ 0, uk is replaced with (1-uk) and ak�q is replaced with (�ak�q) in Eqs. 13a and 13b.Also, Eq. 14 becomes

wðhk; hqÞ ¼ wkq ¼ 3ak�q

4þ ak�qfor k 6¼ q (16)

The resulting function defines the copula density for nega-tive correlations. Note that the copula density is piecewisedifferentiable continuous on wkq [ [�1,1]. It is continuousover the entire domain, but nondifferentiable at wkq ¼ 0.

Bayesian Simulation UsingRandom-Walk, Multiple-Block,Metropolis–Hastings Algorithm

As shown in Figure 1, the marginal posterior distributionsof the hs and their correlation matrix, W, are obtained usingthe Metropolis–Hastings (M–H) algorithm—a family of Mar-kov chain simulation methods. These draw samples from ap-proximate distributions (known as proposal or jumping dis-tributions) and correct them to better approximate the targetposterior distribution.

The powerful M–H algorithm produces a correlatedsequence of draws from the target density that may be diffi-cult to sample using independence sampling meth-

ods.39,40,47,48 To define the algorithm, one needs to beginwith a suitable starting point, h(0), for which f(h(0)|Data) [ 0;for example, maximum likelihood estimates (MLEs) of thefailure rate and probabilities. For the next iteration, two stepsare involved: (a) sample a proposal value, h*1, from a proposaldistribution, Jt1 (discussed in the subsection entitled ProposalDistributions for the Failure Rate and Probabilities of SQOSs)and (b) accept the proposal value as the next iterate in the Mar-kov chain according to the probability, min(r1, 1), where

r1 ¼ f ðh�1; h02;…; h05;W0jDataÞ

f ðh01; h02;…; h05;W0jDataÞ

Jt1ðh01jh�1ÞJt1ðh�1jh01Þ

(17)

More specifically, first a random number is obtainedbetween 0 and 1. When r1 \ 1, the proposal value isaccepted when that random number is less than r1. Other-wise, the proposal value is rejected and the current h1 isretained. The ratio, r1, is proportional to the relative changein the posterior distribution using the proposal value andindicates the goodness of the match of the proposal value tothe target distribution compared with the previous value.

This relative change is multiplied byJt1ðh01jh�1Þ

Jt1ðh�1jh01Þ

to account for

asymmetric jumping in the proposal distribution. When theproposal distribution is symmetric, this factor is unity.

For high-dimensional systems, such as the FCCU, a singleblock M–H algorithm that rapidly converges to the targetdensity can be difficult to construct. In such cases, the vari-ate space can be discretized into smaller blocks, with Mar-kov chains constructed in the smaller blocks. This modifiedalgorithm is referred to as the multiple-block, Metropolis–Hastings algorithm.48 Note that the proposal distributions arechosen to be random-walk distributions with the mean atiteration t equal to the previous iteration value.

Using these two steps, a single iteration of the multiple-block, M–H algorithm is completed by updating each blocksequentially, using the above probabilities of a move, giventhe current value for the other blocks. Many iterations areimplemented until the initial state is indiscernible. Typically,the initial samples are not completely valid because the Mar-kov chain has not been stabilized. These burn-in samples arediscarded. The remaining values are accepted and representsamples from the posterior distribution. Note that samplesare taken from regions where the posterior probability ishigh and the chains begin to mix in these regions. Wheneffective, the proposal density matches the shape of the tar-get distribution, which is usually unknown.

Next, a procedure is presented to carry out the Bayesiansimulation to obtain the marginal posterior distributions of hand W.

Procedure for Bayesian simulation

In algorithmic form, the following recursive procedure isused to obtain simulated values:(1) Specify initial values, h(0) and W(0).(2) Repeat for t ¼ 1, 2,…, M (number of iterations)

(a) Propose h1* using Eq. 18:

Set ht1 ¼h�1with probabilityminðr1; 1Þht�11 ; otherwise

�


with r1 ¼ f ðh�1;ht�12 ;…;ht�1

5 ;Wt�1 jDataÞf ðht�1

1 ;ht�12 ;…;ht�1

5 ;Wt�1jDataÞJt1ðht�1

1 jh�1ÞJt1ðh�1jht�1

1 Þ(b) Repeat for j ¼ 2,…, 5:

Propose h*j using Eq. 19:

Set htj ¼h�jwith probability minðrj; 1Þht�1j ; otherwise

(

with j ¼ f ðht1 ;…;htj�1;h�j ;h

t�1jþ1 ;…;ht�1

5 ;Wt�1 jDataÞf ðht1 ;…;htj�1;h

t�1j ;ht�1

jþ1;…;ht�15 ;Wt�1jDataÞ

Jtj ðht�1j jh�j Þ

Jtjðh�j jht�1

j Þ

(c) Propose W* using Eq. 20:

Set Wt ¼ W�with probability minðrw; 1ÞWt�1; otherwise

�with rw ¼ f ðht1 ;ht2 ;…;ht5 ;W

�jDataÞf ðht1 ;ht2 ;…;ht5;W

t�1 jDataÞJtwðWt�1jW�ÞJtwðW�jWt�1Þ

(3) Save {(h(1), W(1)), (h(2), W(2)),…, (h(M), W(M))}.

Next, the proposal distributions for hs and their correlation

matrix, W, are discussed.

Proposal distributions for the failure rate andprobabilities of SQOSs

The proposal distribution for failure rate of SQOS1 istaken to be

h�1 � GammaðA1;B1Þ (18a)

with expected value

Eðh�1Þ ¼A1

B1

¼ ht�11 (18b)

that is

h�1 � Gamma A1;A1

ht�11

!(18c)

with the conditional proposal distribution

Jt1ðh�1jht�11 Þ ¼

A1

ht�11

� �A1

CðAÞ ðh�1ÞAj�1e� A1

ht�11

� �h�1

(18d)

and

varðh�1Þ ¼ðht�1

1 Þ2A1

(18e)

The proposal distributions for the failure probabilities ofSQOS2–5 are taken as

h�j � BetaðAj;BjÞ; j ¼ 2;…; 5 (19a)

with

Eðh�j Þ ¼Aj

Aj þ Bj¼ ht�1

j ; j ¼ 2;…; 5 (19b)

Therefore

h�j � Beta Aj;Aj �1� ht�1

j

ht�1j

! !; j ¼ 2;…; 5 (19c)

with the conditional proposal distribution

Jtjðh�j jht�1j Þ ¼

C Aj

ht�1j

� �CðAjÞC Aj � 1�ht�1

j

ht�1j

� � ðh�j ÞAj�1ð1� h�kÞAj�

1�ht�1j

ht�1j

�1

(19d)and

varðh�j Þ ¼ðht�1

j Þ2ð1� ht�1j Þ

Aj þ ht�1j

(19e)

The variances of the proposal distributions aredisplayed as a function of the failure rate and probabilitiesin Figure 2.

Figure 2. Variance of proposal distributions as a functionof (a) failure rate (Eq. 18e), and (b) failure proba-bilities (Eq. 19e), withAj5 1.7, j5 2, . . ., 5.

[Color figure can be viewed in the online issue, which isavailable at wileyonlinelibrary.com.]


Note that Ak, k ¼ 1,…, 5, are free parameters that deter-mine the convergence and autocorrelation of the samples.From the above functional forms of the variances, some in-formation regarding the Ak can be inferred. When the varian-ces of the proposal distribution are very small (using highAk), the acceptance rates of samples (i.e., the fraction of pro-posed samples accepted) are very high. However, these sam-ples tend to move slowly about the space and/or becometrapped in the vicinity of local minima, resulting in highautocorrelations when they are not well ‘‘mixed.’’ On theother hand, when the variances are very high (using smallAk), the samples tend to experience jumps, resulting in lessautocorrelation, but lower acceptance rates of the moves—because of the increased likelihood of jumping to regionshaving a lower probability density. In summary, the autocor-relations and acceptance rates of the moves are sensitive toAk, with clear tradeoffs between them. For the simulationstudies herein, the following A gave low autocorrelation andmoderately high sample acceptance rates: A1 [ [30,40], A2 [[35,45], A3 [ [15,25], A4 [ [1.5,2], and A5 [ [1.5,2].

Proposal distribution for the correlation matrix

The proposal distribution for the correlation matrix is cho-sen to be a random-walk, Inverse-Wishart distribution withits expected value at iteration t equal to its previous iterationvalue. Therefore

C� � Inv�WishartmðS�1Þ (20a)

where C* is the unconstrained correlation matrix (proposed), Sis the scale matrix, and m is the number of degrees of freedom(assumed to be 10). The elements of C* are scaled to obtainthe scaled correlation matrix, W*

W� ¼ diagððC�Þ�1=2ÞC�diagððC�Þ�1=2Þ (20b)

This transformation sets the diagonal elements of C* tounity and scales the other elements to the range [�1, 1].

Furthermore

EðW�Þ ¼ ðm� n� 1Þ�1S ¼ Wt�1 (20c)

Therefore

S ¼ ðm� n� 1ÞWt�1 ¼ ð10� 5� 1ÞWt�1 ¼ 4Wt�1 (20d)

Thus, the probability density function is

JtwðW�jWt�1Þ / jSjm=2jW�j�ðmþnþ1Þ=2expð�0:5trðSðW�Þ�1ÞÞ

(20e)

Note that the Inverse-Wishart distribution gives positive-definite samples (matrices)—a condition required for correla-tion matrices.39,49

Alternatively, elements of the correlation matrix can besampled individually from different univariate proposal dis-tributions (e.g., beta distributions)—without maintaining apositive-definite correlation matrix. Then, the sample ele-

ments that destroy the positive-definiteness of the matrix arerejected and re-sampled until the proposal matrix becomespositive-definite. The entire proposal matrix is accepted orrejected using the M–H algorithm. This individual samplingalgorithm yielded a much lower acceptance rate for the cor-relation matrix than that obtained using the Inverse-Wishartdistribution. In addition, the posterior variances of the ele-ments were much higher, introducing more uncertainty andconvergence problems. Hence, the random-walk, Inverse-Wishart distribution was used as the proposal distribution forthe correlation matrix.

Results

Results for the simulation studies are presented next. Sev-eral Markov chains of 70,000 iterations for different priorsand M–H starting points converged rapidly for h1, h2, and h3(the differences between the means and variances of the pos-terior distributions for different priors and M–H startingpoints were found to be negligible). In addition, a low auto-correlation was found among the samples.

The results were obtained using ‘‘Jeffrey’s priors’’ (nonin-formative prior distributions, invariant under reparameteriza-tion of the parameters) and ‘‘uniform priors.’’ However, theJeffrey prior for SQOS1 yields a1 ¼ 0.5, b1 ¼ 0, which isimproper (as b1 [ 0 for proper priors). Therefore, a1 and b1are assumed to be equal to 0.01. For SQOS2–4, the parame-ters (for Jeffrey’s priors) for the beta distributions are a2, a3,a4, b2, b3, b4 ¼ 0.5. For SQOS5, a uniform prior is assumed;that is, a5 ¼ 1, b5 ¼ 1. Note that a Jeffrey prior for SQOS5

led to significant autocorrelation among the SQOS5 posteriorsamples. This does not occur when b5 [ a5 [ 0.5. In sum-mary, a ¼ [0.01, 0.5, 0.5. 0.5, 1]T and b ¼ [0.01, 0.5, 0.5.0.5, 1]T were used.

For the correlation matrix, the following prior value wasused:25

Winit ¼

1 0:8 0:7 0:6 0:50:8 1 0:8 0:7 0:60:7 0:8 1 0:8 0:70:6 0:7 0:8 1 0:80:5 0:6 0:7 0:8 1

266664

377775 (21)

Sinit was assumed to be

Sinit ¼ ðm� n� 1Þ � EðWinitÞ ¼ 4�Winit

¼ 4�

1 0:8 0:7 0:6 0:5

0:8 1 0:8 0:7 0:6

0:7 0:8 1 0:8 0:7

0:6 0:7 0:8 1 0:8

0:5 0:6 0:7 0:8 1

26666664

37777775 ð22Þ

Here, also, several prior matrices yielded rapid conver-gence to their posterior matrices.

Results for pP1 using the multivariate normal and Cuad-ras–Auge copulas are presented next. Later, results for allthe primary and primary process variables are compared toshow the benefits of using quality data in dynamic risk


analyses. Finally, engineering and statistical perspectives oncopula modeling are contrasted.

Results for pP1 using the multivariate normal copula

On the basis of the likelihood data in Table 6, Part I, thehistograms of the marginal posterior failure rate and proba-bilities of the SQOSs, h, for pP1 (over the entire study pe-riod, consisting of 13 equal periods), calculated using multi-variate normal copula, are presented in Figure 3. Their 95%posterior intervals and means areh1 ¼ [135.781, 149.029], E(h1) ¼ 142.624h2 ¼ [0.061, 0.087], E(h2) ¼ 0.074h3 ¼ [0.782, 0.909], E(h3) ¼ 0.851h4 ¼ [0.001, 0.054], E(h4) ¼ 0.017h5 ¼ [0.0005, 0.719], E(h5) ¼ 0.215

Because of the availability of sufficient data for SQOS1–3

(Table 6, Part I), the 95% posterior intervals and meansfor h1, h2, and h3 were insensitive to the prior values.However, for SQOS4–5, because of far fewer data points,their failure probabilities vary significantly with the parame-ters of the prior distributions. For these results, a4 ¼ b4 ¼0.5 and a5 ¼ b5 ¼ 1, with prior means, E(h4) ¼ E(h5) ¼0.5, were used.

Over 13 periods, on average, 142 abnormal eventsoccurred in each period. The mean posterior failure proba-bility (and associated variance) of the operators level Icorrective actions is fairly low [E(h2) ¼ 0.074]—indicat-ing their robust performance. However, for operators levelII corrective actions, the failure probability is abnormallyhigh [E(h3) ¼ 0.851], indicating their difficulties in keep-ing the variables within their orange-belt zones. Stated dif-ferently, the probability that pP1 moves from its yellow-to its orange-belt zone is just 7.4%, whereas the probabil-ity of moving from its orange- to its red-belt zone is85%. Clearly, less-likely, moderately-critical abnormalevents are very likely to propagate into most-criticalabnormal events due to the ineffective level II controlactions (i.e., SQOS3). Hence, for the FCCU, to preventthe occurrence of most-critical abnormal events, it is im-portant to prevent the occurrence of moderately-criticalabnormal events.

The failure probability of the override controller is alsoquite low, E(h4) ¼ 0.017. However, its variance is high, ascompared with that of SQOS1–3. Also, the failure probabilityof the ESD system is relatively uncertain, due to the avail-ability of few data points. In these cases, past performances(over several months and years) and/or expert knowledgeregarding their failures, are desirable. The latter could be

Figure 3. Marginal posterior distributions for h using multivariate normal copula.


derived from the dynamic risk analysis of near-miss datafrom related plants. Note that similar results can be obtainedfor individual time periods.

Given estimates of the failure rate and probabilities of theSQOSs, incident probabilities can be estimated; that is, prob-abilities of the occurrence of an ESD or accident. Two typesof incident probabilities are computed herein: (a) incidentprobabilities per period and (b) incident probabilities perabnormal event. The probability of occurrence of an ESDper period, ptpESD, is obtained by multiplying the failure rateand the probabilities of SQOS1–4 and the success probabilityof SQOS5. And, the probability of occurrence of an accidentin each period, ptpAccident, is obtained by multiplying the fail-ure rate of SQOS1 and the failure probabilities of SQOS2–5.Similarly, the probability of the occurrence of an ESD perabnormal event, pAEESD, is obtained by multiplying the failureprobabilities of SQOS2–4, and the success probability ofSQOS5, whereas the probability of occurrence of an accidentper period, pAEAccident, is obtained by multiplying the failureprobabilities of SQOS2–5. Symbolically

ptpESD ¼ h1h2h3h4ð1� h5Þ (23a)

ptpAccident ¼ h1h2h3h4h5 (23b)

pAEESD ¼ h2h3h4ð1� h5Þ (23c)

pAEAccident ¼ h2h3h4h5 (23d)

To obtain the posterior distributions of the incident proba-bilities, random sampling from the distributions of the failurerate and probabilities of the SQOSs is used. Figure 4presents histograms of the probabilities of occurrence of anESD and an accident, per period. Their 95% posterior inter-vals and means are as followsptpESD ¼ [5.6e-3, 0.443], E(ptpESD) ¼ 0.124

ptpAccident ¼ [1.4e-5, 0.159], E(ptpAccident) ¼ 0.032

Similarly, Figure 5 presents histograms of the probabilitiesof occurrence of an ESD and an accident, per abnormalevent. Their 95% posterior intervals and means are as fol-lowspAEESD ¼ [3.9e-5, 3.1e-3], E(pAEESD) ¼ 8.7e-4

pAEAccident ¼ [2.4e-7, 1.1e-3], E(pAEAccident) ¼ 2.1e-4

Thus, based on the performances of the SQOSs during thestudy period, on average, the probability of the occurrenceof an ESD associated with pP1 is 0.124 per period and 8.7e-4 per abnormal event. That is, an ESD is likely to occur inapproximately 1 of 8 periods, or in 1 of 1150 abnormalevents. Similarly, an accident is probably to occur in approx-imately 1 of 31 periods (1/0.032) or in 1 of 4762 abnormalevents. Note that while no accidents occurred during thestudy period and no trips (ESDs) occurred over several peri-ods, the Bayesian analysis estimates finite probabilities oftrips and accidents for all periods—using abnormal eventshistory data and prior information. Also, because the var-iance in the failure probability of SQOS5 is large, the varian-ces in the probabilities of an ESD and an accident are high.With additional data, this uncertainty can be reduced.

The probabilities of ESDs and accidents can help to assessthe compliance of chemical plants with national and interna-tional safety standards. For example, the Instrumentation,Systems, and Automation Society (ISA)50 presents safety in-tegrity limits (SILs) to measure the level of risk-reductionprovided by ESD systems. More specifically, when the prob-ability of failure under demand (PFD) lies between 10�2 and10�3, the SIL is set at 2. Herein, for the FCCU, the mean

Figure 4. Probability distributions for the occurrence ofan ESD and an accident per period, calcu-lated using multivariate normal copula.

Figure 5. Probability distributions for the occurrence of an ESD and an accident per abnormal event, calculatedusing multivariate normal copula.


probability of an ESD per abnormal event is on the order of10�3. Consequently, the associated SIL ¼ 2 is compliantwith the ISA safety standard.

Next, Figures 6 and 7 present box and whisker plots41 forthe probabilities of occurrence of an ESD and an accident,per period and per abnormal event, respectively, startingfrom the first period and cumulated through the end of eachof the 13 periods. For example, the box and whisker plot forperiods 1–6 (abscissa) signifies the risk estimates for the firstsix periods of the study period.

The figures show how the risk estimates (cumulative val-ues) change with time periods. For the first four periods, therisk estimates experience a consistent decrease, followed bya sudden increase in the fifth period. This was due toincreases in failure probabilities of levels I and II correctiveactions, due to special operations in period 5. Followingthat, the risk estimates decrease slightly and attain a steadyvalue. These figures provide insights on the relative perform-ances during different time periods and can be used to

compare risk levels associated with different plants and workconditions.

Also, note that beginning with the first period, with fewerdata points, the variances are higher. As data are added, thevariances decrease, accompanied by decreases in the influ-ence of prior information.

Next, histograms of the posterior distributions of the ele-ments of the Spearman rank correlation matrix, W, for theoverall study period, calculated using the multivariate normalcopula, are presented in Figure 8. Their 95% posterior inter-vals and means arew12 ¼ [�0.18, 0.92], E(w12) ¼ 0.50w13 ¼ [�0.24, 0.87], E(w13) ¼ 0.41w14 ¼ [�0.32, 0.79], E(w14) ¼ 0.29w15 ¼ [�0.39, 0.75], E(w15) ¼ 0.23w23 ¼ [�0.23, 0.88], E(w23) ¼ 0.47w24 ¼ [�0.31, 0.84], E(w24) ¼ 0.38w25 ¼ [�0.39, 0.78], E(w25) ¼ 0.28w34 ¼ [�0.21, 0.90], E(w34) ¼ 0.47

Figure 6. Box and Whisker plots for probabilities of an ESD and an accident per period, for cumulative periods(outliers are not shown).

Figure 7. Box and Whisker plots for probabilities of an ESD and an accident per abnormal event, for cumulativeperiods (outliers are not shown).


w35 5 [20.30, 0.87], E(w35) 5 0.39

w45 ¼ [�0.24, 0.89], E(w45) ¼ 0.47

As seen, the mean values of the correlation coefficientsare positive, with their distributions skewed to the left. Onaverage, there is a significantly positive correlation of theSQOSs with their nearest neighbors, a moderately positivecorrelation with their next nearest neighbors, a weak positivecorrelation with their second nearest neighbors, and a veryweak positive correlation with their third nearest neighbors,as summarized in Table 1.

In summary, this multivariate normal copula yields posi-tive correlations between the failure rate and probabilities ofthe SQOSs. Next, results using the Cuadras–Auge copula arepresented.

Results for pP1 using the Cuadras–Auge copula

Histograms of the marginal posterior failure rate and prob-abilities of the SQOSs, h, for pP1 (over the entire study

period, consisting of 13 equal periods) were also calculatedusing the Cuadras–Auge copula. Their 95% posterior inter-vals and means for h areh1 ¼ [136.32, 148.94], E(h1) ¼ 142.55h2 ¼ [0.061, 0.086], E(h2) ¼ 0.072h3 ¼ [0.768, 0.896], E(h3) ¼ 0.838h4 ¼ [0.003, 0.066], E(h4) ¼ 0.024h5 ¼ [7.78e-3, 0.709], E(h5) ¼ 0.247

Figure 8. Marginal posterior distributions for the elements of W calculated using multivariate normal copula.

Table 1. Correlation Strengths Between SQOSs CalculatedUsing Multivariate Normal Copula

Distance BetweenSQOSs

Correlation StrengthBetween SQOSs

Nearest neighbor Significant (0.47–0.50)First nearest neighbor Moderate (0.38–0.41)Second nearest neighbor Weak (0.28–0.29)Third nearest neighbor Very weak (0.23)


A comparison of the means using the two copulas, Bayes-ian analysis with independent SQOSs, and classical statistics,is presented in Table 2. The latter neither accounts for priorinformation nor interactions among the SQOSs—with failureprobabilities based solely on the likelihood data in Part I.

Because of the availability of sufficient data for SQOS1–3,the means of their failure rate and probability distributions,calculated using the above methods, differ slightly. Also, theimpact of the shared information on their hs, obtained usingthe two copula models, is very weak. However, as SQOS4–5

involve much less data, their hs depend on the method andcopula, and the prior distribution parameters. For these rareevents, as the data are limited, the Bayesian approach withcopulas provides better failure probability estimates, account-ing for the prior/expert belief about their failures as appro-priate.

For the Cuadras–Auge copula, the 95% posterior intervalsand means for the elements of correlation matrix arew12 ¼ [�0.31, 0.87], E(w12) ¼ 0.32

w13 ¼ [�0.33, 0.82], E(w13) ¼ 0.25

w14 ¼ [�0.43, 0.75], E(w14) ¼ 0.12

w15 ¼ [�0.43, 0.69], E(w15) ¼ 0.11

w23 ¼ [�0.28, 0.88], E(w23) ¼ 0.33

w24 ¼ [�0.35, 0.79], E(w24) ¼ 0.22

w25 ¼ [�0.44, 0.75], E(w25) ¼ 0.17

w34 ¼ [�0.34, 0.86], E(w34) ¼ 0.34

w35 ¼ [�0.40, 0.76], E(w35) ¼ 0.22

w45 ¼ [�0.25, 0.85], E(w45) ¼ 0.39

Similarly, the distributions are skewed to the left.Although their mean values are positive, they are smallerthan those using the multivariate normal copula. On average,the correlation strengths between the SQOSs are less thanusing the multivariate normal copula, as shown in Table 3.

Thus, the strength of the correlations between the SQOSsis contingent on the copula used for modeling. It is impor-tant to note, however, that the trends obtained from the twocopulas are similar—the correlation strengths increase as theSQOSs assume closer proximity. Furthermore, the copulamodeling validates the theoretical assumptions that the

actions of the SQOSs are related to each other (because ofnonlinear relationships between the variables and behavior-based factors). These results can persuade plant personnel totake preventive measures to avoid shutdowns and accidents.For example, when h2 or h3 increase, the onset of a special-cause(s) is likely to trigger the ESDs more often, increasingtheir likelihood of failure.

If sufficient data for SQOS4–5 were available, based onthe results for SQOS1–3 in Table 2, the copulas would likelyhave negligible impact on their posterior failure probabilities.Here, the copulas are needed to calculate the correlationstrengths—which would not differ significantly with copulachoice.

Similar results were obtained for the likelihood data in Ta-ble 4, Part I, for all the primary variables—and in Table 6for all the primary process variables.

Next, the maximum entropy principle is applied to selectthe best copula for the FCCU case study.

Maximum entropy principle

Copula selection can be important, especially when dataare limited. For multivariate systems, copula properties liketail dependences, closed-form expressions, symmetricity, andsimilar, can strongly influence the selection. More recently,it is recommended that noninformative copulas be selected.26

In one approach, an entropy function is maximized.51–53 Forthe multivariate prior distribution, f ðh1; h2;…;h5Þ, the differ-ential entropy, Ent, is

Ent½f ðh1; h2;…;h5Þ� ¼ �Z

� � �Z

f ðh1; h2;…;h5Þ� ln½f ðh1; h2;…;h5Þ�dh1…dh5 ð24aÞ

or

Ent½f ðh1; h2;…;h5Þ� ¼ Ent½f1ðh1Þ� þX5j¼2

Ent½fjðhjÞ�

þ Ent½fwðWÞ� þ Ent½cðh1; h2;…;h5Þ� ð24bÞ

Table 2. Comparison of Mean Failure Probabilities (for theEntire Study Period)

Mean(Cuadras–Auge

Copula)

Mean(Multivariate

NormalCopula)

Means(SQOSs with

ZeroCorrelation)

MLE(Classical)

h1 142.5 142.6 142.7 142.8h2 0.072 0.074 0.074 0.074h3 0.838 0.851 0.844 0.847h4 0.024 0.017 0.021 0.017h5 0.247 0.215 0.250 0

Table 3. Correlation Values Using the Two Copulas

Distance BetweenSQOSs

Correlation Strength Between SQOSs

Cuadras–AugeCopula

MultivariateNormal Copula

Nearest neighbor Moderate(0.32–0.39)

Significant(0.47–0.50)

First nearest neighbor Weak (0.22–0.25) Moderate(0.38–0.41)

Second nearestneighbor

Very weak(0.12–0.17)

Weak (0.28–0.29)

Third nearest neighbor Negligible (0.11) Very weak (0.23)

Table 4. Comparison of the Mean Failure Probabilities and the Probabilities of Occurrence of ESDs and Accidents(Calculated Using Multivariate Normal Copula) – With and Without Quality Data

Mean h1 h2 h3 h4 h5Prob. of an ESD

per Abnormal EventProb. of an Accidentper Abnormal Event

All primary variables 195.69 0.057 0.822 0.017 0.121 7.25e-4 1.04e-4Primary process variables only 156.69 0.069 0.852 0.018 0.106 9.64e-3 1.17e-4


That is, the entropy of the multivariate prior distribution isthe sum of the entropies of the individual marginal PDFsplus the entropy of the copula. Here, the entropy is maxi-mized when the random variables are independent. With noinformation shared among the univariate marginals, the dataare ‘‘most randomly’’ distributed.54 When the random varia-bles are dependent (given their correlation matrix), it can beshown that the differential entropy is maximized using themultivariate normal copula.55,56 It follows that, for a givenprior correlation matrix, the multivariate normal copula hasa higher entropy than the Cuadras–Auge copula and thus, isless informative. Therefore, herein, the multivariate normalcopula is the preferred prior.

Product-quality data in risk analysis

Referring to Part I and Pariyani et al.,41 the safety per-formance of any process is characterized by its primary var-iables, which include both primary process and primaryquality variables. When a primary process or primary qual-ity variable moves outside its green-belt zone, a safetyproblem is likely to occur. However, although quality varia-bles are causally related to process variables, abnormalevents associated with primary quality variables do not nec-essarily imply the movement of primary process variablesout of their green-belt zones, and vice versa. Hence, theabnormal events history of the primary quality variablesshould also influence the Bayesian risk analysis. In practice,however, quality variable data are often not estimated orrecorded.

In this subsection, the impact on risk estimates of neglect-ing primary quality variable data is shown. Using the likeli-hood data in Tables 5 and 7 of Part I, Table 4 compares themean failure probabilities and the probabilities of ESDs andaccidents (calculated using Bayesian analysis with multivari-ate normal copula)—with and without product-quality data.

As expected, there is an increase in the mean h1 whenall primary variables are used. However, when the product-quality data are excluded, the failure probabilities, h2, h3,and h4, are higher, as well as the probabilities of an ESD,and the occurrence of an accident. In particular, the lattertwo increase by 33% and 12%. Clearly, the quality data pro-vide more near-miss information, yielding improved riskestimates.

Engineering and statistical perspectives

For the FCCU DCS and ESD system databases, while thecorrelation strengths between the failure probabilities of theSQOSs are positive, they may not be statistically significant(because of limited data, with P-levels close to 0.15, greaterthan the standard cutoff value of 0.05). With availability ofmore data, the statistical significance of correlations can beimproved.

From an engineering perspective, the nonlinear relation-ships between the variables and behavior-based factorsclearly establish interactions between the SQOSs. These aremodeled properly using copulas. Furthermore, the correlationstrengths in Table 3 are expected and corroborate the use ofcopula modeling. Also, the copulas share information

(through correlations), providing more accurate risk esti-mates associated with rare events.

Conclusions

The Bayesian analysis method presented herein is capableof handling large numbers of abnormal events over extendedperiods of time. This use of dynamic alarm databases forrisk analysis permits a more rigorous and reliable assessmentof the performance of the various regulatory and protectionsystems, and importantly, calculation of the probabilities ofshutdowns and accidents. For processing plants, the nonlin-ear relationships between the variables and behavior-basedfactors cause the SQOSs to share complex interactions witheach other. To account for these interdependencies in theBayesian model and calculate correlation coefficientsbetween the SQOSs, copulas are needed. Although they addan additional complexity in calculations, they provide amechanism to utilize the shared information between theSQOSs and to estimate the posterior values of elements ofdependence matrix.

For the FCCU case study, several specific conclusions aredrawn. Moderately critical abnormal events were very likely(85% chance) to propagate into most-critical abnormalevents, primarily due to the poor controlling actions of thelevel II operators (i.e., SQOS3). Hence, it is concluded thatto prevent the occurrence of most-critical abnormal events, itis almost as important to prevent the occurrence of moder-ately-critical abnormal events.

Posterior correlation coefficients between the failure rateand probabilities of the SQOSs were estimated. The correla-tion strengths between the SQOSs are moderately to signifi-cantly positive and increase as the SQOSs gain closer prox-imity. Both copulas yielded positive (average) correlations(left-skewed distributions, with positive means) between theSQOSs as expected based on their physical interactions.

For rare events, like failures of the override controller,and automatic and manual ESDs, the Bayesian approachwith copulas provides better failure probability estimates,accounting for the prior/expert belief about their failures asappropriate. With sufficient data, copulas permit the reliableestimation of posterior correlation coefficients. Using themaximum entropy principle, the multivariate normal copula,with a given correlation matrix, is less informative and ispreferred over the Cuadras–Auge copula. It is to be notedthat the maximum entropy principle provides a qualitativeway to identify the ‘‘most noninformative’’ copula. Soon, wewill work on developing a quantitative method to analyzethe information gain (using Information theory) associatedwith each copula. Finally, the quality data complement thedata associated with the primary process variables (safetydata), and thus, provide more near-miss information forBayesian analysis, providing better risk estimates.

Acknowledgments

Partial support for this research from the National Science Foundationthrough grant CTS-0553941 is gratefully acknowledged. The helpfulinsights and comments of Professors Edward George and Shane Jensenof the Statistics Department, Wharton School at the Univ. of Pennsylva-nia, and Professor Michael Carchidi, Mechanical Engineering andApplied Mechanics Department at Penn, are appreciated.


Notation

Acronyms

ASP ¼ accident sequence precursorBPCS ¼ basic process control systemCCPS ¼ Center for Chemical Process Safety (AIChE)CDF ¼ cumulative distribution functionCO ¼ continued operation

CPIs ¼ chemical process industriesDCS ¼ distributed control systemESD ¼ emergency shutdown

FCCU ¼ fluid catalytic cracking unitISA ¼ Instrumentation, Systems, and Automation Society

MLE ¼ maximum likelihood estimatePDF ¼ probability density functionPFD ¼ probability of failure under demandPRA ¼ probabilistic risk analysisrvs ¼ random variablesSIL ¼ safety integrity limit

SQOS ¼ safety, quality, and operability system

English letters

ak, bk ¼ parameters of Gamma prior distribution for h1aj, bj ¼ parameters of Beta prior distribution for hj (j ¼

2,…, 5)A1, B1 ¼ parameters of Gamma proposal distribution for h1Aj, Bj ¼ parameters of Beta proposal distribution for hj (j

¼ 2,…, 5)c ¼ copula densityC ¼ copula function

Data ¼ Likelihood data; set of success and failure countsfor SQOS1–5

Ent[fk(hk)] ¼ differential entropy of the prior distribution for hk(k ¼ 1,…, 5)

E(hk) ¼ expected value of hkfk(hk) ¼ prior probability density function for hk (k ¼

1,…, 5)Fk(hk) ¼ prior CDF for hk (k ¼ 1,…, 5)

fk(hk|Data) ¼ marginal posterior distribution for hk (k ¼ 1,…,5)

fw(W) ¼ prior distribution for the Spearman-rankcorrelation matrix

f ðh1; h2;…; hNs;WÞ ¼ joint prior distribution of failure rate andprobabilities of the SQOSs and their correlationmatrix

g(Data|h) ¼ general likelihood function in Eq. 1I ¼ identity matrixJtk ¼ proposal distribution for hk (k ¼ 1,…, 5)Jtw ¼ proposal distribution for the Spearman-rank

correlation matrixKjt, L

jt ¼ failure and success counts for SQOS j in time

period tKjT , L

jT ¼ Failure and success counts for SQOS j associated

with all primary variables during the entire studyperiod

KjpP1

, LjpP1¼ failure and success counts for SQOS j associated

with pP1 variable during the entire study periodKjpP, L

jpP ¼ failure and success counts for SQOS j associated

with primary process variables during the entirestudy period

L(Data|h) ¼ likelihood function for SQOSsnt ¼ number of abnormal events in a time periodnT ¼ number of abnormal events for all primary

variables (see Table 5, Part I)npP1

¼ number of abnormal events for pP1 (see Table 6, Part I)npP ¼ number of abnormal events for all primary

process variables (see Table 7, Part I).Ns ¼ number of SQOSspP1 ¼ primary process variable 1

pAEAccident ¼ probability of occurrence of an accident, perabnormal event

pAEESD ¼ probability of occurrence of an ESD, perabnormal event

ptpAccident ¼ probability of occurrence of an accident, per timeperiod

ptpESD ¼ probability of occurrence of an ESD, per time periodrkq ¼ element (row k, column q) of product-moment

correlation matrix, RR ¼ product-moment correlation matrixS ¼ scale matrixuk ¼ CDF of hkwij ¼ element (row i, column j) of Spearman-rank

correlation matrixW ¼ Spearman-rank correlation matrix

Greek letters

ak�q ¼ correlation coefficient between hk and hqm ¼ degrees of freedom

U�1 ¼ normal inverse transformationh1 ¼ failure rate of BPCS (SQOS1)h2 ¼ failure probability of level I corrective actions

(SQOS2)h3 ¼ failure probability of level II corrective actions

(SQOS3)h4 ¼ failure probability of override controller (SQOS4)h5 ¼ failure probability of ESD system (SQOS5)h ¼ vector comprising failure rate and probabilities of

SQOSsC ¼ unconstrained correlation matrix

Subscripts

h ¼ counter in copula expression in Eq. 15aj ¼ counter for failure probabilities, j ¼ 2,…, 5k ¼ counter for failure rate and probabilities,

k ¼ 1,…, 5q ¼ counter for failure rate and probabilities in Eqs.

15 and 16; k = qt ¼ a general time periodw ¼ symbol representing association with the

Spearman-rank correlation matrix

Superscripts

* ¼ proposed value(0) ¼ starting valueAE ¼ abnormal eventinit ¼ initial value

t ¼ iteration countertp ¼ time period

Literature Cited

1. Apostolakis G. Probability and risk assessment: the subjectivisticviewpoint and some suggestions. Nuclear Safety. 1978;19:305–315.

2. Parry GW, Winter PW. Characterization and evaluation of uncer-tainty in probabilistic risk analysis. Nuclear Safety. 1981;22:28–42.

3. McCormick NJ. Reliability and Risk Analysis: Methods and NuclearPower Applications. New York: Academic Press, 1981.

4. Garrick BJ. Recent case studies and advancements in probabilisticrisk assessment. Risk Anal. 1984;4:267–279.

5. Wu J. Overview of probabilistic risk assessment and applications.Progress in Safety Science and Technology: Proceedings of the2004 International Symposium on Safety Science and Technology,Part B. Beijing, China, 2004;4:1743–1748.

6. Siu NO, Kelly DL. Bayesian parameter estimation in probabilisticrisk assessment. Reliab Eng Syst Saf. 1998;62:89–116.

7. Vose D. Quantitative Risk Analysis: A Guide to Monte Carlo Simu-lation Modeling. New York: John Wiley & Sons, 1996.

8. Anderson E, Hattis D, Matalas N, Bier V, Kaplan S, Burmaster D,Conrad S, Ferson S. Foundations. Risk Anal. 1999;19:47–68.

9. Bedford T, Cooke R. Probabilistic Risk Analysis: Foundations andMethods. Cambridge, UK: Cambridge University Press, 2001.

10. Kaplan S. The words of risk analysis. Risk Anal. 1997;17:407–417.11. Kumamoto H, Henley E. Probabilistic Risk Assessment and Man-

agement for Engineers and Scientists. Piscataway, NJ: IEEE Press,1996.


12. Pate-Cornell M. Uncertainties in risk analysis: six levels of treat-ment. Reliab Eng Syst Saf. 1996;54:95–111.

13. Winkler R. Uncertainty in probabilistic risk assessment. Reliab EngSyst Saf. 1996;54:127–132.

14. Kirchsteiger C. Impact of accident precursors on risk estimates fromaccident databases. J Loss Prev Process Ind. 1997;10:159–167.

15. Rasmussen NC. The application of probabilistic risk assessmenttechniques to energy technologies. Annu Rev Energy. 1981;6:123–138.

16. Ortwin R. Three decades of risk research: accomplishments and newchallenges. J Risk Res. 1998;1:49–71.

17. Apostolakis G. The concept of probability in safety assessment oftechnological systems. Science. 1990;250:1359–1364.

18. Kaplan S, Garrick BJ. On the quantitative definition of risk. RiskAnal. 1981;1:11–37.

19. Berenblut BJ, Cave L, Munday G. An application of PRA and cost/benefit analysis to the problems of loss prevention in the chemicalindustry. Probabilistic Safety Assessment and Risk Management:PSA. 1987;3:1029–1035.

20. Szwed P, Rene van Dorp J, Merrick JRW, Mazzuchi TA, Singh A.A Bayesian paired comparison approach for relative accident proba-bility assessment with covariate information. Eur J Operational Res.2006;169:157–177.

21. Moslesh A, Bier V, Apostolakis G. A critique of current practice forthe use of expert opinions in probabilistic risk assessment. ReliabEng Syst Saf. 1988;20:63–85.

22. Pulkkinen U. Methods for combination of expert judgments. ReliabEng Syst Saf. 1993;40:111–118.

23. Cooke RM. Experts in Uncertainty: Opinion and Subjective Proba-bility in Science. Oxford, UK: Oxford University Press, 1991.

24. Swain AD, Guttmann HE. Handbook of Human Reliability Analy-sis with Emphasis on Nuclear Power Plant Applications. Wash-ington, DC: US Nuclear Regulatory Commission, 1983; NUREG/CR-1278.

25. Meel A, Seider WD. Plant-specific dynamic failure assessment usingBayesian theory. Chem Eng Sci. 2006;61:7036–7056.

26. Yi W, Bier VM. An application of copulas to accident precursoranalysis. Manage Sci. 1998;44:S257–S270.

27. Goossens LHJ, Cooke RM. Applications of some risk assessmenttechniques: formal expert judgment and accident sequence precur-sors. Saf Sci. 1997;26:35–47.

28. Oliver RM, Yang HJ. Bayesian updating of event tree parameters topredict high risk incidents. In: Oliver RM, Smith, JQ, editors. Influ-ence Diagrams, Belief Nets and Decision Analysis. Chichester, UK:John Wiley & Sons, 1990.

29. Bier VM. Statistical methods for the use of accident precursor datain estimating the frequency of rare events. Reliab Eng Syst Saf.1993;41:267–280.

30. Meel A, O’Neill LM, Seider WD, Oktem U, Keren N. Operationalrisk assessment of chemical industries by exploiting accident data-bases. J Loss Prev Process Ind. 2007;20:113–127.

31. Nelsen RB. An introduction to copulas. Lecture Notes in Statistics.Springer, New York, 1999.

32. Embrechts P, Lindskog F, McNeil A. Modeling dependence withcopulas and applications to risk management. In: Rachev S, editor.Handbook of Heavy Tailed Distributions in Finance. Elsevier inAmsterdam, The Netherlands. 2003:331–385.

33. Meel A, Seider WD. Real-time risk analysis of safety systems. Com-put Chem Eng. 2008;32:827–840.

34. Kelly DL. Using copulas to model dependence in simulation riskassessment. Proceedings of IMECE 2007 – ASME International Me-chanical Engineering Congress and Exposition. Seattle, Washington,2008;14:81–89.

35. Clemen RT, Reilly T. Correlation and copulas for decision and riskanalysis. Manage Sci. 1999;45:208–224.

36. Jouini MN, Clemen RT. Copula models for aggregating expert opin-ions. Oper Res. 1996;44:444–457.

37. Danaher P, Smith M. Modeling multivariate distributions using cop-ulas: applications in marketing. Marketing Sci. 2009;30:4–21.

38. Larsen RJ, Marx ML. An Introduction to Mathematical Statisticsand Its Applications, 3rd ed. NJ: Prentice Hall, 2001.

39. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis,2nd ed. Chapman & Hall/CRC, Boca Raton, FL, 2004.

40. Bolstad WM. Introduction to Bayesian Statistics, 1st ed. Hoboken,NJ: John Wiley & Sons, 2004.

41. Pariyani A, Seider WD, Oktem UG, Soroush M. Incident investiga-tion and dynamic analysis of large alarm databases in chemicalplants: a fluidized-catalytic-cracking unit case study. Ind Eng ChemRes. 2010;49:8062–8079.

42. Sklar A. Random variables, joint distribution functions, and copulas.Kybernetika. 1973;9:449–460.

43. Trivedi PK, Zimmer DM. Copula Modeling: An Introduction forPractitioners. Now Publishers Inc., Hanover, MA, 2007. Availableonline at http://chess.uchicago.edu/ccehpe/hew_papers/Fall2007/Trivedi%2011_29_07.pdf.

44. Joe H. Relative entropy measures of multivariate dependence. J AmStat Assoc. 1989;84:157–164.

45. Cuadras CM, Auge J. A continuous general multivariate distributionand its properties. Commun Stat-Theory Methods. 1981;A10:339–353.

46. Krushal W. Ordinal measures of association. J Am Stat Assoc. 1958;53:814–861.

47. Robert CP, Casella G. Monte Carlo Statistical Methods. Springer,New York, 2004.

48. Chib S, Greenberg E. Understanding the Metropolis–Hastings algo-rithm. Am Statistician. 1995;49:327–335.

49. http://en.wikipedia.org/wiki/Covariance_matrix. Access date April16, 2011

50. Instrumentation, Systems, and Automation Society, ANSI/ISA-S84.01–1996, Application of safety instrumented system for the pro-cess industries, ISA, 1996.

51. Kapur JN. Measures of Information and Their Applications. JohnWiley and Sons, New Delhi, 1994.

52. Ogunnaike BA. Random Phenomena: Fundamentals of Probability andStatistics for Engineers. FL: CRC Press, Taylor & Francis Group, 2010.

53. Jaynes ET. Prior probabilities. IEEE Trans Syst Sci Cybern.1968;4:227–241.

54. Joe H. Multivariate Models and Dependence Concepts. CRC Press,London, 1997.

55. Malevergne Y, Sornette D. Multivariate Weibull distributions forasset returns: I. Finance Lett. 2004;2:16–32.

56. Miller DJ, Liu W. On the recovery of joint distributions from lim-ited information. J Econ. 2002;107:259–274.

Manuscript received Aug. 12, 2010, and revision received Mar. 11, 2011.


Dynamic Risk Analysis Using Alarm Databases to Improve Process …€¦ · · 2017-11-07the...

Documents

Transcript of Dynamic Risk Analysis Using Alarm Databases to Improve Process …€¦ · · 2017-11-07the...