Measures of Conﬂicting Evidence in Bayesian Networks for...

Measures of Conflicting Evidence inBayesian Networks for Classification

Max KrugerHochschule Furtwangen University (HFU)

Robert-Gerwig-Platz 1, D-78120 Furtwangen, GermanyE-mail: [email protected]

Abstract— Classification selects one out of finitely many classesbased on available input and is applied to many kinds of applica-tion areas, e.g., diagnosis, monitoring, scoring, and pattern recog-nition. Often, classification is accomplished by use of BayesianNetworks which operate on evidence provided by sources, andcalculate probabilities of classes as outcome. Sometimes, pieces ofevidence from several sources provide substantially different butreliable information, indicating disparate classes. Such conflictingevidence has different reasons, e.g., measurement flaws, andcan result in misclassification with severe consequences. Domainexperts typically have a good intuitive understanding of evidencethat should be judged as conflicting in given applications. Basedon Bayesian Network literature, five measures for conflictingevidence are depicted and discussed. In an air defense scenario,these conflict measures are compared with domain experts’intuitive understanding of conflicts. Some aspects of configurationof such measures are discussed. Finally, coherence of differentmeasures of conflicting evidence is surveyed in a second, maritimescenario.

Keywords: Conflict measure, Bayesian Network, con-flicting evidence, classification, identification, configuration,maritime and air surveillance systems

I. INTRODUCTION

Classification can be defined as ”[...] a decision or forecast [...]on the basis of currently available information [...], in whicheach new [classification] case must be assigned to one of aset of predefined classes on the basis of observed attributesor features” [1, p. 1]. Classification has manifold applications,e.g., technical and medical diagnosis and monitoring, creditscoring, and pattern recognition, see [1], [2]. In civil andmilitary surveillance systems, classification is also referred toas identification, i.e. the assignment of an identity to a trackedobject, see [3], [4], [5]. Most of these applications have astrong need for automated assistance, compare [6].

Bayesian Networks provide an established framework forclassification tasks, see e.g., [2, ch. 2.11], [7, ch. 8]. Differenttechnical and non-technical sources provide object’s observedattributes or features (so-called evidence, findings or sourcedeclarations) as input for a Bayesian Network, which is thenused to calculate probabilities of classes as outcome [7, ch. 8].If pieces of evidence from several sources provide substantiallydifferent, reliable information strongly indicating disparateclasses, they are referred to as conflicting evidence, compare[7, ch. 8].

In context of Bayesian Networks for classification, this pa-per compares different approaches to measure (or even define)

conflicts of evidence with each other and with the intuitiveunderstanding of conflicts by domain experts. It extends ourprevious work in [8].

The outline of this paper is as follows: In section II appli-cation of Bayesian Networks for classification is described.Conflicting evidence and different measurement approachesare introduced and discussed in section III. In an air surveil-lance application example, in section IV these approachesare compared with domain experts’ intuitive understandingof conflicts. Section V discusses some aspects of configuringconflict measures for application. In the following section VI,coherence of the different conflict measures is studied in asecond application scenario taken from maritime surveillance.Finally, section VII outlines conclusions and future work.

II. BAYESIAN NETWORKS FOR CLASSIFICATION

Formally defined according to [7, p. 265], a classifier is afunction cl : D1 × ... ×DN → C, where Di is the finite setof all evidence that source Si can declare and C denotes thefinite set of possible classification results, i.e. classes.

Following [2, p. 61], a simple but quite well workingclassification tool is given by the Naıve-Bayes’ Rule and usesthe Theorem of Bayes

p(ci|d1, ..., dN ) =p(d1, ..., dN |ci) · p(ci)

N∑j=1

p(d1, ..., dN |cj) · p(cj)(1)

to calculate the posterior probabilities of all possible classi-fication results c1, ..., cK ∈ C, given the declared pieces ofevidence d1, ..., dN from N sources. A classification resultcan be determined by choosing the class ci with the highestposterior probability (Bayesian Decision Rule, [2, p. 23]) orby minimization of misclassification costs, see [1, pp. 13-14].

The Naıve-Bayes Rule can be replaced by a BayesianNetwork, which is a graphical probabilistic model, see [7],[9], [10]. According to [7, pp. 33-34], a Bayesian Network isdefined as a Directed Acyclic Graph (DAG). Each node repre-sents a discrete random variable Ai with i ∈ {1, ..., N}, beingin one out of a finite number of node-dependent, mutuallyexclusive states ai,1, ..., ai,Ni , i.e. Ai = ai,j . Dependenciesbetween these variables are modeled by directed edges withcauses as parent nodes and effects as child nodes. Let pa(Ai)denote the parent node set of node Ai. Every node Ai

owns a Conditional Probability Table (CPT), containing theconditional probabilities

p(Ai = ai,j |Ap1 = b1, ..., ApK = bK) (2)

for all (given) state combinations b1, ..., bK of all parent nodesAp1 , ..., ApK ∈ pa(Ai) of node Ai. Formally, a declared pieceof evidence Ai = ai is an ascertainment of a node variable Ai

being in a particular state ai.

Nodes of a Bayesian Network can be grouped into threetypes: query, evidence and intermediary nodes [10, pp. 84-85].The user is interested in the (unknown) states of query nodesbut gains evidence only for some evidence nodes. Intermediarynodes connect other nodes and are possibly unobservable, butuninteresting for users. Typically, they result from a traceable,correct modeling. For classification applications, classes aremodeled by query nodes Ci and the evidence nodes D1, ..., DN

correspond to the observable features provided by sources.Note that unlike in Naıve-Bayes, in more general BayesianNetworks resulting classes can be modeled by use of acombination C1 × ...× CL of query nodes.

Every Bayesian Network embodies a unique joint probabil-ity distribution p(A1, ..., AN ) of its node variables. Following[7, pp. 36-37], the joint probabilities can be determined by useof the Chain Rule for Bayesian Networks:

p(A1, ..., AN ) =

N∏i=1

p(Ai| pa(Ai)) . (3)

In Bayesian Networks, in general the states of node variablesare not known. If sources provide pieces of evidence for somenodes, their variable states are unveiled and because of nodedependencies, state probabilities of other nodes must be up-dated. Classification with Bayesian Networks is based on thisinference mechanism between source nodes and class nodes.For that purpose, in a Bayesian Network for classificationposterior probabilities p(C1, ..., CL|D1 = d1, ..., DN = dN )of states of class variables C1, ..., CL are calculated for givenevidence on nodes D1, ..., DN . More details on BayesianNetworks can be found in [7], [9], [10].

III. CONFLICTING EVIDENCE

Essentially, domain experts define ’conflicting evidence’ asdissonant information from several sources. In doing so, ’disso-nance’ is understood as ”[...] the extent to which information isexplicitly contradictory or conflicting” [11, criterion II.8.2.1.5].Intuitively, such definitions appear very plausible on that level,but they are too vague for use on technical level in BayesianNetworks. Therefore starting with the definition of a surpriseindex by Habbema [12], several authors traced conflicting evi-dence into a Bayesian context, see e.g., [12], [13], [14], [15] [7,pp. 174-179]. But questions arise, how well particular conflictdefinitions reflect the domain experts’ intuitive understandingof conflicting evidence and how coherent different conflictmeasures are. We will address these questions in sections IVand VI. For the rest of this paper, we will refer to conflictingevidence shortly as conflicts.

Conflicts depend on the underlying model and fusion ap-proach [7, pp. 174-179]. E.g., only if fratricide is not modeled,a valid IFF mode 4 reply of an object is in conflict withevidence of attack on own forces. Conflicting evidence inBayesian context is a discrepancy between model and declaredpieces of evidence [7, p. 99] and can have different reasons: (i)Facing rare cases, (ii) situations not covered by the underlyingBayesian network model as well as (iii) flaws or inaccuracies

of sensor measurements and raw data evaluation [7, pp. 174-179], [14]. In any of these cases, the classification process withBayesian Networks is not reliable anymore and users need tobe informed in order to judge the overall reliability of resultingclasses [6], [8].

Each conflict measure might serve as conflict definition inBayesian Networks, but we understand the depicted domainexperts’ definition as basis, without having a precise definitionon (technical) Bayesian Network level. Therefore in the fol-lowing, different approaches of describing conflicts are ratherseen as conflict measures than as conflict definitions.

Conflicts between a source and all other active sources areconsidered in the following in order to locate for examplefailure of a particular source [8]. In this paper we excludeconjoined-sources comparison, as performed for example in[7, pp. 174-179]. Nevertheless, the concepts are expected tobe easily exchangeable. Subsequently, all explained conflictmeasures use likelihood vectors. Declared evidence of theparticular source Si in focus is converted into a SourceLikelihood Vector SLVi := (p(di|cj))j=1,...,M and evidenceof all other sources is combined into the Combined Likeli-hood Vector CLV−i := (p(d−i|cj))j=1,...,M . Here we defined−i := (d1, ..., di−1, di+1, ..., dN ) for convenient notation,denote C := C1 × ... × CL for combination of all queryvariables C1, ..., CL, and by c1, ..., cM list all (combined) statesof C. Note that scaling of likelihood vectors preserves conflict-relevant information.

Conflict measures to be defined should comply with somerequirements. Despite the fact that a definition of conflictingevidence on domain expert’s level is used, certain (soft)requirements for conflict measures can be formulated on levelof Bayesian Networks [8]:

(R1) Equal likelihood vectors should not conflict with eachother, since they carry the same information.

(R2) No conflict should be between an approximated-uniform likelihood vector (i.e. p(di|cj) ≈ 1

M orp(d−i|cj) ≈ 1

M for all j) and any other likelihood vec-tor, since uniform likelihood vectors carry no conflict-relevant information.

(R3) Conflicts between pieces of evidence should not de-pend on prior probabilities p(c) of classes c.

(R4) Most of conflict-relevant information between like-lihood vectors is coded in the highest-valued class-components of at least one likelihood vector. Adding anew disparate class CM+1 with very low probabilitiesp(di|cM+1), p(d−i|cM+1), p(d1, ..., dN |cM+1) shouldnot annihilate a conflict. Such addings could result,e.g., from a fragmentation of existing classes with suchprobabilities.

At least requirement (R3) differs from an established approach(see [7, pp. 174-179]) and might be subject to discussion. In therest of this section, different conflict measures are describedwhich will be subsequently compared in sections IV andVI. The surprise index [12] is considered as computationallyintractable [14], so we exclude it from the comparison.

A. Distance Conflict Measures

Using distance measures is an obvious approach to defineconflict measures. If the distance between the likelihood vec-tors SLVi and CLV−i exceed a given threshold ε, conflictingevidence is indicated. In [16] the use of the taxicab distance isproposed: For two scaled likelihood vectors X = (x1, ..., xM )and Y = (y1, ...yM ) a conflict is indicated by the Taxicab

Conflict Measure, iffM∑i=1

|xi − yi| > εcab is true for a preset

threshold εcab.

Alternatively, Euclidean Conflict Measure can be used,which indicates a conflict between two scaled likelihood

vectors X , Y , iffM∑i=1

(xi − yi)2 > εEuc holds for a preset

threshold εEuc.

Obviously, every distance conflict measure fulfills require-ment (R1), because the distance between two equal vec-tors is zero. But Taxicab and Euclidian Conflict Measureboth need unrealistic high values of εdist = 1.66567 orεEuc = 0.912323 for declaring no conflict between likelihoodvectors X =

(999510000 ,

110000 ,

110000 ,

110000 ,

110000 ,

110000

)and

Y =(

16 ,

16 ,

16 ,

16 ,

16 ,

16

). So both distance conflict measures

do not comply with requirement (R2).

B. Thresholds Conflict Measure

This conflict measure is based on upper and lower thresholdsand is also proposed in [16]: According to this ThresholdsConflict Measure, two likelihood vectors are indicated asconflicting, iff indices i, j with i �= j exist, where at leastone of the following three conditions

(C1) xi > εup and yj > εup , or

(C2) xi > εup and yi < εlow , or

(C3) yi > εup and xi < εlow .

is met for preset upper threshold εup and lower threshold εlow.Note, that the Thresholds Conflict Measure fulfills requirement(R2) if not miss-configurated. But it violates requirement (R1),compare [8]: For εup < 0.5 and arbitrary εlow, the equallikelihood vectors X = Y = (0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0) arein conflict according to condition (C1), whereas εup = 0.499is a high value for an upper threshold.

C. Probabilistic Coherence Conflict Measures

An established conflict measure is according to [7, pp. 175-176], [14]: For declared pieces of evidence d1, ..., dM a conflictis indicated, iff

log2

(p(d1) · ... · p(dM )

p(d1, ..., dM )

)> 0 . (4)

The underlying idea is simple: In a coherent situation thatis covered by the model, correct declared pieces of evidencesupport each other and are positively correlated [7, p. 175]. Wekeep this basic idea but apply several changes [8]: Applicationof log2 is omitted and a threshold εcoh is introduced in order

to suppress small fluctuations. Applied to a particular sourceSi and the combination of all other sources line (4) turns into(

M∑

j=1

p(di|cj)·p(cj))

·(

M∑

j=1

p(d−i|cj)·p(cj))

M∑

j=1

p(d1,...,dN |cj)·p(cj)> (1 + εcoh) . (5)

Finally, we use(

1M

)j=1,...,M

instead of (p(cj))j=1,...,M asprior probabilities in line (5). This is in order to comply withrequirement (R3), because we do not want a bias by a prioriprobabilities of classes. The resulting conflict measure [8] iscalled Coherence Conflict Measure and indicates conflictingevidence, iff for preset threshold εcoh > 0 the followingcondition is true:(

M∑j=1

p(di|cj))·(

M∑j=1

p(d−i|cj))

M∑j=1

p(d1, ..., dN |cj)> (1 + εcoh) ·M. (6)

Obviously,M∑j=1

p(d1, ..., dN |cj) needs to be calculated in addi-

tion to the likelihood vectors SLVi and CLV−i.

The Coherence Conflict Measure complies with require-ment (R3), but not with (R4) as line (6) shows for increasingnumber of classes M . This problem is due to our modificationof using

(1M

)j=1,...,M

instead of (p(cj))j=1,...,M as priorprobabilities.

To reinforces requirement (R4) we modify the CoherenceConflict Measure [8]: For likelihood vectors SLVi and CLV−i

we assume w.l.o.g. for all k = 1, . . . ,M − 1:

max

{p(di|ck+1)M∑

j=1

p(di|cj),p(d−i|ck+1)M∑

j=1

p(d−i|cj)

}

≤ max

{p(di|ck)M∑

j=1

p(di|cj),p(d−i|ck)M∑

j=1

p(d−i|cj)

}.

(7)

The Extended Coherence Conflict Measure then indicatesconflicting evidence, iff it exists M with 2 ≤ M ≤ M and(

M∑j=1

p(di|cj))·(

M∑j=1

p(d−i|cj))

M∑j=1

p(d1, ..., dN |cj)> (1 + εext) · M (8)

for preset threshold εext > 0 [8]. By its construction, theExtended Coherence Conflict Measure obviously fulfills re-quirement (R4).

For the sake of a short denotation, we subsequently referto conflict measures simply as measures. Because only conflictmeasures are considered, there should be no confusion.

IV. CONFLICT MEASURES VERSUS EXPERT’S INTUITION

The leading definition of conflicting evidence in section IIIwas given on domain experts’ level, but all described conflictmeasures operate on technical Bayesian Network level. Sothe question arises, how well the defined conflict measures

Table I. NUMBER OF DEVIATIONS FOR CONFLICT MEASURE APPROACH VS. INTUITION

Conflict measure: Taxicab Measure Euclid Measure Thresholds Measure Coherence Measure Extended Coherence Measure

Parameter: εcab = 0.41 εEuc = 0.4765εup = 0.37εlow = 0.16

εcoh = 0.052 εext = 0.248

lowest source uncertainty ( 40 / 40 / 0 ) ( 35 / 35 / 0 ) ( 6 / 3 / 3 ) ( 5 / 3 / 2 ) ( 7 / 7 / 0 )lower source uncertainty ( 31 / 31 / 0 ) ( 25 / 25 / 0 ) ( 6 / 3 / 3 ) ( 5 / 3 / 2 ) ( 5 / 5 / 0 )below medium source uncertainty ( 15 / 15 / 0 ) ( 10 / 10 / 0 ) ( 6 / 3 / 3 ) ( 5 / 3 / 2 ) ( 3 / 3 / 0 )above medium source uncertainty ( 3 / 3 / 0 ) ( 2 / 2 / 0 ) ( 6 / 3 / 3 ) ( 4 / 1 / 3 ) ( 3 / 3 / 0 )higher source uncertainty ( 7 / 1 / 6 ) ( 11 / 1 / 10 ) ( 6 / 3 / 3 ) ( 4 / 1 / 3 ) ( 2 / 2 / 0 )highest source uncertainty ( 38 / 0 / 38 ) ( 49 / 0 / 49 ) ( 10 / 3 / 7 ) ( 6 / 1 / 5 ) ( 5 / 2 / 3 )

Sum for all uncertainty levels ( 134 / 90 / 44 ) ( 132 / 73 / 59 ) ( 40 / 18 / 22 ) ( 29 / 12 / 17 ) ( 25 / 22 / 3 )

Explanation: ( x / y / z ) denotes total number x of deviations, sum y of false positives, and sum z of false negatives.

reflect domain experts’ intuitive understanding. To address thisquestion an experimental comparison, that we conducted in[8] is extended in terms of considered conflict measures andviewed results:

A. Aerial Application Scenario

Based on a fictive technical and operational scenario in airdefense, we asked domain experts to judge combinations oftwo pieces of evidence from different sources, whether theywould consider a particular combination of evidence as con-flicting or not. In the given operational scenario in [8], trackedobjects are to be classified using the following 6 classes: ’OwnForce Military and Civil’, ’Not-Aligned Military and Civil’,and ’Enemy Force Military and Civil’ [16]. Classification isconducted by Bayesian means and on basis of evidence from10 different sources:

• IFF Mode 4 (Identification Friend or Foe),

• PPLI Link 16(Precise Participant Location and Identification),

• Max Speed,

• Civil Flight Plan,

• Military Mission Plan,

• Hostile Attack,

• Jamming,

• ESM (Electronic Support Measures),

• Visual Classification, and

• Visual Identification.

B. Comparison with Expert’s Intuition

Numerical configuration data of the scenario in [8] were notknown to the experts, so their ratings were rather intuitive, butanyway also very consistent. In a predefined Bayesian configu-ration data set with different source/sensor measurement uncer-tainty levels, every combination of two pieces of evidence fromdifferent sources correspond to a pair of likelihood vectorsSLVi and CLV−i. Altogether, we have set of 804 versatile testcases, and for each case an operational experts’ judgement onconflicting or not. Almost every combined experts’ rating wasunmistakable. Note that 450 cases were marked as conflicting.To this set of test cases we apply the conflict measures andcompare their result with the experts’ judgement in terms of

total number of deviation, sum of false positive and sum offalse negative deviations [8].

In order to compare conflict measures with the domain ex-perts’ intuitive understanding, for each measure we minimizedthe total number of deviations as primary criterion and thesum of false negatives as secondary criterion by calibratingthe ε-parameters. This configuration of conflict measures’parameters is described in section V. Note that the EuclidMeasure was not included in [8].

Numerical results of our comparison are shown in table I.For each conflict measure, this table lists ε-parameter(s) thatproduced minimal deviations between conflict measures andexperts’ intuition. Corresponding deviations’ sums are listedas a whole as well as itemized for different source uncertaintylevels. Here, in the parentheses the first entry is the totalnumber of deviation, the second entry gives the sum of falsepositives (i.e. falsely declared conflicts), and the third entryis the sum of false negatives (i.e. undetected conflicts) of theconsidered measure.

At a glance, Thresholds, Coherence, and Extended Co-herence Conflict Measure show a surprisingly good overallaccuracy, with total deviation rates 5.0%, 3.6%, and 3.1%.Taxicab and Euclid Measure are much worse with deviationrates 16.7% and 16.4%. The Thresholds Measure shows onlyslightly worse performance than the probabilistic coherencemeasures. Noncompliance of requirement (R4) by CoherenceConflict Measure seems to be relevant only for a larger numberM of classes, Coherence and Extended Coherence Measureshow almost the same performance.

In this scenario under the configuration used, sensitivities,i.e. portions of detected cases with conflicts (compare [17,pp. 340-343]), are 95.1%, 96.2%, 99.3% for the Thresholds,Coherence, and Extended Coherence Conflict Measure, and90.2%, 86.9% for Taxicab and Euclid Measure, respectively.The corresponding specifities, i.e. portion of cases correctlydeclared as non-conflicting (compare [17, pp. 340-343]), are95.0%, 96.6%, 93.8% for Thresholds, Coherence, and Ex-tended Coherence Measure, and 74.6%, 79.4% for Taxicaband Euclid Conflict Measure. Note that, in this scenario theprevalence [17, pp. 342-345] of conflicting evidence is 56.0%,a high value from an operational point of view.

For the distance measures, the total number of deviationsseems to depend strongly on the source uncertainty levels,whereas this dependency for the three other conflict measuresis moderate at most or not traceable at all. Both distanceconflict measures show an apparent trade-off between false

positives and false negatives, whereas for the other measures,we can not comprehend this effect, maybe due to only smalldeviation numbers in the uncertainty levels.

Besides a generally low number of total deviation, theExtended Coherence Measure shows also a very low numberof false negatives. As we discuss in section V, undetectedconflicts are more serious than a few more falsely declaredones. Calibration of the Extended Coherence Conflict Measurewith the lowered parameter εext = 0.23 provides no falsenegative for all 804 cases with only three additional falsepositives [8]. For the Coherence Measure, setting the Parameterto εcoh = 0 yields 54 false positive deviations and still 12 falsenegatives. This is an indication that in the Coherence ConflictMeasure a threshold is needed to suppress small fluctuations.

Summarizing, in this first part of the experimental compar-ison, we included an additional conflict measure and lookedat more details, compared to [8]. The different conflict mea-sures in this scenario reflected the domain expert’s intuitiveunderstanding of conflicting evidence from moderate well upto very well, if configured optimally. Aspects of finding a goodconfiguration are discussed in the following section.

V. ASPECTS OF APPLICATION

Appropriate choice of ε-parameter(s) is a crucial factor ofperformance for all considered conflict measures, as our ex-perience from the comparison in section IV pointed out. Inorder to optimize the ε-parameter(s) we need input fromthe domain experts’ level, which combinations of pieces ofevidence should be seen as conflicts. There are at least threedifferent options of configuration, i.e. how to provide thisinput: Exemplary conflicts, conflict ratio, and conflict learning.

Exemplary Conflicts:

In this approach, domain experts provide an exemplary setof conflicting evidence according to their intuitive judgement.Based on this input, ε-parameter(s) can be optimized in termsof total deviation as primary and sum of false negatives assecondary criterion. Usually, this exemplary set covers only asmall subset of all (intuitive) conflict cases, so some marginshould be left in the ε-parameters in order to prevent overfittingto these special cases.

Minimization of the sum of false-negative and false-positive deviations contrarily depend on each other. Basedon possible reasons for conflicting evidence as described insection III, a false negative seems far more severe than afalse positive. Users might have no other indication of apotential misclassification. In contrast, false positives cause auser simply to question the classification case critically, butonly as long as there are not to many of them. By choice ofnumber of total deviations as primary criterion, the sum offalse positives is limited anyway.

The exemplary-conflicts configuration approach requiresdefinition of conflicts by hand for a number of cases nottoo small, and therefore it is costly. For our experimentalcomparison in section IV this approach has been applied withthe modification, that we defined conflicts based on every(relevant) combination of two pieces of evidence. Then themargin for avoiding overfitting can be set to zero.

Figure 1. Littoral area at Scott Islands (taken from [18])

Conflict Ratio:

Alternatively, a conflict ratio for all combinations of tworelevant pieces of evidence is preselected. Then ε-parametersfor different conflict measures can be determined: Runningthrough all combinations of two pieces of evidence, the givenratio shall be met as good as possible. Note, that this taskcan be conducted automatically. The choice of an optimalratio should be based on experience in similar classificationtasks. We will use this configuration approach in our secondcomparison in section VI.

Conflict Learning:

As a third option for configuration of ε-parameters, a learningapproach can be applied. Here while applying classification,users of Bayesian Networks point out evidence, that appearto them as being false negatives or false positives. No pre-configuration is needed, but initial performance of conflictmeasures can be improved by applying an other configurationapproach in advance. Obviously, conflict learning is onlyapplicable if used by experienced classification operators.Unfortunately, they are disturbed in their main task.

Not depending on the chosen approach, configuration of theThresholds Conflict Measure is more difficult because of itsdependence on two ε-parameters. Straight-forward calibrationof this conflict measure can result into ignorance of a conflictsubtype (C1) or (C2),(C3), see subsection III-B. In particular,lower threshold εlow must not be chosen too small. In general,the conflict subtype ratio between (C1) and (C2),(C3) needs tobe monitored during parameter configuration of the ThresholdsMeasure.

VI. COHERENCE OF DIFFERENT CONFLICT MEASURES

In section IV the question was analyzed, how well conflictmeasures reflect domain experts’ operational understanding ofconflicts in the given scenario, taken from air defense. Basedon a second smugglers’ detection scenario, the coherence ofdifferent conflict measures in this example is analyzed, i.e. theconsensus on declared conflicts.

A. Maritime Application Scenario

A Ship Locating and Tracking scenario (see [20]) in contextof illegal immigration detection in a maritime environment

Figure 2. Bayesian Network for smuggler detection [18, slightly modified] (created using the GeNIe & SMILE [19])

was defined by the Evaluation of Techniques for UncertaintyRepresentation Working Group (ETURWG). Based on thisscenario a Bayesian Network for classification was constructedin [18]. We reuse this scenario for our subsequent comparison,whereas the Bayesian Network is slightly modified in orderto make calculation of conditional probabilities less time-consuming. A context summary of this classification scenariois given in [18]:

”The scenario [20] [citation number adjusted] in located inlittoral waters, exemplarily located at Scott Islands, CanadianWest Coast near Vancouver, see figure 1. A sealane from/toAsia runs in West-East direction and a major Tanker routepasses Scott Islands in North-South direction. Both routes crossin large fishing grounds. Besides cargo and oil tanker traffic,there is a lot of fishing and leisure activity in the area. Militaryand governmental vessels supervise the area.

From intelligence sources it is known, that people smugglersintent to transport illegal immigrants on cargo ships andoffload them by several trips with zodiacs from ship to coast.All vessels and boats involved in smuggling try to hide theiractivities and spoof identities by imitating fishery or leisurebehavior and use of other measures. Identification [i.e. classi-fication] task is to find the objects involved in smuggling andto discriminate them from regular commercial, fishery, privateand military/governmental traffic.” [18]

Figure 2 shows the modified Bayesian Network, generatedby using GeNIe SMILE [19], that is used for subsequentcomparison of conflict measures. The network has 33 nodeswith 37 dependencies modeled. It consists of two query nodes’Smuggling Intention (S)’ and ’Type Affiliation (C)’ markedwith (C) behind the name, 7 intermediary nodes marked with(I), and 24 evidence nodes describing sources marked with(S). Altogether, there are 10 (combined) possible classification

results and 627 test cases, i.e. combination of two (relevant)pieces of evidence from different sources. Note, that not everypiece of source evidence is relevant, i.e. that it actually canbe declared based on a source measurement. Exemplarily, aMatch of source ’AIS Spoofing (S)’ can be declared actually,whereas the corresponding No Match is only for modeling pur-poses and includes possible states not modeled: AIS Spoofingnot detected, not present, vessel not equipped with AIS, forexample.

B. Discussion of Coherence

There is no domain experts’ judgment as reference for con-flicting test cases in this scenario. So in order to compare theconflict measures with each other, we choose the followingprocedure: Configuration of ε-parameter(s) is conducted forpreselected conflict ratios 5%, 10%, 20%, 30%, 40% and allconflict measures. Note, that it was not always possible to findε-parameters for each conflict measure which attain the exactconflict ratio. Moreover, non-negative εcoh Coherence Measurecan only be configured up to a 38% conflict ratio

After applying each conflict measure to the common setof test cases, percentages of identical conflict indications byvarying combinations of different sources were calculated.These percentages are calculated as the cardinality of testcases with common conflicts, as identically declared by allcontributing measures, divided by the intended number ofconflicts for the selected conflict ratio. In particular, we lookedat all combinations of two, four and five conflict measures.Percentages of identical declared test cases by different conflictmeasures are listed in table II. The first column denotes, whichconflict measures are compared in this row and which not,marked as ’x’ or ’–’. Following columns each correspond to aselected conflict ratio between 5% and 40%. ’—’ denotes test

Table II. COHERENCE OF DIFFERENT CONFLICT MEASURES

Percentage of identical conflicts of different measuresConflict measures:

←−

Taxi

cab

←−

Euc

lid←−

Thr

esho

lds

←−

Coh

eren

ce←−

Ext

ende

dC

oher

ence

5%

pres

elec

ted

confl

ictra

tio

10%

pres

elec

ted

confl

ictra

tio

20%

pres

elec

ted

confl

ict

ratio

30%

pres

elec

ted

confl

ict

ratio

40%

pres

elec

ted

confl

ict

ratio

( – – – x x ) 93% 96% 92% 95% —( – – x – x ) 41% - 57% 48% - 57% 45% - 59% 53% - 64% 61% - 70%( – – x x – ) 38% - 54% 45% - 61% 42% - 57% 48% - 59% —( – x – – x ) 83% 80% 81% 74% 81%( – x – x – ) 80% 75% 79% 72% —( – x x – – ) 45% - 57% 48% - 65% 53% - 65% 62% - 76% 62% - 77%( x – – – x ) 83% 86% 81% 72% 83%( x – – x – ) 83% 85% 78% 71% —( x – x – – ) 38% - 57% 48% - 61% 48% - 59% 57% - 75% 59% - 73%( x x – – – ) 80% 89% 87% 95% 95%( – x x x x ) 38% - 54% 45% - 57% 42% - 53% 46% - 56% —( x – x x x ) 32% - 54% 45% - 56% 41% - 53% 45% - 54% —( x x – x x ) 73% 73% 73% 66% —( x x x – x ) 35% - 57% 45% - 56% 44% - 55% 48% - 56% 55% - 64%( x x x x – ) 32% - 54% 45% - 56% 41% - 53% 45% - 54% —( x x x x x ) 32% - 54% 45% - 56% 41% - 53% 45% - 54% —

Explanation: ’x’ or ’–’ denote conflict measure involved or not involved in comparison. ’—’ denotes no value available.

cases with missing configuration parameter for the CoherenceConflict Measure.

As expected intuitively, Coherence and Extended Coher-ence Measure show distinctive consensus, which test casesshould be declared as conflicting and which not. Their per-centages range between 92% and 96% of identical indicationsover all conflict ratios. The combination of Taxicab and EuclidMeasure shows a similar consensus ranging from 89% to 95%for preselected conflict ratios ≥ 10%. For 5% ratio, the consen-sus is slightly reduced to 80%, presumably because the Euclidmetric emphasizes large difference in vector components more.

Surprisingly, each distance conflict measure harmonizesquite well with each probabilistic coherence measure (per-centages: 71% - 86%). Yet, all these four measures takentogether, they still agree in their indications at least in 2 outof 3 conflicting test cases in average.

In comparisons with the Thresholds Measure involved,percentages of consensus vary up to 22% depending on theε-parameters used. Table II shows a range of percentages forthese comparisons, received from using different parametersfor configuration of εup and εlow. Depending on configuration,all five conflict measures together sometimes only agree in32%, but without the Thresholds measure, the other fourconflict measures reach at least 66% of identical conflictindications. Therefore, coherence of this measure with otherconflict measures is bad, also underlined by all one-by-onecomparisons with the Thresholds Measure involved.

For given combinations of conflict measures leaving outthe Thresholds Measure, percentages seem to be rather stableover different preselected conflict ratios with fluctuation rarelyreaching 15%. This holds in particular for the combinationof both probabilistic coherence measures. Finally, CoherenceMeasure’s underlying idea given in section III-C yields theinterpretation, that a conflict ratio of 40% is inadequate highfor this application scenario.

Concluding for this scenario, if set to same conflict ratiodifferent distance and probabilistic coherence measures agreeon indications of conflicts in most cases. This coherencesupports the assumption, that they really measure conflictingevidence as defined on domain experts’ level. In contrast,Thresholds Measure seems to be no adequate conflict measure.We assume, that the good performance of the ThresholdsMeasure in the first comparison in section IV is due to therather artificial character of the aerial scenario in sectionwith a deliberately high conflict ratio of 56% and manydistinctive conflicts. Besides the challenge of configuring it,the Thresholds Measure seems to have problems with lessdistinctive conflicting evidence.

VII. CONCLUSIONS

In this work, different measures for detection of conflicting ev-idence in Bayesian Networks for classification have been com-pared with domain experts’ intuitive understanding and witheach other in terms of coherence. In an experimental compar-ison, optimally-configured probabilistic coherence measuresreflected domain experts’ judgement very well and outpacedthe other measures. Possible approaches for configuration havebeen discussed. In the second experimental comparison a goodto very good coherence between all measures of conflictingevidence has been observed, with exception of the ThresholdsMeasure. So based on our experiments, in Bayesian Networksfor classification the Coherence Conflict Measure appears tobe a good choice if the number of classes is rather small.For larger number of classes, the Extended Coherence ConflictMeasure can be an alternative.

Next to optimizing the proceeding for configuration, futurework will need to address the following question: If applied toother Bayesian Network applications, how do domain expertsjudge evidence indicated as conflicting by probabilistic coher-ence measures. In addition, coherence of conflict measures inrandomly created Bayesian Networks will be investigated.

REFERENCES

[1] D. Michie, D. Spiegelhalter, and C. Taylor, Machine Learning, Neuraland Statistical Classification, D. Michie, D. Spiegelhalter, and C. Tay-lor, Eds. New York, London, Toronto, Sydney, Tokyo, Singapore: EllisHorwood Limited, 1994.

[2] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. NewYork, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley& Sons, Inc., 2001.

[3] D. L. Hall and S. A. McMullen, Mathematical Techniques in Multisen-sor Data Fusion, 2nd ed. Boston, London: Artech House Publishers,2004.

[4] S. Blackman and R. Popoli, Design and Analysis of Modern TrackingSystems. Boston, London: Artech House Publishers, 1999.

[5] E. Waltz and J. Llinas, Multisensor Data Fusion. Boston, London:Artech House Publishers, 1990.

[6] M. Kruger and N. Kratzke, “Monitoring of Reliability in BayesianIdentification,” in Proceedings of the 12th International Conference onInformation Fusion (Seattle (WA), USA 06 - 09 July 2009). ISIF, July2009.

[7] F. V. Jensen and T. D. Nielsen, Bayesian Networks and DecisionGraphs. New York: Springer Science, 2007.

[8] Max Kruger and David Hirschhauser, “Source Conflicts in BayesianIdentification,” in INFORMATIK 2009: Im Focus das Leben, ser. LectureNotes in Informatics (vol. 154). Gesellschaft fur Informatik e.V. (GI),2009, pp. 2485–2490.

[9] D. Koller and N. Friedman, Probabilistic Graphical Models: Principlesand Techniques. Cambridge (MA), USA: The MIT Press, 2009.

[10] A. Darwiche, Modeling and Reasoning with Bayesian Networks. Cam-bridge (MA), USA: Cambridge University Press, 2009.

[11] Evaluation of Techniques for Uncertainty Representation Work-ing Group (ETURWG), “Evaluation Criteria: II. URREF Ontology(ver. 1),” http://eturwg.c4i.gmu.edu/?q=URREF Ontology (accessed on23.02.2013), International Society for Information Fusion (ISIF), 2011.

[12] J. Habbema, “Models diagnosis and detection of diseases,” in deDombal et al., editors, Decision Making and Medical Care. North-Holland, 1976, pp. 399–411.

[13] F. Jensen, B. Chamberlain, T. Nordahl, and F. Jensen, “Analysis inHUGIN of data conflict,” in Proceedings of the 6th Conference onUncertainty in Artificial Intelligence. Boston (MA): Association forUncertainty in Artificial Intelligence, 1990, pp. 546–554.

[14] K. B. Laskey, “Conflict and suprise: Heuristics for model revision,”in Proceedings of the 7th Conference on Uncertainty in ArtificialIntelligence. Morgan Kaufmann Publishers, 1991, pp. 197–204.

[15] Y.-G. Kim and M. Valtorta, “On the detection of conflicts in diagnosticBayesian networks using abstraction,” in P. Besnard and S. Hanks,editors, Proceedings of the 11th Conference on Uncertainty in ArtificialIntelligence. Morgan Kaufmann Publishers, 1995, pp. 362–367.

[16] STANAG 4162 (edition 2): Identification Data Combining Process(IDCP), NATO Standardization Agency (NSA), Brussels, 2009, (NATOunclassified).

[17] J. L. Peacock and P. J. Peacock, Oxford Handbook of Medical Statistics.New York: Oxford University Press, 2011.

[18] M. Kruger, J. Ziegler, and K. Heller, “A Generic Bayesian Network forIdentification and Assessment of Objects in Maritime Surveillance,”in Proceedings of the 15th International Conference on InformationFusion (Singapore, 09 - 12 July 2012). ISIF, July 2012, pp. 2309–2316.

[19] Decision Systems Laboratory of the University of Pittsburgh, “GeNIesoftware package for Bayesian Networks,” http://genie.sis.pitt.edu.

[20] Evaluation of Techniques for Uncertainty Representation WorkingGroup (ETURWG), “Use case 1: Ship Locating and Tracking,”http://eturwg.c4i.gmu.edu/?q=UseCase1, International Society for In-formation Fusion (ISIF), accessed on 3rd March 2011, (Webpage andaccompanying document ”Higher-level fusion ship locating and trackingscenario for ETUR” by Pierre Valin (DRDC, Canada)).

Measures of Conﬂicting Evidence in Bayesian Networks for...

Documents

Transcript of Measures of Conﬂicting Evidence in Bayesian Networks for...