Post on 05-Feb-2020
© 2013 Carnegie Mellon University
Analyzing a Multi-Legged Argument UsingEliminative Argumentation
John B. GoodenoughCharles B. WeinstockAri Z. KleinNeil Ernst
December 2013
2Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Copyright 2013 ACM
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
This material has been approved for public release and unlimited distribution.
This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at permission@sei.cmu.edu.
DM-0000795
3Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Multi-Legged Argument
Informal definition• “Independent” evidence and argument supporting the same claim, e.g.,
proving and testing
How much confidence does each leg contribute?
How can independence of the legs be determined?
First, what does it mean to have confidence in a claim?
4Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Gaining Confidence in a Claim
A classic philosophical problem: • How should evidence be used to evaluate belief in a hypothesis?
Use Induction• Enumerative: Support increases as confirming instances are found• Eliminative: Support increases as reasons for doubt are eliminated
5Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Gaining Confidence in a Claim
A classic philosophical problem: • How should evidence be used to evaluate belief in a hypothesis?
Use Induction• Enumerative: Support increases as confirming instances are found• Eliminative: Support increases as reasons for doubt are eliminated
Using past experience as the basis for predicting future behavior
?
6Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Gaining Confidence in a Claim
A classic philosophical problem: • How should evidence be used to evaluate belief in a hypothesis?
Use Induction• Enumerative: Support increases as confirming instances are found• Eliminative: Support increases as reasons for doubt are eliminated
?
Power? Bulb OK?Wired?
Confidence increases as doubts are eliminated
7Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Eliminative Argumentation
Multi-Legged Arguments• How much confidence does each leg contribute?
– Depends on the extent to which doubts are eliminated in each leg• How can independence of the legs be determined?
– Look at dependencies among the doubts
An eliminative argument is visualized in a confidence map, which shows reasons for doubt graphically.
An eliminative argument shows reasons for doubting an argument’s conclusion and why
those doubts are eliminated
8Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
C2.3
Light bulb is functional
Ev3.1Examination results
showing bulb doesn'trattle when shaken
C2.2
Power is available
C2.1
Switch is connected
C1.1
Light turns on
Make doubts and inference rules explicit in a confidence map
9Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Lightbulb CM with rebutting defeaters
R2.2
Unless no poweravailable
C1.1
Light turns on
R2.3
Unless bulb isdead
R2.1
Unless switch isnot connected
IR2.4
If these reasons forfailure are eliminated,
the light will turn on
UC3.3Unless there are
unidentifiedreasons for failure
Rebutting Defeaters(Attack claim validity) R → ~C
10Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Lightbulb CM with IR and rebutting defeaters
R2.2
Unless no poweravailable
C1.1
Light turns on
R2.3
Unless bulb isdead
R2.1
Unless switch isnot connected
IR2.4
If these reasons forfailure are eliminated,
the light will turn on
UC3.3Unless there are
unidentifiedreasons for failure
Inference Ruleasserts: ~R → C
11Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Lightbulb CM with UC, IR, and R
Undercutting Defeater(Attacks rule sufficiency)
R2.2
Unless no poweravailable
C1.1
Light turns on
R2.3
Unless bulb isdead
R2.1
Unless switch isnot connected
IR2.4
If these reasons forfailure are eliminated,
the light will turn on
UC3.3Unless there are
unidentifiedreasons for failure
15Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Lightbulb CM with IR3.2 and UC
Ev3.1Examination results
showing bulb doesn'trattle when shaken
UC4.2
Unless bulb is notincandescent type
UM4.1
But the examineris hard of hearing
R2.3
Unless bulb isdead
IR3.2
If bulb doesn't rattle whenshaken, bulb is good
16Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Eliminative Argumentation
Finding doubts• Attack claim (rebutting defeater)
– Why claim may be false
• Attack evidence (undermining defeater)– Why evidence may be invalid
• Attack inference (undercutting defeater)– Premise ok; conclusion uncertain
R2.3Unless bulb is
dead
UM4.1But the examiner is
hard of hearing
UC4.2Unless bulb is notincandescent type
17Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
A Multi-Legged Argument Example
Given a system reliability claim:• pfd < 10-3 (with 99% certainty)
What evidence and argument will give confidence in this claim?• 4603 random successful test executions would be necessary and sufficient
But suppose we can only execute 4000 tests?• How much confidence in the claim in this case? Why?• What other evidence could increase our confidence in the claim?• Static analysis? How much would confidence increase? Why?
Example is based on a multi-legged argument suggested by Bloomfield and Littlewood (2003)
18Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Multi-Legged AC
S2.1Argue using
statistical testingresults
Cx1.1aThe system is acceptably
reliable if pfd < 0.001 (with 99%statistical certainty)
C1.1
The system is acceptablyreliable
C3.1No failures have been observed
in a sequence of 4603operationally random test
executions
Ev4.1
A sequence of 4000operationallyrandom tests
showing no failureoccurences
S2.2Argue over absence of
statically detectablecoding errors
Ev3.2
Static analysisresults showing nostatically detectable
coding errors
C3.2
The code contains nostatically detectable
coding errors
Cx3.1a
Littlefield &Wright 1997
23Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
ML CM with prob calculation
Ev3.1
4000 operationallyrandom tests showingno failure occurences
0.551.00J
J2.1a
One failure in 4603 orfewer executions is
sufficient to contradict theclaim [Littlewood & Wright
1997]
IR2.2If no failures are observed in asequence of 4000 operationallyrandom test executions, then the
system is acceptably reliable
Cx1.1aThe system is
acceptably reliable if pfd< 0.001 (with 99%statistical certainty)
C1.1The system is acceptably
reliable
1.00
R2.1Unless at least one failure isobserved in a sequence of
4000 (or fewer) operationallyrandom test executions
0.55
0.55
UC3.2
Unless fewer than 4603operationally random testsare executed successfully
0.55 = (1 – .001)603
25Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Validity of Ev3.1
Ev3.1
4000 operationallyrandom tests showingno failure occurences
0.551.00J
J2.1a
One failure in 4603 orfewer executions is
sufficient to contradict theclaim [Littlewood & Wright
1997]
IR2.2If no failures are observed in asequence of 4000 operationallyrandom test executions, then the
system is acceptably reliable
Cx1.1aThe system is
acceptably reliable if pfd< 0.001 (with 99%statistical certainty)
C1.1The system is acceptably
reliable
1.00
R2.1Unless at least one failure isobserved in a sequence of
4000 (or fewer) operationallyrandom test executions
0.55
0.55
UC3.2
Unless fewer than 4603operationally random testsare executed successfully
Ev3.1
4000 operationallyrandom tests showingno failure occurences
1.00
UM4.3But the oraclesometimes
misclassifiesfailed tests as
successes
UM4.2
But the testselection
process is notrandom
UM4.1
But theoperational
profile isinaccurate
27Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Statistical CM with joint calc
Ev3.1
4000 operationallyrandom tests showingno failure occurences
0.551.00J
J2.1a
One failure in 4603 orfewer executions is
sufficient to contradict theclaim [Littlewood & Wright
1997]
IR2.2If no failures are observed in asequence of 4000 operationallyrandom test executions, then the
system is acceptably reliable
Cx1.1aThe system is
acceptably reliable if pfd< 0.001 (with 99%statistical certainty)
C1.1The system is acceptably
reliable
1.00
R2.1Unless at least one failure isobserved in a sequence of
4000 (or fewer) operationallyrandom test executions
0.55
0.55
UC3.2
Unless fewer than 4603operationally random testsare executed successfully
1.00
The defeaters in each subtree are INDEPENDENT, i.e., the truth of one defeater does not imply the truth of another. I.p.,• The validity of the rule is unaffected by whether the evidence is valid, and vice
versa
35Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
0.80 for R2.3
0.200.20
Cx1.1aThe system is acceptablyreliable if pfd < 0.001 (with99% statistical certainty)
UC3.5Unless non-staticcoding errors are
present that increasethe pfd
UC3.4Unless there is no
basis fordetermining how
much pfd is reducedby the presence of
statically detectablecoding errors
UC3.6Unless designerrors exist thatincrease the pfd
0.80
Ev3.3
Static analysis resultsshowing no static coding
errors
0.80
0.16
C1.1The system is acceptably
reliable
R2.3Unless there are
statically detectablecoding errors
IR2.4If there are no statically detectablecoding errors, then the system is
acceptably reliable
UM4.5But the static analysis
overlooked somestatically detectable errors
0.80
37Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Calculation of static error confidence
0.200.20
Cx1.1aThe system is acceptablyreliable if pfd < 0.001 (with99% statistical certainty)
UC3.5Unless non-staticcoding errors are
present that increasethe pfd
UC3.4Unless there is no
basis fordetermining how
much pfd is reducedby the presence of
statically detectablecoding errors
UC3.6Unless designerrors exist thatincrease the pfd
0.80
Ev3.3
Static analysis resultsshowing no static coding
errors
0.80
0.16
C1.1The system is acceptably
reliable
R2.3Unless there are
statically detectablecoding errors
IR2.4If there are no statically detectablecoding errors, then the system is
acceptably reliable
UM4.5But the static analysis
overlooked somestatically detectable errors
0.80
These defeaters are INDEPENDENT, i.e., the truth of one defeater does not imply the truth of another. I.p.,• The validity of the rule is unaffected by whether the evidence is valid, and vice
versa
41Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Calculation of multi-leg confidence0.62 = 1 – (1 – 0.55)(1 – 0.16)
0.55
0.55
R2.1
Unless at least onefailure is observed ina sequence of 4000
(or fewer)operationally random
test executions
IR2.2If no failures are
observed in asequence of 4000
operationally randomtest executions, then
the system isacceptably reliable
C1.1The system is acceptably
reliable
0.62Cx1.1a
The system isacceptably reliable if pfd
< 0.001 (with 99%statistical certainty)
R2.3
Unless there arestatically
detectablecoding errors
IR2.4
If there are no staticcoding errors, then
the system isacceptably reliable
0.551.00
0.55 0.16
0.80 0.20
Each leg is independent of the other because defeaters in each leg are independent:• The truth of a defeater in one leg does not imply the truth of a defeater in an
“independent” leg, i.e.,– The validity of an argument in one leg is unaffected by defects in another
leg
0.62 = 1 – (1 – 0.55)(1 – 0.16)
42Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Multi-legged argument definitions
“Standard” definition• “Independent” evidence and argument supporting the same claim, e.g.,
proving and testingEliminative argument definition• Two or more argument legs rooted at claim C whose defeaters are
independent– Two defeaters are independent if the truth of one defeater does not affect
the truth of the other– An argument leg is defined by an inference rule connecting rebutting
defeaters to claim C
R2.1
Unless at least onefailure is observed ina sequence of 4000
(or fewer)operationally random
test executions
IR2.2If no failures are
observed in asequence of 4000
operationally randomtest executions, then
the system isacceptably reliable
C1.1The system is acceptably
reliable
0.62Cx1.1a
The system isacceptably reliable if pfd
< 0.001 (with 99%statistical certainty)
R2.3
Unless there arestatically
detectablecoding errors
IR2.4
If there are no staticcoding errors, then
the system isacceptably reliable
0.551.00
0.55 0.16
0.80 0.20
43Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Protection Against Argumentation Error
A multi-legged argument is more robust• Because defeaters are independent
– If one leg is defective (i.e., if some defeater is true), the other leg still provides some reason to believe the parent claim
– Argument is more likely to hold up in the future as more info (making a defeater true) becomes available
44Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Reduction in Doubt
Each leg attacks alternative (top-level) reasons for doubt• Alternative: different (top-level) defeaters and inference rules, i.e.,
– (/\~Ri) → C, (/\~Rj) → C where Ri ≠ Rj
– Defeaters in each leg are independentProbability that at least one leg is valid is 1 – (prob no leg is valid)• Prob no leg is valid: ∏(1 – Li)• Assumes Li are independent• Li = 0 implies information from that leg does not increase confidence in parent
claim
45Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Summary
How much confidence does each leg contribute?• Depends on the extent to which defeaters are eliminated in a given leg
– Various rules can be used to determine partial defeater elimination
How can independence of two legs be determined?• By determining that doubts are not shared among the legs
– i.e., the truth of a defeater in one leg does not imply the truth of a defeater in the other leg
46Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Summary
Eliminative argumentation (identification of doubts and their elimination)provides a framework for building confidence in an argument and in properties of a system• Confidence maps are a visualization of an eliminative argument• They explicitly document reasons for doubt and their elimination
An assurance case provides an argument asjustification for a claim
We seek to provide justification for belief in a claim
We do so by identifying and eliminating defeaters (doubts) relevant to the claim and the argument
47Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Contact Information
John B. GoodenoughSEI Fellow (retired)Telephone: +1 412-268-6391Email: jbg@sei.cmu.edu
U.S. MailSoftware Engineering Institute4500 Fifth AvenuePittsburgh, PA 15213-2612USA
Charles B. WeinstockSenior Member of the Technical StaffTelephone: +1 412-268-7719Email: weinstock@sei.cmu.edu
Ari Z. KleinPh.D. Candidate – RhetoricTelephone: +1 412-268-7700Email: azklein@sei.cmu.edu
48Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
OBJECTIONS
49Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Objections
What if a relevant defeater has not been identified?
What if a defeater cannot be completely eliminated?
Not all defeaters are of equal importance. How is this handled?
Eliminative induction (Baconian probability) seems rather weak (compared to Bayesian probability or enumerative induction). What is being gained (and lost) with this approach?
The potential number of defeaters seems incredibly large for a real system. Is this approach practical?
50Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Objections
What if a relevant defeater has not been identified?
What if a defeater cannot be completely eliminated?
Not all defeaters are of equal importance. How is this handled?
Eliminative induction (Baconian probability) seems rather weak (compared to Bayesian probability or enumerative induction). What is being gained (and lost) with this approach?
The potential number of defeaters seems incredibly large for a real system. Is this approach practical?
51Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
What if there is a defeater is unidentified?
Assurance cases are inherently defeasible; always possible something has been omitted• Complete confidence (n|n) only reflects what is known at a particular point in
timeUncertainty about completeness is itself a reason for doubt that needs to be recognized and countered• “Not all hazards have been identified”• Assessment of a case must consider this as a reason for doubting the
adequacy of the case– Eliminative argumentation provides a method for identifying where sources
of doubt can be foundEliminative argumentation provides ways of thinking about and explaining why one should have confidence in a case, or a claim• The approach does not, of course, guarantee a sound case• But helps in developing sound and complete arguments
52Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Objections
What if a relevant defeater has not been identified?
What if a defeater cannot be completely eliminated?
Not all defeaters are of equal importance. How is this handled?
Eliminative induction (Baconian probability) seems rather weak (compared to Bayesian probability or enumerative induction). What is being gained (and lost) with this approach?
The potential number of defeaters seems incredibly large for a real system. Is this approach practical?
53Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Incomplete Defeater Elimination
We have addressed this in our examplesWe accept that in practical cases, there will always be some residual doubt• The issue is whether the remaining doubts are considered significant or not
The general principle is that uneliminated lower level doubts propagate to higher level claims• The goal is to formulate lower level defeaters that can be eliminated by
appropriate evidence and inference rules
54Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Objections
What if a relevant defeater has not been identified?
What if a defeater cannot be completely eliminated?
Not all defeaters are of equal importance. How is this handled?
Eliminative induction (Baconian probability) seems rather weak (compared to Bayesian probability or enumerative induction). What is being gained (and lost) with this approach?
The potential number of defeaters seems incredibly large for a real system. Is this approach practical?
55Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Differential Defeater Importance
The elimination of some defeaters seems more important (in some intuitive sense) than others. A strict eliminative induction (Baconian) approach treats all uneliminated defeaters equally.• Consider hazards identified in a safety analysis. All above a certain threshold
must be eliminated/mitigated– Assessing their relative importance/likelihood is not profitable
This is a current subject of research
56Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Objections
What if a relevant defeater has not been identified?
What if a defeater cannot be completely eliminated?
Not all defeaters are of equal importance. How is this handled?
Eliminative induction (Baconian probability) seems rather weak. What is being gained (and lost) with this approach?
The potential number of defeaters seems incredibly large for a real system. Is this approach practical?
57Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Why Use Eliminative Argumentation
With eliminative argumentation, we learn something concrete about why a system works• With enumerative induction, we at best only learn something statistical
(although this can be valuable knowledge)
Eliminative argumentation avoids “confirmation bias”• To the extent evidence eliminates defeaters, we know an argument cannot be
invalid for all situations covered by these defeaters
58Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Objections
What if a relevant defeater has not been identified?
What if a defeater cannot be completely eliminated?
Not all defeaters are of equal importance. How is this handled?
Eliminative induction (Baconian probability) seems rather weak (compared to Bayesian probability or enumerative induction). What is being gained (and lost) with this approach?
The potential number of defeaters seems incredibly large for a real system. Is this approach practical?
59Analyzing Multi-Legged ArgumentsGoodenough, Dec 2013© 2013 Carnegie Mellon University
Practical Considerations
The amount of evidence and argument for a real system is inherently quite large• Can eliminative argumentation provide a more thorough and cost-effective
basis for developing confidence in system behavior?• Are assurance efforts more effective and focused?
More research is needed