Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human...

12
Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University of Haifa, Haifa, Israel Sarah Lichtenstein and Baruch Fischhoff Decision Research, a Branch of Perceptronics Eugene, Oregon People are often overconfident in evaluating the correctness of their knowl- edge. The present studies investigated the possibility that assessment of con- fidence is biased by attempts to justify one's chosen answer. These attempts include selectively focusing on evidence supporting the chosen answer and disregarding evidence contradicting it. Experiment 1 presented subjects with two-alternative questions and required them to list reasons for and against each of the alternatives prior to choosing an answer and assessing the prob- ability of its being correct. This procedure produced a marked improvement in the appropriateness of confidence judgments. Experiment 2 simplified the manipulation by asking subjects first to choose an answer and then to list (a) one reason supporting that choice, (b) one reason contradicting it, or (c) one reason supporting and one reason contradicting. Only the listing of contradicting reasons improved the appropriateness of confidence. Correla- tional analyses of the data of Experiment 1 strongly suggested that the con- fidence depends on the amount and strength of the evidence supporting the answer chosen. e remarkable characteristic of human >ry is its knowledge of its own content, nents of confidence in the correctness 'all and recognition performance are research was supported by the Advanced ch Projects Agency of the Department of e, and was monitored by Office of Naval =h under Contract N00014-79-C-0029 (ARPA -N'o. 3668) to Perceptronics, Inc. thanks to Mark Layman and Barbara Combs ir help, and to Helmut Jungermann, Henry ornery, and Paul Slovic for comments on an draft. lests for reprints should be sent to Sarah istein, Decision Research, a branch of ronics, 1201 Oak Street, Eugene, Oregon moderately valid predictors of that per- formance (Murdock, 1966; Tulving & Thomson, 1971). Even when unable to recall an item from memory, people can estimate with some accuracy whether the item will be retrieved or recognized in sub- sequent tests (Blake, 1973; Gruneberg & Monks, 1974; Hart, 1965). Although some systematic distortions in evaluating one's own knowledge have been noted (Koriat & Lieblich, 1977), on the whole, people seem able to monitor successfully the con- tents of their memories. This ability has been regarded as important, if not in- dispensable, to the effective operation of a memory system (Hart, 1967). Copyright 1980 by the American Psychological Association, Inc. 0096-1515/80/0602-0107$00.75 107

Transcript of Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human...

Page 1: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

Journal of Experimental Psychology:Human Learning and Memory

6, No. 2 MARCH 1980

Reasons for Confidence

Asher KoriatUniversity of Haifa, Haifa, Israel

Sarah Lichtenstein and Baruch FischhoffDecision Research, a Branch of Perceptronics

Eugene, Oregon

People are often overconfident in evaluating the correctness of their knowl-edge. The present studies investigated the possibility that assessment of con-fidence is biased by attempts to justify one's chosen answer. These attemptsinclude selectively focusing on evidence supporting the chosen answer anddisregarding evidence contradicting it. Experiment 1 presented subjects withtwo-alternative questions and required them to list reasons for and againsteach of the alternatives prior to choosing an answer and assessing the prob-ability of its being correct. This procedure produced a marked improvementin the appropriateness of confidence judgments. Experiment 2 simplified themanipulation by asking subjects first to choose an answer and then to list(a) one reason supporting that choice, (b) one reason contradicting it, or (c)one reason supporting and one reason contradicting. Only the listing ofcontradicting reasons improved the appropriateness of confidence. Correla-tional analyses of the data of Experiment 1 strongly suggested that the con-fidence depends on the amount and strength of the evidence supporting theanswer chosen.

e remarkable characteristic of human>ry is its knowledge of its own content,nents of confidence in the correctness'all and recognition performance are

research was supported by the Advancedch Projects Agency of the Department ofe, and was monitored by Office of Naval=h under Contract N00014-79-C-0029 (ARPA-N'o. 3668) to Perceptronics, Inc.thanks to Mark Layman and Barbara Combsir help, and to Helmut Jungermann, Henryornery, and Paul Slovic for comments on andraft.lests for reprints should be sent to Sarahistein, Decision Research, a branch ofronics, 1201 Oak Street, Eugene, Oregon

moderately valid predictors of that per-formance (Murdock, 1966; Tulving &Thomson, 1971). Even when unable torecall an item from memory, people canestimate with some accuracy whether theitem will be retrieved or recognized in sub-sequent tests (Blake, 1973; Gruneberg &Monks, 1974; Hart, 1965). Although somesystematic distortions in evaluating one'sown knowledge have been noted (Koriat& Lieblich, 1977), on the whole, peopleseem able to monitor successfully the con-tents of their memories. This ability hasbeen regarded as important, if not in-dispensable, to the effective operation of amemory system (Hart, 1967).

Copyright 1980 by the American Psychological Association, Inc. 0096-1515/80/0602-0107$00.75

107

Page 2: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

108 A. KORIAT, S. LICHTEXSTEIN, AND B. FISCHHOFF

Somewhat different conclusions havebeen recently emphasized by investigatorsin the area of decision making. In thesestudies (reviewed by Lichtenstein, Fisch-hoff, & Phillips, 1977), confidence judg-ments were elicited as assessments of theprobability that a statement is true. Ap-propriateness of confidence was measuredby comparing these assessed probabilitieswith the observed relative frequencies ofbeing correct (hit rates). An individual *iswell calibrated if, over the long run, for allanswers assigned a given probability, theproportion correct equals the probabilityassigned. The general conclusion from thesestudies is that people are rather poorlycalibrated. Although higher probabilitiesare typically associated with larger hitrates, in an absolute sense the probabilitiesand hit rates diverge considerably. Themajor systematic deviation from perfectcalibration is overconfidence, an unwar-ranted belief in the correctness of one'sanswers (Lichtenstein et al., 1977). Typicalresults have shown, for example, an ob-served hit rate of .6 associated withassessed probabilities of .70, whereas foranswers assigned probabilities of .90, onlyabout 75% are correct. Recent studiesusing a variety of question and responseformats found that answers endorsed withabsolute certainty (i.e., p = 1.00) werewrong about 20% of the time (Fischhoff,Slovic, & Lichtenstein, 1977). Subjects hadenough faith in these expressions of cer-tainty that they were willing to risk moneyon them.

Although overconfidence has been apervasive bias in a variety of tasks, existingexperimental data leave the underlyingpsychological process (es) quite unclear.Building on some speculations by Fischhoffet al. (1977) and Koriat and Lieblich(1977), the present work looks at theorigins of unwarranted certainty. To assessone's confidence in the truth of a statement,one first arrives at a confidence judgmentbased on internal cues or "feelings ofdoubt" (Adams & Adams, 1961). Thejudgment is then transformed into aquantitative expression, such as the prob-ability that the statement is correct. Un-

warranted certainty might be linked eitherto the confidence judgment or to the trans-lation of that judgment into a number.

The monotonic relationship between as-sessed probability and percentage correctindicates that people can monitor the cor-rectness of their answers with some successThus, miscalibration may be due primarilyto inappropriate translation of those feel-ings into probabilistic terms. Such transla-tion difficulties might be correctable bvproper training. In fact, a calibrationtraining program that provided calibrationfeedback after each of 11 sessions of 200items was found to be effective by Lichten-stein and Fischhoff (in press). Their findingthat almost all the improvement was ac-complished after the first round of feed-back suggests that what subjects learnedwas to make simple adjustments in themagnitude of the subjective probabilitiesthey reported.

These results, however, do not precludethe possibility that miscalibration arisesfrom the way in which information is usedand evaluated when making confidencejudgments rather than from the transla-tion of these judgments into probabilities.The fact that the improvement in Lichten-stein and Fischhoff's (in press) subjectsfailed to generalize to probability assess-ment tasks with the same response modebut different content suggests that transla-tion problems are not the whole story.

One information-processing mechanismthat would produce overconfidence is tcrely more heavily on considerations con-sistent with a chosen answer than on considerations contradicting it. Such a predisposition could express itself either durin,memory search and retrieval, by gradual!;biasing the search toward evidence supporting a tentatively preferred answer, <in a postdecisional stage in which thevidence is reviewed and confidenceassessed. Whichever is the case, forcii?subjects to write down as much pertinerevidence as possible before choosing aanswer or evaluating its validity shoureduce the selective bias in the utilizaticof evidence and result in an improvecalibration.

Page 3: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

REASON'S FOR CONFIDENCE

In the first experiment to be reported,bjects were given two-alternative general-.owledge questions. For each questioney were asked to choose the correct-.ernative and state the probability thateir choice was correct. The debiasinganipulation presented an additional task:

write down all possible reasons thatiTue for or against both alternatives and

rate the strength of each reason. Weuicipated that a balanced survey of theertinent evidence would reduce over-•nfidence and improve calibration. To the•:tent that confidence judgments are biased

favor of the selected alternative, themount of evidence supporting that answerlould be a better predictor of confidence~.an the amount of evidence inconsistent- i t h i t .

Experiment 1

'etkod

•^•imuli. Six sets of 10 questions each were•ected from the 150 general-knowledge questions

~^d in Experiment 3 of Lichtenstein and FischhoffJ"). The sets were closely matched in terms of

••••Ti difficulty, as measured by the percentage of• ojects answering each correctly in that experiment.j-.e range of item difficulties used was from 32% to

,'c, with the mean for each of the six sets varying•om 64.2% to 64.9%. Five additional questions•->m the same pool were used for warm-up. The-tstions covered a wide variety of topics including»'ory, literature, geography, and nature. All had'*o-alternative format. For example, "the Sabines' re P&n of (a) ancient India or (b) ancient Rome."•-ubjects. The subjects were 73 paid volunteers'° responded to an ad in the University of Oregon'•'dent newspaper. Each subject attended one of*° identical group sessions.- rocedure. Each subject answered two of the

v sets of questions, the first under control instruc-ts and the second under reasons instructions. The^rol instructions directed subjects to choose the•rect alternative for each question and to judgeI Probability that the chosen alternative was•rect by writing down a probability between .5

'Jf 1'°: Fifteen questions then followed, the first'Jl which were treated as warm-up and discardedanalyses.. nstructions for the reasons condition directed the

•°ject to

j put all the possible reasons that you can find' oring and opposing each of the answers. Such^<»sons may include facts that you know, things

you vaguely remember, assumptions that

make you believe that one answer is likely to becorrect or incorrect, gut feelings, associations andthe like.

Subjects were required to write each reason in theappropriate cell of a 2 X 2 table depending onwhether it spoke for Answer a, for Answer b, againstAnswer a, or against Answer b. They were urgedto provide reasons for all four cells of the table andto formulate each statement in a manner that con-veyed their degree of certainty in it (e.g., "I knowfor sure. . . ," "I vaguely remember . . .").Finally, Subjects were asked to rate each reason ona 7-point scale according to how strongly it spokefor or against the corresponding answer (1 waslabeled "weakest possible" ; 7 was labeled "strongestpossible").

Each of the 10 questions in the reasons conditionappeared on a separate page. The two possibleanswers were placed above the columns of a large2 X 2 table; the words "reasons for" and "reasonsagainst" labeled the rows. A space was provided atthe bottom of each page to choose an answer andprovide a probability assessment. The 10 pages ofquestions were shuffled before the compilation ofeach booklet.

For the first group of subjects, 3 min were allowedfor each question. Since some subjects found this pro-cedure annoying, in the second session only the firstquestion was timed. Subjects were instructed tocontinue working on the remaining items at theirown pace, spending an average of 3 min on each item.

Results

Five subjects did not provide probabilityassessments for all the questions or usedthe probability scale inappropriately. Theirdata were excluded from the followinganalyses. Each of the six sets of questionswas answered by 9-13 of the remaining68 subjects in the control condition and10-12 subjects in the reasons condition.Since preliminary analyses revealed nosystematic differences between the data ofthe two sessions, they were combined inthe following analyses.

The appropriateness of confidence judg-ments can be represented by a calibrationcurve showing the percentage of correctresponses associated with each subjectiveprobability value. With perfect calibration,probability equals percentage correct andthe calibration curve becomes the identityline. With overconfidence, the curve liesunder the identity l ine; with undercon-fidence, it lies over the identity line.

Page 4: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

110 A. KORIAT, S. LICHTENSTEIN, AND B. FISCHHOFF

Figure 1 presents the calibration curvesfor all subjects and items for the controland reasons conditions.1 The interval aroundeach point represents the 50% credible in-terval using uniform priors (Phillips, 1973).The calibration curve for the reasons con-dition is clearly superior to that of thecontrol condition. The control conditiondisplays the overconfidence typically ob-served in previous studies. Some over-confidence is also observed in the reasonscondition, but it is mostly confined toprobability assessments of .9 and 1.0.Table 1 presents the proportions of correctresponses and mean probability assess-ments for the two conditions as well asseveral measures of the quality of theprobability responses.

The over/underconfidence measure re-ported in Table 1 is the signed differencebetween the mean assessed probability andthe overall proportion correct. The nextentry in Table 1, the Brier score (Brier,1950), is a strictly proper quadratic scoringrule which serves as a general measure ofthe quality of probability assessments.The lower the score, the better the per-formance ; a perfect score of 0 can be

Table 1Summary Indices for the Reasons and Contrg:Conditions

.6 .7 .8 .9 1.0

ASSESSED PROBABILITY

Figure 1. Calibration curves for the control andreasons conditions (Experiment 1) with 50%credible intervals.

Indices Control Reasons

Proportion correctMean probability

assessmentsOver/under confidenceBrier totalKnowledgeCalibrationResolution

.629

.720+ .091

.246

.233

.022

.009

.669

.697+ .028

.209

.221.005.018

1.65

2.34-2.36'2.81"1.632.69'^1.64

*p < .05. ** p < .01.

earned only by always choosing the correc-answer and assigning to it the probabilifof 1.0. The score can go as high as 1; a scorcof .25 will be earned by always choosing a;alternative at random and assigning it -c

probability of .5. The three measure-following the Brier score in Table 1 are th-three additive partitions of the Brier scondeveloped by Murphy (1973). The know!edge component is a function of proportiocorrect and is ideally 0. The calibrationscore is the weighted mean of the squaredifferences between the data points i:Figure 1 and the identity line. Resolutionreflects the slope of calibration curves. Thiterm is subtracted from knowledge an.:calibration to get the Brier score; thus, thlarger it is, the better. (For further detailon these measures, see Lichtenstein et al1977.)

These measures are extremely unstabiwhen based on small samples, such as th10 responses given by each subject in eaccondition. Thus each measure in Tablewas calculated over the data from all suijects. A modified jackknifing analys(Mosteller & Tukey, 1968) was usedcompare the two conditions. The indielisted in Table 1 were calculated for tcontrol and reasons data 68 times, eaitime leaving out a different subject. Tresults of i-test comparisons of the contrand reasons conditions over the 68 reducsamples are presented in the last colunof Table 1.

1 The assessed probabilities were groupedcategories .5 to .59, .6 to .69, 9 to .99, and 1

Page 5: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

REASONS FOR CONFIDENCE 111

As can be seen, the calibration and Briericons for the reasons condition are sig-tcjuitly superior to those obtained in thetmtrol condition. In fact, although the.aiibration observed for the control condi--jon is fairly typical for items with the;*Tsent level of difficulty, the calibration

f. the reasons condition is one of the besttt have ever observed. This improvement.-. calibration came about through both airrrease in confidence (the subjects used; responses more frequently and 1.0 re-

sponses less frequently, with little change,r, the use of intermediate values) and an..-.crease in the proportion of correct choices.The net result was a significant reductionus overconfidence from .091 to .028 and animprovement in calibration.

discussion

Calibration improved in the reasons"ondition in the absence of any feedbackCarding the adequacy of the probabilityrodgments. This strongly suggests thatTagnitive biases in the assessment of un-"*rtainty (and not merely inappropriatetranslation) are involved in the overcon-^Jence observed in our control condition*ad in previous research.

The degree of improvement in calibra-•*» achieved by listing reasons is roughly'wnparable to that achieved in Lichten-*«n and Fischhoff's (in press) training*,"?• With items of comparable difficulty,•** initial calibration of their 12 subjects^proved from .015 to .005,- overconfidence*** reduced from .063 to .020., k*am'nation of the reasons supplied and•V rate<^ strength will be postponed until

res"lts of Experiment 2 have been

Experiment 2e results of Experiment 1 are con-

•fir.t, W ''he hypothesis that over-

*~>id CnCe *S the result of neglectingr j j . cethat contradicts the chosen answer.it«r • hypothesis, writing down both

^ungand contradicting reasons shouldd subjects to a more balanced

weighing of the evidence. If supportingreasons are recruited naturally, then therequirement to spell out contradictingreasons is the key to the improved cali-bration in the reasons condition.

Experiment 2 was designed to evaluatethis interpretation by contrasting theeffectiveness of instructions to write downcontradicting reasons with that of instruc-tions to write down supporting reasons.The task was further simplified by re-quiring subjects to write only one or tworeasons. Three conditions were used, eachof which had subjects first choose thecorrect answer, then produce one or tworeasons and finally assess the probabilityof being correct. In the supporting condi-tion, subjects were instructed to write downone reason speaking for the selected alter-native; in the contradicting condition, theywere asked to write one reason speakingagainst i t ; in the both condition, they wereto write one reason of each type. If ourhypothesis is correct, then writing reasonsin response to the supporting instructionsshould produce no improvement in calibra-tion, since those instructions roughly simu-late what people normally do. By drawingattention to contrary reasons, both thecontradicting and both conditions shouldimprove calibration.

Experiment 2 was also designed to pro-vide some information regarding two alter-native explanations of the results of Experi-ment 1. The first involves order effects.Since the reasons condition followed thecontrol condition for all subjects in Experi-ment 1, it might be argued that the im-proved calibration is due to some kind oflearning or practice effect. Experiment 2examined this possibility by systematicallymanipulating the order of the items withinall conditions. A second alternative inter-pretation is that requiring subjects to workharder induced a more serious attitudetoward the task and increased the subjects'

2 Calibration scores are generally larger whenbased on smaller amounts of data. Since these arethe means of calibrations computed on just 200responses, whereas the calibrations in Table 1 werecomputed on 680 responses, they are not strictlycomparable.

Page 6: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

112 A. KORIAT, S. LICHTENSTEIN, AND B. FISCHHOFF

motivation. If so, in Experiment 2 thereshould be little difference in calibrationbetween the supporting and contradictingconditions and somewhat better calibrationin the both condition.

Method

Stimuli. The same 6 sets of 10 questions used inExperiment 1 were employed in Experiment 2. Allsubjects received three sets under regular (control)instructions and three sets under one of the threeexperimental instructions.

All the questions were compiled in a booklet thatalso contained all instructions. The control instruc-tions appeared first, followed by 35 questions, thefirst 5 of which were discarded in analysis. Theremainder of the booklet contained the supporting,contradicting, or both instructions followed by theremaining 30 items. The order of the sets was system-atically varied so that within each condition all sixsets appeared approximately equally often in eachordinal position.

Subjects. Subjects were 200 paid volunteers whoresponded to an ad in the University of Oregonstudent newspaper. Each subject attended one offive identical group sessions. There were 66 subjectsin the supporting condition, 66 in the contradictingcondition, and 68 in the both condition.

Procedure. The instructions for the control con-ditions were the same as those used in Experiment 1.The instructions for the three experimental condi-tions all required the subject first to choose thecorrect alternative, then to write down one or tworeasons, and finally to assess the probability that thechosen alternative was correct. The conditionsdiffered only in the type of reasons called for.

In the supporting condition, subjects were in-structed to

write down in the space provided one reason thatsupports your decision. Please write the bestreason you can think of that either speaks for orprovides evidence for the alternative you havechosen, or speaks against or points against thealternative you rejected.

The contradicting instructions were similarly phrasedexcept that they called for the best contradictingreason. The both instructions required one reasonof each type. The rest of the instructions weresimilar to those used in the reasons condition ofExperiment 1, specifying what might constitute areason, and encouraging the use of statements con-veying degrees of certainty in the reason providedand indicating the source of the evidence. Unlike thereasons condition, however, no strength ratingswere required. The task was self-paced.

Results

Order effects. Since each of the six setsof 10 questions appeared with approxi-

i-uUJccccou

oh-tro0.o

1 [

Contradicting Group

Control

.5 .6 .7 .8 .9 1.0

ASSESSED PROBABILITY

Figure 2. Calibration curves for the control andexperimental conditions for the supporting, con-tradicting, and both groups (Experiment 2).

mately equal frequency in each of the sixordinal positions, the possibility of ordereffects could be evaluated by comparingthe calibration curves for the first, second,and third successive sets of 10 questionsover all subjects, and again for the fourth,fifth, and sixth. These comparisons re-vealed no systematic trends within eitherof the two blocks of three sets. The calibra-tion indices for the first, second, and thirdsets (the control condition) were .014, .01".and .014, respectively. Pooling over allthree experimental conditions, calibration

Page 7: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

REASONS FOR CONFIDENCE 113

Table 2:<ummary Indices for Experiment

Supporting (n = 66)

Indices

Proportion correctMean probability

assessments'.h-er under confidenceCalibration

Control

.624

.722+ .098

.020

Reasons

.622

.716+ .094

.018

I

.14

.87

.22

.36

Contradicting (n = 55)

Control

.638

.713+ .074

.015

Reasons /

.653 .76

.704 .88+ .052 1.19

.009 2.10*

Both (n = 68)

Control

.645

.712+ .067

.012

Reasons

.646

.703+ .057

.006

/

.06

1.22.61

1.24

•p < .05.

sndices for the fourth, fifth, and sixth setsof items were .011, .010, and .012, re-spectively. It appears safe to concludethat the improvement found in Experiment1 cannot be attributed to the order in»'hich the control and reasons conditions*'ere administered.

Manipulation checks. In evaluating theeffects of the three experimental manipula-tions, 11 subjects had to be eliminated fromthe contradicting group because they wrote«nly or mostly supporting reasons. In fact,judging both from the behavior of subjectsduring the experiment and from the nature"n the reasons offered, subjects apparentlyKmnd it harder to produce a single con-tradicting reason than either a single sup-Porting reason or both a supporting and ac;'T'ntradicting reason. Even some of thesubjects retained in the contradicting

occasionally used supporting reasons,errors seemed to reflect either mo-

mentary lapses or changes of mind as to*nich alternative was chosen, indicated byerasing or crossing out of a previous*nswer (without a search for a reason con-! ̂ dieting that new answer).

Calibration. The calibration curves in2 clearly show that the strongest

was achieved by the con-instructions. Producing both

reasons resulted in a slight but.f*

etnatic improvement, whereas the sup-' TUnS instructions had no effect.,̂ ^knifing analyses similar to those em-*"Val 'n ^xPeriment 1 were used to»_. ate the differences in calibration'•Vctt.'«uv_ . i

experimental conditions andrespective controls. A summary of

K

the results appears in Table 2. For thesupporting group, the means of the sup-porting and control conditions were almostidentical for each of the indices investigated.Although statistically insignificant, theeffects of the both instructions were in thedirection of improved calibration.

The contradicting instructions, on theother hand, resulted in a significant im-provement in calibration.3 Although thechanges in proportion correct and meanprobability were slight, their pattern issimilar to that observed in the reasonscondition of Experiment 1 : increased pro-portion correct combined with reducedconfidence.

Discussion

The results of the contradicting groupare consistent with the idea that over-confidence derives in part from the tendencyto neglect contradicting evidence and thatcalibration may be improved by makingsuch evidence more salient.

Although significant, the improvementin calibration achieved by the contradictinginstructions was not as marked as thatobtained by the reasons instructions ofExperiment 1. This may have been becausethe reasons instructions required a more

3 This difference remained significant even whenthe contradicting subjects who failed to follow theinstructions were included in the analysis. Thusthe different results found for the supporting andcontradicting groups could not be attributed to theexclusion of some of the subjects from the con-tradicting group.

U N I V E R S I T Y L I B R A R I E SC A R N E G I E - M E L L O N ' j :<;<;znSITY

PITTSBURGH, P E N N S Y L V A N I A 15213

Page 8: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

114 A. KORIAT, S. LICHTENSTEIN, AND B. FISCHHOFF

thorough and detailed analysis of thepertinent evidence than the contradictinginstructions with its minimal requirementof specifying a single reason.

The fact that supporting instructionshad no effect on performance suggests thatproducing a supporting reason is approxi-mately what people normally do whenasked to assess the likelihood that ananswer is correct.

The failure of the both condition to pro-duce a significant effect was unexpected.If supporting reasons have no effect andcontradicting reasons improve calibration,then the net effect of both should be bettercalibration, assuming that the two effectsare additive. The substance of the reasonsthe subjects supplied, however, casts doubton this assumption. Contradicting reasonsprovided in the both condition were oftenof a somewhat different character thanthose in the contradicting condition. Spe-cifically, the former were seldom inde-pendent of the supporting reason providedand were sometimes secondary to it (e.g.,by qualifying it or doubting its validity).Manipulating the order in which the sup-porting and contradicting reasons weresolicited would probably not have helped,as subjects could easily follow their naturalpredilection and write down the supportingreason first anyway. Another possible ex-planation is that in Experiment 1 subjectswere asked to list reasons before choosingan answer, whereas in Experiment 2 theywere asked to choose an answer beforelisting reasons. To see if this change inorder could explain the difference betweenthe results of Experiment 1 in which cali-bration did improve and the both conditionof Experiment 2 in which it did not improve,we ran a new group of 44 subjects in amodified both condition with the orderreversed. After responding to 35 controlitems, subjects were given 30 more itemsand asked to list one reason for each of thetwo alternatives:

In the space for "a reason supporting answer a,"please write the best reason you can think of thateither

1. speaks for or provides evidence for alternativea, or

2. speaks against alternative 6.In the space for "a reason supporting answer J»please write the best reason you can think of thleither l

1. speaks for or provides evidence for alternat'b, or ^

2. speaks against alternative a.

After writing the two reasons, the subjectsselected one alternative and gave a prob-ability. The instructions were otherwisethe same as in the both condition ofExperiment 2. This new both/reversedcondition produced no improvement jncalibration over its own control, suggestingthat the order in which subjects gavereasons and chose an answer made noappreciable difference. Of course, there wasno way to prevent the subject from covertlychoosing an alternative before listing rea-sons, even when the instructions and formatsuggested otherwise.

The failure of the both condition to pro-duce stronger improvement than the con-tradicting condition would also argueagainst interpreting the reasons effect inExperiment 1 as the result of increasedeffort. An effort hypothesis would havealso predicted improved calibration withthe supporting instructions, an effect thatwas totally absent.

Analysis of Reasons

The subjects in the reasons condition ofExperiment 1 were asked to list all pertinentreasons arguing for or against each of thetwo alternative answers and to rate thestrength of each reason on a 7-point scale.They then chose an answer and assessed thelikelihood of its being right. The reasonswere analyzed to gain further insight intothe role of evidence in assessing confidence.

Six subjects who had failed to providestrength ratings or had used the ratingscale inappropriately (e.g., used a .5-1.0scale or provided strength ratings for thequestion as a whole) were eliminated fromthe following analyses. The remaining 62subjects supplied an average of 3.17reasons for each question. No subjectprovided more than four reasons for anysingle cell.

Page 9: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

REASONS FOR CONFIDENCE 115

The reasons data were analyzed in termsof the type of evidence provided, the rela-tionship between this evidence and thechosen alternative, and the relationshipbetween the reasons and confidence.

The interrogation of memory for pertinentevidence. A content analysis of the reasonsmay offer valuable insight into the type ofevidence people seek when choosing alter-natives and assessing confidence (Collins,Warnock, Aiello, & Miller, 1975). Thiscask was not attempted with the presentdata because of the great variety of sub-stantive questions used. We did, however,examine the distribution and strength ofreasons for and reasons against. This dis-tinction is independent of the alternativechosen, unlike the distinction betweensupporting and contradicting reasons, whichis defined in relationship to the chosenalternative. The former contrast may bemost relevant to the process of memoryinterrogation, whereas the latter seemsmore pertinent to the evidence review stage.

Two aspects of the data suggest thatwhen interrogating their memories forpertinent evidence, subjects are tuned moretoward arguments for than toward argu-ments against. First, subjects gave morereasons for than reasons against. Theaverage subject provided at least onereason for for 14.7 of the 20 alternativeswhile providing at least one reason againstfor only 12.3 alternatives, / (61) = 6.04,P < .001. Overall, subjects produced anav-erage of 18.3 reasons for and 13.4 reasonsagainst, t (61) = 7.99, p < .001. Second,reasons for were assigned higher strengthratings than reasons against. The meanrating given the first reason for was 3.8;the mean rating of the first reason againstw*s 3.5, t (61) = 2.92, p < .005. The samePattern was found for ratings averagedacross all for reasons and all against reasonsRiven.

Choice of answer. The choice of and'ternative was more highly related to for"an to against reasons. Pooling over all

Rejects and items, there was a .61 point-"serial correlation between choice of alter-

nafve and the difference between theUrr>ber of reasons for given to the two

alternatives (i.e., many more reasons forwere associated with chosen than with re-jected alternatives). The comparable cor-relation between choice of alternative andthe difference between number of reasonsagainst was only .42.

A similar trend was found when pre-dicting the choice of an alternative fromthe differences in the strengths of the firstfor and against reasons supplied to eachof the alternatives. This analysis used onlythose instances in which for or againstreasons were supplied to both alternatives.The choice of a particular alternative corre-lated .64 (n = 313) with the differences inthe strength of the first for reasons (i.e.,people were much more likely to choose thealternative with the higher strength as-signed to the first reason for) and .29(n = 241) with the differences in thestrength of the first against reasons.

Assessment of confidence. In the follow-ing analyses, the reasons that the subjectssupplied were classified according to theirrelationship to the alternative chosen as(a) for chosen, (b) against rejected, (c) forrejected, or (d) against chosen. The firsttwo categories constitute supporting rea-sons and the last two categories constitutecontradicting reasons.

Consider first the number of reasons pro-duced. Pooling over all subjects and items(n = 620), the correlations between con-fidence and number of reasons were asfollows: .28 with reasons for chosen, .17with reasons against rejected, —.06 withreasons for rejected, and —.07 with reasonsagainst chosen. Thus confidence was morestrongly related to the number of support-ing reasons than to the number of contra-dicting reasons. Notice that within thesupporting category, the for reasons appearto have slightly more weight than theagainst reasons, a trend that is morepronounced in the strength data pre-sented next.

Figure 3 shows the mean assessed prob-ability as a function of the strength of thefirst reason, for each of the four categoriesof reasons (a strength of zero means thatno reason of that type was given). Theassessed probabilities were most strongly

Page 10: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

116 A. KORIAT, S. LICHTENSTEIN, AND B. FISCHHOFF

For Chosen • •For Rejected e *Ageinst Chosen o oAgainst Rejected o o

0 1 2 3 4 5

Strength Of First Reason

Figure 3. Mean assessed probability as a functionof the strength of the first reason for four types ofreasons (Experiment 1).

related to the strength of for chosen reasons;with strengths less than 3, the mean prob-ability was below .6, whereas for strengthratings of 7, the mean probability was .9.The correlations between assessed prob-ability and the strength of the first reasonfor the data included in Figure 3 were asfollows: .66 for reasons for chosen, .39 forreasons against rejected, .00 for reasons forrejected, and — .02 for reasons againstchosen. Here, too, the greater relevance ofsupporting reasons (and to a lesser extentreasons for) is apparent.

Figure 3 ignores the number of reasonssupplied. The sum of the strengths of allthe reasons produced seems to capturebetter the impact of all evidence for oragainst an answer. Confidence correlated.56 with the sum of strengths of reasonsfor chosen, .37 for reasons against rejected,.03 for reasons for rejected, and .00 forreasons against chosen. Thus, the amountof contradicting evidence (for rejected,against chosen) appears to have no bearingon confidence. Once again, positive evidence(favoring the selected alternative) seemsto count more than negative evidence(against the rejected alternative).

General Discussion

The present studies seem to shed somelight on the process by which confidence

is determined and on the origins of ftmost pervasive bias in calibration, Oy *confidence. In understanding the resnl *'it is helpful to conceptualize the confi(jen 'assessment task as having two cognjt'stages. The first stage involves search}/one's knowledge; this stage ends when ^answer is chosen. During the second stag 'the evidence is reviewed and confidenin the chosen alternative is assessed.

We have shown that calibration can besignificantly improved by requiring peoplto explicate all considerations that seempertinent to their decision. The resultssuggest two biases in how people elicit anduse their own knowledge, one bias corre-sponding to each cognitive stage.

The first bias involves favoring positiverather than negative evidence (i.e., reasonsfor over reasons against). Evidence of thisbias was found in Experiment 1: Subjectsproduced more reasons for than reasonsagainst, reasons for were given higherstrength ratings, and both the number andstrength ratings of reasons for were betterpredictors of the chosen answer.

The procedures of Experiment 1 mayhave induced or augmented this bias.Forcing subjects to choose the correctalternative may have focused their atten-tion on reasons for. Conceivably, a biastoward reasons against might be exhibitedwith instructions to choose the incorrectalternative. Although the two types of in-structions are logically equivalent (for two-alternative forced-choice items), they maynot be psychologically equivalent. Anotherway of studying the possibility that formataffects the tendency to produce positiveevidence would be to present to fourdifferent groups the four variations ofnaturally dichotomous true/false items(e.g., gazpacho soup is served hot/ . . . isserved cold/ . . . is not served hot/ . . . isnot served cold). A general preference forpositive evidence would produce a differentpattern of reasons across the four variationsthan would a bias induced by the positive ornegative wording of the items.

The bias against negative evidence foundhere is similar to the difficulties people havein accepting the relevance of negative

Page 11: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

REASONS FOR CONFIDENCE 117

evidence in logical inference tasks (Johnson-Laird & Wason, 1977) as well as the neglectnf negative examples in judgments ofcorrelation (e.g., Smedslund, 1963).

The second bias in confidence assessmentis a tendency to disregard evidence in-consistent with (contradictory to) thechosen answer. Particularly striking evi-dence for this bias came from Experiment 2.Asking subjects to write a supporting reasondid not affect their calibration (presumablybecause they were already thinking of thesereasons), whereas asking them to write acontradicting reason did. Although writinga contradicting reason did improve therealism of subjects' confidence assessments,the subjects found the task d i f f i cu l t ; farmore instances of producing no reason, ora wrong reason, were observed under thecontradictory instructions than under thesupporting instructions.

The correlational analyses of Experiment1 showed that whatever measure was used'the number of reasons produced, thestrength of the first reason, or the sum ofthe strengths of all the reasons), the assess-ment of confidence was heavily based onthe evidence supporting the answer chosen,and not on the evidence supporting therejected reason. These results give rise toa seeming paradox: How can it be thatforcing people to give contradicting reasonsleads to improved calibration (as shown inExperiment 2), yet correlational analysisreveals no relationship between the strength°r number of contradicting reasons andthe probabilistic responses? This paradoxis more apparent than real. The results may<>e understood by supposing that the re-quirement to give contradictory reasonsreduces one's overall confidence, so adifference in calibration is found betweenconditions. The correlational data, how-e\er, reflect relationships within the reasonscondition. Apparently, once in that condi-tion, even while feeling less confident ingeneral, one still relies on supporting ratherthan contradicting evidence to select one'srelative degree of confidence for specific'terns.

*t is unclear whether this biased approachto evidence affects only the confidence as-

sessment stage or also the earlier recruit-ment of information in choosing an answer.One could imagine an unbiased search forinformation that quickly generated a predi-lection for one of the choices. Subsequentsearch is then directed toward considera-tions supporting the favored alternative.In this l ight , the reasons manipulationsucceeded by forcing a more systematicexploration of pertinent considerations be-fore choosing an answer, as well as bylowering confidence during the postdeci-sional review process.

Again, we must be aware that the formatused might have contributed to the bias.Asking subjects to give the probabilitythat their chosen answer was incorrectcould conceivably lead them to rely oncontradicting (for rejected and againstchosen] reasons when assessing confidence.

We do not f u l l y understand why the bothcondition in Experiment 2 did not improvecalibration. I t may be that the bias towardsupporting evidence is so strong that ex-plicitly eliciting even one supporting reasonestablishes the bias unless, as in Experiment1, all possible contradicting reasons aresought. This would explain why the con-tradicting reasons supplied by the bothgroup were often subsidiary to the sup-porting reasons.

Although fur ther research is clearly-needed, we can derive some practical advicefrom the present results. People who areinterested in properly assessing how muchthey know should work harder in recruitingand weighing evidence. However, thatextra effort is likely to be of little availunless it is directed toward recruiting con-tradicting reasons.

This message echoes a recurrent con-clusion of research designed to reduce cog-nitive biases. \Yorking harder will havelittle effect unless combined with a taskrestructuring that facilitates more optimalcognitive functioning. For example, Fisch-hoff (1977) found that hindsight bias wasnot reduced by mere exhortation to workharder or even explicit warning about itspossible presence. It was, however, greatlyreduced by having subjects write a shortstatement regarding how they would have

Page 12: Journal of Experimental Psychology: Human Learning and …Journal of Experimental Psychology: Human Learning and Memory 6, No. 2 MARCH 1980 Reasons for Confidence Asher Koriat University

118 A. KORIAT, S. LICHTEXSTEIN, AND B. FISCHHOFF

explained the occurrence of an event thatdid not happen (Slovic & Fischhoff, 1977),a manipulation not unlike the present en-forced search for contradictory reasons.

References

Adams, J. K., & Adams, P. A. Realism of confidencejudgments. Psychological Review, 1961, 68, 33-45.

Blake, M. Prediction of recognition when recallfails: Exploring the feeling of knowing phe-nomenon. Journal of Verbal Learning and VerbalBehavior, 1973, 12, 311-319.

Brier, G. S. Verification of forecasts expressed interms of probability. Monthly Weather Review,1950, 78, 1-3.

Collins, A., Warnock, E. H., Aiello, N., & Miller,M. C. Reasoning from incomplete knowledge.In D. G. Bobrow & A. Collins (Eds.), Representa-tion and understanding. \ew York: AcademicPress, 1975.

Fischhoff, B. Perceived informativeness of facts.Journal of Experimental Psychology: Human Per-ception and Performance, 1977, 3, 349-358.

Fischhoff, B., Slovic, P., & Lichtenstein, S. Knowingwith certainty: The appropriateness of extremeconfidence. Journal of Experimental Psychology:Human Perception and Performance, 1977, 3,522-564.

Gruneberg, M. M., & Monks, J. "Feeling of know-ing" and cued recall. Acta Psychologica, 1974, 38,257-265.

Hart, J. T. Memory and the feeling of knowing ex-perience. Journal of Educational Psychology. 1965,56, 208-216.

Hart, J. T. Memory and the memory monitoringprocess. Journal of Verbal Learning and VerbalBehavior, 1967, 6, 685-691.

Johnson-Laird, P. N'., & Wason, P. C. A theoreticalanalysis of insight into a reasoning task. In P. N.Johnson-Laird & P, C. Wason (Eds.), Thinking.

Cambridge, England: Cambridge L'niversitvPress, 1977.

Koriat, A., & Lieblich, I. A study of memorypointers. Acta Psychologica, 1977, 41, 151-164.

Lichtenstein, S., & Fischhoff, B. Do those \vhrknow more also know more about how much theyknow? The calibration of probability judgments'Organizational Behavior and Human Performance1977, J, 552-564.

Lichtenstein, S., & Fischhoff, B. Training for cali-bration. Organizational Behavior and HumanPerformance, in press.

Lichtenstein, S., Fischhoff, B., & Phillips, L. D.Calibration of probabilities: The state of the art.In H. Jungermann & G. deZeeuw (Eds.), De-cision making and changeAmsterdam: D. Reidel, 1977.

Mosteller, F., & Tukey, J. VV. Data analysis in-cluding statistics. In G. Lindzey & E. Aronson(Eds.), The handbook of social psychology. Read-ing, Mass.: Addison-Wesley, 1968.

Murdock, B. B. The criterion problem in short-termmemory. Journal of Experimental Psychology1966, 72, 317-324.

Murphy, A. H. A new vector partition of theprobability score. Journal of Applied Meteorology,1973, 12, 595-600.

Phillips, L. D. Bayesian statistics for social scientists.London: N'elson, 1973.

Slovic, P. & Fischhoff, B. On the psychology ofexperimental surprises. Journal of ExperimentalPsychology: Human Perception and Performance,1977, J, 544-551.

Smedslund, J. The concept of correlation in adults.Scandinavian Journal of Psychology, 1963, 4, 165-173.

Tulving, E., & Thomson, D. M. Retrieval processesin recognition memory: Effects of associativecontext. Journal of Experimental Psychology, 1971,87, 116-124.

Received April 13. 1979Revision received August 6, 1979