Psychological Heuristics for Making Inferences: Deﬁnition ... · ﬁnding good feasible...

DECISION ANALYSISVol. 08, No. 1, Xxxxx 0000, pp. 000–000issn 1545-8490 |eissn 1545-8504 |00 |0000 |0001

INFORMSdoi 00.0000/xxxx.0000.0000

c! 0000 INFORMS

Psychological Heuristics for Making Inferences:Definition, Performance, and the Emerging Theory

and Practice

Konstantinos V. KatsikopoulosCenter for Adaptive Behavior and Cognition, Max Planck Institute for Human Development,

Lentzeallee 94, 14195 Berlin, Germany, [email protected]

Laypeople as well as professionals, such as business managers and medical doctors, often use psychologicalheuristics. Psychological heuristics are models for making inferences that (1) rely heavily on core humancapacities (such as recognition, recall, or imitation), (2) do not necessarily use all available information andprocess the information they use by simple computations (such as lexicographic rules or aspiration levels),and (3) are easy to understand, apply, and explain. Psychological heuristics are a simple alternative to opti-mization models (where the optimum of a mathematical function, that incorporates all available information,is computed). I review studies in business, medicine, and psychology, where computer simulations and math-ematical analyses reveal conditions under which heuristics make better inferences than optimization, andvice versa. The conditions involve concepts that refer to (i) the structure of the problem, (ii) the resources ofthe decision maker, or (iii) the properties of the models. I discuss open problems in the theoretical study ofthe concepts. Finally, I organize the current results tentatively in a tree for helping decision analysts decidewhether to suggest heuristics or optimization to decision makers. I conclude by arguing for a multi-method,multi-disciplinary approach to the theory and practice of inference- and decision-making.

Key words : Decision analysis: application, inference, theory. Philosophy of modelingHistory : Submitted March 22, 2010; Revised July 20, 2010; Revised September 24, 2010

1

Konstantinos V. Katsikopoulos: Psychological Heuristics for Making Inferences2 Decision Analysis 08(1), pp. 000–000, c! 0000 INFORMS

1. Introduction: RethinkingHeuristics

It is a well-known secret that laypeople andprofessionals often do not make decisions inthe way prescribed by the mathematical modelsdeveloped in operations research and manage-ment science. For example, whereas a widelyadvocated model for making decisions is multi-attribute utility theory (Keeney and Rai!a1976), people who work “in the wild” such asmilitary and fire o"cers, do not seem to useutilities or probabilities to decide (Klein andCalderwood 1991). Lay investors do not usesophisticated models to diversify their assets,but often allocate an equal amount of wealthto each asset (Benartzi and Thaler 2001). Med-ical doctors do not consult logistic regressionfor deciding whether or not patients shouldbe admitted to the emergency room (Greenand Mehr 1997). Engineers do not experimentaccording to statistical theory, but use simplesequential procedures (Magee and Frey 2006).

A blanket term for simple decision modelsthat people use is heuristics. For a long time,heuristics were considered to be second bestto standard decision-theoretic tools such as lin-ear models, Bayesian networks, or classifica-tion and regression trees, and significant e!ortwas put into steering professionals, and laypeo-ple, away from the use of heuristics. Recently,however, evidence has been accumulating thatheuristics can perform well in problems of infer-ence such as judgment (Hogarth and Kare-laia 2005a), forecasting (Goldstein and Gigeren-zer 2009), and categorization (Martignon, Kat-sikopoulos, and Woike 2008): Heuristics haveachieved competitive performance in applica-tions in business (Astebro and Elhedhli 2006),medicine (Fischer et al. 2002), and psychology(Czerlinski, Gigerenzer, and Goldstein 1999).These results bring up the possibility that itmay be acceptable, or even desirable, to allowor instruct laypeople and professionals to useheuristics under the appropriate conditions. Itis a good bet that more research is needed tofully investigate this possibility. Here, I summa-rize the existing empirical evidence, present asynthesis of the theoretical concepts and con-ditions that help explain it, and organize the

results for the use of decision analysts.More specifically, the article is organized asfollows. In Section 2, I define the inferenceproblems that will be considered here. I alsodefine what is meant by psychological heuris-tics and give examples. These heuristics area simple alternative to a class of models Icall optimization models, of which I also giveexamples. In Section 3, I review the empiri-cal evidence from computer simulation stud-ies that compared the performance of heuristicsand optimization. Theoretical concepts, thathelp explain the evidence and define conditionsunder which psychological heuristics and opti-mization models outperform each other, are dis-cussed. In the next section, open problems inthe theoretical study of the concepts are identi-fied. In Section 5, I organize the results in a treefor helping decision analysts decide whether tosuggest heuristics or optimization to decisionmakers. Section 6 concludes.

2. Psychological Heuristics andOptimization Models

2.1. HeuristicsThere are three interpretations of heuristicsthat are relevant here (for more discussion, seeGroner, Groner, and Bischof 1983). First, inoperations research, heuristics refer to compu-tationally simple models that allow “. . . quicklyfinding good feasible solutions” (Hillier andLieberman 2001, p. 624). The two other inter-pretations of heuristics come from psychology.Kahneman, Slovic, and Tversky (1982) focusedon the experimental study of psychological pro-cesses that ”in general, . . . are quite useful, butsometimes lead to severe and systematic errors”(Tversky and Kahneman 1974, p. 1124), andproposed informal models (i.e., models thatdo not make precise quantitative predictions)of heuristics. Gigerenzer, Todd, and the ABCresearch group (1999) developed and testedquantitative models of heuristics that, theyargued, ”. . . when compared to standard bench-mark strategies, . . . , can be faster, more frugal,and more accurate at the same time” (Gigeren-zer and Todd 1999, p. 22). Here, I conceptualizedecision-making heuristics as a hybrid of these

Konstantinos V. Katsikopoulos: Psychological Heuristics for Making InferencesDecision Analysis 08(1), pp. 000–000, c! 0000 INFORMS 3

three interpretations: As Tversky and Kahne-man, and Gigerenzer et al. do, I consider heuris-tics that are not mere computational short-cutsbut have a psychological basis; As Hillier andLieberman, and Gigerenzer et al. do, I considerquantitative models of heuristics. More specifi-cally, by psychological heuristics I mean modelsfor making decisions, that:

(i) Rely heavily on core human capacities;

(ii) Do not necessarily use all available informa-tion, and process the information they use bysimple computations;

(iii) Are easy to understand, apply, and explain.

For other definitions of heuristics in psychology,see Shah and Oppenheimer (2008) and Gigeren-zer and Gaissmaier (in press). Requirements (i)-(iii) overlap with the properties of “fast and fru-gal heuristics” (Goldstein & Gigerenzer, 2002,p. 75), with some di!erences. For example,I introduce the requirements of usability andtransparency of heuristics that are not in theGoldstein and Gigerenzer list. And, Goldsteinand Gigerenzer ask that heuristics be descrip-tively adequate (i.e., model people’s decisions),which is not among my requirements.Based on (i)-(iii), guessing would qualify as apsychological heuristic. I do not see a problemwith this conclusion. According to Richard Bell-man, “In most situations, it would be just asuseful to toss a coin if there are two possibilities,and people know just as much” (1978, p. 50).Even if guessing were always a poor strategy,this article does not claim that psychologicalheuristics necessarily perform well.Parts of (i)-(iii) are underspecified (e.g., whatis a “core” human capacity, which computa-tions are “simple”, when is a heuristic “easy”to understand?), but the following examplesshould clarify their meaning. I first considera type of an inference problem that is oftencalled a problem of judgment: The task is tojudge which one of a number of objects hasthe higher value on a numerical criterion. Forinstance, objects could be companies and thecriterion could be the value of their stock. Even

though the correct answer is knowable, it isassumed that the decision-maker does not haveaccess to it, but has to infer it. This assumptionallows modeling interesting, and more challeng-ing, problems of judgment where the correctanswer will be available in the future, as whenthe criterion is the value of a company’s stockfive years from now. Objects are symbolized byA, B, . . . , and their criterion values by C(A),C(B), ... (it is also assumed that there are noties in criterion values).A psychological heuristic that can be used tomake inferences in judgment problems is therecognition heuristic (Goldstein and Gigerenzer2002):

“If A is recognized and B is not recognized,infer that C(A)>C(B)”. (1)

That is, if one object is recognized andanother object is not, the recognized objectis inferred to have the higher criterion value.For example, if the decision-maker has heard ofLufthansa but not Southwest Airlines, s/he canapply the heuristic and infer that Lufthansa hasa higher stock price.The recognition heuristic satisfies (i)-(iii): First,it only requires peoples’ capacity for recognitionof names, voices, and faces. This capacity canbe called core in the sense that it seems to bealmost e!ortless for people to use it; for exam-ple, a human child can recognize faces betterthan currently available software (with the pos-sible exception of new anti-terrorist technolo-gies). Second, any information beyond recog-nition (e.g., Lufthansa is a German company)is ignored. Third, the recognition heuristic isdefined by (1), which is a straightforward state-ment.If the decision-maker recognizes both Lufthansaand Southwest Airlines, the recognition heuris-tic cannot be used. The inference is then basedon information on some characteristics of theobjects that correlate, albeit imperfectly, withthe criterion. For instance, if the criterion isstock price, relevant company characteristicsmay be the number of years that the companyhas been operating, whether the country of ori-gin is a G-8 country or not, and so on. Such


pieces of information are called attributes (alsocalled aspects, cues, or features). Attributes aresymbolized by a1, a2, . . . , and the values of theattributes on an object A are symbolized bya1(A), a2(A), . . . (attributes are coded so thattheir values are nonnegative and the correlationbetween each attribute and the criterion is pos-itive).A family of simple attribute-based inferencemodels for judgment is lexicographic heuristics(Fishburn 1974):

“Infer C(A)>C(B) if and only if ai(A)>ai(B),where aj(A) = aj(B) for all j < i”. (2)

What does (2) mean? Attributes areinspected one at a time until an attribute isfound that has di!erent values on the twoobjects; then, the object with the higher valueon this attribute is inferred to have the highercriterion value. For example, suppose that adecision-maker orders the country-of-origin-in-G-8 attribute first and the number-of-yearsattribute second. The country-of-origin-in-G-8attribute has the same value on Lufthansa andSouthwest Airlines (“yes” that would be codedas 1), and Lufthansa has a higher value onthe number-of-years attribute, so the decision-maker would infer that Lufthansa has a higherstock price.The family of lexicographic heuristics is param-eterized by the rule used to order attributes.For instance, in the take-the-best heuristic(Gigerenzer and Goldstein 1996), attributes areordered in descending order of their validity, vi:

vi = Pr[ai(A)>ai(B)|ai(A) != ai(B)],where C(A)>C(B). (3)

According to (3), the validity of an attributeis the probability that the attribute has a highervalue on the object that has the higher criterionvalue (given that the attribute has di!erent val-ues on the two objects). Gigerenzer and Gold-stein (1996) postulated that people are able tocalculate attribute validities based on the corecapacity of monitoring frequencies of events,but this claim has been challenged (Dougherty,Franco-Watkins, and Thomas 2008). Rules for

ordering attributes, simpler than using valid-ity, have also been proposed, such as orderingattributes randomly (Gigerenzer and Goldstein1996). In any case, when the decision-makermakes an inference by using a lexicographicheuristic, s/he needs to retrieve attribute valuesfrom memory, one by one. Thus, lexicographicheuristics rely on peoples’ capacity for whatpsychologists call recall, satisfying (i).I also believe that lexicographic heuristics sat-isfy (ii) and (iii) in the sense that the computa-tions they require are simpler than the compu-tations used in optimization; for example, lexi-cographic heuristics use only one attribute, j in(2), and avoid the integration of attribute val-ues typically used in optimization models. Also,people’s attribute orders and judgments canbe, under some conditions, described by heuris-tics such as take-the-best, in the laboratory(Broeder and Newell 2008), and in choices ofconsumer goods such as microwaves and apart-ments (Ford et al. 1989).Lexicographic heuristics can also be applied to ageneralization of the judgment problem, calledcategorization (or classification). In categoriza-tion problems, the task is to assign an object toone of many mutually exclusive categories; forexample, to decide if a patient is at a high riskof having heart disease, or is not at a high risk.A judgment problem can be reformulated as acategorization problem where the pair (A, B)can be viewed as a single object and the pos-sible categories where it could be assigned areC(A)>C(B) and C(B)>C(A).Lexicographic heuristics for categorization canbe graphically represented as trees. A tree con-sists of the root node, on the tree’s first level,and subsequent levels with one attribute pro-cessed at each level (see Figure 1). There aretwo types of nodes. First, a node may specify aquestion about the value of the object to be cat-egorized on an attribute; the answer then leadsto another node at the next level, and the pro-cess continues in this way. The root node is ofthat type. For nodes of the other type there isan exit; the object is categorized and the pro-cess stops. In sum, starting from the root nodeand answering a sequence of questions, an exit isreached and the object is categorized. For trees


to be easy for people to understand and applythey should not have too many levels, nodes,or attributes. For example, Figure 1 shows sucha tree for categorizing a patient as being at ahigh or low risk of having ischemic heart disease(Green and Mehr 1997).

Figure 1 A fast and frugal tree for categorizingpatients as having a high or low risk ofischemic heart disease (for more details, seeGreen and Mehr 1997).

A tree like the one in Figure 1 can be math-ematically viewed as a lexicographic heuristic(Martignon et al. 2008). Lexicographic heuris-tics for categorization are called fast and fru-gal trees (Martignon, Vitouch, Takezawa, andForster 2003):

“A tree is fast and frugal if and only ifit has at least one exit at each level”. (4)

According to (4), the tree of Figure 1 is fastand frugal. If a second question were asked forall patients with elevated ST segment, the treewould not have been fast and frugal. Fast andfrugal trees are completely specified by rules forordering attributes and for assigning exits toone of the possible categories. There are rulesfor doing so which are analogous to (3) (Mar-tignon et al. 2008).Medical practitioners have been positive towardsfast and frugal trees. For example, Louis Cook

and his team at the Emergency Medical Ser-vices Division of the New York City FireDepartment used a fast and frugal tree fordeciding which of the victims of the September11 terrorist attack needed urgent care (Cook2001). Fischer et al. (2002) used a fast and fru-gal tree for deciding whether children should betreated for pneumonia with antibiotics. A num-ber of authors have argued that simple heuris-tics such as fast and frugal trees make the med-ical decision process more transparent and eas-ier to understand and communicate to others(Pearson et al. 1994; Elwyn, Edwards, Eccles,and Rovner 2001; Reilly et al. 2002).The next family of psychological heuristics Iconsider can be applied not only to judgmentand categorization problems but also to anothertype of problems, forecasting, in which the taskis to infer the criterion value of a single object.A forecasting problem can also be viewed asa categorization problem where each possiblevalue of the criterion is a possible category(practically, this makes sense if the criterionis binary- or categorically-valued). The defin-ing characteristic of this heuristic family is thatit uses goals or aspiration levels (Simon 1955,1956), ti:

“Make a forecast about A if and only ifai(A)> ti, for all i”. (5)

For example, a marketing manager of a com-pany may forecast that a customer will notmake any future purchases whenever the cus-tomer has not bought any of the company’sproducts for one year or longer (in this case,there exists just one attribute). Wuebben andvon Wangenheim (2008) review evidence for theuse of such heuristics by professional decision-makers.There are some inference problems and psy-chological heuristics I do not consider here.Social heuristics rely on core social capacities ofpeople, such as imitation, which is unmatchedamong animal species. For instance, peopleoften use the Do-what-the-majority-does heuris-tic (Laland 2001): “If the majority of your peersdisplay a behavior, engage in it as well”. Suchheuristics are very likely used in practice but Ido not discuss them here because there has not


been much work on their performance. I also donot discuss psychological heuristics that applyto inference problems other than the ones con-sidered here, such as geographic criminal profil-ing (Snook et al. 2005).

2.2. OptimizationI use the blanket term optimization modelsfor decision-making models in which the min-imum or maximum of a mathematical func-tion that incorporates all available informationabout the decision problem is computed andused as a guide to making a decision (thisworking definition is inspired by Kimball 1958,p. 35). An example of an optimization modelis multi-attribute utility theory (Keeney andRai!a 1976).Optimization models tend to violate the threecharacteristics of psychological heuristics (i)-(iii) above. To see this, consider the judgmentproblem of inferring which of A or B has ahigher criterion value, where the inference isbased on the values of attributes a1, a2, . . . ,on the objects. The family of linear models(Edwards and Fasolo 2001) is stated as follows(the values of attributes are on the same scale;for example, all attributes may be binary whereai(A) equals 1 or 0):

“Infer C(A)>C(B) if and only if (6)!

i

wiai(A)>!

i

wiai(B), where wi " 0”.

That is, for each object a weighted sum ofattribute values is computed and the objectwith the higher sum is inferred to have thehigher criterion value (if the sums are equal, theinference is made randomly). In ordinary linearregression, weights are computed by minimizingthe sum of squared errors between forecastedand true criterion values (linear models, such asregression, including sophisticated versions of itthat select variables, can also be used for fore-casting and categorization).My claim is that linear models violate (ii)because they always use all available infor-mation (the values of all attributes on bothobjects) and process this information by usingboth weighing and adding, which is more

complex than the computing in lexicographicheuristics which does not require addingattribute values. I also believe that linear mod-els violate (i): it is not clear which core capac-ities underlie the application of (6)-a possibil-ity would be arithmetic but it is a learned ande!ortful activity, and (iii): in order to under-stand weighing and adding attributes, morequantitative sophistication is required than tounderstand inspecting attributes one at a time.Another family of optimization models formaking judgments is that of Bayesian models(Domingos and Pazzani 1997):

“Infer C(A)>C(B) if and only if (7)

Pr[C(A)>C(B)|ai(A), ai(B)]>1

2”.

That is, the object that has, given all avail-able information, the higher probability of hav-ing the higher criterion value, is inferred tohave the higher criterion value. The probabil-ity in (7) is di"cult to compute if the num-ber of attributes is large or their interrelationsare complicated (Cooper 1990). In practice,Bayesian models make simplifying assumptionsabout the interrelations among attributes. Forexample, naıve Bayes (Domingos and Pazzani1997) assumes that attributes are conditionallyindependent given the criterion:

Pr[ai(A), ai(B)|aj(A), aj(B),C(A)>C(B)] =Pr[ai(A), ai(B)|C(A)>C(B)], for all i, j. (8)

The last family of optimization models Iconsider are the so-called decision trees suchas classification and regression trees (CART;Breiman, Friedman, Stone, and Olshen 1984).They can be also applied to judgment and fore-casting, but I focus on their application to cate-gorization. CART are more complex versions offast and frugal trees, where the rules for order-ing attributes require a number of statisticaltests, and it is not necessarily the case that adecision can be made at each level of the tree.


2.3. Between Heuristics andOptimization: Tallying

There are some models that lie between theextremes of heuristics and optimization. Forexample, consider a simple linear model, tally-ing (also called unit- or equal-weights regres-sion, Dawes and Corrigan, 1974; Einhorn andHogarth 1975):

wi =w= 1, for all i. (9)

Even though it uses all available informa-tion, tallying can be seen as a psychologicalheuristic in that it amounts to simply addingattribute values and “is not demanding froma cognitive viewpoint” (Hogarth and Karelaia2005a). Dawes and Corrigan (1974) labeled tal-lying an “improper linear model” and Davis-Stober, Dana, and Budescu (in press) interpretthis label as suggesting “heuristic judgment”and not a “result of any explicit optimization”.

2.4. SummaryTable 1 summarizes the material in this sec-tion. It includes the types of inference prob-lems (judgment, categorization, and forecast-ing), and the heuristic and optimization modelsconsidered in this article.An optimization model necessarily achieves

the best possible performance only if theassumptions, on which it is based, are valid.This basic principle is often overlooked andoptimization models are pronounced to be besteven when it has not been established that theirassumptions hold, as illustrated in one of Rus-sell Acko!’s “fables” (1979, p. 97): A very largeintrasystem distribution problem was modeledas a linear programming (LP) problem, andits optimal solution was derived; the argumento!ered for implementing this solution was thatits performance was superior, according to theLP model, to that of another solution. It is nec-essary to empirically compare the performanceof optimization models and alternatives suchas psychological heuristics, and this is a majormotivation for this article.

3. The Performance ofPsychological Heuristics andOptimization Models:Empirical Evidence andTheoretical Analysis

I first define the measures by which the perfor-mance of models is evaluated. The accuracy of amodel is the proportion of problems in which itmade the correct inference; for example, a cor-rect judgment is that Lufthansa has a higherstock price than Southwest Airlines, and thecategorization that a patient is at a high risk ofhaving heart disease is correct if the patient sub-sequently su!ered a heart attack. Most of thestudies investigate accuracy. In a few studies,a second performance measure is investigated,the financial gain from a model’s inferences.Furthermore, there are two types of accu-racy. Fitting accuracy refers to the situationwhere the parameters of a model (e.g., attributeweights, order of attributes) are estimated usingall data available. Predictive accuracy is oftenmeasured by cross-validation where the modelparameters are estimated by using a subset ofall data-called the training set-and the sameparameters are applied to make decisions forthe rest of the data-the test set; this processis repeated many times to average out randomvariation. Often, the training set is comprised ofthe attribute and criterion values of 50% of allobjects. Predictive accuracy is a relevant mea-sure of performance because it refers to deci-sions not yet made.The empirical evidence I review comes fromcomputer simulation studies. By simulations Ido not mean that the datasets discussed arefictitious (they are almost always real, unlessotherwise stated), rather that the performanceof models is not calculated by using closed-form equations, but by simulating how an idealagent would apply the models. I use only simu-lations because the goal of the review is to eval-uate and understand the performance of modelsper se, excluding the human factor in applyingthe models. That is, I do not discuss (i) boot-strapping studies (Dawes and Corrigan 1974;Camerer 1981) where people provide attributesor attribute values as input to a model (or anal-yses using the Brunswikian lens model, Ham-mond 2007; Karelaia and Hogarth 2008); (ii)


Judgment: Is C(A) high-er than C(B)?

Categorization: Does Abelong to C0 or C1?

Forecasting: What isC(A)?

Psychological heuristics(see Section 2.1, (i)-(iii))

Lexicographic heuristics(e.g., take-the-best) (see(2) and (3)); Tallying(see (9))

Fast and frugal trees (see(4)); Tallying (see (9))

Heuristics with aspira-tion levels (see (5)); Tal-lying (see (9))

Optimization models(see Section 2.2)

Optimization models(see Section 2.2)

Linear models (e.g.,regression) (see (6));Decision trees (e.g.,CART, see Breiman etal. 1984)

Linear models (e.g.,regression) (see (6))

Note: A, B are objects (e.g., companies); C(A), C(B) are objects’ criterion values (e.g., stockprice); and C0, C1 categories where objects belong (e.g., in NY Stock Exchange).

Table 1 A summary of the inference problems (judgment, categorization, forecasting) and models (psychologicalheuristics, optimization models) considered in this article.

work that compared the accuracy of people withlinear models (Meehl 1954; Grove et al. 2000);or (iii) research on the accuracy of heuristicswhere each decision-maker has potentially dif-ferent attribute values (e.g., recognition of stockoptions, Boyd 2001; Ortmann et al. 2008).Furthermore, four classes of computer simula-tions and theoretical analyses are not reviewedhere. First, I do not discuss work on the “adap-tive decision-maker” (Thorngate 1980; Payne,Bettman, and Johnson 1993), or the “tyranny ofchoice” (Fasolo, McClelland, and Todd 2007). Ido so because in this research an accurate infer-ence was not determined by the facts of theworld, but was defined by the researchers to bethe decision of an optimization model, usually alinear model. Even though there may be advan-tages in doing this, it biases the results and theconclusions as, for example, nonlinear modelscannot be more accurate than linear models.Second, I do not review studies on forecast-ing in the tradition of econometrics (Makri-dakis and Hibon 1979; Makridakis and Taleb2009). This body of work has accumulatedover three decades and has shown that isoften very di"cult to make accurate forecasts,and simple models can outperform more com-plex models. What are called simple modelsin this research-usually computationally sim-ple time-series models-does not match what Icall psychological heuristics here. Goldstein and

Gigerenzer (2009) discuss psychological heuris-tics and forecasting.Third, I do not review studies on the “less-is-more-e!ect” (Goldstein and Giegerenzer 2002;Katsikopoulos, 2010; Smithson, 2010), whereless information can, under some conditions,lead to more accurate decisions. The less-is-more e!ect is related to the use of both psy-chological heuristics (i.e., recognition heuristic),and optimization models (e.g., linear models)and does not directly bear on comparing thetwo. There are also studies in which the per-formance of psychological heuristics is investi-gated but it is not compared to optimizationmodels (McCammon and Haegeli 2007; Kattahet al. 2009), and these are also not discussed.Finally, because of space limitations, I do notconsider the large literature on group decision-making (Grofman and Owen 1986; Ben-Yasharand Nitzan 1997). For studies of psychologicalheuristics for group decision making, see, forexample, Hastie and Kameda (2005), Reimerand Ho!rage (2006), and Reimer and Kat-sikopoulos (2004).The material is organized around four themes,spread over eight sections. Each theme refersto one type of psychological heuristic. First,the empirical evidence on the accuracy of lex-icographic heuristics and tallying is presented(Section 3.1); this evidence comes mostly frompsychology. In the subsequent sections (3.2-3.5),I present a synthesis of theoretical concepts that


help explain the evidence and define conditionsunder which lexicographic heuristics and tally-ing achieve competitive accuracy or not. Sec-ond, the evidence on the relative accuracy offast and frugal trees is discussed in Section 3.6;a lot of this evidence comes from medicine. InSection 3.7, a concept is introduced that playsa role for the accuracy of fast and frugal trees(and also of lexicographic heuristics and tal-lying). Third, evidence on the financial gaingenerated by psychological heuristics with aspi-ration levels is reviewed in Section 3.8; thisresearch refers to business. Needless to say, theliterature is large and omissions may have beenmade.

3.1. Lexicographic Heuristics andTallying: Empirical Evidence

I first survey the results of comparisons amonglexicographic heuristics, tallying, and linearmodels. In the seventies, Robyn Dawes and hiscolleagues (Dawes and Corrigan 1974; Dawes1979) found that tallying had higher predictiveaccuracy than linear regression in two out ofthree forecasting problems. Dorans and Dras-gow (1978) generated a number of artificialdatasets so that they reflected characteristicsof real forecasting problems and concluded thattallying overall outperformed a number of ver-sions of regression.It has been and should continue to be empha-sized, however, that there are conditions underwhich regression has higher predictive accuracythan tallying as, for example, when the size ofthe training set is large (Einhorn and Hoga-rth 1975). Keren and Newman (1978) furtherhighlighted such conditions. For a review ofmany related studies in psychometrics, and edu-cational and personnel psychology, see Bobko,Roth, and Buster (2007); recent contributionsare McGrath (2008) and Davis-Stober, Dana,and Budescu (2010).Czerlinski et al. (1999) performed a simulationstudy with 20 datasets from the fields of biol-ogy, environmental science, economics, demog-raphy, health, psychology, sociology, and trans-portation. The inference problem was one ofjudgment and the criterion varied widely frommen’s and women’s attractiveness, to cities’

populations and homelessness rates, to obe-sity rates and mammals’ sleep amounts, andso on. Continuous attributes were dichotomizedby using the median. In fitting, regression wasmost accurate (77%), tallying scored 73%, andtake-the-best 75%. In prediction, where thesize of the training set was 50% of the wholedataset, take-the-best was most accurate (71%),and even tallying outperformed regression by69% to 68%. When continuous attributes werenot dichotomized, the predictive accuracy oftake-the-best and regression was equal, 76%.More recently, in a series of papers, Hoga-rth and Karelaia (2005a, 2005b, 2006a, 2006b,2007) used mostly, though not exclusively, arti-ficial datasets, and confirmed and extendedthese results: Take-the-best, tallying, and linearregression all could have superior and inferiorperformance.I now discuss comparisons of lexicographicheuristics and tallying with Bayesian models.Martignon and Ho!rage (2002) compared thepredictive accuracy of take-the-best and tally-ing with two Bayesian models in the 20 datasetsof Czerlinski et al., when the size of the train-ing set equaled 50% of the whole dataset. Thefirst model was naıve Bayes where the condi-tional independence assumption in (8) is madeabout the attributes, and the second one was aBayesian network where attributes are assumeddependent in a relatively simple Markov sense.Recall that the predictive accuracy of take-the-best with continuous attributes was 76%, oftake-the-best with binary attributes 71%, andof tallying (with binary attributes) 69%. Thepredictive accuracy of Naıve Bayes was 73% andof the Bayesian network 75% (both models usedbinary attributes).Katsikopoulos, Schooler, and Hertwig (in press)also compared the predictive accuracy of take-the-best with continuous attributes and tallyingwith that of naıve Bayes with binary attributes.This study tested very small training set sizes,from 2 to 10 objects, that is, from 3% to 15%of all objects across 19 of the Czerlinski et al.(1999) datasets. It was found that, for 2 objects,tallying had the highest predictive accuracyand take-the-best was more accurate than naıveBayes; for 3-10 objects, take-the-best had the


highest accuracy, with naıve Bayes being moreaccurate than tallying. For 5-10 objects, thepredictive accuracy of take-the-best exceededthat of naıve Bayes by more than 5%.DeMiguel, Garlappi, and Uppal (2007) ran asimulation study of models for deciding how toallocate one’s wealth across assets in a finan-cial portfolio (this task is similar to a cate-gorization problem). They tested tallying (i.e.,here meaning the allocation of an equal amountof wealth to each asset), against Markowitz’s(1952) mean-variance optimization model (fordetails, see DeMiguel et al. 2007, pp. 1921-1922), and 13 variants of the optimizationmodel (some of them Bayesian), designed todeal with issues of statistical estimation. Tal-lying ignores the data on returns, whereas theoptimization models use past returns to reallo-cate wealth.The authors used seven real portfolios (withdata on the returns of the assets spanningform twenty to forty years), and one artificialportfolio. The performance of the models wasevaluated according to three measures (Sharperatio, which is a risk-adjusted return; certainty-equivalent return; and turnover) in a test set,over many repetitions. The main result is thattallying was not consistently outperformed byany of the optimization models, in any of thethree measures. On the other hand, the sameauthors have also developed more sophisticatedBayesian models that outperformed tallying(De Miguel et al. 2009).Finally, in the task of ordering articles accord-ing to their relevance for a given topic (e.g., eye-witness testimony), Lee, Loughlin, and Lund-berg (2002) compared the performance of avariant of take-the-best with that of a Bayesianmodel. The take-the-best variant outperformedthe Bayesian model, in particular when therewere only a few relevant articles.In sum, there are three main findings of com-puter simulation studies comparing the accu-racy of lexicographic heuristics and tallyingwith that of optimization tools such as linearand Bayesian models. First, when all evidenceis taken into account, the accuracy of lexico-graphic heuristics, linear models, and simpleBayesian models is not that di!erent. It should

be emphasized, however, that even a small dif-ference of, say, 1% in accuracy could translateto large di!erences in the performance of a busi-ness. Second, the accuracy of heuristics is sur-prisingly competitive to that of linear modelsand simple Bayesian models, especially in pre-diction. Third, all models can achieve relativelysuperior and inferior performance. In the nextfour paragraphs, I discuss theoretical conceptsthat help explain the results.

3.2. The Flat Maximum E!ectBy definition, tallying is a linear model. It turnsout that lexicographic heuristics, such as take-the-best, can also be viewed as linear mod-els. Martignon and Ho!rage (2002) analyticallyshowed that, when attributes are binary, a lex-icographic heuristic makes judgments identicalto those of a linear model where attributes havenoncompensatory weights (for an earlier dis-cussion of noncompensatoriness, see Einhorn,1970):

wi >!

k>i

wk, for all i. (10)

For example, it is easy to verify that a lexi-cographic heuristic that first inspects a1, thena2, and finally a3, makes judgments identical tothose of the linear model 4a1 +2a2 + a3.Katsikopoulos and Fasolo (2006) noticed thatthe commonly used ROC attribute weights (Sri-vastava, Connolly, and Beach 1995) satisfy (10)for up to four attributes; and they also extendedthe Martignon and Ho!rage result to judgmentsinvolving more than two objects. Martignon etal. (2008) proved that the result holds for cat-egorization problems, and thus fast and frugaltrees can also be viewed as linear models (a lin-ear model for categorization assigns object A tocategory C0 whenever

"iwiai(A)>h, where h

is a parameter of the model; in tallying, wi = 1).The noncompensatory-cue-weights result meansthat a lexicographic heuristic is a special case ofa linear model. This implies that a lexicographicheuristic has at most equal accuracy with a lin-ear model with unrestricted cue weights. I makea digression here to comment on this implica-tion because it can generate confusion. Here isthe apparent confusion: As we saw, there areempirical demonstrations, through computer


simulations, where a lexicographic heuristic hasoutperformed an unrestricted linear model interms of predictive accuracy. Does that meanthat statements such as the noncompensatory-cue-weights result refer to fitting accuracy?It depends on how you view the result. The non-compensatory-cue-weights result can be seen asreferring only to fitting in the sense that theprocess of estimating parameters in a trainingset and applying the same parameters to thetest set, is not part of the assumptions andderivation of the result. On the other hand, ifcue weights are noncompensatory in the test set,then the result does refer to predictive accu-racy. In the remainder, I will discuss analyticalresults on accuracy without making a distinc-tion of whether they have to do with fitting orprediction.Now let us get back to the noncompensatory-cue-weights result. Because of it, one wayof investigating the accuracy of lexicographicheuristics is by studying the accuracy of thefamily of linear models. In fact, naıve Bayescan also be viewed as a linear model if theconditional independence assumption (8) holds:Then, naıve Bayes makes identical judgmentswith a linear model where wi = log[vi/(1# vi)](Katsikopoulos and Martignon 2006; for a dif-ferent interpretation of this result and general-izations of it, see Ben-Yashar and Nitzan 1997).Dawes and Corrigan (1974; citing an unpub-lished manuscript by von Winterfeldt andEdwards 1973) suggested the existence of a flatmaximum e!ect in the family of linear models,which can be informally stated as follows:

“The weights used in a linear model do notchange much the overall deviation betweenforecasted and true criterion values”. (11)

Lovie and Lovie (1986) provided a detailedtreatment of the e!ect and illustrated it forcredit scoring and the prediction of suddeninfant death. There is converging support forthe flat maximum e!ect. In an early paper,Wilks (1938) analytically showed that, underreasonably general conditions, if the numberof attributes is su"ciently large, most lin-ear models make “almost identical forecasts”.

More specifically, Einhorn and Hogarth (1975)showed that the minimum correlation betweenthe forecasts of regression and tallying is anincreasing function of the (assumed to be con-stant) correlation between attributes, and adecreasing function of the number of attributes.From this result, they were able to computethat this minimum correlation “is fairly highfor most applied situations” (Einhorn and Hog-arth 1975, p. 171): For instance, it exceeds .6 ifthere are at most 10 attributes and the attributecorrelation is at least .5. Ehrenberg (1982) con-sidered the case of one attribute, and foundthat using a slope di!ering from the optimalregression slope plus or minus 30% yields onlya 4% increase in unexplained error. Bobko et al.(2007) provide a recent review of related stud-ies; a classic study on the relative accuracy ofregression and tallying is Wainer (1976).The flat maximum e!ect is consistent withour empirical finding that, across a number ofsimulation studies, the di!erences in accuracyamong take-the-best, tallying, linear regression,and naıve Bayes are, on the whole, not thatlarge. Two theoretical concepts introduced inthe next paragraph define conditions underwhich tallying, take-the-best, and other linearmodels are more accurate than each other.

3.3. Noncompensatoriness andCumulative Dominance

Katsikopoulos and Martignon (2006) provideda necessary and su"cient condition for a lexi-cographic heuristic to achieve maximum accu-racy among all possible models in judgmentproblems (this accuracy equals that of naıveBayes). Assuming conditional independence (8)and that attributes are binary, the conditionis that attributes have noncompensatory validi-ties:

oi >#

k>i

ok, where

oi = vi/(1# vi); for all i. (12)

For example, if there are three attributes withv1 = 0.8, v2 = 0.67, and v3 = 0.6, then (12)holds.Again assuming conditional independence (8)and binary attributes, there is also a necessary


and su"cient condition for tallying to achievemaximum accuracy among all possible modelsin judgment (this accuracy equals that of naıveBayes), that attributes have fully compensatoryvalidities (Katsikopoulos and Martignon 2006):

vi = v, for all i. (13)

In analogy to the definition of fully compen-satory attribute validities, we can also definefully compensatory attribute weights, whenwi =w, for all i, as in (9).In a series of papers, Robin Hogarth andNatalia Karelaia (2005a, 2005b, 2006a, 2006b,2007), analyzed further the relative accuracyof linear models (including tallying) and lexi-cographic heuristics (such as take-the-best) forjudgment problems. There are three main dif-ferences between these studies and the stud-ies by Laura Martignon and her colleagues(Martignon and Ho!rage 2002, Katsikopoulosand Fasolo 2006, Katsikopoulos and Martignon2006). First, Hogarth and Karelaia also consid-ered continuous attributes (2005b, 2007). Sec-ond, they looked into issues such as the corre-lations among attributes, or errors in the appli-cation of the models (2005a).Third, and most importantly, unlike Martignonand her colleagues, Hogarth and Karelaia mod-eled the decision environment, that is, the rela-tionship between the criterion value of an objectand the attribute values of the object. A simpleversion of the environment model is linear:

C(A) =!

i

!iai(A), where !i " 0. (14)

Does a linear model approximate decisionenvironments well, or is a more complex modelneeded? Across 113 datasets from fields suchas biology, chemistry, mechanical and man-ufacturing engineering, Li, Sudarsanam, andFrey (2006) found evidence for an extensionof (14) with two-attribute interactions, C(A) ="

i !iai(A)+"

i

"j>i !i,jai(A)aj(A), but these

datasets do not refer to decision-making. Kat-sikopoulos (2010) tested the two versions ofthe linear model, with and without interac-tions, in the 20 judgment problems of Czerlin-ski et al. (1999). The parameters (attribute and

interaction weights) were estimated by cross-validation where the training set included 50%of all objects. The two versions fit equally well,in terms of (i) rank correlation between theorder of objects according to their criterion val-ues (as estimated by each version) and the orderof the objects according to their real criterionvalues (Kendall’s " equaled .43), and (ii) theprobability that the object with the highest cri-terion value (as estimated by each version) isindeed the object with the real highest criterionvalue (.61).Assuming an environment model makes theproblem of judgment we have been studyingsimilar to the problem of choice (Keeney andRai!a, 1976). In fact, Hogarth and his col-leagues have interpreted the environment modelof (14) as a linear multi-attribute utility func-tion.In some of their papers, Hogarth and Kare-laia made additional mathematical assumptionsto (14), as, for example, that attributes arenormally distributed random variables (2005b,2007). They were then able to derive conditionsfor patterns of relative accuracy involving psy-chological heuristics and linear models.For example, Hogarth and Karelaia (2005b)showed that a lexicographic heuristic is atleast as accurate as linear regression wheneverthe following condition holds (where a1 is theattribute inspected first in the lexicographicheuristic, #C,a1 is the correlation between a1 andthe criterion value C, and R2

adj is an adjustedversion of the correlation coe"cient of linearregression; for details see Hogarth and Karelaia2005b, p. 118):

#2C,a1>R2

adj. (15)

An informal interpretation of (15) is thatthe single attribute used by the lexicographicheuristic (because attributes are continuous,the first attribute inspected, a1, allows makinga decision almost always) has a higher corre-lation with the criterion than does the sum ofall attributes (weighed by the regression coef-ficients). In a sense, the attribute structurespecified by (15) is noncompensatory, as is theattribute structure specified by (10) or (12).The validities of attributes, as defined in (3),


can also be interpreted as correlations (Mar-tignon and Ho!rage 2002).Furthermore, Hogarth and Karelaia showedthat tallying is at least as accurate as lin-ear regression whenever the following condi-tion holds (where avg(a) is a dummy attributewhose value on an object is the average of allattribute values on that object):

#2a1,avg(a) "R2adj. (16)

Condition (16) can be interpreted as sayingthat there is little variability in attribute cor-relations (Hogarth and Karelaia 2005b, p. 119),a condition similar to (9) or (13) that specify afully compensatory attribute structure.In sum, even though it is an oversimplifica-tion, it can be said that the results of Hoga-rth and Karelaia converge with the results ofMartignon and her colleagues on the theoret-ical concepts used to explain and define theconditions for competitive accuracy of psycho-logical heuristics. These concepts are noncom-pensatory attribute structures for lexicographicheuristics, and fully compensatory attributestructures for tallying. We do not have analyt-ical results on how inferior the accuracy is oflexicographic heuristics and tallying when theseconditions are not satisfied; all that is known isthat there exist other models that outperformheuristics and tallying.There is another condition that guaranteescompetitive accuracy for lexicographic heuris-tics and tallying. Baucells, Carasco, and Hoga-rth (2008) showed that, assuming a linear envi-ronment model (14), a lexicographic heuristicand tallying (and some linear models) achievemaximum accuracy across all possible models ina judgment problem where two or more objectsare compared, if the condition of cumulativedominance (Kirkwood & Sarin, 1985) holds:

There exists A so that for all B:!

k"i

ak(A)"!

k"i

ak(B), for all i. (17)

For example, for two objects, A and B, suchthat a1(A) = 1, a2(A) = 0, a3(A) = 1, anda1(B) = 0, a2(B) = 1, a3(B) = 1, A cumula-tively dominates B. The lexicographic heuristic

that inspects attributes in the order a1, a2, anda3, would judge A as having the highest cri-terion value, and this is correct for the linearenvironment model C(A) = 5a1(A) + 4a2(A) +3a3(A). It can be shown that the Baucells et al.result also holds for linear environment modelswith attribute interactions (Li et al., 2006) ifattributes are binary (Katsikopoulos, 2010).Baucells et al. (2008) showed that cumulativedominance is relatively common. For exam-ple, given two objects with three attributeseach, one object cumulatively dominates theother in 97% of all possible distributions ofbinary attributes across objects. The probabil-ity that cumulative dominance holds remainshigh when more objects and attributes areadded and assumptions are made about theattributes (e.g., they are independent, identi-cally distributed, Bernoulli random variables).The high probability of cumulative dominanceleads to a low expected value loss for lex-icographic heuristics (Carrasco and Baucells2008).On the other hand, noncompensatoriness seemsto be less frequently satisfied. For example,Hogarth and Karelaia (2005a) pointed out that,in principle, attribute weights are seldom non-compensatory, and Katsikopoulos and Mar-tignon (2006) empirically found that attributevalidities were noncompensatory in three of the20 datasets of Czerlinski et al. (1999).The two concepts share an important com-monality. They both refer to the structure ofthe inference problem: Cumulative dominancerefers to structure in the space of attributevalue, and noncompensatoriness refers to struc-ture in the space of attribute goodness. Fur-thermore, both concepts express that one thingis, in a way, “superior” to other things of thesame kind (objects in the case of cumulativedominance; attributes in the case of noncom-pensatoriness).

3.4. Linear Cognitive AbilityAssuming that the attributes and the criterionare independent, identically distributed, normalrandom variables, Hogarth and Karelaia (2007)introduced a measure that defines a necessaryand su"cient condition for the linear model


with unrestricted weights to be more accuratethan the linear model with noncompensatoryweights (i.e., a lexicographic heuristic). Thismeasure assumes that the decision environmentis linear, as in (14). It refers to the decision-maker, characterizing his/her cognitive abilityof applying a linear model in a linear environ-ment.Assume that the decision-maker’s forecasts arebest (but not necessarily perfectly) described bya linear model with weights wi, in a linear envi-ronment with “true” weights !i. This decision-maker’s linear cognitive ability is defined as fol-lows (G is the correlation, across all objectsA, between

"i !iai(A)and

"iwiai(A); Rs is

the correlation, across all objects, between"iwiai(A) and the criterion values that the

decision-maker actually forecasts):

Linear cognitive ability =GRs. (18)

Hogarth and Karelaia (2007) showed that theunrestricted linear model is more accurate thana lexicographic heuristic whenever the linearcognitive ability of the decision-maker exceedsa critical value.The concepts of flat maximum, noncompen-satory and fully compensatory structures,cumulative dominance, and linear cognitiveability provide insights into regions of supe-rior accuracy of heuristics or optimization. AsGigerenzer and Brighton (2009) point out, how-ever, this kind of concept does not incorpo-rate the process of sampling that is criticallyinvolved in making predictions for the future.The next concept refers to sampling and pre-diction.

3.5. The Bias-Variance DilemmaThe bias-variance dilemma is an application ofsampling theory to the evaluation of models intasks of inference such as forecasting, judgment,and categorization. For a detailed treatment,see Geman, Bienenstock, and Doursat (1992).Informally, the idea is that the predictive accu-racy of a family of models can be decomposedas follows:

Prediction error =(Bias)2 + variance + irreducible error. (19)

As an example, for the linear model fam-ily tested on 50% of all objects in a forecast-ing problem, bias corresponds to the average(across all sets of attribute weights estimatedfrom all training sets with 50% of all objects) ofthe deviation between forecasted and true cri-terion values. Variance corresponds to the vari-ability of the forecasted criterion value. Irre-ducible error is, for a given decision to be made,constant across all models.According to (19), a family of models may bemore predictively accurate than another fam-ily because it has comparatively low bias orbecause it has comparatively low variance. Suchtradeo!s between bias and variance have beendiscussed in the context of regression mod-els (for references, see Davis-Stober et al, inpress). Holte (1993) has suggested that sim-ple heuristics for categorization might have rel-atively high predictive accuracy because theyhave relatively low variance.Gigerenzer and Brighton (2009) developedHolte’s conjecture for lexicographic heuristics,such as take-the-best, and provided illustra-tions consistent with it. Katsikopoulos (2010)used a computationally e"cient version of (19),suggested by Lee (2004), and derived analyt-ically a number of conditions for the relativepredictive accuracy of a lexicographic heuristicand the linear model with unrestricted weights;for example, across all inference problems withthree binary attributes, these models have equalpredictive accuracy as measured by the so-called minimum description length (Lee, 2004).Davis-Stober et al. (in press) assumed con-tinuous attributes and a linear environmentmodel, and showed analytically that the expec-tation (over the training set) of the sum ofsquared errors between the “true” weights andthe weights of a linear model is optimized, ina mini-max sense, by a linear model that usesa single attribute, as a lexicographic heuristicwould. Based on this result, Davis-Stober etal. (2010) derived optimality conditions for the


single-attribute model, that di!er from condi-tions on correlations identified elsewhere (e.g.,Hogarth & Karelaia, 2005b).

3.6. Fast and Frugal Trees: EmpiricalEvidence

Brighton (2006) compared the predictive accu-racy of a fast and frugal tree with CART(Breiman et al. 1984) and another popular deci-sion tree, C4.5 (Quinlan 1990), which can alsobe viewed as an optimization model. He usedeight of the problems of Czerlinski et al. (1999),and, in each problem, varied the size of thetraining set.In four of the problems, the fast and frugal treeoutperformed the decision trees for all train-ing set sizes. In the other four problems, thehighest predictive accuracy was achieved by dif-ferent models for di!erent training set sizes:When the size of the training set was relativelysmall, the fast and frugal tree tended to do best,whereas when the size of the training set waslarger, CART and C4.5 tended to do best. Thelatter pattern of results was also obtained byJuslin and Persson (2002) and Chater, Oaks-ford, Nakisa, and Redington (2003), althoughthese authors tested only one decision problem(which was not one of the eight problems testedby Brighton). Brighton (2006) also replicatedhis results by using minimum description lengthas a measure of predictive accuracy.Martignon et al. (2008) compared two fast andfrugal trees (that di!ered on the rules used forordering attributes and for assigning, at eachtree level, the exit to one category), with CARTand logistic regression (the analogue of linearregression for categorization problems; see Longet al, 1993). They used 30 categorization prob-lems from the UC Irvine Machine LearningRepository, of which 11 were medical decisionproblems. For each problem, three sizes of thetraining set were tested: 90%, 50%, and 15% ofall objects.The results were similar to those of Brighton(2006): When the training set size was large,either CART or logistic regression outper-formed both fast and frugal trees, and whenthe training set size was small, a fast and fru-gal tree outperformed both CART and logis-tic regression. For example, in the 11 medical

problems, when the training set included 90%of the objects, logistic regression outperformedboth fast and frugal trees (79% vs. 76% and74%); and when the training set included 15%of the objects, a fast and frugal tree scored 74%,whereas the other fast and frugal tree scored72%, which was equal to the accuracy of CARTand logistic regression.Similar results were obtained by Fernandez,Katsikopoulos, and Shubitizde (2010). Theyapplied fast and frugal trees and CART to theproblem of detecting unexploded ordnance (i.e.,munitions used in war or military practice). Incross-validation, when the training set had upto 10 (out of a total of 216) objects, fast and fru-gal trees had superior accuracy, whereas CARTwas more accurate when the training set hadmore than 10 objects.The next two problems refer to fitting. Greenand Mehr (1997) compared the performance ofthe fast and frugal tree in Figure 1 to logis-tic regression. Here, accuracy was separatedinto two measures: hit rate and correct-rejectionrate. The hit rate of a model equals the pro-portion of patients who eventually su!ered aheart attack and were (correctly) assigned tothe emergency care unit. The correct-rejectionrate of a model equals the proportion of patientswho did not su!er a heart attack and were (cor-rectly) assigned to a regular nursing bed.Unaided physicians followed a very defensivecategorization rule, sending 90% of all patientsto the emergency care unit. As a consequence,whereas the hit rate was about 90%, the correct-rejection rate was close to 0%. The fast and fru-gal tree produced a hit rate of 100%, while alsoincreasing the correct-rejection rate to about50%. Logistic regression had a free parame-ter that allowed it to make tradeo!s betweenhit- and correct-rejection rates, and it producedeight pairs of rates. All of these pairs had alower hit rate than that of the fast and frugaltree, and the correct-rejection rate was higherin five of the eight pairs.Fischer et al. (2002) designed a fast and fru-gal tree (with just two attributes) for decid-ing whether or not to prescribe macrolideantibiotics to children suspected of hav-ing community-acquired pneumonia. Logistic


regression achieved a hit rate of 75% and thefast and frugal tree achieved 72%.

3.7. Scarce InformationIn sum, the computer simulations reviewed inSection 3.6 found that fast and frugal trees tendto have higher predictive accuracy than CARTand logistic regression when the training set sizewas relatively small. On the other hand, theopposite pattern tended to occur when the sizeof the training set was relatively large. Recall,from Section 3.1, that take-the-best (and, insome cases, tallying) also had higher predictiveaccuracy than simple Bayesian models whenthere was not much information available fortraining the models.Based on a mathematical analysis, DeMiguelet al. (2007) concluded that, in their study offinancial investment, a reason for the good per-formance of tallying is that the informationnecessary for estimating reliably the parame-ters of the optimization models is not availablein training sets with few objects. Martignonand Ho!rage (2002) defined a decision-makeras having scarce information when the num-ber of (binary) attributes s/he has access to issmaller than the base-2-logarithm of the num-ber of objects in the decision problem. Theythen showed that, in the majority of problemswith not so many objects (less than 27), take-the-best was more accurate than tallying. Ifthere are fewer than five attributes, it can beshown that the linear model with unrestrictedattribute weights rarely has higher accuracythan a lexicographic heuristic (Katsikopoulos,2010).In sum, scarce information may refer to fewobjects, few attributes, or few attributes perobject. In any case, it tends to favor heuristics,as summarized informally below:

“When information is scarce, psychologicalheuristics tend to have higher predictive accu-racy than optimization models, and the oppo-site tends to be true when information is notscarce. The reason for this pattern seems tohave to do with parameter estimation”. (20)

3.8. Heuristics with Aspiration Levels:Empirical Evidence

Astebro and Elhedhli (2006) considered theforecasting of commercial success of inven-tions submitted for review to the CanadianInvestment Assistance Program (CIAP). Theyused 561 submissions to CIAP (from 1989 to1994), 62 of which were after-the-fact consid-ered as commercial successes and 499 of whichwere after-the-fact considered failures, based onwhether they generated sales revenue or not.All submissions had 37 attributes (e.g., envi-ronmental impact, price, safety, technical feasi-bility). The authors classified each attribute aspositive (if submissions with this attribute weremore likely to be successes) or negative (oth-erwise). After having interviewed experiencedCIAP reviewers, Astebro and Elhedhli came upwith a heuristic that uses aspiration levels:

“Forecast that an invention will become acommercial success if and only if thenumber of positive attributes exceeds tand the number of negative attributesdoes not exceed s, where t, s" 0.” (20)

In addition to the aspiration levels t and s,the heuristic has a third parameter, which is thenumber n of attributes considered. Astebro andElhedhli (2006) used a single training set (the383 submissions to CIAP from 1989 to 1992,which included 39 successes) and estimated theparameters of the heuristic. In the test set, theremaining 178 submissions to CIAP from 1992to 1994, the accuracy of the heuristic with n=21, t = 5, and s = 2, equaled 86%, which wasequal to that of a log-linear regression. Aste-bro and Elhedhli pointed out that the heuris-tic predicted correctly more successes than didthe regression, which is financially importantbecause the revenue to CIAP from submissionsof successes is estimated to be 10 times higherthan the revenue from failures.Wuebben and von Wangenheim (2008) studiedthe prediction of future purchasing behavior ofpast customers. They considered a number ofrelated decision problems of which I focus onone-how to forecast if a past customer will con-tinue buying from the firm. This is a relevant


decision because it makes sense to spend mar-keting e!ort (e.g., mailing of new catalogs orspecial deals) only on such active customers.The authors used data from three firms (airline,apparel, and music). For each firm, the purchas-ing behavior of at least 2,330 customers for atleast 1.5 years was available. Wuebben and vonWangenheim proposed the following aspiration-level-based heuristic:

“Forecast that a past customer will continuebuying from the firm if and only ifs/he has made at least one purchase duringthe most recent t months”. (21)

For the airline and apparel firms, interviewswith managers revealed that they were usingt = 9; for the music firm, the authors set t at6 months. Thus, the parameter of the heuristicwas fixed and did not have to be estimated ina training set. The parameters of an alterna-tive optimization model (Ehrenberg 1988), wereestimated by using the purchasing behavior forthe first half of the available time. The heuris-tic achieved higher accuracy than optimizationin the airline (77% vs. 74%) and apparel firms(83% vs. 75%), and equal accuracy in the musicfirm (77%).

4. Open Theoretical ProblemsTable 2 summarizes the main empirical resultson the comparison of the performance of heuris-tics and optimization, and theoretical conceptsthat help explain the results.The concepts in Table 2 can be classified

into three categories. First, noncompensatori-ness and cumulative dominance refer to thestructure of the problem (objects, their crite-rion values, their attributes, and the values ofthe attributes). Second, scarce information andlinear cognitive ability refer to the resourcesand skills of the decision maker. Finally, the flatmaximum e!ect and the bias-variance dilemmarefer to the properties of models.These six concepts are the primitives of theemerging theory of how to make inferences. Themain challenge of this theory is to relate theconcepts. It could, of course, be asked if some

of these concepts belong to more than one cate-gory (for example, scarce information could bealso viewed as a property of the structure of theproblem). But other questions may be more use-ful, as, for example, (i) how are concepts relatedcausally, and (ii) how do concepts in di!erentcategories interact with each other to determinethe relative performance of models.Instances of question (i) are if, and under whatconditions, does scarce information imply non-compensatoriness, and which values of linearcognitive ability make the flat maximum e!ectmore prevalent. Question (ii) is an explicationof Herbert Simon’s (1955, 1956) key idea thatto understand decision-making we need to ana-lyze the interaction between the organism andits environment; an example of it is if heuris-tics perform competitively when information isscarce and variance of predictions is relativelyconstant across models.

5. Suggesting Heuristics orOptimization? A Tree forDecision Analysts

Even though decision analysis has been theo-retically based on optimization (Howard 1966;Fishburn 1989), it has also been acknowl-edged that its practice may need to incorporateheuristics as well: In the words of Bell, Rai!a,and Tversky (1988): “. . . if it is too horren-dously complicated to constructively formulatesuch an objective function, then the prescriber’soperational advice might be better organized bya satisficing heuristic” (p. 19). The question isof course when should the decision analyst sug-gest the use of optimization to decision makers,and when should s/he suggest heuristics. In Fig-ure 2, I tentatively organize the current resultsin the form of a tree that can help decision ana-lysts make suggestions.In the tree of Figure 2, the first question

is whether the available information is scarceor not. I chose this question for beginning theanalysis for two reasons. First, it seems eas-ier to explain the concept of scarce informa-tion rather than other candidate concepts suchas linear cognitive ability. Second, the e!ectof scarce information on the relative perfor-mance of heuristics and optimization seems to


Task Empirical result Theoretical conceptOverall, small di!erences inaccuracy among take-the-best,tallying, linear regression, andnaıve Bayes (see Section 3.1)

Flat maximum e!ect (see Sec-tion 3.2)

Judgment: Does one object havea higher criterion value thananother object?

Linear regression and naıveBayes can have higher accuracythan take-the-best and tallying(see Section 3.1)

Compensatoriness (not full) (seeSection 3.3) and high linear cog-nitive ability (see Section 3.4)

Take-the-best and tallying canhave higher predictive accuracythan linear regression and naıveBayes, and vice versa (see Sec-tion 3.1)

Bias-variance dilemma (see Sec-tion 3.5)

Categorization: To which cate-gory does an object belong?

Fast and frugal trees can havehigher accuracy than CART andlogistic regression, and vice versa(see Section 3.6)

Scarce information (see Section3.7)

Table 2 A summary of the empirical results (from computer simulations) reviewed and the theoretical concepts(from mathematical analyses) for them.

be reliable and strong (see Sections 3.6 and3.7). Gigerenzer and Gaissmaier (in press) putspecial emphasis on a similar concept, the so-called large worlds where “. . . part of the rele-vant information is unknown or has to be esti-mated from small samples”. Large worlds werediscussed by L. J. Savage (1954) as a situationwhere optimization may not yield the best pos-sible results (Binmore 2009, also agrees withthis claim).The second question in the tree of Figure 2 isif the environment is linear. This is so becausesome current results make this assumption,those on cumulative dominance and linear cog-nitive ability. These two results are arguablythe most applicable because other work assumesvery strong conditions (noncompensatoriness)or has not yet produced many concrete results(bias-variance dilemma).If the environment is not linear, then it seemsthat the decision analyst does not have manyresults to draw from, and the best course maybe to engage the decision maker in an infor-mative and open discussion of heuristics andoptimization. If the environment is linear, then

it comes down to whether the linear cognitiveability result can be applied or not: If it can, itshould; if it cannot, a reasonable course for thedecision analyst is to suggest that either linearmodels or lexicographic heuristics may be used(because of the flat-maximum- and cumulative-dominance results).

6. Summary and ConcludingRemarks

More than fifty years ago, George Kimballwrote: “. . . there has arisen a temptation toclaim that operations research is the study ofthe best way to control an operation” (1958, p.35, emphasis in the original). He also declaredthat (emphasis added) “In my experience whena moderately good solution to a problem hasbeen found, it is seldom worthwhile to spendmuch time trying to convert this into the ‘best’solution”.The idea of not pursuing optimization indis-criminately is a familiar theme (Simon, 1955;Acko!, 1979). It does, of course, not meanthat optimization models are ine!ective. Butit necessitates that they are tested empirically


Figure 2 The current results tentatively organized in a tree for helping decision analysts decide whether tosuggest heuristics or optimization to decision makers.

and analyzed theoretically against alternatives.In fact, such work has been carried out, butit seems to not yet have penetrated the main-stream of model development and testing inoperations research and management science.This kind of resistance, across the sciences ofdecision-making, is documented and discussedby Hogarth (in press).A major motivation for this article was to sur-vey research that compared the performance ofoptimization models and an alternative, psy-chological heuristics. These heuristics are morethan mere computational shortcuts and have apsychological basis in things that people can doalmost e!ortlessly (e.g., recognize, recall, imi-tate). The mathematical basis of heuristics issimple techniques such as lexicographic algo-rithms and aspiration levels.Psychological heuristics have regions of superiorperformance to optimization models, especiallywhen the task is to make inferences about the

future. Note that these results do not necessar-ily contradict the conclusions of research thatfound that human decision-makers are outper-formed by optimization models such as linearmodels (Meehl, 1954; Dawes, Faust, & Meehl,1989). This is so for two reasons (Katsikopoulos,Pachur, Machery, & Wallin, 2008): First, it isnot clear that the human decision-makers testedin that research used psychological heuristics,as defined here; second, even if such heuristicswere used, it may be that errors were made intheir application.Of course, optimization models outperform psy-chological heuristics in other regions of the deci-sion space. It follows that it is necessary todevelop a theory that allows decision-makers toa-priori anticipate under which conditions theyshould use optimization and under which con-ditions they should employ heuristics (for moreon this idea, see Hogarth and Karelaia 2006a).A contribution of this article is to gather and


classify the so-far known elements of the theoryand practice, that come from a number of dis-ciplines, often outside operations research andmanagement science.In sum, an important lesson from com-paring the performance of optimization andheuristics is that the theory and practiceof inference- and decision-making can onlyprogress if researchers apply a multi-method,multi-disciplinary approach. As it has beenrepeatedly pointed out (March, 1978; Klein-muntz, 1990; Gigerenzer, 2001; Hammond,2007), no model is best under all conditions,and, as we saw here, ideas from all disciplinesthat relate to making decisions are needed tobuild a flexible and e!ective repertoire of mod-els.

AcknowledgmentsI thank Manel Baucells, David V. Budescu, RobinM. Hogarth, Laura Martignon, Henrik Olsson, andOzgur Simsek for their comments and helpful dis-cussions.

Appendix. ReferencesAcko!, R. L. (1979). The future of operational

research is past, Journal of Operational ResearchSociety, 30(2), 93-104.Astebro, T., and Elhedhli, S. (2006). The e!ec-

tiveness of simple decision heuristics: Forecastingcommercial success for early-stage ventures, Man-agement Science, 52(3), 395-409.Baucells, M., Carrasco, J. A., and Hogarth, R. M.

(2008). Cumulative dominance and heuristic perfor-mance in binary multi-attribute choice, OperationsResearch, 56, 1289-1304.Bell, D. E., Rai!a, H., and Tversky, A. (1988).

Decision Making: Descriptive, Normative, and Pre-scriptive Interactions, Cambridge University Press.Bellman, R. E. (1978). An Introduction to Artifi-

cial Intelligence: Can Computers Think? Boyd andFraser: San Francisco, CA.Benartzi, S., and Thaler, R. H. (2001). Naıve

diversification strategies in defined contribution sav-ing plans, American Economic Review, 91(1), 79-98.Ben-Yashar, R. C., and Nitzan, S. I. (1997). The

optimal decision rule for fixed-size committees indichotomous choice situations: The general result,International Economic Review, 38(1), 175-186.Binmore, K. (2009). Rational Decisions, Princeton

University Press.

Bobko, P., Roth, P. L., and Buster, M. A. (2007).The usefulness of unit weights in creating compositescores: A literature review, application to contentvalidity, and meta-analysis, Organizational ResearchMethods, 10(4), 689-709.Boyd, M. (2001). On ignorance, intuition, and

investing: A bear market test of the recognitionheuristic, Journal of Psychology, Finance, and Mar-keting, 2, 150-156.Breiman, L., Friedman, J., Stone, C. J., and

Olshen, R. A. (1984). Classification and RegressionTrees, Chapman and Hall.Brighton, H. (2006). Robust inference with simple

cognitive models, In C. Lebiere & R. Wray (Eds.)AAAI Spring Symposium: Cognitive Science Prin-ciples Meet AI-Hard Problems (pp. 17-22), MenloPark, CA: AAAI Press.Broeder, A., and Newell, B. R. (2008). Challeng-

ing some common beliefs: Empirical work within theadaptive toolbox metaphor, Judgment and DecisionMaking, 3(3), 205-214.Carrasco, J. A., and Baucells, M. (2008). Tight

upper bounds for the expected loss of lexicographicheuristics in binary multiattribute choice, Mathe-matical Social Sciences, 55, 156-189.Chater, N., Oaksford, M., Nakisa, R., and Reding-

ton, M. (2003). Fast, frugal and rational: How ratio-nal norms explain behavior, Organizational Behav-ior and Human Decision Processes, 90, 63-86.Cook, L. (2001). The world trade center attack-

the paramedic response: an insider’s view, CriticalCare, 5, 301-303.Czerlinski, J., Gigerenzer, G., & Goldstein, D.

G. (1999). How good are simple heuristics? In G.Gigerenzer, P. M. Todd, & the ABC ResearchGroup, Simple Heuristics that Make us Smart (pp.97-118). New York: Oxford University Press.Camerer, C. (1981). General conditions for the

success of bootstrapping models, OrganizationalBehavior and Human Decision Processes, 27(3), 411-422.Cooper, G. F. (1990), The computational com-

plexity of probabilistic inference using Bayesianbelief networks, Artificial Intelligence, 42, 393-405.Davis-Stober, C. P., Dana, J., and Budescu, D.

V. (2010). Why recognition is rational: Optimalityresults on single-variable decision rules, Judgmentand Decision-Making, 5(4), 216-229.Davis-Stober, C. P., Dana, J., and Budescu, D. V.

(in press). An improper linear estimator for multipleregression, Psychometrika.Dawes, R. M. (1979). The robust beauty of

improper linear models, American Psychologist, 34,571-582.


Dawes, R. M., and Corrigan, B. (1974). Linearmodels in decision making, Psychological Bulletin,81(2), 95-106.

Dawes, R. M., Faust, D., and Meehl, P. E. (1989).Clinical versus actuarial judgment. Science, 243,1668-1674.

DeMiguel, V., Garlappi, L., Nogales, F. J., andUppal, R. (2009). A generalized approach to port-folio optimization: Improving performance by con-straining portfolio norms, Management Science, 55,798-812.

DeMiguel, V., Garlappi, L., and Uppal, R. (2007).Optimal versus naıve diversification: How ine"cientis the 1/N portfolio strategy? Review of FinancialStudies, 22, 1915-1953.

Domingos, P., and Pazzani, M. (1997), On theoptimality of the simple Bayesian classifier underzero-one loss, Machine Learning, 29, 103-130.

Dorans, N. and Drasgow, F. (1978). Alternativeweighting schemes for linear prediction, Organiza-tional Behavior and Human Performance, 21, 316-345.

Dougherty, M. R., Franco-Watkins, A. M., andThomas, R. (2008). Psychological plausibility of thetheory of probabilistic mental models and the fastand frugal heuristics. Psychological Review, 115,199-213.

Edwards, W., and Fasolo, B. (2001). Decisiontechnology, Annual Review of Psychology, 52(1),581-606.

Ehrenberg, A. S. C. (1982). Repeat-Buying: The-ory and Applications. London: Gri"n.

Ehrenberg, A. S. C. (1988). How good is best?Journal of the Royal Statistical Society, Series A,145(3), 364-366.

Einhorn, H. J. (1970). The use of nonlinear, non-compensatory models in decision making. Psycho-logical Bulletin, 73, 221-230.

Einhorn, H. J., and Hogarth, R. M. (1975). Unitweighting schemes for decision making, Organiza-tional Behavior and Human Performance, 13, 171-192.

Elwyn, G., Edwards, A., Eccles, M., & Rovner,D. (2001). Decision analysis in patient care. TheLancet, 358, 571-574.

Fasolo, B., McClelland, G. H., and Todd, P.M.(2007). Escaping the tyranny of choice: When fewerattributes make choice easier, Markering Theory, 7,13-26.

Fernandez, J. P., Katsikopoulos, K.V., and Shu-bitizde, F. (2010). Simple geometric heuristics forthe detection of unexploded ordnance. Unpublishedmanuscript: Dartmouth College.

Fischer, J. E., Steiner, F., Zucol, F., Berger,C., Martignon, L., Bossart, W., Altwegg, M., and

Nadal, D. (2002). Using simple heuristics to targetmacrolide prescription in children with community-acquired pneumonia, Archives of Pediatrics, 156,1005-1008.Fishburn, P. C. (1974). Lexicographic orders, deci-

sions, and utilities: A survey, Management Science,20, 1442-1471.Fishburn, P. C. (1989). Foundations of decision

analysis: along the way, Management Science, 35,387-405.Ford, J., Schmitt, N., Schechtman, S. L., Hults,

B. H., and Dogherty, M. L. (1989). Process tracingmethods: Contributions, problems, and neglectedresearch questions, Organizational Behavior andHuman Decision Processes, 43(1), 75-117.Geman, S., Bienenstock, E., and Doursat, E.

(1992). Neural networks and the bias/variancedilemma, Neural Computation, 4, 1-58.Gigerenzer, G. (2001). The adaptive toolbox, In

G. Gigerenzer and R. Selten (Eds.) Bounded Ratio-nality: The Adaptive toolbox (pp. 37-50), Cam-bridge, MA: MIT Press.Gigerenzer, G., and Brighton, H. (2009). Homo

heuristicus: Why biased minds make better infer-ences. Topics in Cognitive Science, 1, 107-143.Gigerenzer, G., and Gaissmaier, W. (in press).

Heuristic decision-making in individuals and organi-zations. Annual Review of Psychology.Gigerenzer, G., and Goldstein, D. G. (1996). Rea-

soning the fast and frugal way: Models of boundedrationality, Psychological Review, 103(4), 650-669.Gigerenzer, G., and Todd, P. M. (1999). Fast

and frugal heuristics: The adaptive toolbox, In G.Gigerenzer, P. M. Todd, and the ABC researchgroup, Simple Heuristics that Make Us Smart (pp.3-34), New York: Oxford University Press.Gigerenzer, G., Todd, P. M., and the ABC

research group (1999). Simple Heuristics that MakeUs Smart, New York: Oxford University Press.Goldstein, D. G., and Gigerenzer, G. (2002). Mod-

els of ecological rationality: The recognition heuris-tic. Psychological Review, 109, 75-90.Goldstein, D. G., and Gigerenzer, G. (2009). Fast

and frugal forecasting, International Journal of Fore-casting, 25, 760-772.Green, L., and Mehr, D. R. (1997). What alters

physicians’ decisions to admit to the coronary careunit? The Journal of Family Practice, 45, 219-226.Grofman, B., and Owen, G. (1986). Information

Pooling and Group Decision Making: Proceedingsof the Second University of California Irvine Con-ference on Political Economy, Greenwich, CT: JAIPress.Groner, M., Groner, R., and Bischof, W. F. (1983).

Approaches to heuristics: A historical review. In R.


Groner (Ed.) Methods of Heuristics (pp. 1-18), Hills-dale, NJ: Erlbaum.Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B.

E., and Nelson, C. (2000). Clinical versus mechanicalprediction: A meta-analysis, Psychological Assess-ment, 12(1), 19-30.Hammond, K. R. (2007). Beyond Rationality: The

Search for Wisdom in a Troubled Time, Oxford:Oxford University Press.Hastie, R. and Kameda, T. (2005). The robust

beauty of majority rules in group decisions. Psycho-logical Review, 112, 494-508.Hillier, F. S., and Lieberman, G. J. (2001). Intro-

duction to Operations Research, New York: McGrawHill.Hogarth, R. M. (in press). When simple is hard to

accept, In P. M. Todd, G. Gigerenzer, and the ABCresearch group, Ecological Rationality: Intelligencein the World, New York: Oxford University Press.Hogarth, R. M., and Karelaia, N. (2005a). Simple

models for multiattribute choice with many alterna-tives: When it does and does not pay to face trade-o!s with binary attributes? Management Science,51(12), 1860-1872.Hogarth, R. M., and Karelaia, N. (2005b). Ignor-

ing information in binary choice with continuousvariables: When is less ”more”? Journal of Mathe-matical Psychology, 49(2), 115-124.Hogarth, R. M., and Karelaia, N. (2006a). Regions

of rationality: Maps for bounded agents, DecisionAnalysis, 3, 124-144.Hogarth, R. M., and Karelaia, N. (2006b), ”Take-

the-best” and other simple strategies: Why andwhen they work ”well” with binary cues, Theory andDecision, 61, 205-249.Hogarth, R. M., and Karelaia, N. (2007). Heuristic

and linear models of judgment: Matching rules andenvironments, Psychological Review, 114(3), 733-758.Holte, R. C. (1993). Very simple classification

rules perform well on most commonly used datasets,Machine Learning, 3(11), 63-91.Howard, R. A. (1968). The foundations of decision

analysis, IEEE Transactions on Systems Science andCybernetics, 4, 211-219.Juslin, P., and Persson, M. (2002). PROBabilities

from EXemplars: A ”lazy” algorithm for probabilis-tic inference from generic knowledge. Cognitive Sci-ence, 26, 563-607.Kahneman, D., Slovic, P., and Tversky, A. (Eds.)

(1982). Judgment Under Uncertainty: Heuristics andBiases, Cambridge: Cambridge University Press.Karelaia, N., and Hogarth, R. M. (2008). Deter-

minants of linear judgment: A meta-analysis of lens

model studies, Psychological Bulletin, 134(3), 404-426.Katsikopoulos, K. V. (2010). Heuristic choice with

binary attributes: Model-based and model-free anal-yses, Unpublished manuscript: Max Planck Institutefor Human Development.Katsikopoulos, K. V. (2010). The less-is-more

e!ect: Predictions and tests, Judgment and DecisionMaking, 5(4), 244-257.Katsikopoulos, K. V., and Fasolo, B. (2006).

New tools for decision analysts, IEEE Transactionson Systems, Man, and Cybernetics: Systems andHumans, 36(5), 960-967.Katsikopoulos, K. V. and Martignon, L. (2006).

Naıve heuristics for paired comparison: Some resultson their relative accuracy, Journal of MathematicalPsychology, 50, 488-494.Katsikopoulos, K. V., Pachur, T., Machery, E.,

and Wallin, A. (2008). From Meehl (1954) to fastand frugal heuristics (and back): New insights intohow to bridge the clinical-actuarial divide, Theoryand Psychology, 18(4), 443-464.Katsikopoulos, K. V., Schooler, L. J., and Her-

twig, R. (in press). The robust beauty of mediocreinformation, Psychological Review.Kattah, J. C., Talkad, A. V., Wang, D. Z., Hsieh,

Y. H., Newman-Stoker, D. E. (2009). HINTS to diag-nose stroke in the acute vestibular syndrome: Three-step bedside oculmotor examination more sensitivethan early MRI di!usion-weighted imaging, Stroke,40, 3504-3510.Keeney, R. L., and Rai!a, H. (1976). Decision-

making with Multiple Objectives: Preferences andValue Tradeo!s, New York: Wiley.Keren, G., and Newman, J. R. (1978). Additional

considerations with regard to multiple-regressionand equal weighting, Organizational Behavior andHuman Performance, 22(2), 143-164.Kimball, G. E. (1958). A critique of operations

research, Journal of the Washington Academy of Sci-ences, 48(2), 33-37.Kirkwood, C. W., and Sarin, R. K. (1985). Rank-

ing with partial information: A method and appli-cation, Operations Research, 33(1), 193-204.Klein, G. A., and Calderwood, R. (1991). Decision

models: Some lessons from the field, IEEE Trans-actions on Systems, Man, and Cybernetics, 21(5),1018-1026.Kleinmuntz, B. (1990). Why we still use our heads

instead of formulas: Toward an integrative approach.Psychological Bulletin, 107, 296-310.Laland, K. N. (2001). Imitation, social learn-

ing, and preparedness as mechanisms of boundedrationality, In G. Gigerenzer and R. Selten (Eds.)


Bounded Rationality: The Adaptive Toolbox, Cam-bridge, MA: MIT Press.Lee, M. D. (2004). An e"cient method for the

minimum description length evaluation of cognitivemodels. In K. Forbus, D. Gentner, & T. Regier(Eds.), Proceedings of the 26th Annual Conferenceof the Cognitive Science Society (pp. 807-812). Mah-wah, NJ: Erlbaum.Lee, M. D., Loughlin, N., and Lundberg, I. B.

(2002). Applying one-reason decision-making: Theprioritisation of literature searches, Australian Jour-nal of Psychology, 54, 137-143.Li, X., Sudarsanam, N., and Frey, D. D. (2006).

Regularities in data from factorial experiments,Complexity, 11, 32-45.Long, W. J., Gri"th, J. L., Selker, H. P., and

D’Agostino, R. B. (1993). A comparison of logis-tic regression to decision-tree induction in a medicaldomain, Computers and Biomedical Research, 26,74-97.Lovie, A. D., and Lovie, P. (1986). The flat maxi-

mum e!ect and linear scoring models for prediction,Journal of Forecasting, 5, 159-168.Magee, C., and Frey, D. D. (2006). Experimen-

tation and its role in engineering design: Linking astudent design exercise with new results from cogni-tive psychology. International Journal of Engineer-ing Education, 22, 3, 478-488.Makridakis, S., and Hibon, M. (1979). Accuracy of

forecasting: An empirical investigation (with discus-sion), Journal of the Royal Statistical Society, SeriesA, 142(2), 79-145.Makridakis, S., and Taleb, N. (2009). Decision

making and planning under low levels of predictabil-ity, International Journal of Forecasting, 25, 716-733.March, J. G. (1978). Bounded rationality, ambigu-

ity, and the engineering of choice, The Bell Journalof Economics, 9(2), 587-608.Markowitz, H. M. (1952). Portfolio selection,

Journal of Finance, 7, 77-91.Martignon, L., and Ho!rage, U. (2002). Fast, fru-

gal, and fit: Simple heuristics for paired comparison,Theory and Decision, 52, 29-71.Martignon, L., Katsikopoulos, K. V., and Woike,

J. (2008). Categorization with limited resources: Afamily of simple heuristics, Journal of MathematicalPsychology, 52(6), 352-361.Martignon, L., Vitouch, O., Takezawa, M., and

Forster, M. (2003). Naive and yet enlightened: Fromnatural frequencies to fast and frugal trees, In D.Hardman and L. Macchi (Eds.) Thinking: Psycho-logical Perspectives on Reasoning, Judgment, andDecision Making (pp. 189-211), Chichester: JohnWiley and Sons.

Meehl, P. E. (1954). Clinical versus StatisticalPrediction: A Theoretical Analysis and a Review ofthe Evidence. Minneapolis, MN: University of Min-nesota Press.McCammon, I., and Haegeli, P. (2007). An eval-

uation of rule-based decision tools for travel inavalanche terrain, Cold Regions Science and Tech-nology, 47, 193-206.McGrath, R. E. (2008). Predictor combination

in binary decision-making situations, PsychologicalAssessment, 20, 195-205.Ortmann, A., Gigerenzer, G., Borges, B., and

Goldstein, D. G. (2008). The recognition heuris-tic: A fast and frugal way to investment choice? InC. R. Plott and V. L. Smith (Eds.) Handbook ofExperimental Economics Results: Vol. 1, Amster-dam: North-Holland.Pachur, T., Broeder, A., and Marewski, J. N.

(2008). The recognition heuristic in memory-basedinference: Is recognition a non-compensatory cue?Journal of Behavioral Decision Making, 21, 183-210.Payne J. W., Bettman, and Johnson, E. J. (1993).

The Adaptive Decision Maker, Cambridge: Cam-bridge University Press.Pearson, S. D., Goldman, L., Garcia, T. B., Cook,

E. F., and Lee, T. H. (1994). Physician response toa prediction rule for the triage of emergency depart-ment patients with chest pain, Journal of GeneralInternal Medicine, 9, 241-247.Quinlan, J. R. (1990). Decision trees and decision-

making, IEEE Transactions on Systems, Man, andCybernetics, 20, 339-346.Reilly, B. M., Evans, A. T., Schaider, J. J., Das,

K., Calvin, J. E., Moran, I. A., Roberts, R. R., andMartinez, E. (2002). Impact of a clinical decision ruleon a hospital triage of patients with suspected acutecardiac ischemia in the emergency department, Jour-nal of American Medical Association, 248, 342-357.Reimer, T., and Ho!rage, U. (2006). The ecologi-

cal rationality of simple group heuristics: E!ects ofgroup member strategies on decision accuracy, The-ory and Decision, 60, 403-438.Reimer, T., and Katsikopoulos, K. V. (2004). The

use of recognition in group decision-making, Cogni-tive Science, 28, 1009-1029.Savage, L. J. (1954). The Foundations of Statis-

tics, Yale University Press.Shah, A. K., and Oppenheimer, D. M. (2008).

Heuristics made easy: An e!ort-reduction frame-work. Psychological Bulletin, 137, 207-222.Simon, H. A. (1955). A behavioral model of ratio-

nal choice. Quarterly Journal of Economics, 69, 99-118.


Simon, H. A. (1956). Rational choice and thestructure of environments. Psychological Review, 63,129-138.

Smithson, M. (2010). When less is more in therecognition heuristic, Judgment and Decision Mak-ing, 5(4), 230-243.

Snook, B., Zito, M., Bennell, C., and Taylor, P.J. (2005). On the complexity and accuracy of geo-graphic profiling strategies, Journal of QuantitativeCriminology, 21, 1-16.

Srivastava, J., Connolly, T., and Beach, L. R.(1995). Do ranks su"ce? A comparison of alterna-tive weighting approaches in value elicitation, Orga-nizational Behavior and Human Decision Processes,63(1), 112-116.

Thorngate, W. (1980). E"cient decision heuris-tics. Behavioral Science, 25, 219-225.

Tversky, A., and Kahneman, D. (1974). Heuris-tics and biases: Judgment under uncertainty, Sci-ence, 185, 1124-1130.

von Winterfeldt, D., and Edwards, W. (1973).Costs and payo!s in perceptual research, Unpub-lished manuscript: University of Michigan.

Wainer, H. (1976). Estimating coe"cients in lin-ear models: It don’t make no nevermind, Psycholog-ical Bulletin, 83(2), 213-217.

Wilks, S. S. (1938). Weighting systems for linearfunctions of correlated variables when there is nodependent variable, Psychometrika, 3, 23-40.

Wuebben, M., and von Wagenheim, F. (2008).Instant customer base analysis: Managerial heuris-tics often ”get it right”, Journal of Marketing, 72,82-93.

Psychological Heuristics for Making Inferences: Deﬁnition ... · ﬁnding good feasible...

Documents

Transcript of Psychological Heuristics for Making Inferences: Deﬁnition ... · ﬁnding good feasible...