How bad is Human Judgment? Peter Ayton Department of Psychology City University, London.

How bad is Human Judgment?

Peter Ayton

Department of Psychology

City University, London

How do psychologists study human judgment?

• Psychological experiments often compare the actual with the ideal. The actual can be measured by monitoring human decision making. The ideal is usually determined from laws in logic or statistics.

• Discrepancies show that the human brain doesn’t seem to solve problems by applying laws of logic or statistics - so how does it work?

• Because people can’t utilise vast amounts of information, the brain uses ‘heuristics’ – simple rules of thumb – to make judgments and decisions quickly.


But psychological research has undermined confidence in the quality of human judgment.

E.g. Psychologist Daniel Kahneman awarded the 2002 Economics Nobel: “…discovered how human judgment may take heuristic shortcuts that systematically depart from basic principles of probability”

Reflecting on the fact that their intent in studying heuristic “errors” was akin to the use of optical illusions, forgetfulness, or tongue twisters in order to understand sight, memory, and language, the researchers wrote:

“Although errors of judgment are but a method by which some cognitive processes are studied, the method has become a significant part of the message.” (Kahneman & Tversky, 1982, p. 492).

How do psychologists study human judgment?

Illusions

Visual and Cognitive

Is the blue on the inner left back or the outer left front?

No, they're both the same size

Is the left centre circle bigger?

It's a spiral, right?

No, these are a set of independent circles

Count the black dots

How many legs does this elephant have?

Are the horizontal lines parallel or do they slope?

How do people consider risks?

1) Relative insensitivity to probability information.

2) Driven by evaluation of qualities of outcomes (Risk as Emotions)

How do people consider risks?

Judgement and DescriptionEffects of “unpacking” hypotheses.

E.g.

p (death from unnatural causes) = 32%.

But,

p (death from accident) = 32%

p (death by homicide) = 10%

p (other unnatural causes) = 11%

SUM =53%

Experts (stockbrokers stock forecasts; Oil Engineers safety assessments) show similar effects.

Judgement and DescriptionEffects of “unpacking” hypotheses.

How to Be Incoherent and Seductive: Bookmakers’ Odds and Support Theory

The Planning fallacy• WHY does everything take longer to finish and cost more than we think it will?

• The Channel Tunnel was supposed to cost £2.6 billion. In fact, the final bill came to £15 billion. The Jubilee Line extension to the London Underground cost £3.5 billion, about four times the original estimate. There are many other examples: the London Eye, the Channel Tunnel rail link, the Dome.

• This is not an exclusively British disease. In 1957, engineers forecast that the Sydney Opera House would be finished in 1963 at a cost of A$7 million. A scaled-down version costing $102 million finally opened in 1973. In 1969, the mayor of Montreal announced that the 1976 Olympics would cost C$120 million and "can no more have a deficit than a man can have a baby". Yet the stadium roof alone—which was not finished until 13 years after the games—cost C$120 million.

• Is gross incompetence behind such fiascos? Or a Machiavellian plot to secure approval for projects that once started cannot easily be cancelled?

• Research carried out by psychologist Roger Buehler suggests that the main cause may lie deeper. Buehler found that students consistently underestimated how long it would take them to finish their assignments. They seemed to have an over-idealised vision of a smooth future and rarely anticipated more than trivial impediments.

Partition Dependence

How you frame a question affects the answer

‘Case prime’: “Will Sunday be the hottest day of the week”?

A two-fold partition of the sample space is evoked

Sunday versus the rest of the week. (1/2)

‘Class prime’: “Will the hottest day of the week be Sunday?”

A seven-fold partition is invoked.

Sunday is one of 7 possible options (1/7)

Overconfidence

Typical experiments have presented series of two alternative general knowledge questions to subjects and asked them to indicate the correct answer and state their subjective probability, expressed as a percentage, that they have selected the correct answer. E.g. Which is longer ? (a) Panama canal

“%sure”(b) Suez canal

[“%sure” responses vary from 50% - completely uncertain – to 100% - completely certain.]

Early general knowledge experiments suggested that people’s confidence judgments are poorly “calibrated”.

Points below the diagonal represent overconfident responses – the expressed confidence is higher than the proportion correct.

But some experts (e.g. weather forecasters) produce very well calibrated subjective likelihood judgments in the domain of their expertise.

But not all experts are well calibrated. Experienced physicians’ probabilistic diagnoses of pneumonia are poorly calibrated.

What makes experts well calibrated? Some experts get prompt unambiguous feedback (e.g. weather forecasters) others (e.g doctors) may not.

The hot-hand fallacy and the gambler’s fallacy: Two faces of Subjective Randomness?


Pinker (1997) is critical of the presumption of faulty reasoning typically accompanying observations of the gambler’s fallacy:

“It would not surprise me if a week of clouds really did predict that the trailing edge was near and the sun was about to be unmasked, just as the hundredth rail road car on a passing train portends the caboose with greater likelihood than the third car. Many events work like that. …An astute observer should commit the gambler’s fallacy. A gambling device is by definition a machine designed to defeat our intuitive predictions. It’s like calling our hands badly designed because they fail to get out of handcuffs.” (p. 346).


Gilden and Wilson (1995; 1996) have shown that for golf putting, dart throwing and auditory and visual signal detection there are streaks in performance; Adams (1995) reports “momentum” in the performance of pocket billiards players and Smith (in press) reports that horseshoe pitchers have modest hot and cold spells.

Thus, belief in the hot-hand is not always fallacious. Perhaps then people have learned to expect the hot hand from observing human performances where it occurs.

Gains and losses Samuleson’s paradox: Offers a bet on a coin toss.

Heads you win $200; tails you lose $100.

No-one takes it - but would play ten times.

Loss aversion: Losses are weighted more than gains

Insurance and extended warranties.

Gains and losses Q1. Imagine that you face the following pair of concurrent decisions. First examine

both decisions and then indicate the options that you prefer.

Decision I: Choose between A. A sure gain of £2,400 B. B. A 25% chance to gain £10,000, and a 75% chance to gain nothing

Decision II: Choose between C. A sure loss of £7,500D. D. A 75% chance to lose £10,000, and a 25% chance to lose nothing

Gains and losses Q1. Imagine that you face the following pair of concurrent decisions. First examine

both decisions and then indicate the options that you prefer.

Decision I: Choose between A. A sure gain of £2,400 B. B. A 25% chance to gain £10,000, and a 75% chance to gain nothing

Decision II: Choose between C. A sure loss of £7,500D. D. A 75% chance to lose £10,000, and a 25% chance to lose nothing

Most people choose A and D – hardly anyone prefers B and C. They like the sure gain in Decision I and dislike the certain loss in Decision II. But the pair of choices B and C is much better than A and D.

Gains and losses Q1. Imagine that you face the following pair of concurrent decisions. First examine Q1. Imagine that you face the following pair of concurrent decisions. First examine

both decisions and then indicate the options that you prefer.both decisions and then indicate the options that you prefer.

Decision I: Choose between Decision I: Choose between A.A. A sure gain of £2,400 A sure gain of £2,400 B.B. B. A 25% chance to gain £10,000, and a 75% chance to gain nothingB. A 25% chance to gain £10,000, and a 75% chance to gain nothing

Decision II: Choose between Decision II: Choose between C.C. A sure loss of £7,500A sure loss of £7,500D.D. D. A 75% chance to lose £10,000, and a 25% chance to lose nothing D. A 75% chance to lose £10,000, and a 25% chance to lose nothing

Most people choose A and D – hardly anyone prefers B and C. They like the sure gain in Decision I and Most people choose A and D – hardly anyone prefers B and C. They like the sure gain in Decision I and dislike the certain loss in Decision II. But the pair of choices B and C is much better than A and D. dislike the certain loss in Decision II. But the pair of choices B and C is much better than A and D.

If you combine the outcomes of the two choices you can add the sure gain of £2,400 to the risky outcomes in D. So, A and D gives you

A and D. 25% chance to gain £2,400, and 75% chance to lose £7,600 Similarly, B and C can be combined – the sure loss of £7,500 in C can be subtracted from the risky outcomes from B

B and C. 25% chance to gain £2,500, and 75% chance to lose £7,500

With B and C the chances of winning and losing are the same as in A and D but the amount you might win is more and the amount you might lose is less.

Gains and losses The same notions of loss

aversion and certainty weighting can explain the sunk cost effect.

Mindful of their investment people can’t quit (but animals can and do).

Gains and losses The mental accounting of wine cellars.You purchase several cases of wine at $20 a bottle and, after several years it has now

increased in value. You have been offered $75 a bottle.

You decide to drink a bottle to help you decide about the offer. How much does this cost you?

Possible mental accounts...

(a) Nothing (b) $20 (c) $20 + interest (d) $75 (e) A gain of $55(I already own it) (what I paid) (what I paid + interest) (what I am offered) (I drank a $75

bottle and it only cost $20)

(a) Nothing (b) $20 (c) $20 + interest (d) $75 (e) A gain of $55____________________________________________________________________________Students 30% 10% 1% 37% 22%Experts 30% 18% 7% 20% 25%(wine collectors)

A patient with severe chest pains is rushed to the emergency department in a hospital. The physicians must (quickly) decide: Should the patient be sent to the coronary care unit or to a regular bed with ECG telemetry?

In two Michigan hospitals, emergency physicians sent 90% of all patients to the care unit. Such “defensive” decision-making led to over-crowding, decreased quality of care, and greater health risks for patients.

Researchers taught the physicians to use the Heart Disease Predictive Instrument, an expert system consisting of a chart with some 50 probabilities and a logistic formula with which the physician, aided by a pocket calculator, computes the probability of requiring the coronary care unit for each patient. If the probability is higher than a certain value, then the patient is sent to the care unit, otherwise not.

Physicians don’t like using this and similar systems. They don’t understand it - it does not conform to their intuitive thinking - and so avoid using it.

The researchers tried a third alternative: a heuristic that has the structure of physicians’ intuitions, but is based on empirical evidence. This fast and frugal tree (Figure 2) asks only a few yes-no questions. If a patient has a certain anomaly in his electrocardiogram (the so-called ST segment), he is immediately sent to the coronary care unit. No other information is required. If that is not the case, a second cue is considered: whether the patient’s primary complaint was chest pain. If this is not the case, he is immediately assigned to a regular nursing bed. No further information is sought. If the answer is yes, then a third question is asked to finally classify the patient.

Gaudi’s “Stereostatic Model”

Between the inverted rope-and-weight model and these painted photographs, Gaudi obtained an unorthodox, but architecturally flawless set of plans for his famous chapel, one that no engineer could have derived using traditional methods.

Gaudi’s “Stereostatic Model”

“Since the plan of the church was so complicated-towers and arcs emerging from unexpected places, leaning on other arcs and towers-it is practically impossible to solve the set of equations which corresponds to the requirement of equilibrium in this complex. [But through Gaudi’s model] all the computation was instantaneously done by gravity! The set of arcs arranged itself such that the whole complex is in equilibrium, but upside down.”Dorit Aharonov, Quantum Computation, Annual Reviews of Computational Physics VI (Dietrich Stauffer, ed., 1998).

How Dogs Navigate to Catch

Frisbees

According to the notion of bounded rationality (Simon, 1956; 1992), the computational limits of cognition and the structure of the environment may foster the use of "satisficing" rather than optimal strategies.

Thus for many of our decisions "fast and frugal" heuristics would be a serviceable substitute for the “proper” rule.

But not always. E.g. U.K. Magistrates’ bail decisions are well modelled by One-reason decision models (despite their insistence that they look at all the information)

How can anyone be perfectly “rational” in a world where knowledge is limited, time is pressing, and deep thought is often an unattainable luxury?

Traditional models of unbounded rationality and optimization in cognitive science, economics, and animal behavior have tended to view decision-makers as possessing supernatural powers of reason, limitless knowledge, and endless time.

But understanding judgment and decisions in the real world requires a more psychologically plausible notion of bounded rationality.

Human Judgment and choice: Rational or irrational?


The good news is that, counter to some views, human judgment can be very accurate – though it may not always be.

However, we are closer to understanding the conditions where judgement may be more reliable. (Formats of information; learning conditions with feedback).

Understanding judgement means understanding not just the mind – but how it interacts with its environment.

The Beauty Contest

The game is called a beauty contest after a famous passage in Keynes’ (1936) “General theory of Employment Interest and Money”.

The Beauty Contest

The game is called a beauty contest after a famous passage in Keynes’ (1936) “General theory of Employment Interest and Money”.

Keynes remarked that the stock market is like a beauty contest. He had in mind contests that were popular in England at the time, where a newspaper would print 100 photographs, and people would write in and say which six faces they liked most. Everyone who picked the most popular face was automatically entered in a raffle, where they could win a prize.

Keynes wrote, “It is not a case of choosing those [faces] which, to the best of one’s judgment, are really the prettiest, nor even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practise the fourth, fifth and higher degrees.”

The Beauty Contest

If you played this game repeatedly, your thoughts might run as follows. You might assume that the starting average would probably be 50, so you’d guess 33. But then you’d say, hmmm, if other people are as clever as I am, they will all pick 33, so I should pick 22. But if everyone else does that, too, I should pick two-thirds of 22. And if you carry this through infinitely many levels of reasoning to the logical end, you’ll wind up picking zero.

Zero is what game theory predicts for this situation. Game theory is the branch of social science that analyzes strategic interactions in mathematical terms. It was founded quite a long time ago, but it’s had a slow fuse—only in the last 10 or 15 years has it come to the fore in reasoning about economics and political science.

So how do people actually behave? Do theypick zero? The data here are from undergrads from Singapore, Germany, theWharton School of Business at the Universityof Pennsylvania, and Caltech.

The average choice across all these experiments was around 40, so if you guessed about two-thirds of 40, or 27, you’d probably win.

If we use these data to gauge how many steps of reasoning people are doing about other people’s reasoning, something from one to three seems reasonable. It’s clearly not the game-theory prediction of infinity, but it clearly demonstrates theperformance of at least one step of reasoning.

Three Newspaper studies

Three Newspaper studies

The most popular numbers in all three experiments are two-thirds of 50 (about 33), twothirds of this number (about 22) and the equilibria of the game (0 and 1 in The FT, 1 in Expansion and 0 in Spektrum).

The steps of iterated dominance interpretation claims that in the Beauty-contest game peoplereason in steps. Step 0, which would be the preliminary step of any reasoning, translates intonumbers that are arbitrarily distributed over the interval.

Level-1 reasoning is (2/3)·50 = 33.333. Level-2 reasoning is (2/3)·33.333 = 22.22 and so on.

University of Chicago Economics PhDs; Other Economics PhDs; CEOs; The Caltech Board (eminent in various fields)

How bad is Human Judgment? Peter Ayton Department of Psychology City University, London.

Documents

Transcript of How bad is Human Judgment? Peter Ayton Department of Psychology City University, London.