9.94 The cognitive science of intuitive theories J. Tenenbaum, T. Lombrozo, L. Schulz, R. Saxe.

9.94 The cognitive science of intuitive theories

J. Tenenbaum, T. Lombrozo,

L. Schulz, R. Saxe

Plan for the class

• Goal: – Introduce you to an exciting field of research

unfolding at MIT and in the broader world.

Plan for the class• An integrated series of talks and discussions

– Today: Josh Tenenbaum (MIT), “Computational models of theory-based inductive inference”

– Tuesday: Tania Lombrozo (Harvard/Berkeley) “Explanation in intuitive theories”

– Wednesday: Rebecca Saxe (Harvard/MIT), “Understanding other minds”

– Thursday: Laura Schulz (MIT), “Theories and evidence”

– Friday: Special Mystery Guest, “When theories fail”

Plan for the class• Requirements for credit (pass/fail, 3 units)

– Attend the classes– Participate in discussions– Take-home quiz:

• Emailed to you this weekend (after the class)

• Due back to me by email on Wednesday, Feb. 1

If you are not registered on the list below, make sure to register and send me an email message at: jbt@mit.edu

Class list

Azeez,Zainab O

Belova,Nadezhda

Brenman,Stephanie V

Cherian,Tharian S

Clark,Abigail M

Clark,Samuel D

Curry,Justin M

Dai,Jizi

Dean,Clare

Liu,Yicong

McGuire,Salafa'anius

Ovadya,Aviv

Poon,Samuel H

Pradhan,Nikhil T

Ren,Danan T

Rotter,Juliana C

Slagle,Amy M

Tang,Di

Ferranti,Darlene E

Frazier,Jonathan J

Gordon,Matthew A

Green,Delbert A

Huhn,Anika M

Hunt,Beatrice P.

Kamrowski,Kaitlin M

Kanaga,Noelle J

Kwon,Jennifer

Taub,Daniel M

Tung,Roland

Voelbel,Kathleen

Vosoughi,Soroush

Willmer,Anjuli J

Ye,Diana F

Yuen,Grace J

Zhao,Bo

Scheduling

• Today:– Go to 4:30 (with a break in the middle)?

• Friday:– An hour earlier: 1:00 – 3:00?

The big problem of intelligence

How does the mind get so much out of so little?

Three-dimensional:

Two-dimensional:

How can we generalize new concepts reliably from just one or a few examples? – Learning word meanings

“horse” “horse” “horse”

The objects of planet Gazoob“tufa”

“tufa”

How can we generalize new concepts reliably from just one or a few examples? – Learning about new properties of categories

Cows have T4 hormones.Bees have T4 hormones.Salmon have T4 hormones.

Humans have T4 hormones.

Cows have T4 hormones.Goats have T4 hormones.Sheep have T4 hormones.

Humans have T4 hormones.

How do we use concepts in ways that go beyond our experience?

• “dog”

• Is it still a dog if you…– Put a penguin costume on it?

– Surgically alter it until it looks just like a penguin?

– Pre-natally inject a substance that causes it to look just like a penguin? … and it can mate with penguins and produce penguin offspring?

• Two cars were reported stolen by the Groveton police yesterday.

• The judge sentenced the killer to die in the electric chair for the second time.

• No one was injured in the blast, which was attributed to a buildup of gas by one town official.

• One witness told the commissioners that she had seen sexual intercourse taking place between two parked cars in front of her house.

Consider a man named Boris. – Is the mother of Boris’s father his grandmother?

– Is the mother of Boris’s sister his mother?

– Is the son of Boris’s sister his son?

(Note: Boris and his family were stranded on a desert island when he was a young boy.)

What makes us so smart?

• Memory?

• Logical inference?

• Memory? No.– The difference between a test that you can pass on

rote memory and a test that shows whether you “actually learned something”.

• Logical inference? No.– The difference between deductive inference and

inductive inference.

Modes of inference

• Deductive inference:

• Inductive inference:

All mammals have biotinic acid in their blood.Horses are mammals.

Horses have biotinic acid in their blood.

Horses have biotinic acid in their blood.Horses are mammals.

All mammals have biotinic acid in their blood.

• Intuitive theories– Systems of concepts that are in some important

respects like scientific theories. – Abstract knowledge that supports prediction,

explanation, exploration, and decision-making for an infinite range of situations that we have not previously encountered.

Some questions about intuitive theories

• What is their content? • How are they represented in the mind or brain?• How are they used to generalize to new situations? • How are they acquired?

Some questions about intuitive theories

• What is their content?

• How are they represented in the mind or brain?

• How are they used to generalize to new situations?

• How are they acquired?

• Can they be described in computational terms?

• In what essential ways are they similar to or different from scientific theories?

• How good (accurate, comprehensive, rich) are they, under what circumstances? What can we learn from their failures?

What can we learn from perceptual or cognitive illusions?

• Goal of visual perception is to recover world structure from visual images.

• Why the problem is hard: many world structures can produce the same visual input.

Scene hypotheses

Image data

What can we learn from perceptual or cognitive illusions?

• Goal of visual perception is to recover world structure from visual images.

• Why the problem is hard: many world structures can produce the same visual input.

• Illusions reveal the visual system’s implicit theories of the physical world and the process of image formation.

Computational models of theory-based inductive inference

Josh TenenbaumDepartment of Brain and Cognitive Sciences

Computer Science and Artificial Intelligence Laboratory

Plan for today

• A general framework for solving under-constrained inference problems– Bayesian inference

• Applications in perception and cognition– lightness perception

– predicting the future (with Tom Griffiths)

– learning about properties of natural species (with Charles Kemp)

Modes of inference

• Deductive inference:

• Inductive inference:

All mammals have biotinic acid in their blood.Horses are mammals.

Horses have biotinic acid in their blood.

Horses have biotinic acid in their blood.Horses are mammals.

All mammals have biotinic acid in their blood.

Probability

Bayesian inference

• Definition of conditional probability:

• Bayes’ rule:

• “Posterior probability”:• “Prior probability”:• “Likelihood”:

)|()()|()(),( BAPBPABPAPBAP

ihii hdPhP

hdPhPdhP

)|()()|(

)|( dhP

)|( hdP

Bayesian inference

• Bayes’ rule:• An example

– Data: John is coughing – Some hypotheses:

1. John has a cold

2. John has emphysema

3. John has a stomach flu

– Prior favors 1 and 3 over 2– Likelihood P(d|h) favors 1 and 2 over 3– Posterior P(d|h) favors 1 over 2 and 3

ihii hdPhP

hdPhPdhP

)|()()|(

Bayesian inference

• Bayes’ rule:

• What makes a good scientific argument? P(h|d) is high if:– Hypothesis is plausible: P(h) is high

– Hypothesis strongly predicts the observed data:

P(d|h) is high

– Data are surprising: is low

ihii hdPhP

hdPhPdhP

)|()()|(

ii hdPhPdP )|()()(

Coin flipping

What process produced these sequences?

Comparing two simple hypotheses

• Contrast simple hypotheses:– H1: “fair coin”, P(H) = 0.5

– H2:“always heads”, P(H) = 1.0

• Bayes’ rule:

• With two hypotheses, use odds form

)|()()|(

HDPHPDHP

D: HHTHTH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/25 P(H1) = ?

P(D|H2) = 0 P(H2) = 1-?

D: HHTHTH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/25 P(H1) = 999/1000

P(D|H2) = 0 P(H2) = 1/1000infinity

D: HHHHHH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/25 P(H1) = 999/1000

P(D|H2) = 1 P(H2) = 1/1000

D: HHHHHHHHHHH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/210 P(H1) = 999/1000

P(D|H2) = 1 P(H2) = 1/1000

The role of intuitive theories

The fact that HHTHT looks representative of a fair coin and HHHHH does not reflects our implicit theories of how the world works. – Easy to imagine how a trick all-heads coin

could work: high prior probability.– Hard to imagine how a trick “HHTHT” coin

could work: low prior probability.

Plan for today

• A general framework for solving under-constrained inference problems– Bayesian inference

• Applications in perception and cognition– lightness perception

– predicting the future (with Tom Griffiths)

– learning about properties of natural species (with Charles Kemp)

Gelb / Gilchrist demo

Explaining the illusion

• The problem of lightness constancy– Separating the intrinsic reflectance (“color”) of a

surface from the intensity of the illumination.

• Anchoring heuristic: – Assume that the brightest patch in each scene is white.

• Questions:– Is this really right?– Why (and when) is it a good solution to the problem of

lightness constancy?

Why is lightness constancy hard?

• The physics of light reflection:

L = I x R

L: luminance (light emitted from surface)

I: intensity of illumination in the world

R: reflectance of surface in the world

• The problem: Given L, solve for I and R.

• The physics of light reflection:

L1 = I x R1

L2 = I x R2

Ln = I x Rn

• The problem: Given L1, …, Ln, solve for I and R1, …, Rn.

L = {2, 4, 5, 9}

Scene hypotheses

Image data

L = I x RI = 10 R = {0.2, 0.4, 0.5, 0.9}

I = 100 R = {0.02, 0.04, 0.05, 0.09}

I = 15 R = {0.13, 0.26, 0.33, 0.60}

A simplified theory of the visual world

• Really bright illuminants are rare.

• Any surface color is equally likely.

Ri 0 1 (black) (white)

• Observed luminances, Li = I x Ri , are a random sample from 0 to I.

P(Li|I )

Li 0 I

• Observed luminances, Li = I x Ri , are a random sample from 0 to I.

P(Li|I )

Li 0 I’

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

P(h1): high

P(h2): med

P(h3): low

ihii hdPhP

hdPhPdhP

)|()()|(

L = {9}

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

P(h1): high

P(h2): med

P(h3): low

ihii hdPhP

hdPhPdhP

)|()()|(

L = {2, 4, 5, 9}

Prior probabilityalone can’t explainhow inference changeswith more data.

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

P(h1): high

P(h2): med

P(h3): low

ihii hdPhP

hdPhPdhP

)|()()|(

L = {9}

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

P(h1): high

P(h2): med

P(h3): low

ihii hdPhP

hdPhPdhP

)|()()|(

L = {2, 4, 5, 9}

p({l1}| I=10) p({l1}| I=15)

Graphing the likelihood

2 4 5 9

p({l1, l2, l3, l4,}| I=10) >> p({l1, l2, l3, l4,}| I=15)

Explanations lightness constancy

• Anchoring heuristic: Assume that the brightest patch in each scene is white.– Is this really right?

– Why (and when) is it a good solution to the problem?

• Bayesian analysis– Explains the computational basis for inference.

– Explains why confidence in “brightest = white” increases as more samples are observed.

Applications to cognition

• Predicting the future (with Tom Griffiths)

• Learning about properties of natural species (with Charles Kemp)

Everyday prediction problems• You read about a movie that has made $60 million to date.

How much money will it make in total?

• You see that something has been baking in the oven for 34 minutes. How long until it’s ready?

• You meet someone who is 78 years old. How long will they live?

• Your friend quotes to you from line 17 of his favorite poem. How long is the poem?

• You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?

Making predictions

• You encounter a phenomenon that has existed for tpast units of time. How long will

it continue into the future? (i.e. what’s ttotal?)

• We could replace “time” with any other variable that ranges from 0 to some unknown upper limit (c.f. lightness).

Bayesian inference

P(ttotal|tpast) P(tpast|ttotal) P(ttotal)

posterior probability

likelihood prior

Bayesian inference

P(ttotal|tpast) 1/ttotal P(ttotal)

likelihood prior

Assume randomsample

(0 < tpast < ttotal)

Bayesian inference

P(ttotal|tpast) 1/ttotal 1/ttotal

likelihood prior

“Uninformative” prior

Assume randomsample

(0 < tpast < ttotal)

How about maximal value of P(ttotal|tpast)?

Bayesian inference

What is the best guess for ttotal?

P(ttotal|tpast)

ttotalttotal = tpast

Randomsampling

Bayesian inference

P(ttotal|tpast)

ttotal

What is the best guess for ttotal? Instead, compute t such that P(ttotal > t|tpast) = 0.5:

Randomsampling

Bayesian inference

Yields Gott’s Rule: P(ttotal > t|tpast) = 0.5 when t = 2tpast

i.e., best guess for ttotal = 2tpast .

Randomsampling

What is the best guess for ttotal? Instead, compute t such that P(ttotal > t|tpast) = 0.5.

Evaluating Gott’s Rule

• You read about a movie that has made $78 million to date. How much money will it make in total?– “$156 million” seems reasonable.

• You meet someone who is 35 years old. How long will they live?– “70 years” seems reasonable.

• Not so simple:– You meet someone who is 78 years old. How long will they live?

– You meet someone who is 6 years old. How long will they live?

The effects of priors

• Different kinds of priors P(ttotal) are appropriate in different domains.

Gott: P(ttotal) ttotal-1

• Different kinds of priors P(ttotal) are appropriate in different domains.

e.g., wealth, contacts

e.g., height, lifespan

Evaluating human predictions

• Different domains with different priors:– A movie has made $60 million– Your friend quotes from line 17 of a poem– You meet a 78 year old man– A move has been running for 55 minutes – A U.S. congressman has served for 11 years– A cake has been in the oven for 34 minutes

• Use 5 values of tpast for each.

• People predict ttotal .

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

How long did the typicalpharaoh reign in ancientegypt?

Assumptions guiding inference

• Random sampling

• Strong prior knowledge – Form of the prior (power-law or exponential)– Specific distribution given that form

(parameters)– Non-parametric distribution when necessary.

• With these assumptions, strong predictions can be made from a single observation

Applications to cognition

• Predicting the future (with Tom Griffiths)

• Learning about properties of natural species (with Charles Kemp)

Which argument is stronger?

Cows have biotinic acid in their blood

Horses have biotinic acid in their blood

Rhinos have biotinic acid in their blood

All mammals have biotinic acid in their blood

Cows have biotinic acid in their blood

Dolphins have biotinic acid in their blood

Squirrels have biotinic acid in their blood

All mammals have biotinic acid in their blood

“Diversity phenomenon”

Osherson, Smith, Wilkie, Lopez, Shafir (1990):

• 20 subjects rated the strength of 45 arguments:

X1 have property P.

X2 have property P.

X3 have property P.

All mammals have property P.

• 40 different subjects rated the similarity of all pairs of 10 mammals.

Traditional psychological models

Osherson et al. consider two similarity-based models:

• Sum-Similarity:

• Max-Similarity:

mammals

),sim()mammals all(i Xj

ji| XP

),sim(max)mammals all(mammals

ji| XPi Xj

aData vs. models

Each “ ” represents one argument:X1 have property P.X2 have property P.X3 have property P.

Open questions

• Explaining similarity:– Why does Max-sim fit so well? When worse?– Why does Sum-sim fit so poorly? When better?

• Explaining Max-sim:– Is there some rational computation that Max-sim

implements or approximates?– What theory about this task and domain is implicit in

Max-sim?

(c.f., analysis of lightness constancy)

• Species generated by an evolutionary branching process.– A tree-structured taxonomy of species.

• Taxonomy also central in folkbiology (Atran).

A simplified theory of biology

Theory-based Bayesian model

Begin by reconstructing intuitive taxonomy from similarity judgments:

clustering

Hypothesis space H: each taxonomic cluster is a possible hypothesis for the extension of the novel property.

h17 . . .

h0: “all mammals”

)|()|mammals all( 0

1)size(

hxi any if 0

p(h): uniform

How taxonomy constrains induction

• Atran (1998): “Fundamental principle of systematic induction” (Warburton 1967, Bock 1973)– Given a property found among members of any

two species, the best initial hypothesis is that the property is also present among all species that are included in the smallest higher-order taxon containing the original pair of species.

“all mammals”

Cows have property P.Dolphins have property P.Squirrels have property P.

Strong (0.76 [max = 0.82])

Cows have property P.Horses have property P.Rhinos have property P.

“large herbivores”

Strong: 0.76 [max = 0.82]) Weak: 0.17 [min = 0.14]

Seals have property P.Dolphins have property P.Squirrels have property P.

“all mammals”

Strong: 0.76 [max = 0.82] Weak: 0.30 [min = 0.14]

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

“all mammals” “horses” “horses”

3 2 1, 2, or 3

Bayes(taxonomic)

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

“all mammals”

Seals have property P.Dolphins have property P.Squirrels have property P.

Bayes(taxonomic)

A simplified theory of biology

• Species generated by an evolutionary branching process.– A tree-structured taxonomy of species.

• Features generated by stochastic mutation process and passed on to descendants. – Novel features can appear anywhere in tree, but

some distributions are more likely than others.

Hypothesis space H: each taxonomic cluster is a possible hypothesis for the extension of a novel feature.

h17 . . .

Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:

bebFp 1) along develops (

Samples from the prior

• Labelings that cut the data along longer branches are more probable:

horse cow

Samples from the prior

• Labelings that cut the data along fewer branches are more probable:

“monophyletic” “polyphyletic”

horse cow

)|()|mammals all( 0

1)size(

hxi any if 0

p(h): “evolutionary” process (mutation + inheritance)

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

3 2 1, 2, or 3

Bayes(taxonomic)

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

3 2 1, 2, or 3

Bayes(taxonomy+mutation)

Explaining similarity

• Why does Max-sim fit so well? – An efficient and accurate approximation to

Bayesian (evolutionary) model.

Correlation (r)

Correlation with Bayeson three-premise general arguments,over 100 simulated tree structures:

Mean r = 0.94

There’s also a theorem.

Biology: Summary• Theory-based statistical inference explains inductive

reasoning in folk biology.

• Mathematical modeling reveals people’s implicit theories about the world. – Category structure: taxonomic tree.– Feature distribution: stochastic mutation process +

inheritance.

• Clarifies traditional psychological models.– Why Max-sim over Sum-sim?

Beyond taxonomic similarity• Generalization

based on known dimensions: (Smith et al., 1993; Blok et al., 2002)

Poodles can bite through wire.

German shepherds can bite through wire.

Dobermans can bite through wire.

Beyond taxonomic similarity• Generalization

based on known dimensions: (Smith et al., 1993; Blok et al., 2002)

• Generalization based on causal relations: (Medin et al., 2004; Shafto & Coley, 2003)

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon carry E. Spirus bacteria.

Poodles can bite through wire.

Dobermans can bite through wire.

Predicate type “has T4 hormones” “can bite through wire” “carries E. Spirus bacteria” Generative theory taxonomic tree directed chain directed network + mutation + unknown threshold + noisy transmission

Class C

Class A

Class D

Class E

Class G

Class F

Class BClass C

Class A

Class D

Class E

Class G

Class F

Class B

Class AClass BClass CClass DClass EClass FClass G

. . . . . . . . .

Class C

Class G

Class F

Class E

Class D

Class B

Class A

Hypotheses

Dolphin

Sand shark

Mako shark

Herring

KelpHuman

Dolphin

Sand shark

Mako shark Tuna Herring

Island ecosystem

Taxonomy

Food web

(Shafto, Kemp, Baraff, Coley, Tenenbaum)

DatasetsModels

Mammalecosystem:- disease

- genetic property

Island ecosystem:- disease

- genetic property

0.75 -0.15 0.07

0.25 0.92 0.87

0.79 0.01 0.17

0.31 0.89 0.86

Bayes Bayes Max-(food web) (tree) Sim

Assumptions guiding inferences

• Qualitatively different priors are appropriate for different domains of inductive generalizaiton.

• In each domain, a prior that matches the world’s structure fits people’s inductive judgments better than alternative priors.

• A common framework for representing people’s domain models: a graph structure defined over entities or classes, and a probability distribution for predicates over that graph.

Conclusion• The hard problem of intelligence: how do we “go

beyond the information given”?

• The solution:– Bayesian statistical inference:

– Implicit theories about the structure of the world, generating P(h) and P(d | h).

ihii hdPhP

hdPhPdhP

)|()()|(

Discussion

• How is this intuitive theory of biology like or not like a scientific theory?

• In what sense does the visual system have a theory of the world? How is it like or not like a cognitive theory of biology, or a scientific theory?

9.94 The cognitive science of intuitive theories J. Tenenbaum, T. Lombrozo, L. Schulz, R. Saxe.

Documents

Transcript of 9.94 The cognitive science of intuitive theories J. Tenenbaum, T. Lombrozo, L. Schulz, R. Saxe.

Structural Thinking about Social Categories: Nadya ......Evidence from Formal Explanations, Generics, and Generalization Nadya Vasilyeva Tania Lombrozo Princeton University Corresponding

Explanation in Intuitive Theories Tania Lombrozo Harvard University / UC Berkeley.

massimo.sbs.arizona.edu file · Web viewThe present regrettable return of neo-empiricism, in the shape of Bayesian learning models (Tenenbaum & Griffiths, 2001; Xu & Tenenbaum, 2007)

Ordinary.differential.equations Tenenbaum Pollard 0486649407

Inez Tenenbaum, of counsel · Inez Tenenbaum, OF COUNSEL Inez Tenenbaum focuses her practice on consumer product safety and risk management, education policy, and alternative dispute

SONY BMG Music Entertainment v. Tenenbaum

What is Cognitive Science? Josh Tenenbaum MLSS 2010.

Daniel Tenenbaum Appointment Letter...2016/08/29 · August 29, 2016 Mr. Daniel Tenenbaum Dear Mr. Tenenbaum: I am pleased to inform you that I hereby appoint you to the Housing Authority

Bayesian models of human inductive learning Josh Tenenbaum MIT.

Tenenbaum - First Cir. Appeal - Brief of US

Sony v. Tenenbaum (Sony Reply Brief)

9.94.Saxe.ppt - MIT

SONY BMG MUSIC ENTERTAINMENT v. Joel TENENBAUM · SONY BMG MUSIC ENTERTAINMENT v. Joel TENENBAUM ... The Music Recording Industry ... the unauthorized and illegal downloading and

Boston Judge Holds Music Filesharing Damages Unconstitutional in Tenenbaum Case

Welcome to NYC.gov | City of New YorkM343 M403 M503 M703 M803 Volume (ft2) 9.94 12.02 14.15 9.94 12.02 14.15 17.75 21.25 28.31 Rated Capacity (lbs.) 27.8 33.7 39.6 27.8 33.7 39.6 59.5

Men's 100m Diamond Discipline 30.06 · Men's 100m Diamond Discipline 30.06.2019 Start list 100m Time: 14:39 Lane Athlete Nat NR PB SB 1 Arthur CISSÉ CIV 9.94 9.94 10.01 2 Filippo

Tenenbaum Opening Brief - Law Office of Ray Beckerman, P.C.beckermanlegal.com/Lawyer_Copyright...Tenenbaum Opening Brief - Law Office of Ray Beckerman, P.C. ... 2 ! Tenenbaum!

Nancy Kanwisher, MIT...Brain Specializations of Social Perception Infants Adults Daniel Harari, Tao Gao (+T1, Tenenbaum) Tao Gao, Harari (+T1, Tenenbaum) Lindsey Powell (Saxe) Peterson,

Brief of Respondent Joel Tenenbaum

Modeling Conceptual Understanding in Image Reference Games · Learning via Explanation Lombrozo TICS’16 4. Learning via Explanation Lombrozo TICS’16 4. Learning via Explanation