9.94 The cognitive science of intuitive theories J. Tenenbaum, T. Lombrozo, L. Schulz, R. Saxe.

Post on 24-Dec-2015

213 views 0 download

Tags:

Transcript of 9.94 The cognitive science of intuitive theories J. Tenenbaum, T. Lombrozo, L. Schulz, R. Saxe.

9.94 The cognitive science of intuitive theories

J. Tenenbaum, T. Lombrozo,

L. Schulz, R. Saxe

Plan for the class

• Goal: – Introduce you to an exciting field of research

unfolding at MIT and in the broader world.

Plan for the class• An integrated series of talks and discussions

– Today: Josh Tenenbaum (MIT), “Computational models of theory-based inductive inference”

– Tuesday: Tania Lombrozo (Harvard/Berkeley) “Explanation in intuitive theories”

– Wednesday: Rebecca Saxe (Harvard/MIT), “Understanding other minds”

– Thursday: Laura Schulz (MIT), “Theories and evidence”

– Friday: Special Mystery Guest, “When theories fail”

Plan for the class• Requirements for credit (pass/fail, 3 units)

– Attend the classes– Participate in discussions– Take-home quiz:

• Emailed to you this weekend (after the class)

• Due back to me by email on Wednesday, Feb. 1

If you are not registered on the list below, make sure to register and send me an email message at: jbt@mit.edu

Class list

Azeez,Zainab O

Belova,Nadezhda

Brenman,Stephanie V

Cherian,Tharian S

Clark,Abigail M

Clark,Samuel D

Curry,Justin M

Dai,Jizi

Dean,Clare

Liu,Yicong

McGuire,Salafa'anius

Ovadya,Aviv

Poon,Samuel H

Pradhan,Nikhil T

Ren,Danan T

Rotter,Juliana C

Slagle,Amy M

Tang,Di

Ferranti,Darlene E

Frazier,Jonathan J

Gordon,Matthew A

Green,Delbert A

Huhn,Anika M

Hunt,Beatrice P.

Kamrowski,Kaitlin M

Kanaga,Noelle J

Kwon,Jennifer

Taub,Daniel M

Tung,Roland

Voelbel,Kathleen

Vosoughi,Soroush

Willmer,Anjuli J

Ye,Diana F

Yuen,Grace J

Zhao,Bo

Scheduling

• Today:– Go to 4:30 (with a break in the middle)?

• Friday:– An hour earlier: 1:00 – 3:00?

The big problem of intelligence

How does the mind get so much out of so little?

The big problem of intelligence

How does the mind get so much out of so little?

Three-dimensional:

Two-dimensional:

The big problem of intelligence

How can we generalize new concepts reliably from just one or a few examples? – Learning word meanings

“horse” “horse” “horse”

The objects of planet Gazoob“tufa”

“tufa”

“tufa”

The big problem of intelligence

How can we generalize new concepts reliably from just one or a few examples? – Learning about new properties of categories

Cows have T4 hormones.Bees have T4 hormones.Salmon have T4 hormones.

Humans have T4 hormones.

Cows have T4 hormones.Goats have T4 hormones.Sheep have T4 hormones.

Humans have T4 hormones.

The big problem of intelligence

How do we use concepts in ways that go beyond our experience?

• “dog”

• Is it still a dog if you…– Put a penguin costume on it?

– Surgically alter it until it looks just like a penguin?

– Pre-natally inject a substance that causes it to look just like a penguin? … and it can mate with penguins and produce penguin offspring?

The big problem of intelligence

How do we use concepts in ways that go beyond our experience?

• Two cars were reported stolen by the Groveton police yesterday.

• The judge sentenced the killer to die in the electric chair for the second time.

• No one was injured in the blast, which was attributed to a buildup of gas by one town official.

• One witness told the commissioners that she had seen sexual intercourse taking place between two parked cars in front of her house.

The big problem of intelligence

How do we use concepts in ways that go beyond our experience?

Consider a man named Boris. – Is the mother of Boris’s father his grandmother?

– Is the mother of Boris’s sister his mother?

– Is the son of Boris’s sister his son?

(Note: Boris and his family were stranded on a desert island when he was a young boy.)

What makes us so smart?

• Memory?

• Logical inference?

What makes us so smart?

• Memory? No.– The difference between a test that you can pass on

rote memory and a test that shows whether you “actually learned something”.

• Logical inference? No.– The difference between deductive inference and

inductive inference.

Modes of inference

• Deductive inference:

• Inductive inference:

All mammals have biotinic acid in their blood.Horses are mammals.

Horses have biotinic acid in their blood.

Horses have biotinic acid in their blood.Horses are mammals.

All mammals have biotinic acid in their blood.

What makes us so smart?

• Intuitive theories– Systems of concepts that are in some important

respects like scientific theories. – Abstract knowledge that supports prediction,

explanation, exploration, and decision-making for an infinite range of situations that we have not previously encountered.

Some questions about intuitive theories

• What is their content? • How are they represented in the mind or brain?• How are they used to generalize to new situations? • How are they acquired?

Some questions about intuitive theories

• What is their content?

• How are they represented in the mind or brain?

• How are they used to generalize to new situations?

• How are they acquired?

• Can they be described in computational terms?

• In what essential ways are they similar to or different from scientific theories?

• How good (accurate, comprehensive, rich) are they, under what circumstances? What can we learn from their failures?

What can we learn from perceptual or cognitive illusions?

• Goal of visual perception is to recover world structure from visual images.

• Why the problem is hard: many world structures can produce the same visual input.

Scene hypotheses

Image data

What can we learn from perceptual or cognitive illusions?

• Goal of visual perception is to recover world structure from visual images.

• Why the problem is hard: many world structures can produce the same visual input.

• Illusions reveal the visual system’s implicit theories of the physical world and the process of image formation.

Computational models of theory-based inductive inference

Josh TenenbaumDepartment of Brain and Cognitive Sciences

Computer Science and Artificial Intelligence Laboratory

MIT

Plan for today

• A general framework for solving under-constrained inference problems– Bayesian inference

• Applications in perception and cognition– lightness perception

– predicting the future (with Tom Griffiths)

– learning about properties of natural species (with Charles Kemp)

Modes of inference

• Deductive inference:

• Inductive inference:

All mammals have biotinic acid in their blood.Horses are mammals.

Horses have biotinic acid in their blood.

Horses have biotinic acid in their blood.Horses are mammals.

All mammals have biotinic acid in their blood.

Logic

Probability

Bayesian inference

• Definition of conditional probability:

• Bayes’ rule:

• “Posterior probability”:• “Prior probability”:• “Likelihood”:

)|()()|()(),( BAPBPABPAPBAP

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

)|( dhP

)(hP

)|( hdP

Bayesian inference

• Bayes’ rule:• An example

– Data: John is coughing – Some hypotheses:

1. John has a cold

2. John has emphysema

3. John has a stomach flu

– Prior favors 1 and 3 over 2– Likelihood P(d|h) favors 1 and 2 over 3– Posterior P(d|h) favors 1 over 2 and 3

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

Bayesian inference

• Bayes’ rule:

• What makes a good scientific argument? P(h|d) is high if:– Hypothesis is plausible: P(h) is high

– Hypothesis strongly predicts the observed data:

P(d|h) is high

– Data are surprising: is low

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

ih

ii hdPhPdP )|()()(

Coin flipping

HHTHT

HHHHH

What process produced these sequences?

Comparing two simple hypotheses

• Contrast simple hypotheses:– H1: “fair coin”, P(H) = 0.5

– H2:“always heads”, P(H) = 1.0

• Bayes’ rule:

• With two hypotheses, use odds form

)(

)|()()|(

DP

HDPHPDHP

Comparing two simple hypotheses

D: HHTHTH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/25 P(H1) = ?

P(D|H2) = 0 P(H2) = 1-?

)(

)(

)|(

)|(

)|(

)|(

2

1

2

1

2

1

HP

HP

HDP

HDP

DHP

DHP

Comparing two simple hypotheses

D: HHTHTH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/25 P(H1) = 999/1000

P(D|H2) = 0 P(H2) = 1/1000infinity

1

999

0

321

)|(

)|(

2

1 DHP

DHP

)(

)(

)|(

)|(

)|(

)|(

2

1

2

1

2

1

HP

HP

HDP

HDP

DHP

DHP

Comparing two simple hypotheses

D: HHHHHH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/25 P(H1) = 999/1000

P(D|H2) = 1 P(H2) = 1/1000

301

999

32

1

)|(

)|(

2

1 DHP

DHP

)(

)(

)|(

)|(

)|(

)|(

2

1

2

1

2

1

HP

HP

HDP

HDP

DHP

DHP

Comparing two simple hypotheses

D: HHHHHHHHHHH1, H2: “fair coin”, “always heads”

P(D|H1) = 1/210 P(H1) = 999/1000

P(D|H2) = 1 P(H2) = 1/1000

)(

)(

)|(

)|(

)|(

)|(

2

1

2

1

2

1

HP

HP

HDP

HDP

DHP

DHP

11

999

1024

1

)|(

)|(

2

1 DHP

DHP

The role of intuitive theories

The fact that HHTHT looks representative of a fair coin and HHHHH does not reflects our implicit theories of how the world works. – Easy to imagine how a trick all-heads coin

could work: high prior probability.– Hard to imagine how a trick “HHTHT” coin

could work: low prior probability.

Plan for today

• A general framework for solving under-constrained inference problems– Bayesian inference

• Applications in perception and cognition– lightness perception

– predicting the future (with Tom Griffiths)

– learning about properties of natural species (with Charles Kemp)

Gelb / Gilchrist demo

Explaining the illusion

• The problem of lightness constancy– Separating the intrinsic reflectance (“color”) of a

surface from the intensity of the illumination.

• Anchoring heuristic: – Assume that the brightest patch in each scene is white.

• Questions:– Is this really right?– Why (and when) is it a good solution to the problem of

lightness constancy?

Why is lightness constancy hard?

• The physics of light reflection:

L = I x R

L: luminance (light emitted from surface)

I: intensity of illumination in the world

R: reflectance of surface in the world

• The problem: Given L, solve for I and R.

Why is lightness constancy hard?

• The physics of light reflection:

L1 = I x R1

L2 = I x R2

...

Ln = I x Rn

• The problem: Given L1, …, Ln, solve for I and R1, …, Rn.

L = {2, 4, 5, 9}

Scene hypotheses

Image data

L = I x RI = 10 R = {0.2, 0.4, 0.5, 0.9}

I = 100 R = {0.02, 0.04, 0.05, 0.09}

I = 15 R = {0.13, 0.26, 0.33, 0.60}

Why is lightness constancy hard?

A simplified theory of the visual world

• Really bright illuminants are rare.

P(I)

I 0

P(I)

I 0

A simplified theory of the visual world

• Really bright illuminants are rare.

• Any surface color is equally likely.

P(Ri)

Ri 0 1 (black) (white)

P(I)

I 0

P(I)

I 0

A simplified theory of the visual world

• Really bright illuminants are rare.

• Observed luminances, Li = I x Ri , are a random sample from 0 to I.

P(Li|I )

Li 0 I

P(I)

I 0

P(I)

I 0

A simplified theory of the visual world

• Really bright illuminants are rare.

• Observed luminances, Li = I x Ri , are a random sample from 0 to I.

P(Li|I )

Li 0 I’

P(I)

I 0

P(I)

I 0

I

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

h1

P(h1): high

h2

P(h2): med

h3

P(h3): low

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

L = {9}

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

h1

P(h1): high

h2

P(h2): med

h3

P(h3): low

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

L = {2, 4, 5, 9}

Prior probabilityalone can’t explainhow inference changeswith more data.

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

h1

P(h1): high

h2

P(h2): med

h3

P(h3): low

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

L = {9}

nIhdP

1)|(

Scene hypotheses h

Image data d

I = 10

I = 15

I = 100

h1

P(h1): high

h2

P(h2): med

h3

P(h3): low

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

L = {2, 4, 5, 9}

nIhdP

1)|(

4101

4151

41001

L

9

0

p(L

= l

| I )

I =10

I =15

p({l1}| I=10) p({l1}| I=15)

Graphing the likelihood

~

Graphing the likelihood

L

2 4 5 9

0

p(L

= l

| I )

I =10

I =15

p({l1, l2, l3, l4,}| I=10) >> p({l1, l2, l3, l4,}| I=15)

Explanations lightness constancy

• Anchoring heuristic: Assume that the brightest patch in each scene is white.– Is this really right?

– Why (and when) is it a good solution to the problem?

• Bayesian analysis– Explains the computational basis for inference.

– Explains why confidence in “brightest = white” increases as more samples are observed.

Applications to cognition

• Predicting the future (with Tom Griffiths)

• Learning about properties of natural species (with Charles Kemp)

Everyday prediction problems• You read about a movie that has made $60 million to date.

How much money will it make in total?

• You see that something has been baking in the oven for 34 minutes. How long until it’s ready?

• You meet someone who is 78 years old. How long will they live?

• Your friend quotes to you from line 17 of his favorite poem. How long is the poem?

• You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?

Making predictions

• You encounter a phenomenon that has existed for tpast units of time. How long will

it continue into the future? (i.e. what’s ttotal?)

• We could replace “time” with any other variable that ranges from 0 to some unknown upper limit (c.f. lightness).

Bayesian inference

P(ttotal|tpast) P(tpast|ttotal) P(ttotal)

posterior probability

likelihood prior

Bayesian inference

P(ttotal|tpast) P(tpast|ttotal) P(ttotal)

P(ttotal|tpast) 1/ttotal P(ttotal)

posterior probability

likelihood prior

Assume randomsample

(0 < tpast < ttotal)

Bayesian inference

P(ttotal|tpast) P(tpast|ttotal) P(ttotal)

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

likelihood prior

“Uninformative” prior

Assume randomsample

(0 < tpast < ttotal)

How about maximal value of P(ttotal|tpast)?

Bayesian inference

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

What is the best guess for ttotal?

P(ttotal|tpast)

ttotalttotal = tpast

Randomsampling

“Uninformative” prior

Bayesian inference

P(ttotal|tpast)

ttotal

What is the best guess for ttotal? Instead, compute t such that P(ttotal > t|tpast) = 0.5:

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

Randomsampling

“Uninformative” prior

Bayesian inference

Yields Gott’s Rule: P(ttotal > t|tpast) = 0.5 when t = 2tpast

i.e., best guess for ttotal = 2tpast .

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

Randomsampling

“Uninformative” prior

What is the best guess for ttotal? Instead, compute t such that P(ttotal > t|tpast) = 0.5.

Evaluating Gott’s Rule

• You read about a movie that has made $78 million to date. How much money will it make in total?– “$156 million” seems reasonable.

• You meet someone who is 35 years old. How long will they live?– “70 years” seems reasonable.

• Not so simple:– You meet someone who is 78 years old. How long will they live?

– You meet someone who is 6 years old. How long will they live?

The effects of priors

• Different kinds of priors P(ttotal) are appropriate in different domains.

Gott: P(ttotal) ttotal-1

The effects of priors

• Different kinds of priors P(ttotal) are appropriate in different domains.

e.g., wealth, contacts

e.g., height, lifespan

The effects of priors

Evaluating human predictions

• Different domains with different priors:– A movie has made $60 million– Your friend quotes from line 17 of a poem– You meet a 78 year old man– A move has been running for 55 minutes – A U.S. congressman has served for 11 years– A cake has been in the oven for 34 minutes

• Use 5 values of tpast for each.

• People predict ttotal .

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

How long did the typicalpharaoh reign in ancientegypt?

Assumptions guiding inference

• Random sampling

• Strong prior knowledge – Form of the prior (power-law or exponential)– Specific distribution given that form

(parameters)– Non-parametric distribution when necessary.

• With these assumptions, strong predictions can be made from a single observation

Applications to cognition

• Predicting the future (with Tom Griffiths)

• Learning about properties of natural species (with Charles Kemp)

Which argument is stronger?

Cows have biotinic acid in their blood

Horses have biotinic acid in their blood

Rhinos have biotinic acid in their blood

All mammals have biotinic acid in their blood

Cows have biotinic acid in their blood

Dolphins have biotinic acid in their blood

Squirrels have biotinic acid in their blood

All mammals have biotinic acid in their blood

“Diversity phenomenon”

Osherson, Smith, Wilkie, Lopez, Shafir (1990):

• 20 subjects rated the strength of 45 arguments:

X1 have property P.

X2 have property P.

X3 have property P.

All mammals have property P.

• 40 different subjects rated the similarity of all pairs of 10 mammals.

Traditional psychological models

Osherson et al. consider two similarity-based models:

• Sum-Similarity:

• Max-Similarity:

mammals

),sim()mammals all(i Xj

ji| XP

),sim(max)mammals all(mammals

ji| XPi Xj

Model

Dat

aData vs. models

Each “ ” represents one argument:X1 have property P.X2 have property P.X3 have property P.

All mammals have property P.

.

Open questions

• Explaining similarity:– Why does Max-sim fit so well? When worse?– Why does Sum-sim fit so poorly? When better?

• Explaining Max-sim:– Is there some rational computation that Max-sim

implements or approximates?– What theory about this task and domain is implicit in

Max-sim?

(c.f., analysis of lightness constancy)

• Species generated by an evolutionary branching process.– A tree-structured taxonomy of species.

• Taxonomy also central in folkbiology (Atran).

A simplified theory of biology

Theory-based Bayesian model

Begin by reconstructing intuitive taxonomy from similarity judgments:

chim

pgo

rilla

hors

eco

wel

epha

ntrh

ino

mou

sesq

uirre

ldo

lphi

nse

al

clustering

Hypothesis space H: each taxonomic cluster is a possible hypothesis for the extension of the novel property.

chim

pgo

rilla

hors

eco

wel

epha

ntrh

ino

mou

sesq

uirre

ldo

lphi

nse

al

h1

h3

h6

h17 . . .

h0: “all mammals”

Theory-based Bayesian model

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

Hh

hphXp

hXpXp

)()|(

)|()|mammals all( 0

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

h0: “all mammals”

p(h): uniform

How taxonomy constrains induction

• Atran (1998): “Fundamental principle of systematic induction” (Warburton 1967, Bock 1973)– Given a property found among members of any

two species, the best initial hypothesis is that the property is also present among all species that are included in the smallest higher-order taxon containing the original pair of species.

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

“all mammals”

Cows have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

Strong (0.76 [max = 0.82])

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

Cows have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

Cows have property P.Horses have property P.Rhinos have property P.

All mammals have property P.

“large herbivores”

Strong: 0.76 [max = 0.82]) Weak: 0.17 [min = 0.14]

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

Seals have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

Cows have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

“all mammals”

Strong: 0.76 [max = 0.82] Weak: 0.30 [min = 0.14]

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

“all mammals” “horses” “horses”

3 2 1, 2, or 3

Bayes(taxonomic)

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

“all mammals”

3

Seals have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

Cows have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

Bayes(taxonomic)

A simplified theory of biology

• Species generated by an evolutionary branching process.– A tree-structured taxonomy of species.

• Features generated by stochastic mutation process and passed on to descendants. – Novel features can appear anywhere in tree, but

some distributions are more likely than others.

Hypothesis space H: each taxonomic cluster is a possible hypothesis for the extension of a novel feature.

chim

pgo

rilla

hors

eco

wel

epha

ntrh

ino

mou

sesq

uirre

ldo

lphi

nse

al

h1

h3

h6

h17 . . .

h0: “all mammals”

Theory-based Bayesian model

Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

bebFp 1) along develops (

Theory-based Bayesian model

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

bebFp 1) along develops (

Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:

Theory-based Bayesian model

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

bebFp 1) along develops (

Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:

Theory-based Bayesian model

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

bebFp 1) along develops (

Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:

Theory-based Bayesian model

Samples from the prior

• Labelings that cut the data along longer branches are more probable:

>x

x

chim

pgo

rilla

horse cow

eleph

ant

rhin

om

ouse

squi

rrel

dolp

hin

seal

chim

pgo

rilla

horse cow

eleph

ant

rhin

om

ouse

squi

rrel

dolp

hin

seal

Samples from the prior

• Labelings that cut the data along fewer branches are more probable:

>

“monophyletic” “polyphyletic”

x xx

chim

pgo

rilla

horse cow

eleph

ant

rhin

om

ouse

squi

rrel

dolp

hin

seal

chim

pgo

rilla

horse cow

eleph

ant

rhin

om

ouse

squi

rrel

dolp

hin

seal

elep

hant

squi

rrel

chim

pgo

rilla

hors

eco

w

rhin

om

ouse

dolp

hin

seal

Hh

hphXp

hXpXp

)()|(

)|()|mammals all( 0

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

h0: “all mammals”

p(h): “evolutionary” process (mutation + inheritance)

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

“all mammals” “horses” “horses”

3 2 1, 2, or 3

Bayes(taxonomic)

Max-sim

Sum-sim

Conclusionkind:

Number ofexamples:

“all mammals” “horses” “horses”

3 2 1, 2, or 3

Bayes(taxonomy+mutation)

Explaining similarity

• Why does Max-sim fit so well? – An efficient and accurate approximation to

Bayesian (evolutionary) model.

Correlation (r)

Correlation with Bayeson three-premise general arguments,over 100 simulated tree structures:

Mean r = 0.94

There’s also a theorem.

Biology: Summary• Theory-based statistical inference explains inductive

reasoning in folk biology.

• Mathematical modeling reveals people’s implicit theories about the world. – Category structure: taxonomic tree.– Feature distribution: stochastic mutation process +

inheritance.

• Clarifies traditional psychological models.– Why Max-sim over Sum-sim?

Beyond taxonomic similarity• Generalization

based on known dimensions: (Smith et al., 1993; Blok et al., 2002)

Poodles can bite through wire.

German shepherds can bite through wire.

Dobermans can bite through wire.

German shepherds can bite through wire.

Beyond taxonomic similarity• Generalization

based on known dimensions: (Smith et al., 1993; Blok et al., 2002)

• Generalization based on causal relations: (Medin et al., 2004; Shafto & Coley, 2003)

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon carry E. Spirus bacteria.

Poodles can bite through wire.

German shepherds can bite through wire.

Dobermans can bite through wire.

German shepherds can bite through wire.

Predicate type “has T4 hormones” “can bite through wire” “carries E. Spirus bacteria” Generative theory taxonomic tree directed chain directed network + mutation + unknown threshold + noisy transmission

Class C

Class A

Class D

Class E

Class G

Class F

Class BClass C

Class A

Class D

Class E

Class G

Class F

Class B

Class AClass BClass CClass DClass EClass FClass G

. . . . . . . . .

Class C

Class G

Class F

Class E

Class D

Class B

Class A

Hypotheses

Kelp

Human

Dolphin

Sand shark

Mako shark

Tuna

Herring

KelpHuman

Dolphin

Sand shark

Mako shark Tuna Herring

Island ecosystem

Taxonomy

Food web

(Shafto, Kemp, Baraff, Coley, Tenenbaum)

DatasetsModels

Mammalecosystem:- disease

- genetic property

Island ecosystem:- disease

- genetic property

0.75 -0.15 0.07

0.25 0.92 0.87

0.79 0.01 0.17

0.31 0.89 0.86

r =

Bayes Bayes Max-(food web) (tree) Sim

Assumptions guiding inferences

• Qualitatively different priors are appropriate for different domains of inductive generalizaiton.

• In each domain, a prior that matches the world’s structure fits people’s inductive judgments better than alternative priors.

• A common framework for representing people’s domain models: a graph structure defined over entities or classes, and a probability distribution for predicates over that graph.

Conclusion• The hard problem of intelligence: how do we “go

beyond the information given”?

• The solution:– Bayesian statistical inference:

– Implicit theories about the structure of the world, generating P(h) and P(d | h).

Cows have property P.Dolphins have property P.Squirrels have property P.

All mammals have property P.

ihii hdPhP

hdPhPdhP

)|()(

)|()()|(

Discussion

• How is this intuitive theory of biology like or not like a scientific theory?

• In what sense does the visual system have a theory of the world? How is it like or not like a cognitive theory of biology, or a scientific theory?