Bayesian models of inductive learning and reasoning
Josh TenenbaumMIT
Department of Brain and Cognitive SciencesComputer Science and AI Lab (CSAIL)
Charles Kemp
Pat ShaftoVikash Mansinghka Amy Perfors Lauren Schmidt
Chris Baker Noah Goodman
Collaborators
Tom Griffiths
Everyday inductive leaps
How can people learn so much about the world from such limited evidence?– Learning concepts from examples
“horse” “horse” “horse”
Learning concepts from examples
“tufa”
“tufa”
“tufa”
Everyday inductive leaps
How can people learn so much about the world from such limited evidence?– Kinds of objects and their properties– The meanings of words, phrases, and sentences – Cause-effect relations– The beliefs, goals and plans of other people– Social structures, conventions, and rules
The solution
Prior knowledge (inductive bias).
The solution
Prior knowledge (inductive bias).– How does background knowledge guide learning
from sparsely observed data? – What form does background knowledge take,
across different domains and tasks?– How is background knowledge itself acquired?
The challenge: Can we answer these questions in precise computational terms?
Modeling goals
• Principled quantitative models of human inductive inferences, with broad coverage and a minimum of free parameters and ad hoc assumptions.
• An understanding of how and why human learning and reasoning works, as a species of rational (approximately optimal) statistical inference given the structure of natural environments.
• A two-way bridge to artificial intelligence and machine learning.
Bayesian inference
• Bayes’ rule:
• An example– Data: John is coughing
– Some hypotheses:1. John has a cold
2. John has lung cancer
3. John has a stomach flu
– Likelihood P(d|h) favors 1 and 2 over 3
– Prior probability P(h) favors 1 and 3 over 2
– Posterior probability P(h|d) favors 1 over 2 and 3
Hhii
i
hPhdP
hPhdPdhP
)()|(
)()|()|(
1. How does background knowledge guide learning from sparsely observed data?
Bayesian inference:
2. What form does background knowledge take, across different domains and tasks?
Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories.
3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction.
Flexible nonparametric models in which complexity grows with the data.
The Bayesian modeling toolkit
Hhii
i
hPhdP
hPhdPdhP
)()|(
)()|()|(
A case study: learning about objects and their properties
“Property induction”, “category-based induction” (Rips, 1975; Osherson, Smith et al., 1990)
Gorillas have T9 hormones.Seals have T9 hormones.Squirrels have T9 hormones.
Horses have T9 hormones. Gorillas have T9 hormones.Chimps have T9 hormones.Monkeys have T9 hormones.Baboons have T9 hormones.
Horses have T9 hormones.
Gorillas have T9 hormones.Seals have T9 hormones.Squirrels have T9 hormones.
Flies have T9 hormones.
“Similarity”, “Typicality”,
“Diversity”
• 20 subjects rated the strength of 45 arguments:
X1 have property P. (e.g., Cows have T4 hormones.)
X2 have property P.
X3 have property P.
All mammals have property P. [General argument]
• 20 subjects rated the strength of 36 arguments:X1 have property P.
X2 have property P.
Horses have property P. [Specific argument]
Experiments on property induction(Osherson, Smith, Wilkie, Lopez, Shafir, 1990)
?
?????
??
Features New property
?
HorseCow
ChimpGorillaMouse
SquirrelDolphin
SealRhino
Elephant
85 features for 50 animals (Osherson & Wilkie feature rating task). e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’,
‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
Property induction as acomputational problem
Model
Dat
aSimilarity-based models
Each “ ” represents one argument:X1 have property P.X2 have property P.X3 have property P.
All mammals have property P.
.
Beyond similarity in induction
• Reasoning based on dimensional thresholds: (Smith et al., 1993)
• Reasoning based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003)
Poodles can bite through wire.
German shepherds can bite through wire.
Dobermans can bite through wire.
German shepherds can bite through wire.
Salmon carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Salmon carry E. Spirus bacteria.
1. How does background knowledge guide learning from sparsely observed data?
Bayesian inference:
2. What form does background knowledge take, across different domains and tasks?
Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories.
3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction.
Flexible nonparametric models in which complexity grows with the data.
The Bayesian modeling toolkit
Hhii
i
hPhdP
hPhdPdhP
)()|(
)()|()|(
F: form
S: structure
D: data
Tree with species at leaf nodes
mouse
squirrel
chimp
gorilla
mousesquirrel
chimpgorilla
F1
F2
F3
F4
Ha
s T
9h
orm
on
es
??
?
…
P(structure | form)
P(data | structure)
P(form)
Model overview
F: form
S: structure
D: data
Tree with species at leaf nodes
mouse
squirrel
chimp
gorilla
mousesquirrel
chimpgorilla
F1
F2
F3
F4
Ha
s T
9h
orm
on
es
??
?
…
Model overview
???????
?
HorseCow
ChimpGorillaMouse
SquirrelDolphin
SealRhino
Elephant
... ...
Horses have T9 hormonesRhinos have T9 hormones
Cows have T9 hormones
X
Y
}
Xh
YXh
hP
hP
XYP
with consistent
, with consistent
)(
)(
)|(
Prior P(h)
Hypotheses h
???????
?
HorseCow
ChimpGorillaMouse
SquirrelDolphin
SealRhino
Elephant
... ...
Horses have T9 hormonesRhinos have T9 hormones
Cows have T9 hormones
}
Prediction P(Y | X) Hypotheses h
Prior P(h)
X
Y
Xh
YXh
hP
hP
XYP
with consistent
, with consistent
)(
)(
)|(
HorseCow
ChimpGorillaMouse
SquirrelDolphin
SealRhino
Elephant
... ...
Prior P(h)
Why not just enumerate all logically possible hypothesesalong with their relative prior probabilities?
Where does the prior come from?
Knowledge-based priors
Chimps have T9 hormones.
Gorillas have T9 hormones.
Poodles can bite through wire.
Dobermans can bite through wire.
Salmon carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Taxonomic similarity
Jaw strength
Food web relations
F: form
S: structure
D: data
Tree with species at leaf nodes
mouse
squirrel
chimp
gorilla
mousesquirrel
chimpgorilla
F1
F2
F3
F4
Ha
s T
9h
orm
on
es
??
?
…
Model overview
Smooth: P(h) high
P(D|S): How the structure constrains the data of experience
• Define a stochastic process over structure S that generates candidate property extensions h.– Intuition: properties should vary smoothly over structure.
Not smooth: P(h) low
S
y
P(D|S): How the structure constrains the data of experience
h
ji
T
ij
ji yyd
yySyp
,
2
2
1)(
4
1exp)|(
dij = length of the edge between i and j
(= if i and j are not connected)
A Gaussian prior ~ N(0, ), with (Zhu, Lafferty & Ghahramani, 2003)
).(~ 1 S
)()|( yyhp
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Structure S
Data D
Features
85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
Modeling feature covariance based on distance in graph(Zhu et al., 2003; c.f. Sattath & Tversky, 1977)
Modeling feature covariance based on distance in two-dimensional space(Lawrence, 2004; Smola & Kondor 2003; c.f. Shepard, 1987)
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features New property
Structure S
Data D ?
?????
??
85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
Gorillas have property P.Mice have property P.Seals have property P.
All mammals have property P.
Cows have property P.Elephants have property P.
Horses have property P.
Tre
e
2D
Testing different priors
Correctbias
Wrongbias
Too weakbias
Too strongbias
Inductive bias
x
Spatially varying properties
Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?”
Tre
e
2D
Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria”
Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission
Class C
Class A
Class D
Class E
Class G
Class F
Class BClass C
Class A
Class D
Class E
Class G
Class F
Class B
Class AClass BClass CClass DClass EClass FClass G
. . . . . . . . .
Class C
Class G
Class F
Class E
Class D
Class B
Class A
Hypotheses
Bio
logi
cal
prop
erty
Dis
ease
prop
erty
Tree Web
“Given that A has property P, how likely is it that B does?”
Kelp
Human
Dolphin
Sand shark
Mako shark
Tuna
Herring
Kelp Human
Dolphin
Sand shark
Mako sharkTunaHerring
e.g., P = “has X cells”
e.g., P = “has X disease”
Summary so far• A framework for modeling human inductive
reasoning as rational statistical inference over structured knowledge representations– Qualitatively different priors are appropriate for different
domains of property induction.
– In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors.
– A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph.
• Remaining question: How can we learn appropriate structures for different domains?
Model overview
F: form
S: structure
D: data mousesquirrel
chimpgorilla
F1
F2
F3
F4
Tree
mouse
squirrel
chimp
gorilla
mousesquirrel
chimpgorilla
SpaceChain
chimp
gorilla
squirrel
mouse
Discovering structural forms
Ostrich
Robin
Croco
dile
Snake
Bat
Orangu
tan
Turtle
Ostrich Robin Crocodile Snake Bat OrangutanTurtle
Ostrich
Robin
Croco
dile
Snake
Bat
Orangu
tan
Turtle
Angel
GodRock
Plant
Ostrich Robin Crocodile Snake Bat OrangutanTurtle
Discovering structural forms
Linnaeus
“Great chain of being”
• Scientific discoveries
• Children’s cognitive development– Hierarchical structure of category labels– Clique structure of social groups– Cyclical structure of seasons or days of the week– Transitive structure for value
People can discover structural forms
Tree structure for biological species
Periodic structure for chemical elements
(1579) (1837)
Systema Naturae
Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens
(1735)
“great chain of being”
Typical structure learning algorithms assume a fixed structural form
Flat Clusters
K-MeansMixture modelsCompetitive learning
Line
Guttman scalingIdeal point models
Tree
Hierarchical clusteringBayesian phylogenetics
Circle
Circumplex models
Euclidean Space
MDSPCAFactor Analysis
Grid
Self-Organizing MapGenerative topographic
mapping
The ultimate goal
“Universal Structure Learner”
K-MeansHierarchical clusteringFactor AnalysisGuttman scalingCircumplex modelsSelf-Organizing maps
···
Data Representation
A “universal grammar” for structural forms
Form FormProcess Process
Node-replacement graph grammars
Production(Line) Derivation
Production(Line) Derivation
Node-replacement graph grammars
Production(Line) Derivation
Node-replacement graph grammars
F: form
S: structure
D: data mousesquirrel
chimpgorilla
F1
F2
F3
F4
Favors simplicity
Favors smoothness[Zhu et al., 2003]
Tree
mouse
squirrel
chimp
gorilla
GridLinear
chimp
gorilla
squirrel
mouse
mouse squirrel
chimp gorilla
x
Learning algorithm• Evaluate each form in parallel• For each form, heuristic search over structures
based on greedy growth from a one-node seed:
anim
als
features
cases
judg
es
objects
obje
cts
similarities
Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”
Dominance hierarchy Tree Cliques Ring
Structural forms from relational data
Development of structural forms as more data are observed
“blessing of abstraction”
Beyond “Nativism” versus “Empiricism”• “Nativism”: Explicit knowledge of structural forms for
core domains is innate.– Atran (1998): The tendency to group living kinds into hierarchies reflects
an “innately determined cognitive structure”.– Chomsky (1980): “The belief that various systems of mind are organized
along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.”
• “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. – Connectionist networks (e.g., Rogers and McClelland, 2004). – Traditional structure learning in probabilistic graphical models.
Conclusion Bayesian inference over hierarchies
of structured representations provides a framework to understand core questions of human cognition:– What is the content and form of human
knowledge, at multiple levels of abstraction?
– How does abstract domain knowledge guide learning of new concepts?
– How is abstract domain knowledge learned? What must be built in?
F: form
S: structure
D: data
mouse
squirrel
chimp
gorilla
mousesquirrel
chimpgorilla
F1
F2
F3
F4
– How can domain-general learning mechanisms acquire domain-
specific representations? How can probabilistic inference work together with symbolic, flexibly structured representations?
Principles
Structure
Data
Whole-object principleShape biasTaxonomic principleContrast principleBasic-level bias
Learning word meanings
Causal learning and reasoning
Principles
Structure
Data
(Griffiths, Tenenbaum, et al.)
VerbVP
NPVPVP
VNPRelRelClause
RelClauseNounAdjDetNP
VPNPS
][
][][
Phrase structure
Utterance
Speech signal
Grammar
“Universal Grammar” Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)
P(phrase structure | grammar)
P(utterance | phrase structure)
P(speech | utterance)
(c.f. Chater and Manning, 2006)
P(grammar | UG)
(Han & Zhu, 2006; c.f.,Zhu, Yuanhao & Yuille NIPS 06 )
Vision as probabilistic parsing
Goal-directed action (production and comprehension)
(Wolpert, Doya and Kawato, 2003)
The big picture• What we need to understand: the mind’s ability to build rich
models of the world from sparse data.– Learning about objects, categories, and their properties.– Language comprehension and production– Scene understanding– Causal inference– Understanding other people’s actions, plans, thoughts, goals
• What do we need to understand these abilities?– Bayesian inference in probabilistic generative models– Hierarchical models, with inference at all levels of abstraction– Structured representations: graphs, grammars, logic– Flexible representations, growing in response to observed data
Top Related