Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the...

42
Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology and Computer Science, Carnegie Mellon Timothy T. Rogers Center for the Neural Basis of Cognition and now MRC Cognition and Brain Sciences Unit, UK CNBC A Joint Project of Carnegie Mellon and the University of Pittsburgh

Transcript of Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the...

Page 1: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Semantic Cognition:A Parallel Distributed Processing

Approach

James L. McClellandCenter for the Neural Basis of Cognition

andDepartments of Psychology and Computer Science, Carnegie

Mellon

Timothy T. RogersCenter for the Neural Basis of Cognition

and nowMRC Cognition and Brain Sciences Unit, UK

CNBCA Joint Project of Carnegie Mellon and the University of Pittsburgh

Page 2: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Approaches to Semantic Cognition• Concepts and their Properties

– Is Socrates Mortal?

• Hierarchical Propositional Models– Quillian, 1968; Collins and Quillian, 1969

• Theory-Theory and Related Approaches– Murphy and Medin, 1985; Gopnik and Wellman,

1994; Keil, 1991; Carey, 1985

• Parallel Distributed Processing– Hinton, 1981; Rumelhart and Todd, 1993; McRae, De

Sa, and Seidenberg, 1997

Page 3: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Plan for This Talk

• Compare a distributed, connectionist model that learns from exposure to information about the relations between concepts and their properties to the ‘classical’ Hierarchical Propositional Approach.

• Show how the model accounts for a set of phenomena that have been introduced in support of ‘Theory Theory’

• Conclude with a brief consideration of where we are in the development of a theory of semantic cognition.

Page 4: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Initial Motivations for the Model

• Provide a connectionist alternative to traditional hierarchical propositional models of conceptual knowledge representation.

• Account for development of conceptual knowledge as a gradual process involving progressive differentiation.

Page 5: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Quillian’sHierarchicalPropositional

Model

Page 6: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

The Parallel Distributed Processing Approach

• Processing occurs via propagation of activation among simple processing units.

• Knowledge is stored in the weights on connections between the simple processing units.

• Propositions are not stored directly. – The ability to produce complete propositions from

partial probes arises through the activation process, based on the knowledge stored in the weights.

• Learning occurs via adjustment of the connections.• Semantic knowledge is gradually acquired through

repeated exposure, mirroring the gradual nature of cognitive development.

Page 7: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

The Rumelhart Model

Activation

Page 8: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

The Training Data:

All propositions true of items at the bottom levelof the tree, e.g.:

Robin can {grow, move, fly}

Page 9: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Error

Page 10: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Any Questions?

Page 11: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Differentiation in Development

Page 12: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.
Page 13: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

The Rumelhart Model

Page 14: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.
Page 15: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.
Page 16: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Trajectories of Representations Through State Space over Time

Page 17: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Any Questions?

Page 18: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Tenets of Theory Theory• Intuitive domain knowledge of relations between items

and their properties is use to decide:

– which categories are ‘good’ ones and which properties are central to particular concepts

– how properties should be generalized from one category to another

• Many proponents suggest that some theory-like knowledge (or constraints on acquiring such knowledge) must be available ‘initially’.

• Others emphasize reorganization of knowledge through experience, but provide very little discussion of how experience leads to reorganization.

Page 19: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Three Phenomena Supporting Theory Theory

Category goodness and feature importance.

• Differential importance of properties in different concepts.

• Reorganization of conceptual knowledge.

Page 20: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Effects of Coherent Variation of Properties on Learning

• Attributes that vary together create the concepts that populate the taxonomic hierarchy, and determine which properties are central to a given concept.

• Where sets of attributes vary together, they exert a strong effect on learning. – Items with co-varying properties stay together

through semantic space and form the clusters corresponding to super-ordinate concepts.

• Arbitrary properties (those that do not co-vary with others) are very difficult to learn, even when frequency is controlled. – They control a late stage of differentiation in which

individual items within clusters become conceptually distinct.

Page 21: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

CoherenceTrainingEnvironment

No Category Labels are Provided!

Properties

Coherent Incoherent

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Items

Page 22: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Effect of Coherence on Learning

Page 23: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Effect of Coherence on Representation

Page 24: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Extended modelfor remainingsimulations

Page 25: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Progressive Differentiation of Category Structure Without Names

300 Epochs 1200 Epochs

plants | animals plants | animals

Page 26: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Any Questions?

Page 27: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Three Phenomena Supporting Theory Theory

Category goodness and feature importance.

Differential importance of properties in different concepts.

• Reorganization of conceptual knowledge.

Page 28: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Differential Importance (Marcario, 1991)

• 3-4 yr old children see a puppet and are told he likes to eat, or play with, a certain object (e.g., top object at right)– Children then must choose

another one that will “be the same kind of thing to eat” or that will be “the same kind of thing to play with”.

– In the first case they tend to choose the object with the same color.

– In the second case they will tend to choose the object with the same shape.

Page 29: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Adjustments to Training Environment

• To address this we added some new property units and created clear cases of feature-dependencies in the model:

• Among the plants:– All trees are large– All flowers are small– Either can be bright or dull

• Among the animals:– All birds are bright– All fish are dull– Either can be small or large

• Though partially counter-factual, these assignments allow us to explore domain specificity of feature dependencies in the model.

Page 30: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Testing Feature Importance

• After partial learning, model is shown eight test objects:– Four “Animals”:

• All have skin• All combinations of bright/dull and large/small

– Four “Plants”:• All have roots• All combinations of bright/dull and large/small

• Representations are generated by usingback-propagation, training the item-to-representation weights only.

• Representations are then compared to see which animals are treated as most similar, and also which plants are treated as most similar.

Page 31: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

(One unit is addedfor each test object)

Page 32: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Similarities of Obtained Representations

Size is relevant for Plants

Brightness is relevant for Animals

Page 33: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Differential Feature Importance

• The simulation suggests that domain-general learning mechanisms can learn that different features are important for different concepts.

• The network has acquired domain-specific knowledge of just the sort theory theorists claim children know about concepts.

• It does so from the distributions of properties of concepts, without the aid of initial domain knowledge.

Page 34: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Phenomena Supporting Theory Theory

Category goodness and feature importance.

Differential importance of properties in different concepts.

Reorganization of conceptual knowledge.

Page 35: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Conceptual Reorganization (Carey, 1985)

• Carey demonstrates that young children ‘discover’ the unity of plants and animals as living things only around the age of 10.

• She suggests that the emergence of the concept of living thing coalesces from assimilation of different kinds of information, including:– Need for nutrients– What it means to be dead vs. alive– Reproductive properties

Page 36: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Conceptual Reorganization in the Model

• Our simulation model provides a vehicle for exploring how conceptual reorganization can occur.– The model is capable of forming initial

representations based on superficial appearances

– Later, it can discover shared structure that cuts across several different relational contexts, and use the emergent common structure as a basis for a deeper organization.

Page 37: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Reorganization Simulation

• We consider the coalescence of the superordinate categories plant and animal, in a situation where the training data initially supports a superficial organization based on appearance properties.

• In each training pattern, the input is an item and one of the three relations: ISA, HAS, or CAN.

• The target includes all of the superficial appearance properties (IS properties) plus the properties appropriate for the relation.

• The model quickly learns representations that capture the superficial IS properties.

• Later, it reorganizes these representations as it learns the relation-dependent properties.

Page 38: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Organization of Conceptual Knowledge at Different Points in Development

Page 39: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Phenomena Supporting Theory Theory

Category goodness and feature importance.

Differential importance of properties in different concepts.

Reorganization of conceptual knowledge.

Page 40: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Summary

• The model exhibits several characteristics of human cognition that motivated the appeal to naïve domain theories.

• The model does these things simply by adjusting the weights on connections among simple processing units, and by propagating signals backward and forward through these weighted connections.

Page 41: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Relationship between the Model and Theory Theory

• There is a sense in which the knowledge in the connections plays the role of the informal domain theories advocated by theory theorists, and one might be tempted to suggest that the model is ‘merely an implementation’ of the theory theory.

• However, it differs from the theory theory in several very important ways:– It provides explicit mechanisms indicating how

domain knowledge influences semantic cognition.– The PDP model avoids bringing in unwanted aspects

of what we generally mean by ‘theory’– It offers a learning process that provides a means for

the acquisition of such knowledge.– It demonstrates that some of the sorts of constraints

theory-theorists have suggested might be innate can in fact be acquired from experience.

Page 42: Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.

Conclusions

• In our view the ‘theory theory’ should be viewed as more of a pre-theoretical heuristic than an actual theory of semantic cognition.

• Our own proposals, built on Hinton’s and Rumelhart’s, are far from the final word, and do not constitute a complete theory at this point.

• Our hope is that they will contribute, along with the work of many others, to the ongoing development of an adequate and complete theory of semantic cognition.