Memory Modeling & Knowledge Representation - KIT · e & e r. 1/57 Memory Modeling & Knowledge...

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

1/57

Memory Modeling & Knowledge Representation

Felix Putze

16.5.2013

Lecture „Cognitive Modeling“

SS 2013

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

2/57

Structure of Lecture

• Introduction and Motivation

• Memory Modeling

• Knowledge Representation

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

3/57

Why do memory modeling?

• Any process that spans a period of time requires the handling of limited human memory capacity • Memory capacity is a robust indicator of general intelligence

• Memory access is not of guaranteed success and with instantaneous reaction time • Modeling of memory performance relevant to predict errors

• For Human-Machine-Interaction: User has limited capability of remembering and recalling • Not all presented information is stored or available at all times

• Interaction systems should know what is on the user‘s mind and what is not • Which information can the system implicitly refer to?

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

4/57

Requests to a Memory Model

• There is a number of questions a memory model should be able to answer: • How is memory organized?

• What items are currently active on the human‘s mind?

• How is new information integrated?

• Is a certain bit of information retrievable?

• What is associated with a certain input?

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

5/57

Types of Memory • Squire (1992) distinguishes several distinct types of memory

and associates them with different parts of the brain:

• Declarative Memory: Explicit and conscious recollection of… • facts (semantic memory, e.g. “France is a country in Europe.”)

• events (episodic memory, e.g. “Last summer, I spend my holidays in France.”)

• Procedural Memory: Implicitly learned skills (e.g. riding bicycle)

• Priming: Automated associations caused by frequent repetition

• Conditioning: Automatic stimulus-reflex pairs (e.g. Pawlow‘s dogs)

• In this lecture, we will focus on semantic memory

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

6/57

Short-term and long-term Memory

• Short-term memory: Storage for a limited number of items • Small capacity

• Limited duration for storage (seconds), decay

• Longer storage duration requires rehearsal, i.e. periodic repetition

• Acoustically and visually coded (e.g. multiple phonetically similar items are hard to keep in memory)

• Long-term memory: • Nearly unlimited capacity

• Items can last for years without rehearsal

• Items are mostly retrieved and coded semantically, however there is a phonetic component (tip-of-tongue effect)

• Other types of memory: sensory memory, working memory

• The existence of distinct memory systems in the brain is controversial; experiments support both theories

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

7/57

The magic number 7 (+/- 2)

• Miller (1956): Determined the capacity of short-term memory to be about 7 items • Estimated by having people recall sequences of digits or words

• Performance is very good for around five to six items

• Performance degrades rapidly for more items

• Miller’s conclusion: Memory span is not a function of encoding length in bit, but a function of the number of elements

• Later, Miller acknowledged that the “magic number” was a coincidence and heavily context-dependent

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

8/57

Chunking and Mnemonics

• How can people remember longer phone number if their short-time memory is limited to 7 (or fewer) elements? • Most people do not remember the number 0123456789 as 0-1-2-3-4-5-

6-7-8-9 but as 01-23-45-67-89 (or similar)

• This division of information into smaller pieces is called chunking

• This is also a question of skill: A trained person can chunk a stream of binary digits into larger blocks, convert them to decimal numbers and remember those

• There are many other mnemonic techniques: • Make use of linguistic or phonetic similarities

• Construct images or stories to connect multiple items into one (e.g. „man“, „horse“, „fish“ A man riding on a horse hunting a fish)

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

9/57

Controversy regarding memory limitations

• There are a lot of conflicting viewpoints on memory limitation:

• A general limit exists but is lower than seven (≈ 4 without possibility for chunking or mnemonic techniques)

• The acoustic encoding of items in short-term memory influences this capacity: • Of long words (which take longer to speak), only shorter sequences can

be remembered

• Memory span decreases when remembering phonetically similar words

• There are specialized parts of short term memory with separate capacity limits

• There is no limitation of short term memory at all (observed limitations are an effect of general scheduling conflicts)

• There is no special faculty for short term memory at all, only an attention limitation on generic memory

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

10/57

Influence of Emotion on Memory

• Emotion-congruent information is encoded better • In a happy mode, we encode more „happy“ facts than „sad“ ones

• With high arousal, central information is encoded better • …while peripheral information is encoded worse

• Yerkes-Dodson law: Relation between arousal and performance is described as an „inverted u-curve“

• Consequence: Do not study memory as an isolated concept!

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

11/57



• Memory Modeling


Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

12/57

Atkinson‘s & Siffrin‘s Memory Model

• Incoming information is extracted from parts of sensory input, initially stored in STM and later transferred to LTM or displaced linear process

• Monolithic modeling (one model for each type of information)

http://upload.wikimedia.org/wikipedia/en/4/41/Multi-store-diagram(psychology).png

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

13/57

Components of Siffrin’s and Atkinson’s Model

• Sensory Memory: • Specialized for different sensory inputs (e.g. visual, auditive, …)

• Lasts for a very short time (milliseconds for visual, few seconds for aural information)

• Contains raw data, used to select relevant information (partial report)

• Decoupled from other components (localized, unconscious)

• Short term memory: • Keeps currently relevant information

• Duration of 15-30 seconds (unless rehearsed)

• Bottleneck between raw data from sensors and unlimited long term memory

• Long term memory: • Information which is rehearsed often enough is stored here

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

14/57

Baddeley‘s Memory Model

• Model of short-term (or working) memory • Three slave systems for different types of information

• Controlled by central executive

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

15/57


• The phonological loop consists of two main parts: • Phonological store: contains ca. 2 seconds of audio information

• Phonological rehearsal: performs periodic rehearsal to keep information available ( „inner voice“)

• Evidence: Suppression of rehearsal impairs memory

• Visuo-Spatial sketchpad is divided in two components: • Inner cache: forms, color

• Inner scribe: spatial information, movement (planning)

• Visually presented information can also be transferred to the phonological loop by verbalization

• Separation between phonologic and visual system explains differences in dual-tasking: Combining one acoustic and one visual task is easier than combining two tasks of the same kind

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

16/57


• Central Executive: • attention

• retrieval strategies

• episode forming

• Episodic buffer: • Added in 2000 as third slave system

• Contains concrete, multimodal “episodes”

• Introduced to explain memory which is not limited to one channel

• Also explains the ability to memorize a longer sequence of words which form a “story”

• Still less defined than the other two subsystems

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

17/57

Cowan‘s memory model

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

18/57

Cowan‘s memory model

• No distinction between long-term and short-term memory

• No division in modality-specific components

• Short-term memory is implicitly represented as activated items in memory

• Activation decays over time unless it is refreshed

• A subset of the activated items forms the focus of attention

• Theoretical foundation of the ACT-R memory model

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

19/57

Decay and Rehearsal

• All presented models maintain a set of active items in short-term memory

• How is the capacity of short-term memory limited? • Most common explanation: Temporal effects

• Decay of information: unconditional fading-out of activation

• Temporal distinctiveness: memory traces less distinguishable over time

• How to keep information in working memory? • Rehearsal, e.g. self-induced repetition of information (overt or covert)

• Represented in many models (Siffrin’s and Atkinson, Baddeley)

ITEM ITEM ITEM ITEM

-

Time

http://www.uni-regensburg.de/Fakultaeten/phil_Fak_II/Psychologie/Psy_II/beautycheck/english/durchschnittsgesichter/w(01-64)_gr.jpg

http://www.uni-regensburg.de/Fakultaeten/phil_Fak_II/Psychologie/Psy_II/beautycheck/english/durchschnittsgesichter/m(01-32)_gr.jpg





Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

20/57

Primacy and Recency Effects

• Consider serial memory tasks, e.g. remembering information presented in linear order

• Recency effect: Items at end of list are remembered better than average • Decay has not yet taken place

• Item still highly active in STM

• Primacy effect: Items at the beginning of the list are remembered better than average • Early on, more resources available for encoding information in LTM

• Can be rehearsed more often?

• Note that this effect on memory also influences which information is retrieved for decision making • Presentation order of arguments induces bias on their importance

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

21/57

Problems of Decay and Rehearsal Models

• Experiments show that time is not the only controlling variable of forgetting Relevant what happens during this time! (Lewandowsky, Oberauer & Brown, 2009) • A single distractor stimulus before recall strongly impairs performance

• Waiting additional time before recall does not lead to comparable loss

• Amnesia patients show much better recall after one hour when placed in interference-minimizing conditions (e.g. quiet room)

• Rehearsal may not play a key role in retaining information • Even when rehearsal is suppressed, items are not lost over time

• Items which are marked as irrelevant are still retrievable even although they should not be rehearsed anymore

• Modeling primacy effect requires rapid rehearsal of early items, which implies neglect of more recent ones

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

22/57

Interference-based Forgetting

• Several approaches for modeling forgetting by interference • Process-based interference: processing activity up to 500ms after item

presentation draws on attentional bottleneck and disrupts consolidation

• Interference by feature overwriting: When two items share certain features, only one may retain those and the others are lost

• Interference by superposition: Items are superimposed in a composite memory structure representation blurs with more items

• Interference by cue overload: Too many items are associated to a given retrieval cue

Consider activation patterns in neural network as item representation • # of activation patterns is finite • Crowded memory = less distinction between patterns

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

23/57



• Memory Modeling


Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

24/57

First Order Logic

• First order logic is a traditional and still widely used knowledge representation scheme

• Express knowledge in form of logical clauses • „All humans are mortal.“ =

• First order logic is typically used for modeling logical, conscious reasoning and deduction • Given a certain knowledge, can the user arrive at a certain conclusion

• Limited to deductive processes • No representation of inference processes (“learning from examples”)

• No easy representation of “fuzzy”, associative processes

)()(: xmortalxhumanx

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

25/57

Cyc: A Database of Human Knowledge

• Under development since 1984 by the company CyCorp, large collection of everyday knowledge („water is wet“)

• Currently contains ~500.000 items, ~5.000.000 facts

• A free version OpenCyc exists (subset of Cyc)

• Developed for language generation and language understanding • An inference engine is able to deduce facts form the knowledge base

• Cyc uses higher order logic to increase its expressiveness: • A micro-theory describes the context in which a statement is valid

• For example the statement „vampires fear garlic“ is (only) true in the context „mythology“

• Introduces modal operator: isTrue(context, assertion)

• Beyong First Order Logic

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

26/57

Cyc Example

• (isa BurningOfPapalBull SocialGathering) • The burning of the papal bull is an instance of of „SocialGathering“

• (relationInstanceExistsMin BurningOfPapalBull attendees UniversityStudents 40) • At least 40 students attended the event

• (isa BurningOfPapalBull-Document CombustionProcess)

• (properSubEvent BurningOfPapalBull-Document BurningOfPapalBull) • The actual burning event (as part of the social event)

• (relationInstanceExists inputsDestroyed BurningOfPapalBull-Document (CopyOfConceptualWorkFn PapalBull-ExcommunicationCW)) • The thing destroyed is a member of the functionally defined collection

„copies of the conceptual work PapalBull-ExcommunicationCW“

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

27/57

Frames

• Developed by Marvin Minsky in 1975

• A Frame is a prototype of a certain context and bundles relevant attributes and relations • Related to the schema theory of cognitive psychology

• A Frame consists of a name and several attributes („slots“) which have can consist of… • atomic values or

• references to other frames

• nothing (to represent partial knowledge)

• For each attribute, a frame can define… • a range of potentially allowed values

• default values to represent the standard case

• New input is matched against the currently “active” frame, which depends on the context

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

28/57

Example of Frame-based Modeling

• Frames are similar to CS concepts of UML class/object diagrams or Entity-Relation models

[Course

Title: String

NumStudents: Positive Integer

Teacher: Person (i.e. another frame)

]

[Course

Title: „Grundbegriffe der Informatik“

NumStudents: <empty>

Teacher: [Person

FirstName: „Tanja“

FamilyName: „Schultz“

]

]

Class

Instance

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

29/57

Memory Modeling in ACT-R

• The main building block of knowledge representation in ACT-R (chunk) is essentially a frame

• Semantic memory is handled by the declarative module • The declarative module makes no distinction between long-term and

short-term memory (comp. Cowan‘s model)

• Each item is associated with an activation value

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

30/57

Matching and Memory Retrieval

• How does retrieval work in ACT-R? • A partially filled chunk is put into the retrieval buffer of the declarative

module, e.g. (sum-fact arg1: 5 arg2: 2 result: <empty>)

• All chunks stored in declarative memory are checked if they match this chunk in type and in the filled slots, e.g. (sum-fact arg1: 5 arg2: 2 result: 7)matches, but (sum-fact arg1: 5 arg2: 3 result: 8)or (mult-fact arg1: 5 arg2: 2 result: 10)not

• All matching chunks are potential retrieval results • If multiple matches are found, only the one with the highest activation

is (potentially) returned

• From the result’s activation, a retrieval probability is calculated and the chunk is returned on success

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

31/57

Activation

• Activation of a chunk is the sum of two components (plus noise):

• Base Activation: Depends on the frequency and recency of stimulations of a chunk:

• Spreading Activation (associative* activation):

• * chunk j is associated with chunk i if j is an attribute of a slot in i

)1

log(1

n

j j

it

B

age of jth activation of chunk i

else,))log((

associatednot are j and i if ,01

1 j

n

j

i fanSnS

sum over all chunks associated with the content of the goal buffer

number of chunks of which j is value of

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

32/57

Spreading Activation: Example

Person 1 Job: Student Sex: male

Person 2 Job: Teacher Sex: female

Person 3 Job: Chancellor Sex: female

Person 2 Job: Student Sex: female

Person 2 Job: Farmer Sex: male

Female …

Chancellor …

Chunk is associated to many items large fan weak spreading

Chunk is associated to few items small fan strong spreading

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

33/57

Validation of Fan Effect

• Experiment lets participants learn the facts in the left column. When given the probes, they have to identify those which occurred in the training set (target probes)

• It is easier to identify those sentences for which at least one component (person or location) was rare in the training corpus

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

34/57

Semantic Networks

• Goal of knowledge representation is the modeling of facts and their relationships

• Natural formalism are graphs with nodes representing facts and edges representing relationships

• Different forms of networks exist: • Are edges themselves semantically annotated?

• Are edges directed?

• Are edges weighted?

London

Paris

north-of

London

Paris

north-of

1

2

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

35/57

Examples of semantic networks: Hierarchies

• Focus on „is-a“ relations

• Example from Porphyry, 300 AD:

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

36/57

Examples of semantic networks: KL-ONE

• KL-ONE: Developed in 1979 by Brachman

• Knowledge representation framework for AI

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

37/57

Examples of semantic networks: MultiNet

• Multi-Layer architecture, focus on language understanding

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

38/57

Characteristics of Semantic Networks

• No predefined ontology and attributes • Different types and levels of information can be combined

• Network can contain meta-knowledge and self-description

• Additional effort to decode semantics

• Natural tool for the representation of associations • Spreading can be modeled as a breadth-first-search process

• Can use well-studied graph algorithms for analysis of network • connected components, cliques, …

• distance metrics, shortest paths, …

• topology analysis

V Each node spreads a fraction of its activation evenly across its neighbors (fan effect!)

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

39/57

Partial Matching in a Semantic Network

• LTMc: memory model designed as a replacement for the ACT-R declarative module • Models memory as semantic network, nodes represent concepts and

their relations

• When doing retrieval, activate the nodes representing the request (triggering spreading) • “„How many animals of each type did Moses bring to the Ark?“

• Activate nodes ANIMAL, MOSES, ARK, QUANTITY

• The connected component with highest overall activation is returned as a result • In the example, this cluster will probably contain the node TWO

• Allows partial matching, e.g. returning non-perfect matches to retrieval requests (It was Noah who built the arc!) • Called the Moses Illusion (Erickson and Mattson, 1981)

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

40/57

ConceptNet

• Created at MIT Media Lab

• Huge common sense database represented as semantic net

• Not developed by experts but using a crowd sourcing approach • Data is entered by users of a webpage

• People play a “Game with a Purpose” (e.g. association games)

• Data can later be validated and weighted by other judges

• Contains subjective associations

• Easily accessible using Python interfaces

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

41/57

Verbosity: A Game with a Purpose

Describer’s view

Guesser’s view

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

42/57

ConceptNet: Example

http://csc.media.mit.edu/node/49

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

43/57

Knowledge Database WordNet

• Lexical database of English Language in form of a semantic network

• Developed since 1985 by George A. Miller in Princeton

• Main unit forming the nodes of the network: Synsets (group of synonymes with short description)

• Models semantic relations (mostly language-oriented) between synsets

• Contains more than 110,000 synsets

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

44/57

WordNet: Example graph of hypernyms

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

45/57

Information in WordNet

• Some relations are allowed at word level (e.g. antonym = of opposite meaning), but the majority is defined on synset level

• Examples for relations in WordNet: • Holonyms (part-of), e.g. „family“ is a holonym of „mother“

• Hypernyms (kind-of), e.g. „animal“ is hypernym of „dog“

• WordNet also contains short definitions in plain text for each term

• Also contains additional linguistic information, e.g. syntactic constraints on the use of certain words

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

46/57

Soundness of the WordNet Graph

• WordNet ontology is represented such that more abstract generalizations of a word a further up in the ontology take longer to retrieve • Poodle is-a Dog vs. Poodle is-a Animal

• This is in accordance with a spreading applied to the WordNet graph

• Introduction of the concept evocation, measures how much one concept brings to mind another • Evocation creates a much denser network with weighted edges

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

47/57

Neural Knowledge Representation

• Encode information in the structure of a neural network • Train network by presenting input patterns by stimulating neurons

• When stimulating learned (or similar) input patterns, the network should recognize them

• Information is not encoded explicitly but within the structure and state of the network

• Example: Hopfield Networks • Model of associative memory

• Can retrieve a memorized pattern from partial input

• Based on Hebbian learning rule

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

48/57

Hopfield Nets

• Recurrent artificial neural networks • Each neuron is connected to each other neuron

• Symmetric weights wij = wji , no self-loops wii=0

• Discrete case: Neurons take binary value -1 or 1 (can be seen as on and off states)

• All neurons can be both input and output

• Values of a neurons is input vector in the next step

sj (t+1)=sgn(s’j(t))

s1 s2 sN

Neuronj

w1j

w2j

wNj

…

i

iijjsws '

s'j

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

49/57

Hopfield Nets as Associative Memory

• Human ability to retrieve information associated with an (incomplete) cue

• Hopfield Nets as content-addressable associative memory • Several different activation patterns can be learned in a network

• Produces for any input pattern a similar stored pattern

• Autoassociative memory: pattern completion of noisy or partial data

• Can reliably store up to 0.183*#neurons different patterns

• Asynchronous Network recall 1. Set pattern as input to the neurons

2. Pick a neuron randomly

3. Update its state

4. Goto 2 until state does not change

• Synchronous recall is also possible

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

50/57

Hopfield Nets as Associative Memory

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

51/57

Learning in Hopfield Nets

• Memorization of new information: • Activate neurons corresponding to features of item

• Increase weight of edges between nodes of equal activation (mutual stimulation)

• Decrease weight of edges between nodes of different activation (mutual suppression)

Small?

Flying?

Carnivore?

Mammal?

Swimming? Mouse

green = high positive weight red = high negative weight

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

52/57

Retrieval in Hopfield Nets

• Features of partial input stimulus are activated in network

• Update activation of each node based on its neighbors • Each active node stimulates or suppresses activation of its neighbors

• Each inactive node stimulates or suppresses activation of its neighbors

• Repeat this process based on the new activation values • Iterative process, finally converges to stable state

Small?

Flying?

Carnivore?

Mammal?

Swimming?

green = high positive weight red = high negative weight

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

53/57

Belief

• Up to now, knowledge was either (subjectively) false or true, i.e. part of the individual knowledge base or not • Fuzziness was part of the model concerning the activation value, not

the truth of an information

• Not a realistic assumption

• Introduce belief: Degree to which some information is considered to be valid • Example: I estimate the probability that P!=NP to be 95% (I am “pretty

sure”, but there is room for doubt)

• Belief is subjective, depends on prior assumptions and experience or observations

• Need to find a formalism to model and manipulate belief

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

54/57

Probability according to Bayes

• Representation of belief as probability

• Probability according to Bayes: „Confidence in the personal assessment of an issue.“ • Can be different for different individuals with different background

and experience

• Allows to model probability of non-stochastic and unique events

• Example: P(student A passes the exam on cognitive modeling)

• This is not possible in classic frequentist statistic which is defined based on the frequency of events

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

55/57

Bayes‘ Theorem

• Important Instrument: Bayes‘ Theorem

• Bayes‘ Theorem allow the combination of… • a-priori knowledge P(A) with…

• information from a cue B to…

• calculate the a-posteriori probability P(A|B)

• Allows to combine a-priori assumptions about the world with observations of the world to calculate a belief • Belief is high, if we assume the item to be true and see evidence for it

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

56/57

Bayesian Networks

• Want to model joint distribution of multiple variables

• A Bayesian Network is a directed acyclic graph: • Nodes = Random Variables

• Arcs = Direct Causality

• Each node contains conditional probability distribution dependent on the parents in the graph

• Using Bayes‘ theorem, we can infer probabilities of some nodes given information on some of the others

family-out bowel-problem

lights-on dog-out

hear-bark

Co

gnit

ive

Mo

del

ing:

Mem

ory

Mo

del

ing

& K

no

wle

dge

Rep

r.

57/57

The Bayesian Brain

• Bayesian coding hypothesis: Brain represents information probabilistically • Coding and computing with probability density functions

• Not limited to memory, targets noisy perception, planning and action execution

• Instead of deterministically modeling a concept X, model its probability density function p(X)

• Natural and expressive representation of uncertainty

• May present a generic framework for modeling cognition

• Allows seamless integration of models with statistical machine learning techniques

Memory Modeling & Knowledge Representation - KIT · e & e r. 1/57 Memory Modeling & Knowledge...

Documents

Transcript of Memory Modeling & Knowledge Representation - KIT · e & e r. 1/57 Memory Modeling & Knowledge...