Memory Modeling & Knowledge Representation - KIT · e & e r. 1/57 Memory Modeling & Knowledge...
Transcript of Memory Modeling & Knowledge Representation - KIT · e & e r. 1/57 Memory Modeling & Knowledge...
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
1/57
Memory Modeling & Knowledge Representation
Felix Putze
16.5.2013
Lecture „Cognitive Modeling“
SS 2013
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
2/57
Structure of Lecture
• Introduction and Motivation
• Memory Modeling
• Knowledge Representation
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
3/57
Why do memory modeling?
• Any process that spans a period of time requires the handling of limited human memory capacity • Memory capacity is a robust indicator of general intelligence
• Memory access is not of guaranteed success and with instantaneous reaction time • Modeling of memory performance relevant to predict errors
• For Human-Machine-Interaction: User has limited capability of remembering and recalling • Not all presented information is stored or available at all times
• Interaction systems should know what is on the user‘s mind and what is not • Which information can the system implicitly refer to?
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
4/57
Requests to a Memory Model
• There is a number of questions a memory model should be able to answer: • How is memory organized?
• What items are currently active on the human‘s mind?
• How is new information integrated?
• Is a certain bit of information retrievable?
• What is associated with a certain input?
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
5/57
Types of Memory • Squire (1992) distinguishes several distinct types of memory
and associates them with different parts of the brain:
• Declarative Memory: Explicit and conscious recollection of… • facts (semantic memory, e.g. “France is a country in Europe.”)
• events (episodic memory, e.g. “Last summer, I spend my holidays in France.”)
• Procedural Memory: Implicitly learned skills (e.g. riding bicycle)
• Priming: Automated associations caused by frequent repetition
• Conditioning: Automatic stimulus-reflex pairs (e.g. Pawlow‘s dogs)
• In this lecture, we will focus on semantic memory
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
6/57
Short-term and long-term Memory
• Short-term memory: Storage for a limited number of items • Small capacity
• Limited duration for storage (seconds), decay
• Longer storage duration requires rehearsal, i.e. periodic repetition
• Acoustically and visually coded (e.g. multiple phonetically similar items are hard to keep in memory)
• Long-term memory: • Nearly unlimited capacity
• Items can last for years without rehearsal
• Items are mostly retrieved and coded semantically, however there is a phonetic component (tip-of-tongue effect)
• Other types of memory: sensory memory, working memory
• The existence of distinct memory systems in the brain is controversial; experiments support both theories
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
7/57
The magic number 7 (+/- 2)
• Miller (1956): Determined the capacity of short-term memory to be about 7 items • Estimated by having people recall sequences of digits or words
• Performance is very good for around five to six items
• Performance degrades rapidly for more items
• Miller’s conclusion: Memory span is not a function of encoding length in bit, but a function of the number of elements
• Later, Miller acknowledged that the “magic number” was a coincidence and heavily context-dependent
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
8/57
Chunking and Mnemonics
• How can people remember longer phone number if their short-time memory is limited to 7 (or fewer) elements? • Most people do not remember the number 0123456789 as 0-1-2-3-4-5-
6-7-8-9 but as 01-23-45-67-89 (or similar)
• This division of information into smaller pieces is called chunking
• This is also a question of skill: A trained person can chunk a stream of binary digits into larger blocks, convert them to decimal numbers and remember those
• There are many other mnemonic techniques: • Make use of linguistic or phonetic similarities
• Construct images or stories to connect multiple items into one (e.g. „man“, „horse“, „fish“ A man riding on a horse hunting a fish)
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
9/57
Controversy regarding memory limitations
• There are a lot of conflicting viewpoints on memory limitation:
• A general limit exists but is lower than seven (≈ 4 without possibility for chunking or mnemonic techniques)
• The acoustic encoding of items in short-term memory influences this capacity: • Of long words (which take longer to speak), only shorter sequences can
be remembered
• Memory span decreases when remembering phonetically similar words
• There are specialized parts of short term memory with separate capacity limits
• There is no limitation of short term memory at all (observed limitations are an effect of general scheduling conflicts)
• There is no special faculty for short term memory at all, only an attention limitation on generic memory
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
10/57
Influence of Emotion on Memory
• Emotion-congruent information is encoded better • In a happy mode, we encode more „happy“ facts than „sad“ ones
• With high arousal, central information is encoded better • …while peripheral information is encoded worse
• Yerkes-Dodson law: Relation between arousal and performance is described as an „inverted u-curve“
• Consequence: Do not study memory as an isolated concept!
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
11/57
Structure of Lecture
• Introduction and Motivation
• Memory Modeling
• Knowledge Representation
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
12/57
Atkinson‘s & Siffrin‘s Memory Model
• Incoming information is extracted from parts of sensory input, initially stored in STM and later transferred to LTM or displaced linear process
• Monolithic modeling (one model for each type of information)
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
13/57
Components of Siffrin’s and Atkinson’s Model
• Sensory Memory: • Specialized for different sensory inputs (e.g. visual, auditive, …)
• Lasts for a very short time (milliseconds for visual, few seconds for aural information)
• Contains raw data, used to select relevant information (partial report)
• Decoupled from other components (localized, unconscious)
• Short term memory: • Keeps currently relevant information
• Duration of 15-30 seconds (unless rehearsed)
• Bottleneck between raw data from sensors and unlimited long term memory
• Long term memory: • Information which is rehearsed often enough is stored here
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
14/57
Baddeley‘s Memory Model
• Model of short-term (or working) memory • Three slave systems for different types of information
• Controlled by central executive
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
15/57
Baddeley‘s Memory Model
• The phonological loop consists of two main parts: • Phonological store: contains ca. 2 seconds of audio information
• Phonological rehearsal: performs periodic rehearsal to keep information available ( „inner voice“)
• Evidence: Suppression of rehearsal impairs memory
• Visuo-Spatial sketchpad is divided in two components: • Inner cache: forms, color
• Inner scribe: spatial information, movement (planning)
• Visually presented information can also be transferred to the phonological loop by verbalization
• Separation between phonologic and visual system explains differences in dual-tasking: Combining one acoustic and one visual task is easier than combining two tasks of the same kind
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
16/57
Baddeley‘s Memory Model
• Central Executive: • attention
• retrieval strategies
• episode forming
• Episodic buffer: • Added in 2000 as third slave system
• Contains concrete, multimodal “episodes”
• Introduced to explain memory which is not limited to one channel
• Also explains the ability to memorize a longer sequence of words which form a “story”
• Still less defined than the other two subsystems
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
17/57
Cowan‘s memory model
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
18/57
Cowan‘s memory model
• No distinction between long-term and short-term memory
• No division in modality-specific components
• Short-term memory is implicitly represented as activated items in memory
• Activation decays over time unless it is refreshed
• A subset of the activated items forms the focus of attention
• Theoretical foundation of the ACT-R memory model
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
19/57
Decay and Rehearsal
• All presented models maintain a set of active items in short-term memory
• How is the capacity of short-term memory limited? • Most common explanation: Temporal effects
• Decay of information: unconditional fading-out of activation
• Temporal distinctiveness: memory traces less distinguishable over time
• How to keep information in working memory? • Rehearsal, e.g. self-induced repetition of information (overt or covert)
• Represented in many models (Siffrin’s and Atkinson, Baddeley)
ITEM ITEM ITEM ITEM
-
Time
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
20/57
Primacy and Recency Effects
• Consider serial memory tasks, e.g. remembering information presented in linear order
• Recency effect: Items at end of list are remembered better than average • Decay has not yet taken place
• Item still highly active in STM
• Primacy effect: Items at the beginning of the list are remembered better than average • Early on, more resources available for encoding information in LTM
• Can be rehearsed more often?
• Note that this effect on memory also influences which information is retrieved for decision making • Presentation order of arguments induces bias on their importance
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
21/57
Problems of Decay and Rehearsal Models
• Experiments show that time is not the only controlling variable of forgetting Relevant what happens during this time! (Lewandowsky, Oberauer & Brown, 2009) • A single distractor stimulus before recall strongly impairs performance
• Waiting additional time before recall does not lead to comparable loss
• Amnesia patients show much better recall after one hour when placed in interference-minimizing conditions (e.g. quiet room)
• Rehearsal may not play a key role in retaining information • Even when rehearsal is suppressed, items are not lost over time
• Items which are marked as irrelevant are still retrievable even although they should not be rehearsed anymore
• Modeling primacy effect requires rapid rehearsal of early items, which implies neglect of more recent ones
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
22/57
Interference-based Forgetting
• Several approaches for modeling forgetting by interference • Process-based interference: processing activity up to 500ms after item
presentation draws on attentional bottleneck and disrupts consolidation
• Interference by feature overwriting: When two items share certain features, only one may retain those and the others are lost
• Interference by superposition: Items are superimposed in a composite memory structure representation blurs with more items
• Interference by cue overload: Too many items are associated to a given retrieval cue
Consider activation patterns in neural network as item representation • # of activation patterns is finite • Crowded memory = less distinction between patterns
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
23/57
Structure of Lecture
• Introduction and Motivation
• Memory Modeling
• Knowledge Representation
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
24/57
First Order Logic
• First order logic is a traditional and still widely used knowledge representation scheme
• Express knowledge in form of logical clauses • „All humans are mortal.“ =
• First order logic is typically used for modeling logical, conscious reasoning and deduction • Given a certain knowledge, can the user arrive at a certain conclusion
• Limited to deductive processes • No representation of inference processes (“learning from examples”)
• No easy representation of “fuzzy”, associative processes
)()(: xmortalxhumanx
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
25/57
Cyc: A Database of Human Knowledge
• Under development since 1984 by the company CyCorp, large collection of everyday knowledge („water is wet“)
• Currently contains ~500.000 items, ~5.000.000 facts
• A free version OpenCyc exists (subset of Cyc)
• Developed for language generation and language understanding • An inference engine is able to deduce facts form the knowledge base
• Cyc uses higher order logic to increase its expressiveness: • A micro-theory describes the context in which a statement is valid
• For example the statement „vampires fear garlic“ is (only) true in the context „mythology“
• Introduces modal operator: isTrue(context, assertion)
• Beyong First Order Logic
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
26/57
Cyc Example
• (isa BurningOfPapalBull SocialGathering) • The burning of the papal bull is an instance of of „SocialGathering“
• (relationInstanceExistsMin BurningOfPapalBull attendees UniversityStudents 40) • At least 40 students attended the event
• (isa BurningOfPapalBull-Document CombustionProcess)
• (properSubEvent BurningOfPapalBull-Document BurningOfPapalBull) • The actual burning event (as part of the social event)
• (relationInstanceExists inputsDestroyed BurningOfPapalBull-Document (CopyOfConceptualWorkFn PapalBull-ExcommunicationCW)) • The thing destroyed is a member of the functionally defined collection
„copies of the conceptual work PapalBull-ExcommunicationCW“
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
27/57
Frames
• Developed by Marvin Minsky in 1975
• A Frame is a prototype of a certain context and bundles relevant attributes and relations • Related to the schema theory of cognitive psychology
• A Frame consists of a name and several attributes („slots“) which have can consist of… • atomic values or
• references to other frames
• nothing (to represent partial knowledge)
• For each attribute, a frame can define… • a range of potentially allowed values
• default values to represent the standard case
• New input is matched against the currently “active” frame, which depends on the context
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
28/57
Example of Frame-based Modeling
• Frames are similar to CS concepts of UML class/object diagrams or Entity-Relation models
[Course
Title: String
NumStudents: Positive Integer
Teacher: Person (i.e. another frame)
]
[Course
Title: „Grundbegriffe der Informatik“
NumStudents: <empty>
Teacher: [Person
FirstName: „Tanja“
FamilyName: „Schultz“
]
]
Class
Instance
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
29/57
Memory Modeling in ACT-R
• The main building block of knowledge representation in ACT-R (chunk) is essentially a frame
• Semantic memory is handled by the declarative module • The declarative module makes no distinction between long-term and
short-term memory (comp. Cowan‘s model)
• Each item is associated with an activation value
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
30/57
Matching and Memory Retrieval
• How does retrieval work in ACT-R? • A partially filled chunk is put into the retrieval buffer of the declarative
module, e.g. (sum-fact arg1: 5 arg2: 2 result: <empty>)
• All chunks stored in declarative memory are checked if they match this chunk in type and in the filled slots, e.g. (sum-fact arg1: 5 arg2: 2 result: 7)matches, but (sum-fact arg1: 5 arg2: 3 result: 8)or (mult-fact arg1: 5 arg2: 2 result: 10)not
• All matching chunks are potential retrieval results • If multiple matches are found, only the one with the highest activation
is (potentially) returned
• From the result’s activation, a retrieval probability is calculated and the chunk is returned on success
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
31/57
Activation
• Activation of a chunk is the sum of two components (plus noise):
• Base Activation: Depends on the frequency and recency of stimulations of a chunk:
• Spreading Activation (associative* activation):
• * chunk j is associated with chunk i if j is an attribute of a slot in i
)1
log(1
n
j j
it
B
age of jth activation of chunk i
else,))log((
associatednot are j and i if ,01
1 j
n
j
i fanSnS
sum over all chunks associated with the content of the goal buffer
number of chunks of which j is value of
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
32/57
Spreading Activation: Example
Person 1 Job: Student Sex: male
Person 2 Job: Teacher Sex: female
Person 3 Job: Chancellor Sex: female
Person 2 Job: Student Sex: female
Person 2 Job: Farmer Sex: male
Female …
Chancellor …
Chunk is associated to many items large fan weak spreading
Chunk is associated to few items small fan strong spreading
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
33/57
Validation of Fan Effect
• Experiment lets participants learn the facts in the left column. When given the probes, they have to identify those which occurred in the training set (target probes)
• It is easier to identify those sentences for which at least one component (person or location) was rare in the training corpus
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
34/57
Semantic Networks
• Goal of knowledge representation is the modeling of facts and their relationships
• Natural formalism are graphs with nodes representing facts and edges representing relationships
• Different forms of networks exist: • Are edges themselves semantically annotated?
• Are edges directed?
• Are edges weighted?
London
Paris
north-of
London
Paris
north-of
1
2
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
35/57
Examples of semantic networks: Hierarchies
• Focus on „is-a“ relations
• Example from Porphyry, 300 AD:
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
36/57
Examples of semantic networks: KL-ONE
• KL-ONE: Developed in 1979 by Brachman
• Knowledge representation framework for AI
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
37/57
Examples of semantic networks: MultiNet
• Multi-Layer architecture, focus on language understanding
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
38/57
Characteristics of Semantic Networks
• No predefined ontology and attributes • Different types and levels of information can be combined
• Network can contain meta-knowledge and self-description
• Additional effort to decode semantics
• Natural tool for the representation of associations • Spreading can be modeled as a breadth-first-search process
• Can use well-studied graph algorithms for analysis of network • connected components, cliques, …
• distance metrics, shortest paths, …
• topology analysis
V Each node spreads a fraction of its activation evenly across its neighbors (fan effect!)
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
39/57
Partial Matching in a Semantic Network
• LTMc: memory model designed as a replacement for the ACT-R declarative module • Models memory as semantic network, nodes represent concepts and
their relations
• When doing retrieval, activate the nodes representing the request (triggering spreading) • “„How many animals of each type did Moses bring to the Ark?“
• Activate nodes ANIMAL, MOSES, ARK, QUANTITY
• The connected component with highest overall activation is returned as a result • In the example, this cluster will probably contain the node TWO
• Allows partial matching, e.g. returning non-perfect matches to retrieval requests (It was Noah who built the arc!) • Called the Moses Illusion (Erickson and Mattson, 1981)
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
40/57
ConceptNet
• Created at MIT Media Lab
• Huge common sense database represented as semantic net
• Not developed by experts but using a crowd sourcing approach • Data is entered by users of a webpage
• People play a “Game with a Purpose” (e.g. association games)
• Data can later be validated and weighted by other judges
• Contains subjective associations
• Easily accessible using Python interfaces
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
41/57
Verbosity: A Game with a Purpose
Describer’s view
Guesser’s view
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
42/57
ConceptNet: Example
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
43/57
Knowledge Database WordNet
• Lexical database of English Language in form of a semantic network
• Developed since 1985 by George A. Miller in Princeton
• Main unit forming the nodes of the network: Synsets (group of synonymes with short description)
• Models semantic relations (mostly language-oriented) between synsets
• Contains more than 110,000 synsets
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
44/57
WordNet: Example graph of hypernyms
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
45/57
Information in WordNet
• Some relations are allowed at word level (e.g. antonym = of opposite meaning), but the majority is defined on synset level
• Examples for relations in WordNet: • Holonyms (part-of), e.g. „family“ is a holonym of „mother“
• Hypernyms (kind-of), e.g. „animal“ is hypernym of „dog“
• WordNet also contains short definitions in plain text for each term
• Also contains additional linguistic information, e.g. syntactic constraints on the use of certain words
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
46/57
Soundness of the WordNet Graph
• WordNet ontology is represented such that more abstract generalizations of a word a further up in the ontology take longer to retrieve • Poodle is-a Dog vs. Poodle is-a Animal
• This is in accordance with a spreading applied to the WordNet graph
• Introduction of the concept evocation, measures how much one concept brings to mind another • Evocation creates a much denser network with weighted edges
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
47/57
Neural Knowledge Representation
• Encode information in the structure of a neural network • Train network by presenting input patterns by stimulating neurons
• When stimulating learned (or similar) input patterns, the network should recognize them
• Information is not encoded explicitly but within the structure and state of the network
• Example: Hopfield Networks • Model of associative memory
• Can retrieve a memorized pattern from partial input
• Based on Hebbian learning rule
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
48/57
Hopfield Nets
• Recurrent artificial neural networks • Each neuron is connected to each other neuron
• Symmetric weights wij = wji , no self-loops wii=0
• Discrete case: Neurons take binary value -1 or 1 (can be seen as on and off states)
• All neurons can be both input and output
• Values of a neurons is input vector in the next step
sj (t+1)=sgn(s’j(t))
s1 s2 sN
Neuronj
w1j
w2j
wNj
…
i
iijjsws '
s'j
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
49/57
Hopfield Nets as Associative Memory
• Human ability to retrieve information associated with an (incomplete) cue
• Hopfield Nets as content-addressable associative memory • Several different activation patterns can be learned in a network
• Produces for any input pattern a similar stored pattern
• Autoassociative memory: pattern completion of noisy or partial data
• Can reliably store up to 0.183*#neurons different patterns
• Asynchronous Network recall 1. Set pattern as input to the neurons
2. Pick a neuron randomly
3. Update its state
4. Goto 2 until state does not change
• Synchronous recall is also possible
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
50/57
Hopfield Nets as Associative Memory
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
51/57
Learning in Hopfield Nets
• Memorization of new information: • Activate neurons corresponding to features of item
• Increase weight of edges between nodes of equal activation (mutual stimulation)
• Decrease weight of edges between nodes of different activation (mutual suppression)
Small?
Flying?
Carnivore?
Mammal?
Swimming? Mouse
green = high positive weight red = high negative weight
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
52/57
Retrieval in Hopfield Nets
• Features of partial input stimulus are activated in network
• Update activation of each node based on its neighbors • Each active node stimulates or suppresses activation of its neighbors
• Each inactive node stimulates or suppresses activation of its neighbors
• Repeat this process based on the new activation values • Iterative process, finally converges to stable state
Small?
Flying?
Carnivore?
Mammal?
Swimming?
green = high positive weight red = high negative weight
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
53/57
Belief
• Up to now, knowledge was either (subjectively) false or true, i.e. part of the individual knowledge base or not • Fuzziness was part of the model concerning the activation value, not
the truth of an information
• Not a realistic assumption
• Introduce belief: Degree to which some information is considered to be valid • Example: I estimate the probability that P!=NP to be 95% (I am “pretty
sure”, but there is room for doubt)
• Belief is subjective, depends on prior assumptions and experience or observations
• Need to find a formalism to model and manipulate belief
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
54/57
Probability according to Bayes
• Representation of belief as probability
• Probability according to Bayes: „Confidence in the personal assessment of an issue.“ • Can be different for different individuals with different background
and experience
• Allows to model probability of non-stochastic and unique events
• Example: P(student A passes the exam on cognitive modeling)
• This is not possible in classic frequentist statistic which is defined based on the frequency of events
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
55/57
Bayes‘ Theorem
• Important Instrument: Bayes‘ Theorem
• Bayes‘ Theorem allow the combination of… • a-priori knowledge P(A) with…
• information from a cue B to…
• calculate the a-posteriori probability P(A|B)
• Allows to combine a-priori assumptions about the world with observations of the world to calculate a belief • Belief is high, if we assume the item to be true and see evidence for it
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
56/57
Bayesian Networks
• Want to model joint distribution of multiple variables
• A Bayesian Network is a directed acyclic graph: • Nodes = Random Variables
• Arcs = Direct Causality
• Each node contains conditional probability distribution dependent on the parents in the graph
• Using Bayes‘ theorem, we can infer probabilities of some nodes given information on some of the others
family-out bowel-problem
lights-on dog-out
hear-bark
Co
gnit
ive
Mo
del
ing:
Mem
ory
Mo
del
ing
& K
no
wle
dge
Rep
r.
57/57
The Bayesian Brain
• Bayesian coding hypothesis: Brain represents information probabilistically • Coding and computing with probability density functions
• Not limited to memory, targets noisy perception, planning and action execution
• Instead of deterministically modeling a concept X, model its probability density function p(X)
• Natural and expressive representation of uncertainty
• May present a generic framework for modeling cognition
• Allows seamless integration of models with statistical machine learning techniques