Principled Probabilistic Inference and Interactive Activation

17
Principled Probabilistic Inference and Interactive Activation Psych209 January 25, 2013

description

Principled Probabilistic Inference and Interactive Activation. Psych209 January 25, 2013. A Problem For the Interactive Activation Model. Data from many experiments give rise to a pattern corresponding to ‘logistic additivity ’ And we expect such a pattern from a Bayesian point of view. - PowerPoint PPT Presentation

Transcript of Principled Probabilistic Inference and Interactive Activation

Page 1: Principled Probabilistic Inference and Interactive Activation

Principled Probabilistic Inference and Interactive Activation

Psych209January 25, 2013

Page 2: Principled Probabilistic Inference and Interactive Activation

A Problem For theInteractive Activation Model

• Data from many experiments give rise to a pattern corresponding to ‘logistic additivity’

• And we expect such a pattern from a Bayesian point of view.

• Unfortunately, the original interactive activation model does not exhibit this pattern.

• Does this mean that the interactive activation model is fundamentally wrong – i.e. processing is strictly feedforward (as Massaro believed)?

• If not, is there a principled basis for understanding interactive activation as principled probabilistic inference?

Page 3: Principled Probabilistic Inference and Interactive Activation

Joint Effect of Context and Stimulus Information in Phoneme Identification (/l/ or /r/)

From Massaro & Cohen (1991)

Page 4: Principled Probabilistic Inference and Interactive Activation

Massaro’s Model

• Joint effects of context and stimulus obey the fuzzy logical model of perception:

• ti is the stimulus support for r given input i and cj is the contextual support for r given context j.

• Massaro sees this model as having a strictly feed-forward organization:

Evaluate stimulus

Evaluate contextIntegration Decision

Page 5: Principled Probabilistic Inference and Interactive Activation

• Massaro’s model implies ‘logistic additivity’:

log(pij/(1-pij)) = log(ti/(1-ti)) + log(cj/(1-cj))lo

git(p

ij)

The pij on this graph corresponds to the p(r|Sij) on the preceding slide

L-like R-like

Different lines referto different contextconditions:r means ‘favors r’l means ‘favors l’n means ‘neutral’

Page 6: Principled Probabilistic Inference and Interactive Activation

Ideal logistic-additive pattern (upper right)vs. mini-IA simulation results (lower right).

Page 7: Principled Probabilistic Inference and Interactive Activation

Massaro’s argument against the IA model

• In the IA model, feature information gets used twice, once on the way up and then again on the way back down…

• Feeding the activation back in this way, he suggested, distorts the process of correctly identifying the target phoneme.

Page 8: Principled Probabilistic Inference and Interactive Activation

Should we agree and give up on interactivity?

• Perception of each letter is influenced by the amount of information about every other letter

– So, it would be desirable to have a way for each letter to facilitate perception of others while it itself is being facilitated.

• In speech, there are both ‘left’ and ‘right’ context effects

• Examples of ‘right’ context effects:

– ‘?ift’ vs ‘?iss’– ‘the ?eel of the {shoe/wagon/orange/leather}’

• As we discussed before, there are knock-on effects of context that appear to penetrate the perceptual system, as well as support from neurophysiology

Page 9: Principled Probabilistic Inference and Interactive Activation

What was wrong with the Interactive Activation model?• The original interactive activation

model ‘tacked the variability on at the end’ but neural activity is intrinsically stochastic.

• McClelland (1991) incorporated intrinsic variability in the computation of the net input to the IA model:

• Rather than choosing probabilistically based on relative activations, we simply choose the alternative with the highest activation after settling.

• Logistic additivity is observed.

i i j ij ij

net bias a w

Intrinsic Variability

Page 10: Principled Probabilistic Inference and Interactive Activation

Can we Relate IA to Principled Probabilistic Inference?

• We begin with a probabilistic generative model

• We then show how a variant of the IA model samples from the correct posterior of the generative model

Page 11: Principled Probabilistic Inference and Interactive Activation

The Generative Model

• Select a word with probability p(wi)

• Generate letters with probability p(ljp|wi)

• Generate feature values with probability p(fvdp|ljp)

• Note that features are specified as ‘present’ or ‘absent’

Page 12: Principled Probabilistic Inference and Interactive Activation

The Neural Network Model

• Network is viewed as consisting of several multinomial variables each represented by a pool of units corresponding to mutually exclusive alternatives.

• There are:

– 4*14 feature level variables, each with two alternative possible values (not well depicted in figure)

– 4 letter level variables, each with 26 possible values.

– 1 word level variable, with 1129 possible values.

• Connection weights are bi-directional, but their values are the logs of the top-down probabilities given in the generative model.

• There are biases only at the word level, corresponding to the logs of the p(wi).

Page 13: Principled Probabilistic Inference and Interactive Activation

The Neural Network Model• An input, assumed to have been produced by the

generative model is clamped on the units at the feature level.

• The letter and word level variables are initialized to 0.

• Then, we alternate updating the letter and word variables– Letters can be updated in parallel or sequentially– Word updated after all of the letters

• Updates occur by calculating each unit’s net input based on active units that have connections to it (and the bias at the word level), then setting the activations using the softmax function.

• A state of the model consists of one active word, four active letters, and 4*14 active features.

• The hidden state consists of one active word and four active letters. We can view each state as a composite hypothesis about what underlying path might have produced the feature values clamped on the input units.

• After a ‘burn in period’, the network visits hidden states with probability proportional to the posterior probability that the partial path corresponding to the hidden state generated the observed features.

Page 14: Principled Probabilistic Inference and Interactive Activation

Sampled and Calculated Probabilities for a Specific Display (? = a random set feature values)

Mirman et al Figure 14

?

Page 15: Principled Probabilistic Inference and Interactive Activation

Alternatives to the MIAM Approach

• For the effect of context in a specific position:– Calculate p(wi|other letters)

for all words– Use this to calculate p(ljp|context)

• Pearl’s procedure:– Calculate p(wi|all letters)– Divide the contribution of position p

back out when calculating p(ljp|context) for each position.

– This produces the correct marginals for each multinomial variable but doesn’t specify their joint distribution (see next slide)

Page 16: Principled Probabilistic Inference and Interactive Activation

Joint vs. marginal posterior probabilities

• Can you make sense of the given features?

• In the Rumelhart font, considering each position separately likely letters are:– {H,F}, {E,O}, {X,W}

• Known words are– HOW, HEX, FEW, FOX

• There are constraints between the word and letter possibilities not captured by just listing the marginal probabilities

• These constraints are captured in samples from the joint posterior.

Page 17: Principled Probabilistic Inference and Interactive Activation

Some Key Concepts

• A generative model as the basis for principled probabilistic inference

• Perception as a probabilistic sampling process• A sample from the joint posterior as a compound

hypothesis• Joint vs. marginal posteriors• Interactive neural networks as mechanisms that

implement principled probabilistic sampling