Principled Probabilistic Inference and Interactive Activation

Principled Probabilistic Inference and Interactive Activation

Psych209January 25, 2013

A Problem For theInteractive Activation Model

• Data from many experiments give rise to a pattern corresponding to ‘logistic additivity’

• And we expect such a pattern from a Bayesian point of view.

• Unfortunately, the original interactive activation model does not exhibit this pattern.

• Does this mean that the interactive activation model is fundamentally wrong – i.e. processing is strictly feedforward (as Massaro believed)?

• If not, is there a principled basis for understanding interactive activation as principled probabilistic inference?

Joint Effect of Context and Stimulus Information in Phoneme Identification (/l/ or /r/)

From Massaro & Cohen (1991)

Massaro’s Model

• Joint effects of context and stimulus obey the fuzzy logical model of perception:

• ti is the stimulus support for r given input i and cj is the contextual support for r given context j.

• Massaro sees this model as having a strictly feed-forward organization:

Evaluate stimulus

Evaluate contextIntegration Decision

• Massaro’s model implies ‘logistic additivity’:

log(pij/(1-pij)) = log(ti/(1-ti)) + log(cj/(1-cj))lo

git(p

ij)

The pij on this graph corresponds to the p(r|Sij) on the preceding slide

L-like R-like

Different lines referto different contextconditions:r means ‘favors r’l means ‘favors l’n means ‘neutral’

Ideal logistic-additive pattern (upper right)vs. mini-IA simulation results (lower right).

Massaro’s argument against the IA model

• In the IA model, feature information gets used twice, once on the way up and then again on the way back down…

• Feeding the activation back in this way, he suggested, distorts the process of correctly identifying the target phoneme.

Should we agree and give up on interactivity?

• Perception of each letter is influenced by the amount of information about every other letter

– So, it would be desirable to have a way for each letter to facilitate perception of others while it itself is being facilitated.

• In speech, there are both ‘left’ and ‘right’ context effects

• Examples of ‘right’ context effects:

– ‘?ift’ vs ‘?iss’– ‘the ?eel of the {shoe/wagon/orange/leather}’

• As we discussed before, there are knock-on effects of context that appear to penetrate the perceptual system, as well as support from neurophysiology

What was wrong with the Interactive Activation model?• The original interactive activation

model ‘tacked the variability on at the end’ but neural activity is intrinsically stochastic.

• McClelland (1991) incorporated intrinsic variability in the computation of the net input to the IA model:

• Rather than choosing probabilistically based on relative activations, we simply choose the alternative with the highest activation after settling.

• Logistic additivity is observed.

i i j ij ij

net bias a w

Intrinsic Variability

Can we Relate IA to Principled Probabilistic Inference?

• We begin with a probabilistic generative model

• We then show how a variant of the IA model samples from the correct posterior of the generative model

The Generative Model

• Select a word with probability p(wi)

• Generate letters with probability p(ljp|wi)

• Generate feature values with probability p(fvdp|ljp)

• Note that features are specified as ‘present’ or ‘absent’

The Neural Network Model

• Network is viewed as consisting of several multinomial variables each represented by a pool of units corresponding to mutually exclusive alternatives.

• There are:

– 4*14 feature level variables, each with two alternative possible values (not well depicted in figure)

– 4 letter level variables, each with 26 possible values.

– 1 word level variable, with 1129 possible values.

• Connection weights are bi-directional, but their values are the logs of the top-down probabilities given in the generative model.

• There are biases only at the word level, corresponding to the logs of the p(wi).

The Neural Network Model• An input, assumed to have been produced by the

generative model is clamped on the units at the feature level.

• The letter and word level variables are initialized to 0.

• Then, we alternate updating the letter and word variables– Letters can be updated in parallel or sequentially– Word updated after all of the letters

• Updates occur by calculating each unit’s net input based on active units that have connections to it (and the bias at the word level), then setting the activations using the softmax function.

• A state of the model consists of one active word, four active letters, and 4*14 active features.

• The hidden state consists of one active word and four active letters. We can view each state as a composite hypothesis about what underlying path might have produced the feature values clamped on the input units.

• After a ‘burn in period’, the network visits hidden states with probability proportional to the posterior probability that the partial path corresponding to the hidden state generated the observed features.

Sampled and Calculated Probabilities for a Specific Display (? = a random set feature values)

Mirman et al Figure 14

?

Alternatives to the MIAM Approach

• For the effect of context in a specific position:– Calculate p(wi|other letters)

for all words– Use this to calculate p(ljp|context)

• Pearl’s procedure:– Calculate p(wi|all letters)– Divide the contribution of position p

back out when calculating p(ljp|context) for each position.

– This produces the correct marginals for each multinomial variable but doesn’t specify their joint distribution (see next slide)

Joint vs. marginal posterior probabilities

• Can you make sense of the given features?

• In the Rumelhart font, considering each position separately likely letters are:– {H,F}, {E,O}, {X,W}

• Known words are– HOW, HEX, FEW, FOX

• There are constraints between the word and letter possibilities not captured by just listing the marginal probabilities

• These constraints are captured in samples from the joint posterior.

Some Key Concepts

• A generative model as the basis for principled probabilistic inference

• Perception as a probabilistic sampling process• A sample from the joint posterior as a compound

hypothesis• Joint vs. marginal posteriors• Interactive neural networks as mechanisms that

implement principled probabilistic sampling

Principled Probabilistic Inference and Interactive Activation

Documents

Transcript of Principled Probabilistic Inference and Interactive Activation