A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·...

61
A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe, Ph.D. Computational NeuroEngineering Laboratory (CNEL) University of Florida [email protected] www.cnel.ufl.edu

Transcript of A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·...

Page 1: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

A COGNITIVE ARCHITECTURE

FOR SENSORY PROCESSING

Jose C. Principe, Ph.D.

Computational NeuroEngineering Laboratory (CNEL)

University of Florida

[email protected]

www.cnel.ufl.edu

Page 2: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Acknowledgments

My students:

Rakesh Chalasani

Goktug Cinar

This work was partially support by ONR grant N00014-10-

1-0375

Page 3: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Outline

• Brief overview

• Cognitive Sensory Processing

• Generative Hierarchical Models

• Convolution Models

• Conclusions

Principe J. Chalasani R., “Cognitive Architecture for Sensory

Processing”, Proceedings of the IEEE, vol 102, #4, 514-525, 2014

Page 4: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Sensory Processing and Features • What are good features for sensory processing?

• Note: this is crucial “ you cannot make good omelets with rotten eggs”

• We really don’t know! but keep on using almost exclusively the sensory space (audio, video) to find them (SIFT, HOG).

• Does this make sense? Probably not….. Because of the complexity and variability of the sensed signals (differences in illumination, shape, noise, context), while we need invariants in MODEL space!

• Are there alternatives? Of course there are!

Page 5: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Sensory Processing and Features

• We, biological organisms, have solved this problem long

ago (otherwise we would have been extinct!!!).

• Perception is an ACTIVE PROCESS, while our sensory

signal processing is PASSIVE!

• Fuster: “Perception is memory updating”

• Think of the object background segregation (visual or

auditory)

• “We see what we want to see”…. i.e. our brain

disambiguates the sensory signals according to our

expectations

Page 6: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Sensory Processing and Features

• Hermann von Helmholtz (1821-1894) “ the perceptual system

is an inference engine whose function is to infer the probable

causes of the sensory input”

• Cognitive science has provided an impressive increase in

knowledge of how the brain works. I recommend Joaquin

Fuster as “a must read”

• Cortex and the mind, Memory in the cortex, Prefrontal cortex

• The issue for us, signal processing and machine learning

experts, is how to translate these concepts mathematically

i.e. how to create a computational theory of perception.

• Mars- Vision, Bergman - Computational Auditory Scene Analysis

Page 7: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Neuro Anatomy of Visual System

• We share Helmholtz’ view that cortical function evolved to explain sensory inputs. As such we seek to understand the role of processing and stored experience in a machine learning framework for the decoding of sensory input.

• Therefore the goal is to create computational systems that explain the world using rich internal representations that can be made stable and discriminative for fast recall.

RETINA1 Felleman DJ, Van Essen DC. Distributed hierarchical

processing in the primate cerebral cortex. Cereb Cortex. 1991

Jan-Feb;1(1):1-47.

Page 8: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Model for Object Recognition in Video

Goal: develop a

bidirectional, dynamical,

adaptive, self -

organizing, distributed

and hierarchical model

for sensory cortex

processing using

approximate Bayesian

inference.

Page 9: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Model for Object Recognition in Video

Why Cognitive?

Because it learns, infers autonomously to represent the external world and uses this knowledge to disambiguate future inputs

Bidirectional:

Goals/memory from the top levels can be used as time signals and mixed with incoming sensory data. It uses the architecture as constraints for the optimization.

Page 10: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Model for Object Recognition in Video

Dynamic Elements:

Long and short term

memory (as a signal) for in

situ computation.

Naturally handles time with

functional mappings,

encodes uncertainty, learns

on line, and implement

smooth constraints.

Page 11: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Model for Object Recognition in Video

Generative models:

Self-organizing because predicts the external inputs, recognition becomes inverse problem.

Efficient way to parameterize the posterior and incorporate priors and uncertainty about causes.

Perceptual inference becomes online inference about latent or hidden causes of inputs

Learning becomes just optimization of parameters .

Page 12: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Model for Object Recognition in Video

Hierarchical and

distributed architecture:

Partitions computation,

provides multi-scales for

time and space and creates

an uniform architecture

(same code) that can be

parallelizable.

Allows the use of

architecture as constraint

for the optimization

Page 13: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Models for Perception

Previous Work

• Predictive coding as a statistical model

• Olshausen and Field [1996] – Learning sparse codes for natural

images using L1-norm regularization.

• Rao and Ballard [1997] – Dynamic Models with space-time

receptive fields using Kalman filter.

• Lee and Mumford [2003] – Particle filtering for hierarchical

inference with empirical priors using the top-down prediction.

• Friston [2008] – Hierarchical Dynamic Models in generalized

coordinates of motion and empirical priors for continuous-time

signals.

CNEL, University of Florida, University of Florida 13

Page 14: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Cognitive Models for Perception

Previous Work

• Deep learning networks use greedy layer-wise

unsupervised methods to build an hierarchical model from

data. The goal is to learn encoding and decoding

concurrently (different reg.) using feedforward models

• Restricted Boltzmann Machine RBM – weight sharing

• Auto Encoders – denoising

• Sparsification – predictive sparse decompositions

• Convolution networks can also be used for the full image

• Our approach called Deep Predictive Coding Networks

(DPCN) relies on an efficient inference procedure to get a

more accurate latent representation (no encoder)

CNEL, University of Florida, University of Florida 14

Page 15: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

15

Sensory Processing Functional PrinciplesHierarchical Dynamical Model with Unknown Inputs

• Generalized state space model with additive noise:

yt – Observationsxt – Hidden statesut – Causal states

• Hidden states model the history and• the internal state. • Causes model the “inputs” driving the• system . • Empirical Bayesian priors create a hierarchical model, the

layer on the top tries to predict the causes for the layer below.

tttt

tttt

vBuAxx

nDuCxy

1

Page 16: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

• Inferred causes act as observations to the higher linear dynamical system.

• Prior on the causes become empirical and are set by the top-down prediction.

• Causes in a particular layer are updated using the prediction error in the layer below.

• This forms explicit forward and backward connectivity between the layers.

16

Sensory Processing Functional PrinciplesHierarchical Models

Page 17: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

17

Sensory Processing Functional PrinciplesHierarchical Dynamical Model with Unknown Inputs

• On a patch of the video image over time:

• 1) Feature extraction (inferring states - xt) by creating an overcomplete state representation of the dynamics in the patch

• 2) Pooling (inferring causes - ut) to extract invariants on the image across patches.

Page 18: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerFeature extraction (inferring states)

Let y be a p dimensional sequence of a 2D patch from the

same location of a video sequence.

To infer the states x we use a dynamic sparse coding (DSC)

model that maps y onto an overcomplete dictionary of k filters

(k>p)

The energy function is the negative log likelihood

This optimization is not trivial because of the two l1 constraints

CNEL, University of Florida, University of Florida 18

- logP(yt, xt |C, A) = E1(yt, xt,C, A) =

= yt -Cxt 2

2+ l xt - Axt-1 1

+g xt 1

Page 19: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerSparse Coding in Dynamical Networks

CNEL, University of Florida, University of Florida 19

Page 20: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerSparse Coding in Dynamical Networks

CNEL, University of Florida, University of Florida 20

Chalasani, R., and Principe, J.C, “Dynamic Sparse Coding with

Smoothing Proximal Gradient Method", Proc. ICASSP 2014, Florence, Italy

Page 21: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerSparse Coding in Dynamical Networks

• Example: State estimation with known parameters

CNEL, University of Florida, University of Florida 21

20 40 60 80 1000

0.5

1

1.5

2

2.5

3

Observation Dimensions

ste

ady s

tate

rM

SE

Kalman Filter

Proposed

Sparse Coding

Synthetic data: Gaussian process

generated with sparseness (500

States, 20 non-zero elements).

Observation matrix C is Gaussian,

A is a permutation matrix

Sparse Coding using FISTA

(fast iterative shrinkage with

thresholds)

Page 22: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerPooling (inferring causes)

Learn invariant representations by taking advantage of spatial

relationships in local neighborhoods.

A small group of states x representing contiguous patches are

added (sum pooled).

Infer the d dimensional causes by minimizing the energy

functional

This l1 minimization also indirectly establishes nonlinear

relations between causes and states

CNEL, University of Florida, University of Florida 22

- logP(xt,ut | B) = E2(ut, xt, B) =

= g.xt,k

(n)

k=1

K

åæ

èç

ö

ø÷

n=1

N

å + b ut 1gk =g0

1+ exp(-[But ]k)

2

é

ëê

ù

ûú

Page 23: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerInterpretation as free energy

The components of the energy functionals form a generative

model specified as the log likelihood of observations, causes

and parameters as

The latent variables can be efficiently inferred using proximal

gradient methods

CNEL, University of Florida, University of Florida 23

- logP(yt, xt,ut,q ) =

= - logP(yt | xt,ut,q )- logP(xt,ut |q )- logP(q ) =

=n=1

N

å1

2yt

(n) -Cxt

(n)

2

2

+ l xt

(n) - Axt-1

(n)

1+g xt 1

+ g t,kxt,k

(n)

k=1

K

åæ

èç

ö

ø÷+

b ut 1- logP(q )

Page 24: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Computational Model – Single LayerLearning the parameters

The model parameters are learned by

dual estimation on the combined cost function by

alternating inference with parameter updating.

The parameters are updated using gradient

descent with an additional temporal smoothness

For fixed x and u the gradients can be computed

as

Matrices C and B are column normalized

CNEL, University of Florida, University of Florida 24

ÑAE = sign(xt - Atxt-1)xt

T +V (At - At-1)

ÑBE = (exp(-But ). xt )ut

T +V (Bt - Bt-1)

ÑCE = (yt -Ctxt )xt

T +V (Ct -Ct-1)

q = {A, B,C}

qt =qt-1 + zt

State

Estimation

Parameter

Estimation

Page 25: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Learned Features and InvariancesReceptive Fields (feature detectors)

• Video database (Van Hateren’s)• Input: 17 x 17 Patches from different video sequences.

• States – 400 dim, causes 100 dim; pooling 2x2

CNEL, University of Florida, University of Florida 25

Measurement Matrix C –bases

Each small block is a column

Layer 1 causes (matrix B) are

composed from neighborhoods

Page 26: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Learned Features and InvariancesReceptive Fields (feature detectors)

CNEL, University of Florida, University of Florida 26

When a receptive field is active (left) the model predicts it will be active later (right)

Scatter plot of the 15 strongest connections of matrix A from current time

to next time (bars are pi/6).

t t+1,t+2,….

Page 27: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Learned Features and InvariancesGabor modeling of the receptive fields

CNEL, University of Florida, University of Florida 27

Each element (receptive field of 15x15) in C is fit with a Gabor function parameterized

as center position, spatial orientation, and frequency of the Gabor functions.

Page 28: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Visualizing Invariances

From states to causes (orientation)

• Connection strength between first layer invariance matrix (B) and the observation matrix (C). Each subplot is one column of B when one column of C is active, and strength is color coded.

CNEL, University of Florida, University of Florida 28

Page 29: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

CNEL, University of Florida, University of Florida 29

• Connection strength between first layer invariance matrix (B) and the observation matrix (C). Each subplot is one column of B when one column of C is active, and strength is color coded.

Visualizing InvariancesFrom states to causes (frequencies)

Page 30: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Classification Results Single layer representation (inferred causes)

Dataset FS DSC DSC-I ConvNN

COIL-100 66.87 71.81 74.63 71.49

Animal 76.09 82.43 85.82 ---

CNEL, University of Florida, University of Florida 30

12x12 patches, 2x2 pooling and a SVM classifier (4 labeled frames)

Page 31: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Multi-Layered Architecture

CNEL, University of Florida, University of Florida 31

Tree structure with tiling

of scene at bottom

Computational model is

uniform within layer and

across

Different spatial scales due to pulling which also

slows the time scale in upper layers

Learning is greedy (one layer at a time)

This creates a Markov chain across layers

Page 32: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Multi-Layered Architecture

CNEL, University of Florida, University of Florida 32

Notice that the top layer predictions affect the lower

layer creating effectively constraints due to the

topology.

Page 33: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Inference with Top-Down Connections

CNEL, University of Florida, University of Florida 33

Top-down Influence

Predictions from the

top-layer dynamic

model non-linearly

enter the bottom-layer

inference.

Page 34: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Each frame is 32x32 pixel and each sequence

is 100 frames long (30,000 long by concatenation Layer 1 Divide each frame into 20x20 (states 12x12)

Pool 2x2 states into one cause (Invariant unit)

Dimensions: States – 100; Causes – 40

Layer 2 Consider the causes in layer-1 as inputs

Pool 2x2 states into one cause

Dimensions: States – 60; Causes - 3

Recognition of Sequences in Noisy Data

CNEL, University of Florida, University

of Florida34

Layer- 1 Causes

Layer- 2 Causes

Layer – 2 Causes

Page 35: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Recognition of Sequences in Noisy Data

• Top-down influence: ut(L+1)=ut-1(L)

• 2-layered network with dimensions (100, 40)1, (60, 3)2

CNEL, University of Florida, University of Florida 35

0

5

10

0

2

4

0

2

4

6Object 1

Object 2

Object 3

0

5

10

0

2

4

6

0

2

4

6Object 1

Object 2

Object 3

0

2

4

6

0

1

2

3

0

2

4

6Object 1

Object 2

Object 3

Bottom-up (No Noise) Bottom-up (With Noise) Top-down (With Noise)

Clean Video Corrupted Video (SNR=-1.2dB)

Page 36: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Scalable Architecture with Convolutional Models

• Advantage of Convolutional model:

• Scalable to large images.

• Invariance to translations.

• Efficient implementation using GPUs.

• Main components:

• Convolutional sparse coding in dynamic networks using

Convolutional FISTA [IJCNN, 2013] to infer states.

• Pooling/Unpooling between the states and the causes.

• Convolutional sparse coding to infer causes.

CNEL, University of Florida, University of Florida 36

Chalasani R., Principe J., Ramakrishnan N., “A Fast Proximal Method for

Convolutional Sparse Coding”, Proc. IEEE IJCNN, Austin, Tx, 2013

Page 37: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Scalable Architecture with Convolutional Dynamical

Models (CDNs)

CNEL, University of Florida, University of Florida 37

filters

filters

RGB

Pooling

unpooling

SINGLE LAYER MODEL

Page 38: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Convolutional Dynamical ModelsState Space Equations

• Each channel Im,t is modeled as a linear combination

of K matrices convolved with filters Cm,k

• ak,k’ are the lateral connections and here we make

ak,k’=1 for k=k’ because of the application (object

recognition)

CNEL, University of Florida, University of Florida 38

I t

m = Cm,k * Xt

k + Nt

m

k=1

K

å mÎ {1,2,..M}

Xt

k(i, j ) = ak,k 'Xt-1

k '

k '=1

K

å (i, j )+Vt

k(i, j )

Page 39: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Convolutional Dynamical ModelsOptimization

• Energy function for state maps (x is a matrix):

• Energy function for cause maps (x is pooled):

CNEL, University of Florida, University of Florida 39

Page 40: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Convolutional Dynamical ModelsImplementation

• This convolution dynamical model can be stacked in trees

as before

• The internal connectivity between inputs and states can

be made sparse

• We decrease the model size in the hierarchy by using

max pooling between the state and causes (and

unpooling in the reverse direction)

• Inference is done as before (alternate fixing causes and

states)

• Parameter learning is done as before

• FISTA was extended to convolution networks

CNEL, University of Florida, University of Florida 40

Page 41: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Convolutional Dynamical ModelsInference in the Hierarchy

• To simplify inference, the state-space model at each layer predicts the most likely cause at the layer below (Ul-1,t ), given only the previous states and the predicted causes from the layer above

CNEL, University of Florida, University of Florida 41

Page 42: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Convolutional Dynamical ModelsLearning in the Hierarchy

• Learning is done layer by layer starting from the

bottom

• To simplify learning, we do not consider any top down

connections for inference

• Filters are normalized to unit norm after learning

• The gradients are

CNEL, University of Florida, University of Florida 42

ÑCm,k '

I EI = -2Xt

k ',I *(I t

m - Ck,m * Xt

k,I )k=1

K

å

ÑBm,d '

I EI = -Ut

d ',I * exp{- Bk ',m *Ut

d,I

d=1

D

åæ

èç

ö

ø÷. down(Xt

k ',I )é

ëê

ù

ûú

Page 43: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Object Recognition- Training

• Learning on Van Hateren

natural video database

(128x128).

• Architecture:

• Layer 1: 16 states of 7x7

filters and 32 causes of

6x6 filters.

• Layer 2: 64 states of 7x7

filters and 128 causes.

• Pooling: 2 x 2 between

states and causes.

CNEL, University of Florida, University of Florida 43

Layer 2 - Causes

Layer 1 - Causes

Page 44: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Self-Taught Learning

CNEL, University of Florida, University of Florida 44

With the parameters learned from Van Hateren, classify images in

Caltech 101.

Extract features from a single bottom up inference (causes from both

layer 1 and 2 that are concatenated into a feature vector)

30 images for training, and for testing.

Page 45: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Object Recognition with Context

CNEL, University of Florida, University of Florida 45

COIL-100 dataset:

72 frames per object.

Top-down inference is run

over each sequence

We assume that the test

data is partially available

during training.

So called “transductive”

learning.

Four frames per object for

training a linear SVM.

(0o, 90o, 180o, 270o)

Page 46: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Object Recognition- Results

CNEL, University of Florida, University of Florida 46

Methods Accuracy (%)

View-tuned network (VTU)

[Wersing & Korner, 2003]

79.10 %

Convolutional Nets with temporal

coherence [Mobahi et al, 2009]

92.25 %

Stacked ISA with temporal coherence

[Zou et al, 2012]

87.00 %

Our method;

without temporal coherence

79.45 %

Our method;

with temporal coherence

94.41 %

Our method;

with temporal coherence + Top-down

98.34 %

Page 47: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Testing Discriminability in Object Recognition

CNEL, University of Florida, University of Florida 47

Honda/UCSD face data set (20 for training, 39 for testing) using Viola Jones

face finding algorithm (on 20x20 patches).Histogram equalization is done. 2

layer model (16,48)1 (64,100)2, 5x5 filters, causes concatenated as features

Page 48: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Testing Discriminability in Object Recognition

CNEL, University of Florida, University of Florida 48

You Tube Celebrity face data set (partition the data in 10 sets of 9

video and used 3 of each for training and 6 for training) using Viola

Jones face finding algorithm (30x30 patches). Average results plotted.

Model of same size but filters are 7x7. Histogram equalization is done.

Page 49: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Discriminability with Occlusion

49

Layer -2 Causes

Layer -1 Causes

Layer -1 States

Example Video frames

[VidTIMIT]

Page 50: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Extension to Auditory Processing

• We would like to show the “universality” of this type of modeling

approach by addressing auditory processing.

• Temporal theory information in sound is coded in the temporal

firing patterns of the auditory neurons connecting to the cochlea.

• Place theory perception of sound depends on the location which

vibrates in response to the sound along the basilar membrane.

• Consequently, use of dynamical systems would be a great fit.

Audio streams can also be modeled single source, so they are

easier to explain.

• Simplify the modeling approach using Kalman filters for

efficiency

• Integrate auditory and visual processing through the causes

stored in a content addressable memory (CAM).

CNEL, University of Florida, University of Florida 50

Page 51: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Nested Hierarchical Linear Dynamical

System (HLDS)

The linear model consists of one

measurement equation and multiple state

transition equations.

By design the top layer creates point

attractors (Brownian state) to extract

redundancies in the sound time structure.

The nested HLDS is driven bottom-up by

the observations, and top-down by the

states so indirectly it segments the input in

spectral uniform regions.

CNEL, University of Florida, University of Florida 51

Cinar G., Príncipe J., “Clustering of Time Series Using a Hierarchical

Linear Dynamical System”, in Proc. ICASSP 2014, Florence, Italy

Page 52: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Point Attractors for Trumpet Notes

• Train with audio samples from Univ. of Iowa Musical

Instrument notes (2 sec sustained notes) in the range E3-

D6 for the nonvibrato Trumpet.

• The algorithm organizes in an unsupervised fashion the

different time structure of notes into point attractors in the

state space of the highest layer (Hopfield network).

CNEL, University of Florida, University

of Florida

52

Page 53: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Recognition Through Clustering

• Model system: 60,10,3 states, 36 msec windows, 1024

FFTs. System is real time.

• To assess classification accuracy we do Monte-Carlo runs

through all 35 notes (randomized order).

• Convergence is declared with 4 consecutive decisions to

the same output space location (clustering).

• The performance is tested for three noise levels.

CNEL, University of Florida, University of Florida 53

35 dB SNR 15 dB SNR -5 dB SNR

Classification Accuracy

using Variance Test

96.94% 87.94% 5.43%

Classification Accuracy

using all time instances

79.70% 71.95% 5.17%

Page 54: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Performance in Music Clips

• This setup of isolated notes is not ‘practical’. In a real life

scenario it would be almost impossible to find a music piece

that consists of notes sustained for 2 seconds.

• We started testing the performance of the algorithm in

very short music clips.

• We used the first two verses of Beethoven Symphony

#5.

• We create the clip from real trumpet recordings.

Page 55: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Performance in Music Clips

We play this motif

under different

conditions.

3 different levels of

amplitude

magnitudes.

Crescendo

Decrescendo

Different Tempos

Observe that the

same trajectory is

followed all through

these dynamic

changes.

Page 56: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Yesterday by The Beatles

• We created the Beatles song “Yesterday” using the Trumpet

notes in the database.

• We apply the same simple convergence criterion. Once

unanimous decisions are disrupted, windows are labeled as

“undecided”.

• Once the next convergence is declared, we go back in time

and fix the undecided windows accordingly.

Page 57: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

-What the

algorithm “hears”

Yesterday by The Beatles

• Each window is played several times and clustering averaged

(can be parallelized for real time)

• After post-processing, the classification accuracy goes as high

as 93% (notes shorter than the convergence time are

misclassified).

• Clean notes from database are concatenated

Page 58: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

58

CONCLUSIONSHigh Level

• We have developed a preliminary version of a computational,

adaptive, self organized distributed and hierarchical model for

episodic memory in sensory processing.

• The Bayesian framework is general and very flexible to explain

the data. However the computational complexity is still huge and

requires specific hardware architectures.

• The sparseness constraint was critical to disambiguate time and

spatial features in video, while adaptive pooling was critical to

link model parameters to object and video invariant features.

• Preliminary results show that our model is capable of high

performance in video processing and it is very robust to

structured noise.

• The same basic principles were extended to auditory processing

(no sparseness), which shows their large appeal.

Page 59: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

59

CONCLUSIONS

Specific novel aspects in the approach

• Paradigm shift in sensory processing: Switched from input

space feature design to discriminative model design to explain

the sensory data with sparse and invariant representations.

• The computational framework emphasizes self-organization

and distributed feedback between bottom up and top down

processing. It blends working memory (states) and long term

memory (parameters). It is ready to take advantage of prior

knowledge (cognitive memory).

• Illustrates the importance of dynamic modeling to describe

spatio-temporal data (video). Continuity over time simplifies

processing!

Page 60: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

60

CONCLUSIONS

Specific novel aspects in the modeling

• New methodology to perform sparse coding with dynamic

models and showed its importance

• New methodology to learn invariant representations rather

than doing max or average pooling and showed

improvement in classification

• New methodology to implement top-down connections

that can disambiguate the object of interest in the video.

• Validated the role of dynamics to help deal with structured

noise in a hierarchical model

Page 61: A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING …cip2014.conwiz.dk/files/principe_cip2014.pdf ·  · 2014-05-28A COGNITIVE ARCHITECTURE FOR SENSORY PROCESSING Jose C. Principe,

Future Directions

• Audio- Video Fusion