How to win big by thinking straight about relatively trivial problems

22
ow to win big by thinking straight abou relatively trivial problems Tony Bell University of California at Berkeley

description

How to win big by thinking straight about relatively trivial problems. Tony Bell University of California at Berkeley. Density Estimation. Make the model. like the reality. by minimising the Kullback-Leibler Divergence:. by gradient descent in a parameter of the model :. - PowerPoint PPT Presentation

Transcript of How to win big by thinking straight about relatively trivial problems

Page 1: How to win big by thinking straight about relatively trivial problems

How to win big by thinking straight aboutrelatively trivial problems

Tony BellUniversity of California at Berkeley

Page 2: How to win big by thinking straight about relatively trivial problems

Density Estimation

Make the model like the reality

by minimising the Kullback-Leibler Divergence:

by gradient descent in a parameter of the model :

THIS RESULT IS COMPLETELY GENERAL.

Page 3: How to win big by thinking straight about relatively trivial problems

The passive case ( = 0)For a general model distribution written in the ‘Energy-based’ form:

partition function(or zeroth moment...)

energy

the gradient evaluates in the simple ‘Boltzmann-like’ form:

learn on data while awake unlearn on data while asleep

Page 4: How to win big by thinking straight about relatively trivial problems

The single-layer case

Learning Rule(Natural Gradient)

The Score Function

Linear Transform

Shaping Density Many problems solved by modeling in the transformed space

for non-loopy hypergraph

is the important quantity

Page 5: How to win big by thinking straight about relatively trivial problems

Conditional Density Modeling

To model

use the rules:

This little known fact has hardly ever been exploited.It can be used instead of regression everywhere.

Page 6: How to win big by thinking straight about relatively trivial problems

ICA

IVA

ISA

Independent Components, Subspaces and Vectors

DCA(ie: score function hard to get at due to Z)

Page 7: How to win big by thinking straight about relatively trivial problems

IVA used for audio-separation in real room:

Page 8: How to win big by thinking straight about relatively trivial problems

Score functions derived from sparse factorial and radial densities:

Page 9: How to win big by thinking straight about relatively trivial problems

Results on real-room source separation:

Page 10: How to win big by thinking straight about relatively trivial problems

Why does IVA work on this problem?

Because the score function, and thus the learning, is only sensitive to the amplitude of the complex vectors, representingcorrelations of amplitudes of frequency components associatedwith a single speaker. Arbitrary dependencies can exist between the phases of this vector. Thus all phase (ie: higher-order statistical structure) is confined within the vector and removedbetween them.

It’s a simple trick, just relaxing the independence assumptionsin a way that fits speech. But we can do much more:

• build conditional models across frequency components• make models for data that is even more structured:

Video is [time x space x colour]Many experiments are [time x sensor x task-condition x trial]

Page 11: How to win big by thinking straight about relatively trivial problems

channel 1-16, time 0-8

Page 12: How to win big by thinking straight about relatively trivial problems

channel 17-32, time 0-8

Page 13: How to win big by thinking straight about relatively trivial problems

channel 1-16, time 0-8

Page 14: How to win big by thinking straight about relatively trivial problems

channel 1-16, time 0-1

Page 15: How to win big by thinking straight about relatively trivial problems

channel 17-32, time 0-1

Page 16: How to win big by thinking straight about relatively trivial problems

channel 1-16, time 0-1

Page 17: How to win big by thinking straight about relatively trivial problems

channel 17-32, time 0-1

Page 18: How to win big by thinking straight about relatively trivial problems

channel 33-48, time 0-1

Page 19: How to win big by thinking straight about relatively trivial problems

The big picture.

Behind this effort is an attempt to explore something called “The Levels Hypothesis”, which is the idea that in biology, in the brain,in nature, there is a kind of density estimation taking place across scales.

To explore this idea, we have a twofold strategy:

1. EMPIRICAL/DATA ANALYSIS: Build algorithms that can probe the EEG across scales, ie: across frequencies

2. THEORETICAL: Formalise mathematically the learning process in such systems.

Page 20: How to win big by thinking straight about relatively trivial problems

( = STDP)

LEVEL UNIT DYNAMICS LEARNING

society organism behaviour

ecology society predation, symbiosis

natural selection

sensory-motorlearning

organism cell spikes synaptic plasticity

cell

protein molecular forces gene expression,protein recycling

direct, voltage, Ca, 2nd messenger molecular changeprotein

amino acid

A Multi-Level View of Learning

LEARNING at a LEVEL is CHANGE IN INTERACTIONS between its UNITS,implemented by INTERACTIONS at the LEVEL beneath, and by extensionresulting in CHANGE IN LEARNING at the LEVEL above.

IncreasingTimescale

Separation of timescales allows INTERACTIONS at one LEVEL to be LEARNING at the LEVEL above.

Interactions=fastLearning=slow

Page 21: How to win big by thinking straight about relatively trivial problems

Infomax between Layers.(eg: V1 density-estimates Retina)

Infomax between Levels.(eg: synapses density-estimate spikes)

1 2

• square (in ICA formalism)• feedforward• information flows within a level• predicts independent activity• only models outside input

• overcomplete• includes all feedback• information flows between levels• arbitrary dependencies• models input and intrinsic activity

retina

V1

synaptic weights

x

y all neural spikes

all synaptic readout

synapses,dendites

t

y

pdf of all spike timespdf of all synaptic ‘readouts’

If we canmake thispdf uniform

then we have a model constructed from all synaptic and dendritic causality

This SHIFT in looking at the problemalters the question so that if it isanswered, we have an unsupervised theory of ‘whole brain learning’.

Page 22: How to win big by thinking straight about relatively trivial problems

Formalisation of the problem:p is the ‘data’ distribution, q is the ‘model’ distributionw is a synaptic weight, and I(y,t) is the spike synapse mutual information

IF

THEN if we were doing classical Infomax, we would use the gradient:

BUT if one’s actions can change the data, THEN an extra term appears:

(1)

(2)

changing one’s model to fit the world

therefore (2) must be easier than (1). This is what we are now researching.

It is easier to live in a world where one can

change the worldto fit the model, as well as