Solving the differential equation · Solving the differential equation: or in the general form:...

27
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot solve explicitly the convergence to the fixed points.) x 0 dx dt = η df ( x ) dx = 2 η( x x 0 ) + -

Transcript of Solving the differential equation · Solving the differential equation: or in the general form:...

Page 1: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

A note about gradient descent:

Consider the function f(x)=(x-x0)2

Its derivative is:

By gradient descent

(If f(x) is more complex we usually cannot solve explicitly the convergence to the fixed points.)

x0

dxdt

= −ηdf (x)dx

= −2η(x − x0)

+ -

Page 2: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Solving the differential equation:

or in the general form:

What is the solution of this type of equation:

Try:

Page 3: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Objective function formulation

We can define a function Rm (Intrator 1992)

This function is called a Risk/Objective/Energy/ Lyaponov/Index/Contrast –function in different uses

The minimization of this function can be obtained by gradient descent: €

Rw = −µ13E[(w ⋅ x)3]− 1

4E 2 (w ⋅ x)2[ ]

dwi

dt= −

∂Rw

∂wi

= E (w ⋅ x)y

2 xi

− E (w ⋅ x)

2[ ]θm

E (w ⋅ x

y)xi

Page 4: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

dwi

dt= E y 2xi[ ] −θmE y xi[ ] = E xiy(y −θm )[ ]

= E xiφ(y,θm )[ ]

Therefore

And the stochastic analog is:

where

dwi

dt= xiφ(y,θm )

θm = E y 2[ ]It can be shown that the stochastic ODE converges to the deterministic ODE.

Page 5: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Using the objective function formulation:

•  Fixed points in various cases have been derived.

•  A connection has been established with the statistical theory of Projection Pursuit

(See chapter 3 of Theory of Cortical plasticity

Cooper, Intrator, Blais, Shouval )

Page 6: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

One way of looking at PCA is that we move to a basis that diagonalizes the correlation matrix. Whereby each PC grows independently.

From the basis of the new N principal components xk

we can form the rotation matrix such that we get the correlation basis in the new rotated space:

so that

U = x1 ,...xN[ ] =

x11 xN1

x1N xNN

Page 7: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Graphically:

first PC second PC

What does PCA do?

•  Dimensionality reduction (hierarchy)

•  Eliminate correlations – by diagonalizing correlation matrix

Another thing PCA does is that it finds the projection (direction) of maximal variance. Or

assuming |w|=1 and <x>=0

Page 8: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

ICA – Independent Component Analysis

Usually:

Definition of Independent components:

In ICA we typically assume this can be done by a linear transformation:

Such that x’ are independent.

The approach described here follows most closely the work of Hevarinen and Oja (1996,2000)

Page 9: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Cocktail party effect

Original signals s Mixed signals x

Task of ICA – estimate s from x or equivalently estimate the mixing matrix A, or it’s inverse W such that:

Or in matrix notation

s1

s2

x1

x2

Page 10: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Illustration of ICA:

s1

s2

x1

x2

0

( Note the ICA approach only makes sense if the data is indeed a superposition of independent sources)

Page 11: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Definition of independence:

This implies that

And also implies the private case of decorrelation:

But the inverse is not true; decorrelation does not imply independence.

Example: pairs of discrete variables (y1,y2) such that points each have the probability of ¼. These variables are uncorrelated, but

show at home: 1. that uncorrelated 2. that not independent

Page 12: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Non Gaussian is independent

(Caveat – signals cannot be Gaussian)

Central limit theorem: “ A sum of many independent random variables approaches a Gaussian distribution as the number of variables increases.

Consequently: A sum of two independent, non Gaussian random variables, is “More Gaussian” than each of the signals.

Page 13: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Example

Here independent component are sub-Gaussian (light tails) An exponential is super-Gaussian

Page 14: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Contrast functions to measure ‘distance’ from a Gaussian distribution

1.  Kurtosis- a standard simple to understand measure based on the forth moment.

There are two forms of Kurtosis, one is:

typically assume that: E{y}=0 and E{y2}=1, so:

K(y) = E y 4{ }− 3E y 2{ }2

K(y) = E y 4{ }− 3At home- calculate the Kurtosis of a uniform distribution form -1 to 1 and an exponential exp(-|x|).

Page 15: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Other options (cost functions):

1.  Negentropy (HO paper 2000)

2.  KL distance between P(x1…xn) and P(x1). …P(xn) (HO paper 2000)

3. BCM objective function (Theory of … Book 2003- ch 3)

Page 16: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

We will use here a similar approach to that used in the objective function formulation of BCM. Use Gradient descent to maximize Kurtosis.

What does the sign depend on? However, this rule is not stable for growth of w, and therefore an additional constraint should be used to keep w2=1. This could be done with a similar trick to Oja 1982.

For another approach (FastICA see HO, 2000)

dwi

dt= ±η

∂ E y 4{ }− 3E y 2{ }2[ ]

∂wi

= ±4η ⋅ E y(c 2 − 3E[y 2]θ m

)xi

= ±η ⋅ E y(y 2 − 3θm )xi[ ]

Page 17: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Example 1 – cash flow in retail stores Original – preprocessed data 5 Independent components

weeks weeks

Page 18: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Example 2: ICA from natural images

Are these independent components of natural Images?

ICA, BCM and Projection Persuit

Page 19: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Projection Pursuit – find non Gaussian Projections

Page 20: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Summary

Page 21: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Generating Heterosynaptic and Homosynaptic models From Kurtosis we got:

This alone is non stable to growth of w. can use the same trick to keep w normalized as in the Oja rule.

For this case we obtain:

This is a Heterosynaptic rule. Note- the different uses of “Heterosynaptic”

dwi

dt=η y(y 2 − 3θm )xi − y

2(y 2 − 3θm )wi

Heterosynaptic term

Page 22: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

There is another form for Kurtosis:

Therefore:

This produces a (more complex) Homosynaptic rule with a sliding threshold

K1(y) =E y 4{ }E y 2{ }

2

dwi

dt=η

1θm

2

E y(y 2 − E[y 4 ]/θm )[ ] xi

Page 23: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

What are the consequences of these different rules?

General form:

Heterosynaptic term

Page 24: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Monocular Deprivation Homosynaptic model (BCM)

High noise

Low noise

Page 25: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Monocular Deprivation Heterosynaptic model (K2)

High noise

Low noise

Page 26: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

Noise Dependence of MD Two families of synaptic plasticity rules

Hom

osyn

aptic

He

tero

syna

ptic

Blais, Shouval, Cooper. PNAS, 1999

QBCM K1 S1

Noise std Noise std Noise std

S2 K2 PCA

Noise std Noise std Noise std

Nor

mali

zed

Tim

e

N

orm

alize

d T

ime

Page 27: Solving the differential equation · Solving the differential equation: or in the general form: What is the solution of this type of equation: Try: Objective function formulation

What did we learn until now?