Independent Component Analysis - Örebro...

Post on 04-Jun-2018

227 views 0 download

Transcript of Independent Component Analysis - Örebro...

Independent Component Analysis

PhD SeminarJörgen Ungh

Agenda

• Background – a motivater• Independence• ICA vs. PCA• Gaussian data• ICA theory• Examples

Background & motivation

• The cocktail party problem

Hi hiBla bla

Blabla

Background & motivation

• The cocktail party problem

Hi hiBla bla

Blabla

Background & motivation

• The cocktail party problem

Hi hiBla bla

Blabla

x1

x3

x2

s1 s2s3

Cocktail party problem• Let s1(t), s2(t) and s3(t) be the original spoken signals• Let x1(t), x2(t) and x3(t) be recorded signals• The connection between s and x can be written

x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t)x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t)x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t)

Goal: Estimate s1, s2 and s3 from x1, x2 and x3?Problem: We do not know anything about the right side…

”Today we celebrate ourindependence day”

- US President THOMAS J. WHITMORE (Bill Pullman) in Independence Day (1996)

Independence – what is it?Independence = Uncorrelation?

Definitions

}))({( Tyxxy mymxEC −−=Covariance:

}{T

xy yxER =Correlation:

xyxyyx RCthenmmIf === ,0

Uncorrelated

• Two vectors are uncorrelated if:

0}))({( =−−= Tyxxy mymxEC

Tyx

TT

xy mmyExEyxER === }{}{}{

0,0 ==== xyxyyx RCthenmmIf

…from now we assume zero mean variables

Independent• Vectors x,y are independent if:

• Which also gives:

• Where gx and gy are arbitrary functions of x and y

)()(),(, ypxpyxp yxyx =

)}({)}({)}()({ ygExgEygxgE yxyx =

Independent

Independent is stronger than uncorrelated!

}{}{}{TTyExEyxE =

)}({)}({)}()({ ygExgEygxgE yxyx =

Equal if linear functions of x and y

Independent ≠ Uncorrelated

x

y

x

y

Are x and y uncorrelated?

Independent ≠ Uncorrelated

x

y

x

y

YESYES

Are x and y uncorrelated?

Independent ≠ Uncorrelated

x

y

x

y

Are x and y independent?

Independent ≠ Uncorrelated

x

y

x

y

No YES

Are x and y independent?

Relations

Independent Uncorrelated

BUT

Uncorrelated Independent

ICA vs. PCA

Independent Principal

PCA• Goal: ”Project data onto an ortonormal

basis with maximum variance”

• Data explained by principal components

e1

e2

PCA

• Uses information up to second moment, i.e. the mean and variance/covariance

• Reduce dimensions of data

• Ortonormal basis of uncorrelated vectors

ICA• Goal: ”Find the independent sources”

• Data explained by independent components

x

y

e1

e2

ICA

• Uses information over second moment, i.e. higher order statistics like kurtosis and skewness

• Does not reduce dimensions of data

• A basis of independent vectors

ICA vs. PCA

• Independent is stronger

• In case of Gaussian data, ICA = PCA

Gaussian data

Gaussian distribution

• Definition:

⎟⎠⎞

⎜⎝⎛ −−−= − )()(21exp

)2(

1)( 1

212/

µµπ

xCxC

xf T

Nx

C = covariance matrix, µ = mean vector

Explained completely by first and second orderstatistics, i.e. mean and variances

Gaussian data

• Cannot perform a rotation of the basis, due to symmetry

Gaussian distribution

• Completely defined by its first and secondmoment

• Uncorrelated Gaussian data Independence

• Why assume gaussian data?

Central limit theorem• Definition:

”A sum of independent random variables will tendto be Gaussian”

• That is the argument for many assumptions on gaussian distributions

Central limit theorem

• Definition:

”A sum of independent random variables willtend to be Gaussian”

What if we put it in another way…?

Central limit theorem

• 2:nd definition:

”The mixtures of two or more independent random variables are more gaussian thanthe random variables themselves”

A single random u.d. variable

A mixture of 2 u.d. variables

Idea!• The observed mixtures should be more

gaussian than the original components

• The original components should be less gaussian than the mixture

• If we try to maximize the non-gaussianity of the data, we should get closer to the original components…

ICA theory

• Problem definition• Solution • Preprocessing• Different methods• Examples

ICA: Definition of the problem• Let s1(t), s2(t) and s3(t) be the original signals• Let x1(t), x2(t) and x3(t) be collected signals• The connection between s and x can be written

x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t)x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t)x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t)

Goal: Estimate s1, s2 and s3 from x1, x2 and x3?

ICA: Assumption

• Independence

• Non-gaussian

• Square mixing matrix

ICA: Idea

• Maximize non-gaussianity of the data!

• We need a measure of ”Gaussianity” or ”Non-gaussianity”

Measures of Gaussianity

1. Kurtosis

Assuming zero mean variables

{ } { }( )224 3)( yEyEykurt −=

Measures of Gaussianity

1. Kurtosis

Assuming zero mean and unit variance

{ } 3)( 4 −= yEykurt

Measures of Gaussianity1. Kurtosis

For Gaussian data we have:

Which gives kurt(y) = 0, for Gaussian data

For most other, kurt ≠ 0, positive or negative

{ } { }( )224 3)( yEyEykurt −=

{ } { }( )224 3 yEyE =

Measures of Gaussianity

1. Kurtosis

Maximize |kurt(y)|

Measures of Gaussianity

1. Kurtosis

Maximize |kurt(y)|

Advantages:- Easy to compute

Drawbacks:- Sensitive to outliers

Measures of Gaussianity2. Negentropy

where, H = entropy, defined as:

)()()( yHyHyJ gauss −=

ηηη dppyH yy )(log)()( ∫−=

Measures of Gaussianity2. Negentropy

)()()( yHyHyJ gauss −=

Gaussian data has the largest entropy, meaning that it is the ”most random” distrubution.

Measures of Gaussianity2. Negentropy

)()()( yHyHyJ gauss −=

Gaussian data has the largest entropy, meaning that it is the ”most random” distrubution.

J(y) > 0 and equals zero if y gaussian

Measures of Gaussianity

1. Negentropy

Maximize J(y)

Advantages:- Robust

Drawbacks:- Computationally hard

ICA: Solutions

• Kurtosis• Negentropy• Maximum likelihood• Infomax• Mutual information• …

ICA: Solutions

• Kurtosis• Negentropy• Maximum likelihood• Infomax• Mutual information• …

Based on independenceand/or non-gaussian

ICA: Restrictions

• Non-gaussian data*

• Scaling, sign and order of components

• Need to know the No. of components

ICA: Restrictions

• Non-gaussian data*

• Scaling, sign and order of components

• Need to know the No. of components

* In case of some Gaussian data, the independent components will still be found, but the Gaussian oneswill be mixed.

ICA: Preprocessing

• No reduction of dimension in ICA

• Need to know the number of components

• But, we do already have a method for dimension reduction and estimating the probable number of components

ICA: Preprocessing

• No reduction of dimension in ICA

• Need to know the number of components

• But, we do already have a method for dimension reduction and estimating the probable number of components

Use PCA as a preprocessing step!

ICA: Preprocessing

• Low pass filtering+ Reduces noise– Reduces independence

• High pass filtering+ Increases independence- Increases noise

ICA: Overlearning

• Much more mixtures than independent components

• Spiky character of the components

Examples

• Cocktail party• Music separation• Image analysis• Separation of recorded signals of brain

activity• Process data• Noise/Signal separation• Process monitoring

Cocktail party problem

Hi hiBla bla

Blabla

Music separation

Mix 1 Est 1Source 1

Mix 2 Est 2Source 2

Mix 3 Est 3Source 3

Mix 4 Est 4Source 4

http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi

Music separation

Mix 1 Est 1Source 1

Mix 2 Est 2Source 2

Mix 3 Est 3Source 3

Mix 4 Est 4Source 4

http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi

Image analysis - NLPCA

Brain activity

S1

S3

S2S4

Process data

0 200 400 600 800 1000 1200-5

0

5Mixed signals

0 200 400 600 800 1000 1200-5

0

5

0 200 400 600 800 1000 1200-5

0

5

0 200 400 600 800 1000 12000

1

2

Process data

0 200 400 600 800 1000 1200-20

0

20Whitened signals

0 200 400 600 800 1000 1200-5

0

5

0 200 400 600 800 1000 1200-5

0

5

0 200 400 600 800 1000 1200-5

0

5

Process data

0 200 400 600 800 1000 1200-20

0

20Independent components

0 200 400 600 800 1000 1200-2

0

2

0 200 400 600 800 1000 1200-5

0

5

0 200 400 600 800 1000 1200-10

-5

0

Process data

0 200 400 600 800 1000 1200-1

0

1

0 200 400 600 800 1000 12000

2

4

0 200 400 600 800 1000 12000.5

1

1.5

0 200 400 600 800 1000 12000

0.5

1

Noise removal

• Different noise sources– Laplacian– Gaussian– Uniform– Exponential

0 100 200 300 400 500 600 700 800 900 1000-1

-0.5

0

0.5

1

1.5

2

Noise removal - Laplacian

0 200 400 600 800 1000 1200-4

-2

0

2

4Mixed signals

0 200 400 600 800 1000 1200-4

-2

0

2

4

Noise removal - Laplacian

0 200 400 600 800 1000 1200-2

-1

0

1

2

3Whitened signals

0 200 400 600 800 1000 1200-4

-2

0

2

4

6

Noise removal - Laplacian

0 200 400 600 800 1000 1200-2

-1

0

1

2Independent components

0 200 400 600 800 1000 1200-6

-4

-2

0

2

4

Noise removal - Gaussian

0 200 400 600 800 1000 1200-4

-2

0

2

4Mixed signals

0 200 400 600 800 1000 1200-5

0

5

Noise removal - Gaussian

0 200 400 600 800 1000 1200-2

-1

0

1

2Whitened signals

0 200 400 600 800 1000 1200-4

-2

0

2

4

Noise removal - Gaussian

0 200 400 600 800 1000 1200-2

-1

0

1

2Independent components

0 200 400 600 800 1000 1200-4

-2

0

2

4

Noise removal - Uniform

0 200 400 600 800 1000 1200-2

-1

0

1

2Mixed signals

0 200 400 600 800 1000 1200-2

-1

0

1

2

Noise removal - Uniform

0 200 400 600 800 1000 1200-2

-1

0

1

2Whitened signals

0 200 400 600 800 1000 1200-4

-2

0

2

4

Noise removal - Uniform

0 200 400 600 800 1000 1200-2

-1

0

1

2Independent components

0 200 400 600 800 1000 1200-2

-1

0

1

2

Noise removal - Exponential

0 200 400 600 800 1000 1200-1

0

1

2Mixed signals

0 200 400 600 800 1000 1200-0.4

-0.2

0

0.2

0.4

0.6

Noise removal - Exponential

0 200 400 600 800 1000 1200-2

-1

0

1

2

3Whitened signals

0 200 400 600 800 1000 1200-6

-4

-2

0

2

Noise removal - Exponential

0 200 400 600 800 1000 1200-2

-1

0

1

2Independent components

0 200 400 600 800 1000 1200-2

0

2

4

6

Process monitoring

• Often done by PCA

• Example: F1, F2

• One step further, use ICA!

Practical considerations

• Noise reduction (filtering)

• Dimension reduction (PCA?)

• Overlearning

• Algorithm

What about time signals?

• So far, no information about time used• Original ICA, x is a random variable• What if x is a time signal x(t) ?

0 100 200 300 400 500 600 700 800 900 1000-1

-0.5

0

0.5

1

1.5

2

x(t)

Time signal x(t)

• Extra information, order is not random:– Autocorrelation– Cross correlation

• More information

Relaxed assumptions

Gaussian data ok

Extensions…

• Non-linear ICA

• Independent subspace analysis

Further information:

Book:Independent Component Analysis - A. Hyvärinen, J. Karhunen, E. OjaCovers everything from novel to expert

Homepage:http://www.cis.hut.fi/projects/ica/Tutorials, material, contacts, matlab code, …

Journal of Machine Learning Researchhttp://jmlr.csail.mit.edu/papers/special/ica03.htmlPapers and publications

Toolboxes, codehttp://mole.imm.dtu.dk/toolbox/ica/index.htmlhttp://www.bsp.brain.riken.jp/ICALAB/http://www.cis.hut.fi/projects/ica/book/links.html