A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component...

A Comparative Study between

ICA and PCA

Md. Sahidul IslamRoll No. 08054718

Department of StatisticsUniversity of Rajshahi

ripon.ru.statistics@gmail.com

Department of Statistics, University of Rajshahi-6205

Overview

Motivation of the study

Objective

Definition of ICA

FastICA algorithm

Results of the study

Latent structure

Cluster analysis

Outlier detection

Conclusions

Motivation of the study

o In multivariate statistics Latent structure detection, cluster analysis, and outlier detection using PCA is a promising old technique.

o In many cases ICA perform better than PCA.

o Our motivation in this thesis is to perform latent structure, cluster analysis and outlier detection using ICA and compare it with that of PCA

Objectives

o Study algorithms of ICA

o Applying ICA for Latent structure detection, cluster analysis

and outlier detection.

o Comparing its performance with that of PCA

Independent Component Analysis

The simple “Cocktail Party” Problem

SourcesObservations

Mixing matrix A

2221212

2121111

PCAy= WTx

Non-gaussianity is independent

Central limit theorem

The distribution of a sum of independent random variables tends

toward a Gaussian distribution

Observed signal = S1 S2 Sna1 + a2 ….+ an

toward Gaussian Non-GaussianNon-GaussianNon-Gaussian6

Non-guassianity is Independent

Nongaussianity estimates independent

Estimation of y = wT x =wTAs = zTs

let z = AT w, so y = wTAs = zTs

y is a linear combination of si, therefore zTs is more gaussian than any of si

zTs becomes least gaussian when it is equal to one of the si

wTx = zTs equals an independent component

Maximizing nongaussianity of wTx gives us one of the independent components

FastICA algorithm

Iteration procedure for maximizing nongaussianity

Step1: choose an initial weight vector w

Step2: Let w+=E[xg(wTx)]-E[g’(wTx)]w(g: a non-quadratic function)

Step3: Let w=w+/||w+||

Step4: if not converged, go back to

Results and Discussions

Latent structure detection

Simulated dataset -1

Figure: Matrix plot of original source of 10 uniform distribution.

Figure: (a) Matrix plot of 10 principal components. (b) Matrix plot of source variables.

Figure: (a) Matrix plot of 10 independent components. (b) Matrix plot of source variables

Simulated dataset-2

Simulated dataset-2 consists of

5 variables comes from Laplace

(super-gaussian), uniform

(sub-gaussian), binomial,

multinomial and normal

distribution each have 10000

observation.

Figure: Matrix plot of original source of 5 variables each

comes form different distribution.

Department of Statistics, University of Rajshahi-6205 14

Simulated dataset-2

Figure: (Left)Matrix plot of principle components. (Right) Original source of 5 variables

each comes form different distribution.

Simulated dataset-2

Figure: (Left)Matrix plot of independent components. (Right) Original source of 5

variables each comes form different distribution.

Cluster Analysis

The first experiment of real data set for clustering is Australian crabs data set where

there are 200 rows and 8 columns describing the 5 morphological measurements

(Frontal lob size, Rear width, Carapace length, Carapace width, Body depth). There

are two species in the data set each have both sexes (male, female) of the genus

Leptograpsus. There are 50 specimens of each sex of each species, collected on site

at Fremantle, Western Australia. (N. A. Campbell et al., 1974).

Australian Crabs dataset

The second example of real data set is world famous Fishers Iris data set

where the data report four characteristics (sepal width, sepal length, petal

width and petal length) of three species (setosa, versicolor, virginica) of Iris

flower.

Fisher Iris dataset

Outlier detection

Scottish hill racing dataset

The data gives the record wining times for 35 hill races in Scotland (Atkinson,

1986). The purpose of that study was to investigate the relationship of record

time 35 hill races.

Epilepsy dataset

Thal and Vail reported data from clinical trial of 59 patients with

epilepsy, 31 of whom were randomized to receive the anti-epilepsy

drug Progabide and 28 receive placebo

This data consists of 21 days of operation for a plant for the

oxidation of ammonia as a stage in the production of nitric acid. The

response is called stack loss which is percent of uncovered

ammonia that escapes from the planet. There are three explanatory

and one response variable in the dataset.

Stackloss data

Education expenditure dataset

These data are used by Chatterjee, Hadi, and Price as an example

of heteroscedasticity. The data gives the education expenditures of

U.S. states as projected in 1975.

Conclusions

If the subject domain supports the assumption of

independent non-gaussian source variables, we

recommended of using ICA in place of PCA for latent

structure detection, clustering and outlier detection.

Future Research

The following are the areas in which we want to study

o Use Kernel technique of ICA for shape study, clustering and outlier

detection.

o Separation of Nonlinear mixture.

o Data mining (sometimes called data or knowledge discovery) is the

most recent technique in multivariate analysis to extract information

from a data set and transform it into an understandable structure for

further use. Text data mining or Medical data mining using ICA wolud

be future research.

Thank you

A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component...

Education

Transcript of A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component...

Independent Component Analysis and Its Applicationsfmriserver.ucsd.edu/ttliu/be280a_12/BE280A12_eeg3.pdf · 2012-11-14 · Independent Component Analysis ICA is a method to recover

Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

ECG Signal Denoising Using Wavelet Thresholding Techniques ......Independent Component Analysis (ICA) is for removing the noises from physiological signals in [9]. But, the ICA does

ICA decomposition and component analysis · Independent Component Analysis ... ICA decomposition and component analysis Task 1 Run ICA ... Singapore: Julie Onton – Data Decomposition

Techniques & Applications Independent component analysis ...u.cs.biu.ac.il/~louzouy/courses/seminar/ica.pdf · Independent component analysis (ICA) is a method for automatically identifying

Independent Component Analysis and Blind Signal Separation · Independent Component Analysis and Blind Signal Separation Fifth International Conference, ICA 2004 Granada, Spain, September

Independent Components Analysis. What is ICA? “Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate.

Sep 10, 2003ENEE 698A Seminar1 Independent Component Analysis (ICA) and Factor Analysis (FA) Amit Agrawal.

Independent component analysis via nonparametric maximum ...my2550/papers/ica.final.pdf · ICA VIA NONPARAMETRIC MAXIMUM LIKELIHOOD 2975 Our ICA estimating procedure uses the log-concave

Fast Kernel-Based Independent Component Analysispeople.csail.mit.edu/stefje/papers/FastHSICA_preprint.pdf · Recent approaches to Independent Component Analysis (ICA) have used kernel

Analytical Techniques Data Driven Principal Component Analysis (PCA) Independent Component Analysis (ICA) Fuzzy Clustering Others Structural equation modeling.

ICA and Dual Regression Practical · ICA and Dual Regression Practical Independent Component Analysis (ICA) is a tool that we can use to decompose FMRI data into spatially independent

What is ICA?web.eng.fiu.edu/mcabre05/DATA FOR PROJECTS/ICA... · ICA 3 Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the identification & separation of

International Symposium on I ndependent Component Analysis and Blind Source Separation, ICA 2004

Fundamentals of Principal Component Analysis (PCA ... · Independent Component Analysis (ICA) Independent component analysis (ICA) is a statistical and computational technique for

Independent Component Analysis and Unsupervised Learning · • Independent component analysis (ICA) is essential for blind source separation. • ICA is applied to separate the mixed

Case Studies of Independent Component Analysisoursland.net/tutorials/ica/ica-report.pdf1 Case Studies of Independent Component Analysis For CS383C – Numerical Analysis of Linear

Face Recognition Using Independent component analysis(ICA)

Independent Component Analysis: Algorithms and Applicationsric.uthscsa.edu/personalpages/lancaster/SPM_Class/Lecture_18/ica... · Independent Component Analysis: Algorithms and Applications

Independent Component Analysis (ICA) Adopted from: Independent Component Analysis: A Tutorial Aapo Hyvärinen and Erkki Oja Helsinki University of Technology.