Tensor decomposition based unsupervised feature extraction applied to matrix products for multiview...

13
Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing Y-h. Taguchi Department of Physics, Chuo University Tokyo, Japan. PLoS ONE 12(8): e0183933. PLoS ONE 12(8): e0183933. DOI: 10.1371/journal.pone.0183933 DOI: 10.1371/journal.pone.0183933

Transcript of Tensor decomposition based unsupervised feature extraction applied to matrix products for multiview...

Page 1: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Tensor decomposition­based unsupervisedfeature extraction applied to matrix products 

for multi­view data processing

Y­h. Taguchi

Department of Physics, Chuo UniversityTokyo, Japan.

PLoS ONE 12(8): e0183933. PLoS ONE 12(8): e0183933. DOI: 10.1371/journal.pone.0183933DOI: 10.1371/journal.pone.0183933

Page 2: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

What's typical in Bioinformatics?What's typical in Bioinformatics?

Small samples(a few), variables(=genes)arehuge(~104)→a typical “large p small n” problem

Difficult to apply usual statistical analyses

ex. small samples  deep learning → דlarge p small n” problem→sparse modeling (lasso)variable selections ×

Approaches specific to bioinformatics are required

Page 3: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Purpose: multiview data analysis

persons×

features

persons

features

persons×

shoppings

shoppings

features:A,B,D,M

persons:β,δ,μ

shoppings:1,3,4

persons

Page 4: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

matrix      tensor

×xij xil

xij ×xil

xijl

Tensor decomposition

Gxik1

xjk2

xlk3

xijl=xij ×xil≒Σk1,k2,k3 Gk1,k2,k3

xik1xjk2

xlk3 

i:personsj:featuresl:shoppings

Page 5: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Demonstration using synthetic data set

50 50

1000+20%ノイズ

50

100%noise

No correlationsNo correlations

++

50

+20%ノイズ

50×1000×1000

tensor

Tensor decomposition

Page 6: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

xik1

k1=1

1≦i 50≦

k1=2 k1=3

xjk2

k2=1

k2=2

xlk2

k3=1

k3=2

1≦j 1000≦ 1≦l 1000≦

persons

features shoppings

Page 7: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Advantages as multi­view data analysis toolsAdvantages as multi­view data analysis tools

・No weights required to integrate multiple views・Complete unspervised learning

(no model buildings using pre­knowledge)・smaller computational resources because of linearity

 Disadvantages....

・tendency to require more memoriesSolution:summing up Σi xij ×xil results in j×l matrix that can be converted back (explains omitted)。

・no shared feature or samples result in four mode.

Page 8: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Feature extractionFeature extraction No real data separated well

Assume Gaussian

Detect outliers

Pi=P [ >∑k(x ikσ )

2

]

Benjamini­Hochberg corrected P <0.01

P­values by χ2 dist

P(p)

1­p0 1

Page 9: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Applications:multi­omics data

mRNAsample1

sample2

sample3

sample4

sample5

miRNA

A group

B group

activeactive

expression interaction

xij ×xil   i:161samples, j:13393mRNA, l:755miRNA,(8 groups)

Page 10: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

Selection of xik1distinct between symptoms

k1=1 k1=2 k1=3 k1=4 k1=5

1≦k1  5 are symptom dependent≦P­value

Page 11: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

k2 k3 k1 G(k1,k2,k3)

1≦k1 k2 k3  5≦

k1 :samplek2 :mRNA k3 :miRNA 

1≦ k2  5≦Larger G

Smaller G

1≦ k3  2≦

xjk2xlk3

assume Gaussian

Detect outliers

Benjamini­Hochberg corrected P <0.01

P­values by χ2 dist

755miRNA中7miRNA13393mRNA中427mRNA(Biological validations omitted)

Page 12: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

SummarySummary

・ As  a  feature  selection  in  multi  view  data,  after  applying tensor  decomposition  to  a  tensor  generated  by  product  of matrices,  I  propose  to  select  features  associated  with  BH­corrected P­values <0.01 computed by χ2 dist  assumed for a mode.

・ As  for  synthetic  data  set,  apparently  uncorrelated variables embedded into noised are decomposed to original orthogonal vectors after identifying correlated variables.

・As for muli omics data set, a few (a few %) inter­correlated and  biologically  reasonable  miRNAs  and  mRNAs  are identified among huge number of mRNAs and miRNAs

Page 13: Tensor decomposition based unsupervised feature extraction applied to matrix products  for multiview data processing

My presentation in GIW2017:GIW 7 ­ RNA Bioinformatics2nd Nov. Morning (c.a. 10 AM)

at Adonis (1F)

Tensor decomposition­based unsupervised feature extraction identified the universal nature of sequence­non­specific off­target 

regulation of mRNA mediated by microRNA transfection