Multivariate Variable Selection: Beamforming-based Approach · 2018-02-13 · 1/47 Multivariate...

1/47

Multivariate Variable Selection:Beamforming-based Approach

Jian ZhangUniversity of Kent, Canterbury, UK

14/02/2018

High-Dimensional Statistics and Complex Network WorkshopIMS, NUS, 5-16/02/2018, Singapore

High Dimensional Data and Complex Networks Multivariate Variable Selection: Beamforming-based Approach

2/47

Acknowledgements

This talk is partially based on joint works with

I Chao Liu, Elaheh Oftadeh and Hui Ding at University of Kent

I Gary Green in York Neuroimaging Centre

I Li Su in Cambridge Clinical Medical School


3/47

Outline

I Sparse multivariate regression

I Multivariate variable selection

I Applications

I Principal variable analysis

I Estimation covariance matrix

I Anti-cancer drug data

I Inverse imaging

I Face-perception data

I Simulated data

I Theory

I Extension for multivariate additive models

I Anti-cancer drug data revisit

I Conclusion

I References


4/47

Sparse multivariate regression

Sparse multivariate regression with random effects:

Yj , 1 ≤ j ≤ J regressed on the same X = (x1, ..., xp) of pcovariates with

Yj − Y =

p∑k=1

xk βkj + εj , 1 ≤ j ≤ J,

n × 1 n × 1 n × 1,

where

I βkj ’s are random

I p � n and J

I and for most of covariate k ’s are non-active in the sense thatvar(βkj) = 0 for all j ’s.

Zhang and Liu (2014, 2015);Zhang (2015); Zhang and Su (2015);Zhang and Oftadeh (2016); Ding, Zhang and Zhang (2018).


5/47

Multivariate variable selection

Write the model for (Y,X) in the matrix form

Y− Y1TJ = XB + E,

where B = (β1, ...,βJ) is an unknown p × J coefficient matrix forp covariates and E = (ε1, ..., εJ).

Take advantage of the covariance structure below: Assume thatgiven X, random coefficient matrix B is uncorrelated with εj .Then, the covariance

C = cov(Yj) = Xcov(βj)XT + cov(εj)

= Xν0cov(βν0j)XTν0 + cov(εj).

Aim to identify ν0, estimate βν0j and predict Yj .


6/47

Applications

Anti-cancer drug study: Cancer starts when gene changes makesome cells begin to grow and multiply too much. Anti-cancerdrugs attempt to reduce abnormal cell growth. IC50 value is aconcentration of drug that reduces certain biochemical activitysuch as cell multiplication to 50 percent of its normal value in theabsence of the inhibitor (drug).

Figure: https://www.kcl.ac.uk/


7/47

Applications

Anti-cancer drug study:Consider various cancer cell lines and a list of drugs.

Suppose thatboth genome-wide gene expression data and drug sensitive data to(IC50) are available.

One wants to find a set of genes (biomarkers) that account forvariability of drug sensitivities in curing cancers.


7/47

Applications

Anti-cancer drug study:Consider various cancer cell lines and a list of drugs. Suppose thatboth genome-wide gene expression data and drug sensitive data to(IC50) are available.

One wants to find a set of genes (biomarkers) that account forvariability of drug sensitivities in curing cancers.


8/47

Applications

Inverse neuroimaging:Conduct an non-invasive neuroimaging with n sensors (orchannels) on a human brain.

One wants to reconstruct neuronal-activities associate with somestimuli. The problem can be converted to estimation of a sparsemultivariate regression model by using Maxwell equations.

Figure: http://en.wikipedia.org/wiki


9/47

Existing methods

I

minB

1

2n||Y− XB||2F + λ

(1− α) ‖ B ‖2F +α

p∑k=1

√√√√ J∑j=1

β2kj

.

—Multivariate group LASSO (MGL), multivariate elastic-net(MENET).

I

minB

1

2n||Y− XB||2F + λ

(1− α)|B|+ α

p∑k=1

√√√√ J∑j=1

β2kj

.

—Multivariate LASSO (ML), multivariate group sparseLASSO (MGSL), Multivariate regression with covarianceestimation (MRCE).

I See Peng et al. (2010), Friedman et al. (2017, R-packageglmnet), Rothman et al. (2010), and Li et al. (2015).

I All these methods will breakdown when J tends to infinity.


10/47

Principal variable analysis: Idea

PCA involves a series of filters, which are tailored at eachorthogonal subspace and aim to find a few low dimensionalrepresentation for a high-dimensional dataset, resulting anorthogonal decomposition of sample covariance matrix.

I Find a1 = argmaxavar(aTY).

I Find ak = argmaxaT ai=0,1≤i≤k−1var(aTY).

I Select a tuning constant k following a rule.

Beamforming is a covariate-assisted PCA, where subspaces (notnecessarily orthogonal) are specified by covariates: One looks for aset of covariates ν with |ν| < n such that

C = ˆcov(Yj) ≈∑k∈ν

γkxkxTk + σ2Ip,

where γk is determined by the so-called power of the k-thcovariate. See van Veen et al. (1996).


10/47






Beamforming is a covariate-assisted PCA, where subspaces (notnecessarily orthogonal) are specified by covariates:

One looks for aset of covariates ν with |ν| < n such that


γkxkxTk + σ2Ip,



10/47






Beamforming is a covariate-assisted PCA, where subspaces (notnecessarily orthogonal) are specified by covariates: One looks for aset of covariates ν with |ν| < n such that


γkxkxTk + σ2Ip,



11/47

Principal variable analysis: Power of a covariate

The power of a covariate is defined via minimizing interferencesfrom other covariates and from noise.

For this purpose, assumethat C = cov(Yj) is independent of j and estimated by C. Notethat under wTxk = 1,

cov(wTYj) = σk + wT cov(∑i 6=k

xiβij + εj)w

+2cov(βkj ,wT (∑i 6=k

xiβij + εj)).


11/47


The power of a covariate is defined via minimizing interferencesfrom other covariates and from noise. For this purpose, assumethat C = cov(Yj) is independent of j and estimated by C. Notethat under wTxk = 1,

cov(wTYj) = σk + wT cov(∑i 6=k

xiβij + εj)w

+2cov(βkj ,wT (∑i 6=k

xiβij + εj)).


12/47


Therefore, Power of the kth covariate

γk = min{var(wTYj) : wTxk = 1} = (xTk C−1xk)−1,

which is estimated by

γk = (xTk C−1

xk)−1.

The signal-to-noise ratio (SNR):

γk/(σ2wT w), w = C−1

xk/xTk C−1

xk .


13/47

Principal variable analysis: Nulled-power

Covariates can be correlated. Consequently, when we investigate acovariate, other covariates may interfere our analysis.

To addressthis problem, we need to null the significant covariates identified inprevious steps by adding certain constraints on each linear filter.Let $ and ν be two non-overlapped subsets of the covariates withsizes m1 and m respectively.The nulled-power matrix γ(ν|$) can be shown:

γν|$ = wT C−1

w = eTν∪$

(xTν∪ωC

−1xν∪ω

)−1eν∪$, (1)

where eTν∪$ = (1T , 0T ) with 1 and 0 being the m-vector of 1’sand the m1-vector of 0’s respectively. And

w = C−1

xν∪$(

xTν∪$C−1

xν∪$)−1

eTν∪$,


13/47

Principal variable analysis: Nulled-power

Covariates can be correlated. Consequently, when we investigate acovariate, other covariates may interfere our analysis. To addressthis problem, we need to null the significant covariates identified inprevious steps by adding certain constraints on each linear filter.Let $ and ν be two non-overlapped subsets of the covariates withsizes m1 and m respectively.The nulled-power matrix γ(ν|$) can be shown:

γν|$ = wT C−1

w = eTν∪$

(xTν∪ωC

−1xν∪ω

)−1eν∪$, (1)

where eTν∪$ = (1T , 0T ) with 1 and 0 being the m-vector of 1’sand the m1-vector of 0’s respectively. And

w = C−1

xν∪$(

xTν∪$C−1

xν∪$)−1

eTν∪$,


14/47

Principal variable analysis: Forward Selection

To find principal covariates, we null the previously found covariatesin each step as follows.Step 1 (initialisation): Find k1 at which the SNR attains themaximum. Set ω = {k1}.

Step 2 (forward nulling): In the iteration m, m ≥ 2, let ωm−1denote the set of the identified covariates in the first m − 1iterations. For any covariate k not in ωm−1, using the formulae(1), we calculate the nulled predictive power γ{k}|ωm−1

as well asan optimal projection direction w.This gives a nulled SNR, SNR({k}|ωm−1), the ratio of the nulledpower to the white noise gain.We then find km 6∈ ωm−1 in which SNR({k}|ωm−1) attains themaximum. If the criteria shown later are satisfied, we add km tothe previous found covariate set. We update ωm−1 and xωm−1 byletting ωm = {km} ∪ ωm−1 and xωm = (xkm , xωm−1).


14/47


To find principal covariates, we null the previously found covariatesin each step as follows.Step 1 (initialisation): Find k1 at which the SNR attains themaximum. Set ω = {k1}.Step 2 (forward nulling): In the iteration m, m ≥ 2, let ωm−1denote the set of the identified covariates in the first m − 1iterations.

For any covariate k not in ωm−1, using the formulae(1), we calculate the nulled predictive power γ{k}|ωm−1



14/47


To find principal covariates, we null the previously found covariatesin each step as follows.Step 1 (initialisation): Find k1 at which the SNR attains themaximum. Set ω = {k1}.Step 2 (forward nulling): In the iteration m, m ≥ 2, let ωm−1denote the set of the identified covariates in the first m − 1iterations. For any covariate k not in ωm−1, using the formulae(1), we calculate the nulled predictive power γ{k}|ωm−1

as well asan optimal projection direction w.

This gives a nulled SNR, SNR({k}|ωm−1), the ratio of the nulledpower to the white noise gain.We then find km 6∈ ωm−1 in which SNR({k}|ωm−1) attains themaximum. If the criteria shown later are satisfied, we add km tothe previous found covariate set. We update ωm−1 and xωm−1 byletting ωm = {km} ∪ ωm−1 and xωm = (xkm , xωm−1).


14/47



as well asan optimal projection direction w.This gives a nulled SNR, SNR({k}|ωm−1), the ratio of the nulledpower to the white noise gain.

We then find km 6∈ ωm−1 in which SNR({k}|ωm−1) attains themaximum. If the criteria shown later are satisfied, we add km tothe previous found covariate set. We update ωm−1 and xωm−1 byletting ωm = {km} ∪ ωm−1 and xωm = (xkm , xωm−1).


14/47



as well asan optimal projection direction w.This gives a nulled SNR, SNR({k}|ωm−1), the ratio of the nulledpower to the white noise gain.We then find km 6∈ ωm−1 in which SNR({k}|ωm−1) attains themaximum.

If the criteria shown later are satisfied, we add km tothe previous found covariate set. We update ωm−1 and xωm−1 byletting ωm = {km} ∪ ωm−1 and xωm = (xkm , xωm−1).


14/47





15/47

Principal variable analysis: Early stopping rule

Early stopping criterion: After a number of iterations, the nulledSNR values will start levelling off, which indicates that theremaining covariates have no predictive power for the response.

The hypothesis of no predictive power is accepted if the maximumnulled SNR value, SNRmax, of the upper set falls into the followingconfidence interval,

|SNRmax − µl | ≤ c0σl ,

where c0 is a tuning constant. The iteration will be terminatedwhen the upper subset is uninformative. Otherwise, we add thecovariate, which attains the maximum nulled SNR value, into thecurrent set of selected covariates ω and the iteration will continue.We set the default value c0 = 3 for above tuning constant at theconfidence level of 99.7%.


15/47

Principal variable analysis: Early stopping rule

Early stopping criterion: After a number of iterations, the nulledSNR values will start levelling off, which indicates that theremaining covariates have no predictive power for the response.The hypothesis of no predictive power is accepted if the maximumnulled SNR value, SNRmax, of the upper set falls into the followingconfidence interval,

|SNRmax − µl | ≤ c0σl ,

where c0 is a tuning constant. The iteration will be terminatedwhen the upper subset is uninformative. Otherwise, we add thecovariate, which attains the maximum nulled SNR value, into thecurrent set of selected covariates ω and the iteration will continue.We set the default value c0 = 3 for above tuning constant at theconfidence level of 99.7%.


16/47

Estimation of covariance matrix

Inspired by Ledoit and Wolf (2004), we propose the followingestimator of C:

Chs =b2nd2n

µnIn +d2n − b2nd2n

Ch,

where

µn = < Ch, In >, d2n =< Ch − µnIn, Ch − µnIn >,

b2n =1

J2

J∑k=1

1

n

n∑i=1

n∑j=1

(yikykj − cij)2I (|cij | > hτnJ),

b2n = min{b2n, d2n},

where τnJ = c0√

log(n)/J.


17/47

Anti-cancer drug data

We assessed the performance of PVA on a dataset, which wasdiscussed in details by Garnett et.al (Nature, 2012). The datacontain 13321 gene expressions and fifty percent inhibitoryconcentration (IC50) values of 131 drugs across 42 cell lines.Zhang and Oftadeh (2016).


18/47


(a)

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1IAR

SC

LAS

P1

STA

MB

PL1

GS

TM

3E

ML1

TR

IM34

.TR

IM6.

TR

ID

EC

R1

EP

400

TAD

A2L

RP

L39L

FAIM

3C

18O

RF

24C

D1A

CID

EB

TP

53Q

KI

SN

TB

1S

EM

A4C

NU

DT

2R

FX

2G

PS

N2

C21

OR

F45

CO

L5A

1R

P1.

153G

14.3

MK

L1F

KS

G44

KIA

A18

56H

DG

F2

CR

OC

CW

DR

76R

PS

14M

AP

3K6

LY6E

SLC

O2B

1N

R1D

2R

HB

DD

3S

TX

7

IARSCLASP1

STAMBPL1GSTM3

EML1TRIM34.TRIM6.TRI

DECR1EP400TADA2L

RPL39LFAIM3

C18ORF24CD1ACIDEB

TP53QKI

SNTB1SEMA4C

NUDT2RFX2GPSN2

C21ORF45COL5A1

RP1.153G14.3MKL1

FKSG44KIAA1856

HDGF2CROCC

WDR76RPS14MAP3K6

LY6ESLCO2B1

NR1D2RHBDD3

STX7

(b)

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1IAR

SC

LAS

P1

STA

MB

PL1

GS

TM

3E

ML1

TR

IM34

.TR

IM6.

TR

ID

EC

R1

EP

400

TAD

A2L

RP

L39L

FAIM

3C

18O

RF

24C

D1A

CID

EB

TP

53Q

KI

SN

TB

1S

EM

A4C

NU

DT

2R

FX

2G

PS

N2

C21

OR

F45

CO

L5A

1R

P1.

153G

14.3

MK

L1F

KS

G44

KIA

A18

56H

DG

F2

CR

OC

CW

DR

76R

PS

14M

AP

3K6

LY6E

SLC

O2B

1N

R1D

2R

HB

DD

3S

TX

7

IARSCLASP1

STAMBPL1GSTM3

EML1TRIM34.TRIM6.TRI

DECR1EP400TADA2L

RPL39LFAIM3

C18ORF24CD1ACIDEB

TP53QKI

SNTB1SEMA4C

NUDT2RFX2GPSN2

C21ORF45COL5A1

RP1.153G14.3MKL1

FKSG44KIAA1856

HDGF2CROCC

WDR76RPS14MAP3K6

LY6ESLCO2B1

NR1D2RHBDD3

STX7

Figure: (a) Gene expression correlation coefficients of selected gene. (b)

Response-correlation coefficients of selected genes.


19/47

Anti-cancer drug data: Response-network of selected genes

Figure: Network of estimated coefficient of selected genes.


20/47


Figure 4 presents a network of estimated coefficients of theselected genes. The adjacency matrix was based on correlationsbetween rows of the estimated coefficient matrix. Fisher’sz-transformation was used to normalize the pair wise correlations.Then, the z-matrix was thresholded at 1% significance level.


21/47


The network is a functional network in the sense that while thesegenes were uncorrelated (or the corresponding columns in thedesign matrix are uncorrelated), their contributions to IC50 valueswere highly correlated.


22/47


To validate the results from the existing facts in cancer biology, welooked at The Human Protein Atlas Portal for the 20 mostcommon cancers in the world cancer report 2014. For each geneselected, its protein expression/staining level has been calculatedfor each of these cancers. The scores of staining have been dividedinto 4 categories: high (3), medium (2), low (1) and not detected(0). The mean score of each gene for different cancers is presented.


23/47

Anti-cancer data

Among the 37 selected genes, these with scores larger than 1.5(the background score) are particularly interesting.The breast cancer: stambpl1, decr1, faim3, ska1, nudt2, znf391,fksg44, kiaa1856, rps14.The liver cancer: iars, decr1, nudt2, znf391.The lung cancer: ska1, znf391, fksg44, rps14, nr1d2.The prostate cancer: clasp1, decr1, ska1, nudt2, c21orf45, znf391,fksg44, kiaa1856, rhbdd3.So on.


24/47

Inverse imaging

Conduct an non-invasive imaging with n sensors (or channels) onan object.

The imaging process can be approximately described bya spatio-temporal model as follows:

Y(tj)− Y =

p∑k=1

xkβk(tj) + ε(tj), 1 ≤ j ≤ J,

where {Y(tj) : 1 ≤ j ≤ J} are n-dimensional time series recordedby the sensors, {βk(tj) : 1 ≤ j ≤ J}, 1 ≤ k ≤ p are latent sourcemagnitudes of interest, xk , 1 ≤ k ≤ p are known, calledunit-input-gain vectors, and {ε(tj) : 1 ≤ j ≤ J} are unobservedn-dimensional noise time series in the sensors.


24/47

Inverse imaging

Conduct an non-invasive imaging with n sensors (or channels) onan object. The imaging process can be approximately described bya spatio-temporal model as follows:

Y(tj)− Y =

p∑k=1

xkβk(tj) + ε(tj), 1 ≤ j ≤ J,



24/47

Inverse imaging


Y(tj)− Y =

p∑k=1

xkβk(tj) + ε(tj), 1 ≤ j ≤ J,

where

{Y(tj) : 1 ≤ j ≤ J} are n-dimensional time series recordedby the sensors, {βk(tj) : 1 ≤ j ≤ J}, 1 ≤ k ≤ p are latent sourcemagnitudes of interest, xk , 1 ≤ k ≤ p are known, calledunit-input-gain vectors, and {ε(tj) : 1 ≤ j ≤ J} are unobservedn-dimensional noise time series in the sensors.


24/47

Inverse imaging


Y(tj)− Y =

p∑k=1

xkβk(tj) + ε(tj), 1 ≤ j ≤ J,



25/47

Face-perception data: Single-subject with multiple trials(Henson et al.,2010)

I The experiment includes six sessions. Here, we take the firstsession as an example, which includes 96 trials labeled as Faceand 50 labeled as Scramble Face.

I The MEG data were collected with 102 magnetometers andsampled at rate 1100Hz.


26/47

Face-perception data

Figure: Orthogonal plots for global peak at (-5,5,5)cm and a local peakat (-4,-4,8)cm.

Zhang et al.(2014)


27/47

Simulated data

We simulated 50 datasets from Y = XB + ε for various cases of B,which included

I Strong and weak correlations within and between the rows ofB.

I Various values of (n, p, J, a).

I The rows of X, was sampled from Np(0,Σ) where Σ wasobtained from the gene expression matrix in the IC50 data.


28/47

Simulated data

I Two kinds of covariance matrix estimators C were considered:thresholding Ch = (cij I (|cij | ≥ hτnJ)) with

h = 0.01, 0.005, 0.001 and shrinkage Chs .

I Performance was assessed by:Sensitivity (survival rate of true active covariates).Specificity (survival rate of true non-active covariates).Here, a covariate is called active if its regression coefficientsto response has positive variance.

I Compared our method to MGL, MENET, MRCE, ML andMGSL.


29/47

Simulated data

(a) Oscillated around0

0

25

50

75

100

sh_o hs1 hs2 hs3 mgl menet mrce ml msglMethod

Val

ue senspe

(b) Oscillated around0

0

25

50

75

100


Val

ue senspe

(c) Oscillated around0

0

25

50

75

100


Val

ue senspe

(d) Oscillated around0

0

25

50

75

100


Val

ue senspe

(e) Separate from 0

0

25

50

75

100


Val

ue senspe

(f) Separate from 0

0

25

50

75

100


Val

ue senspe

(g) Separate from 0

0

25

50

75

100


Val

ue senspe

(h) Separate from 0

0

25

50

75

100


Val

ue senspe

Figure: Simulated data with zero correlations between the rows of B.


30/47

Simulated data

(a) Oscillated around0

0

25

50

75

100


Val

ue senspe

(b) Oscillated around0

0

25

50

75

100


Val

ue senspe

(c) Oscillated around0

0

25

50

75

100


Val

ue senspe

(d) Oscillated around0

0

25

50

75

100


Val

ue senspe

(e) Separate from 0

0

25

50

75

100


Val

ue senspe

(f) Separate from 0

0

25

50

75

100


Val

ue senspe

(g) Separate from 0

0

25

50

75

100


Val

ue senspe

(h) Separate from 0

0

25

50

75

100


Val

ue senspe

Figure: Simulated data with correlations between the rows of B.


31/47

Simulated data

(a) Low corr.w.inrows(n,J,a)=(88,20,50)

25

50

75

100

sh_o hs1 hs2 hs3 mgl menet ml msglMethod

Val

ue senspe

(b) Low corr.w.inrows(n,J,a)=(150,20,50)

25

50

75

100


Val

ue senspe

(c) Low corr.w.inrows(n,J,a)=(88,34,50)

25

50

75

100


Val

ue senspe

(d) Low corr.w.inrows(n,J,a)=(150,34,50)

0

25

50

75

100


Val

ue senspe

(e) Low corr.w.inrows(n,J,a)=(88,20,70)

0

25

50

75

100


Val

ue senspe

(f) Low corr.w.inrows(n,J,a)=(150,20,70)

25

50

75

100

sh_o hs1 hs2 hs3 mgl menet ml mgslMethod

Val

ue senspe

(g) Low corr.w.inrows(n,J,a)=(88,34,70)

25

50

75

100


Val

ue senspe

(h) Low corr.w.inrows(n,J,a)=(150,34,70)

0

25

50

75

100


Val

ue senspe

Figure: Simulated data with low correlations in the rows of B.


32/47

Simulated data

(a) High corr.w.inrows(n,J,a)=(88,20,50)

25

50

75

100


Val

ue senspe

(b) High corr.w.inrows(n,J,a)=(150,20,50)

0

25

50

75

100


Val

ue senspe

(c) High corr.w.inrows(n,J,a)=(88,34,50)

0

25

50

75

100


Val

ue senspe

(d) High corr.w.inrows(n,J,a)=(150,34,50)

0

25

50

75

100


Val

ue senspe

(e) High corr.w.inrows(n,J,a)=(88,20,70)

0

25

50

75

100


Val

ue senspe

(f) High corr.w.inrows(n,J,a)=(150,20,70)

25

50

75

100


Val

ue senspe

(g) High corr.w.inrows(n,J,a)=(88,34,70)

0

25

50

75

100


Val

ue senspe

(h) High corr.w.inrows(n,J,a)=(150,34,70)

0

25

50

75

100


Val

ue senspe

Figure: Simulated data with high correlations in the rows of B.


33/47

Theory

Suppose that for ν0 = {k1, ..., kp0} and for any ν ⊆ ν0, we can findj[1:m] = {j1, ..., jm} ⊆ {1, ..., p0} such that ν = {kj : j ∈ j[1:m]}. Leteν/ν0 be a |ν0| × |ν| indicator matrix with the (jl , l)th entry equalto 1, 1 ≤ l ≤ |ν| and with other entries equal to zeros. Using eν/ν0 ,we select sub-columns from xν0 to form xν , namely xν = xν0eν/ν0 .

Let Aν0 = C− xν0eTν0Σeν0x

Tν0 , the remainder of C after subtracting

the term xν0eTν0Σeν0x

Tν0 . For any subsets ν and ν0, define the

coherence (i.e., collinearity) matrices between xν and xν0 :

Rνν = xTν A−1ν0 xν/n, Rνν0 = xTν A

−1ν0 xν0/n, Rν0ν0 = xTν0A

−1ν0 xν0/n.


34/47

Theory

(C0). There exists a permutation on yj , 1 ≤ j ≤ J so that theresulted sequence is with marginal covariance matrix C. The errorterm εj and the p-dimensional regression coefficient βj areindependent of each other.

(C1). There are a constant 0 < r ≤ 1 and a set of activecovariates ν0 of size |ν0| ≤ rn such that xν0 is of full column rankand that eTν0Σeν0 and Aν0 are invertible.

(C2). For ν0 and r in Condition (C1), as n tends to infinity, thereis a constant 0 ≤ α0 < 1 such that uniformly for any setν ⊆ [1 : p] with |ν| ≤ rn,

n−α0 ≤ λmin(Rνν) ≤ λmax(Rνν) = O(nα0).


35/47

Theory

(C3). (Irrepresentability) For ν0 and r in Condition (C1), as ntends to infinity, uniformly for any ν ⊆ [1 : p] \ ν0 with the size|ν| ≤ rn, (Rνν − Rνν0R

−1ν0ν0Rν0ν)−1 = O(nα0).

(C4). For ν0 and r in Condition (C1), as n tends to infinity,uniformly for any ν ⊆ [1 : p] \ ν0 with the size |ν| ≤ rn,xTν0A

−2ν0 xν = ζ0xTν0A

−1ν0 xν + O(1), where ζ0 and O(1) are

independent of ν.

(C5). There exist positive constants κ1 and τ1 such that for anyu > 0, 1 ≤ j ≤ J,

max1≤i≤n

P(|yij | > u) ≤ exp(1− τ1uκ1)

and max1≤i≤n E |yi1|4η0 < +∞, where η0 > 1 is a constant.


36/47

Theory

We assume that there exists a permutation π on {1, ..., J} so thatyπ(j), 1 ≤ j ≤ J are strong mixing. Let Fk0

0 and F∞k denote theσ-algebras generated by {yπ(j) : 0 ≤ j ≤ k0} and {yπ(j) : j ≥ k}respectively. Define the mixing coefficient

α(k) = supA∈Fk0

0 ,B∈F∞k

|P(A)P(B)− P(AB)|.

The mixing coefficient α(k) quantifies the degree of thedependence of the process {yπ(j)} at lag k. We assume that α(k)is decreasing exponentially fast as lag k is increasing, i.e.,

(C6): There exist positive constants κ2 and τ2 such thatα(k) ≤ exp(−τ2kκ2).


37/47

Theory

Theorem 1:Suppose that there exist constants 0 ≤ α1 ≤ (1− 3α0)/2 andc2 > 0, c2n

−α1 ≤ λmin

(eTν0Σeν0

)≤ λmax

(eTν0Σeν0

)= O(1). Let

Σν/ν0 =(

eTν/ν0(eTν0Σeν0

)−1eν/ν0

)−1, a partial covariance matrix

of ν with respect to ν0. Then, under Conditions (C0)∼(C6), as ntends to infinity, we have:

(i) Uniformly for any ν ⊆ ν0 with |ν| ≤ rn,γν = Σν/ν0 + Op(n−1+α0+2α1 + n2τnJ).

(ii) Uniformly for any ν ⊆ [1 : p] \ ν0 with |ν| ≤ rn,γν = Op(n−1+α0 + n2τnJ).


38/47

Theory

Theorem 2:Suppose that Conditions (C0)∼(C6) hold and that τnJn

2 = o(1) asboth n and J tend to infinity. Then, we have:

(i) Uniformly for a ∈ [1 : p] \ ν0, a 6∈ ν1 ∪ ν2 and |ν1 ∪ ν2| < rn,the (ν1 ∪ ν2)-nulled predictive power of a admits the form

ˆSNRa|ν1∪ν2 = 1ζ0σ2 + Op(n−2+4α0+2α1 + n2τnJ).

(ii) Uniformly for a ∈ ν0 \ ν1 and |ν1 ∪ ν2| < rn, the(ν1 ∪ ν2)-nulled SNR of covariate a admits the form

ˆSNRa|ν1∪ν2 =neTa/ν0Σ−1ν0\ν1ea/ν0

σ2η0eTa/ν0Σ−1ν0\ν1ΦΣ−1ν0\ν1ea/ν0(1 + o(1))

+Op(n2τnJ),

where Σ−1ν0\ν1 , Ψ and Φ are defined in the paper.


39/47

Theory

Let ωm denote the set of covariates derived from the (SNR-based)PVA. We have the following selection consistency for ωm.Theorem 3:Under the conditions in Theorem 2, as both n and J tend toinfinity, we have the selection consistency in the sense thatP(ωm = ν0)→ 1.


40/47

Extension to multivariate additive models

Letting Fk(xk) = (fk1(xk), ..., fkJ(xk)), we consider the followingmodel in the matrix form

Y = µ1TJ + F1(x1) + · · ·+ Fp(xp) + E. (2)

We assume the following condition for the additive componentsfkj(·):(C7) The additive component functions have a bounded support[a, b] and satisfy the Lipschitz inequality,

|f (r)kj (z + δ)− f(r)kj (z)| ≤ c0|δ|α

for all z and z + δ ∈ [a, b], where r is a non-negative integer and0 < α ≤ 1.


41/47

Extension to multivariate additive models

Replacing the kth component function by its B-splineapproximation, we have

Y = µ1TJ + Ψ(xk)Bk + E∗k ,

where E∗k = E + ∆k +∑

t 6=k Ft(xt), ∆k = Fk(xk)−Ψ(xk)Bk .


42/47

Anti-cancer drug data revisit

−2 −1 0 1 2

−2

01

2

PEX5

KIN

00

1.1

35

_IC

_5

0

(a)

−2 −1 0 1 2

−1

.00

.51

.5

NRXN2K

IN0

01

.13

5_

IC_

50

(b)


43/47


−2 −1 0 1 2

−1

01

2

HS2ST1

KIN

00

1.1

35

_IC

_5

0

(c)

−2 −1 0 1 2

−2

02

EIF4G1K

IN0

01

.13

5_

IC_

50

(d)


44/47


−2 −1 0 1

−2

01

2

CUL4A

KIN

00

1.1

35

_IC

_5

0

(e)

−1 0 1 2

−1

.50

.01

.5

PTPN22K

IN0

01

.13

5_

IC_

50

(f)


45/47


−3 −2 −1 0 1

−2

02

ACN9

KIN

00

1.1

35

_IC

_5

0

(g)

Figure: Estimated nonparametric sensitivity functions of selected genes,PEXS, NRXN2, HS2ST1, EIF4G1, CUL4A, PIPN22 and ACN9, to theanti-cancer drug KIN001-135.


46/47

Conclusion

I We have generalised PCA to the setting of multivariatevariable selection.

I We have developed the related theory.

I We have evaluated the proposed method by two real data sets.

I We have conducted a wide range of simulation studies toshow the superior performance of our proposal over theexisting methods: MGL, MENET, MRCE, ML and MGSL.

I We have applied the method to cancer data, identifying anumber of biomarkers which has been validated by theexisting facts on cancer biology. We have applied the methodto face perception imaging data, finding some neuronalsources for explaining sensor data.

I We have generalised our proposal to multivariate additivemodels.


47/47

References

Ding, H., Zhang, J. and Zhang, R. (2018). Nonparametricvariable screening for multivariate additive models. KentAcademic Repository (KAR), University of Kent.Zhang, J. (2015). On nonparametric feature filters inelectromagnetic imaging. Jour. Statist. Plann. and Inf., 164,39-53.Zhang, J. and Liu, C. (2015). On linearly constrainedminimum variance beamforming. Journal of Machine LearningResearch, 16, 2099-2145.Zhang, J., Liu, C. and Green, G. (2014). Source localizationwith neuroimaging data. Biometrics, 70, 121-131.Zhang, J. and Oftadeh, E. (2016). Multivariate variableselection by means of null-beamforming . Kent AcademicRepository (KAR), University of Kent.Zhang, J. and Su, L. (2015). Temporal autocorrelation-basedbeamforming with MEG neuroimaging data. Jour. Ameri.Statist. Soc., 110, 1375-1388.


Multivariate Variable Selection: Beamforming-based Approach · 2018-02-13 · 1/47 Multivariate...

Documents

Transcript of Multivariate Variable Selection: Beamforming-based Approach · 2018-02-13 · 1/47 Multivariate...