A Selective Review of Su cient Dimension Reductionbios.unc.edu/research/bias/documents/unc.pdf ·...

A Selective Review of Sufficient DimensionReduction

Lexin Li

Department of StatisticsNorth Carolina State University

Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19

Outline

1 General FrameworkSufficient Dimension ReductionKey ConceptsEstimation Approaches

2 Basic Dimension Reduction ApproachesSliced Inverse RegressionOther Dimension Reduction Approaches

3 Some Potentially Useful ExtensionsClassificationVariable SelectionComplex ResponsesComplex PredictorsNonlinear Sufficient Dimension Reduction

4 Conclusion and Discussion

General Framework Sufficient Dimension Reduction

Sufficient dimension reduction

Basic regression (supervised learning) setup:

study the conditional distribution of Y ∈ IRr given X ∈ IRp

find a p × d matrix γ = (γ1, . . . , γd), d ≤ p, such that

Y X |γTX ⇔ Y |X = Y |γTX ⇔ X |(γTX ,Y ) = X |γTX

replace X with γTX = (γT1 X , . . . , γT

dX ) without losing any regressioninformation of Y |X(γT

1 X , . . . , γTdX ) are called the sufficient predictors

γ is not unique!

General Framework Key Concepts

Key concepts

Central subspace:

Y |X = Y |γTX ⇒ SDRS = Span(γ) ⇒ SY |X = ∩SDRS

Examples:

Y = f (γT1 X ) + σε

Y = f1(γT1 X ) + f2(γT

2 X )× ε

logit = γT1 X , where logit = log

{P(Y = 1|X )

1− P(Y = 1|X )

General Framework Key Concepts

Key concepts

Central mean subspace:

E (Y |X ) = E (Y |γTX ) ⇒ SE(Y |X )

For many models, SY |X = SE(Y |X )

Examples:

Y = f1(γT1 X ) + . . .+ fd(γT

dX ) + ε

Y = f1(γT1 X ) + f2(γT

2 X )× ε

General Framework Estimation Approaches

Estimation approaches

Inverse moment based:

sliced inverse regression (Li, 1991) and many variants: E (X |Y )

sliced average variance estimation (Cook and Weisberg, 1991)Cov(X |Y )

directional regression (Li and Wang, 2007)

Kernel smoothing based:

minimum average variance estimation (Xia et al. 2002): estimation ofthe derivative of E (Y |X )

variants: Xia (2007), Wang and Xia (2008)

Others: (not complete)

ordinary least squares (Li and Duan, 1991)

reproducing kernel Hilbert space (Fukumizu, Bach and Jordan, 2004,2009)

contour based (Li, Zha and Chiramonte, 2005, Li, Artemious and Li,2010)

General Framework Estimation Approaches

Estimation approaches

Comparison:

Inverse moment based:

very easy and fast to computerequires a relatively large sample sizerequires conditions on the distribution of X (linearity condition)

Kernel smoothing based:

works well for small sample sizerequires no condition on Xrequires kernel smoothingrelatively slow

Basic Dimension Reduction Approaches Sliced Inverse Regression

Sliced inverse regression

Foundation: under the linearity condition,

Σ−1x E{X − E (X )|Y } ∈ SY |X

Spectral decomposition formulation:

Σx |yγj = λjΣxγj , j = 1, . . . , p,

where Σx |y = Cov[E{X − E (X )|Y }] and Σx = Cov(X ).

obtain the first d eigenvectors (γ1, . . . , γd) corresponding to thelargest d positive eigenvalues λ1 ≥ . . . ≥ λd > 0, thenSpan(γ1, . . . , γd) ⊆ SY |Xassumes Y is categorical or slice Y to estimate E (X |Y )

asymptotic test / permutation test / BIC to determine d

Basic Dimension Reduction Approaches Sliced Inverse Regression

Sliced inverse regression

The linearity condition:

E (X |γTX ) is a linear function of γTX for a SY |X basis γ

X is elliptically symmetric; X is normally distributed

approximately true as p →∞ with a fixed d

involves no Y or Y |X , so nonparametric or model-free

Some important variants:

canonical correlation analysis: maxCorr2{h(Y ), bTX} over h(·) and b

letting β ≡ E [h(Y )Σ−1x E{X − E (X )|Y }] = Σ−1

x Cov{h(Y ),X}, thenβ ∈ SY |X

Beyond SIR:

sliced average variance estimation: 2nd inverse moment; exhaustive

directional regression: 1st and 2nd inverse moments; exhaustive

Basic Dimension Reduction Approaches Other Dimension Reduction Approaches

Other dimension reduction approaches

Principal components analysis:

spectral decomposition of Σx

unsupervised; linear combinations of X

Partial least squares:

at the population level, PLS = OLS; under the linearity condition,PLS estimates SE(Y |X ) (Li, Cook and Tsai, 2007)

supervised; linear combinations of X

Multidimensional scaling and nonlinear dimension reduction:

unsupervised; nonlinear combinations of X

Indepedent components analysis:

unsupervised; linear combinations of X

Some Potentially Useful Extensions Classification

Classification

Discriminant analysis:

directly applicable to categorical Y

at the population level, SIR ⇔ LDA ⇔ Fisher’s discriminant analysis;SAVE ⇔ QDA

SIR/SAVE produce sufficient predictors instead of classification rule;LDA/QDA produce probability estimate of Y = g |X and aclassification rule

SIR/SAVE require the linearity condition (normality) on X ;LDA/QDA require the normality assumption on X |Y

Why useful:

of course ...

Some Potentially Useful Extensions Variable Selection

Variable selection

Basic ideas:

rewrite the SDR estimation in least squares, then apply L1 typepenalty (adaptive group Lasso, SCAD)

foundation: Y XA|XI ⇔ corresponding rows of γ = 0

differ from most model-based variable selection approaches in that noparametric model on Y |X is imposed

Consistency in selection:

fixed p, n→∞ (Ni, Cook and Tsai, 2005, Bondell and Li, 2009)

diverging p →∞, n→∞, p < n (Wu and Li, 2010)

p = o(an) for any fixed a > 1 (Zhu, Li, Li and Zhu, 2010)

Why useful:

help interpretation, e.g., identifying regions of brain that are relevantto phenotype

Some Potentially Useful Extensions Complex Responses

Multivariate and complex responses

Basic ideas:

dimension reduction is still on X instead of Y

key observations:

SE(Y |X ) = SE(Y1|X ) ⊕ . . .⊕ SE(Yr |X )

SY |X ⊇ SY1|X ⊕ . . .⊕ SYr |X

multivariate reduced rank model (Cook and Setodji, 2003): oneresponse at a time

projective sampling (Li, Wen and Zhu, 2008): sample a on a unit ballO(n) times, and regress aTY on X

Why useful:

e.g., voxel-wise imaging genetics (ignore the spatial information)

what if Y has structures, such as spatial information in MRI, orpositive definiteness in DTI? — open question

Some Potentially Useful Extensions Complex Predictors

Predictors with structures

Basic ideas:

predictors have group structure, and dimension reduction (linearcombinations) should be within groups (Li, 2009, Li, Li and Zhu,2010)

direct sum structure: γ1 ⊕ . . .⊕ γgpartial dimension reduction, e.g., genetic / imaging information plusclinical / demographical information

Why useful:

fusion of different data modalities

what if X has, e.g., network structures? — open question

Matrix or array valued predictors

Basic ideas:

predictor is a matrix or an array instead of a vector, and dimensionreduction wishes to preserve interpretation

dimension folding (Li, Kim and Altman, 2010):

γTXη = (η ⊗ γ)Tvec(X)

tensor PCA / tensor ICA

Why useful:

MRI or fMRI

Functional predictors

Basic ideas:

predictor is a functional curve (dense / sparse)

sliced inverse regression in functional space (Ferre and Yao, 2003,2005, Hsing and Ren, 2009, Li and Hsing, 2010):

functional PCA

Why useful:

common nowadays in genetics and imaging data

Some Potentially Useful Extensions Nonlinear Sufficient Dimension Reduction

Nonlinear sufficient dimension reduction

Basic ideas:

map X to φ(X ), then do linear SDR in the φ(X ) space (Wu, Liangand Mukherjee, 2008, Zhu and Li, 2010)

the optimal separating hyperplane (Li, Artemious and Li, 2010)

Why useful:

categorical predictors

predictors with complex structures

how to do variable selection in this setup? — open question

Conclusion and Discussion

Conclusion and discussion

application of existing SDR solutions to imaging data

motivate new methodology development for dimension reduction

Conclusion and Discussion

Thank You!

A Selective Review of Su cient Dimension Reductionbios.unc.edu/research/bias/documents/unc.pdf ·...

Documents

Transcript of A Selective Review of Su cient Dimension Reductionbios.unc.edu/research/bias/documents/unc.pdf ·...

cient Manteiga

Kernel Dimension Reduction in Regressionpanc/research/dr/talks/KDR_Regression.pdfSu cient Dimension Reduction Regression setting: observe (X;Y) pairs, where the covariate X is p-dimensional

DEVELOPMENT AND APPLICATION OF A METHOD ... DIMENSION PSIRLOZ 1 DIMENSION RL 5) DIMENSION RBAR(016) DIMENSION RCAP(02.5) DIMENSION RSMLL(02t20) DIMENSION RMOD(5) DIMENSION RZERO(2)

2. Managing Implementation - nzei.org.nz · Curriculum Fluency 4. Dimension 1: Vision 5. Dimension 2: Principles 6. Dimension 3: Values 7. Dimension 4: Key Competencies 8. Dimension

Highway Dimension and Provably E cient Shortest Path ... · Highway Dimension and Provably E cient Shortest Path Algorithms ITTAI ABRAHAM Microsoft Research Silicon Valley ... (aka

arXiv:2005.06983v2 [hep-ph] 27 May 2020 · 2020-05-28 · as in (1.6) extracts the anomalous dimension at ‘-loop order. Thus ‘(1)’ in (1.7) indicates the coe cient of the leading

Implementing Ef cient Market Structure

Wm H Polk at UNC.pdf

E cient Iris Recognition System Using Relational Measures · E cient Iris Recognition System Using Relational Measures ... One of the major hurdles ... E cient Iris Recognition System

Sufficient Dimension Reduction using Support Vector Machine …ucakche/agdank/agdank2013... · 2013-11-29 · Su cient Dimension Reduction using Support Vector Machine and it’s

E cient Occlusive Components Analysis

Cultural Dimension Theory. What is cultural dimension theory?

Data sheet Redline - zep.solar · Temperature coëffi cient voltage 0.05%/° C Temperature coëffi cient currency -0.32%/°C Temperature coëffi cient power -0.39 %/°C General data

E–cient Comb Elliptic Curve Multiplication Methods ... · PDF fileE–cient Comb Elliptic Curve Multiplication Methods Resistant to Power ... E–cient Comb Elliptic Curve Multiplication

ldr: An R Software Package for Likelihood-Based Sufficient ...kofi/reprints/LDR.pdfldr: An R Software Package for Likelihood-Based Su cient Dimension Reduction Ko P. Adragni University

E cient Transformers: A Survey

Chapter 4 E cient Likelihood Estimation and Related Tests · Chapter 4 E cient Likelihood Estimation and Related Tests 1 Maximum likelihood and e cient likelihood estimation We begin

Simple cient cological - GIORGIO BORMAC

Suâ€“cient Dimension Reduction via Bayesian Mixture Modeling

Mesh adaption for e cient multiscale implementation of One ... · been developed [10]. For a ordable application to a wide scale range, ODT is formulated in one spatial dimension