Variable Selection for High-Dimensional Data with Spatial...
Transcript of Variable Selection for High-Dimensional Data with Spatial...
Variable Selection for High-Dimensional Datawith Spatial-Temporal Effects and Extensions to
Multitask Regression and MulticategoryClassification
Tong Tong Wu
Department of Epidemiology and BiostatisticsSchool of Public Health
University of Maryland, College Park
October 31, 2009
1 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Outline
1 Overview and Introduction
2 Variable Selection for High-Dimensional Data withSpatial-Temporal Effects
Models for Time-Course DataPenalties for Structured Predictor SpacesCoordinate Descent Algorithms
3 Extensions to Multicategory Classification and MultitaskRegression
4 Discussion
2 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Overview and Introduction
3 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Figure 1: Overview of the proposed research
4 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Objectives: NSA Mathematical Science Program
1 Development of new variable selection method forhigh-dimensional data with spatial-temporal effects inregression
1 Models for time-course data (temporal effects): AR(q), kernelcombined regression models
2 New penalty functions for structured predictor spaces (spatialeffects): linear chain, undirected graph
3 Fast and stable algorithms for high-dimensionalspatial-temporal data
4 Asymptotic properties5 Finite sample properties by numerical experiments
2 Extensions to multicategory classification and multitaskregression
3 Applications in biology: pancreatic cancer pathways, cancersubtype classification
5 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Variable Selection
Design of effective and efficient tools to extract scientificinsights from these massive and noisy high-throughput data
Two approaches for dimension reduction: feature selectionand feature extraction
Selection of a possible best subset from the original variablesto build robust learning models
Goals of variable selection
Alleviating the effect of the curse of dimensionalityAcquiring better understanding about dataEnhancing generalization capability and predictionImproving stabilityAvoiding bias in hypothesis tests during or after variableselectionRetaining scientific interpretabilitySpeeding up learning process
6 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Lasso Penalized Regression
Lasso (Least absolute shrinkage and selection operator) penalizedregression (Tibshirani 1996) can be phrased as
min f (θ) = g(θ) + λ
p∑j=1
|βj |
g(θ): loss function
g(θ) =∑n
i=1 |yi − µ−∑p
j=1 xijβj | for `1 regression
g(θ) = 12
∑ni=1(yi − µ−
∑pj=1 xijβj)
2 for `2 regression
θ = (µ, β1, . . . , βp)t : parameter vector
λ∑p
j=1 |βj |: lasso penalty
λ: positive tuning constant that determines the strength of thelasso penalty
7 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Lasso Penalty λ∑p
j=1 |βj |
Shrinking each βj toward the origin
Discouraging models with large numbers of marginallyrelevant predictors
More effective in deleting irrelevant predictors than a ridgepenalty λ
∑pj=1 β
2j because |b| > b2 for small b
8 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Lasso vs. Ridge
In `2 regression:Pn
i=1(yi − µ−Pp
j=1 xijβj)2 = (β − β0)TXTX (β − β0) + c
9 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Lasso vs. Ridge
In `2 regression:Pn
i=1(yi − µ−Pp
j=1 xijβj)2 = (β − β0)TXTX (β − β0) + c
9 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Variable Selection for High-Dimensional Datawith Spatial-Temporal Effects
10 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Spatial-Temporal Models
Idea: Minimize a regularized objective function
min f (θ) = g(θ) + λP(β)
g(θ): loss function for time-course model (temporal effects)
P(β): penalty function for selecting structured predictors(spatial effects)
11 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Regression Models for Time-Course Data
Regression model for subject i at time ts , s = 1, . . . ,T
yi (ts) = µ(ts) + x ti β(ts) + εi (ts)
where
yi (t): responses at time txi = (xi1, . . . , xip)t : fixed p predictorsβ(t) = (β1(t), . . . , βp(t))t : time-varying coefficientsµ(t): time-varying intercept
12 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Regression Models for Time-Course Data
Regression model for subject i :
0B@ yi (t1)...
yi (tT )
1CA =
0B@ µ(t1)...
µ(tT )
1CA +
0B@ β1(t1) . . . βp(t1)...
β1(tT ) . . . βp(tT )
1CA0B@ xi1
...xip
1CA
+
0B@ εi (t1)...
εi (tT )
1CAyi ≡ µ+ Bxi + εi
where B is a T × p coefficient matrix whose sth row is β(ts)t
Linear system for n observations:
0B@ y t1...y tn
1CA = 1µt +
0B@ x ti...x tn
1CA0B@ β1(t1) . . . βp(t1)
...β1(tT ) . . . βp(tT )
1CAt
+
0B@ εt1...εtn
1CA13 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Regression Models for Time-Course Data
Stationary process
E [ε(t)] = 0 and Cov(εt , εt+s) = Cov(εt , εt−s) = σ2ρs
where ρs is the error autocorrelation at lag s
Error covariance matrix Σ for each subject
Σ = σ2
1 ρ1 ρ2 . . . ρT−1
ρ1 1 ρ1 . . . ρT−2
ρ2 ρ1 1 . . . ρT−3...
. . ....
ρn−1 ρn−2 ρn−3 . . . 1
= σ2V
Loglikelihood
−n
2log(det Σ)− 1
2
n∑i=1
(yi − µ− Bxi )tΣ−1(yi − µ− Bxi )
14 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
AR(q) Model
Autocorrelated regression errors εt
εt = φ1εt−1 + φ2εt−2 + · · ·+ φqεt−q + vt ,
where the “random shocks” vt ∼ N(0, σ2v )
Given that first q observations for each subject are fixed, theconditional loglikelihood of the remaining T − q observations(y(tq+1), . . . , y(tT ))
−n
2log σ
2v −
1
2σ2v
nXi=1
TXs=q+1
nyi (ts )− µ(ts )− xt
i β(ts )−qX
j=1
φj
hyi (ts−j )− µ(ts )− xt
i β(ts−j )io2
| {z }Minimizing the least-squares type loss function
g1(θ) =nX
i=1
TXs=q+1
nyi (ts)− µ(ts)− x t
i β(ts)−qX
j=1
φj
hyi (ts−j )− µ(ts)− x t
i β(ts−j )io2
15 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Kernel Smoothed Regression Model
Minimizing the kernel smoothed loss function at time ts forθ(ts)
θ(ts) = arg minn∑
i=1
T∑r=1
w(ts , tr ){
yi (tr )− µ(ts)− x ti β(ts)
}2
where∑T
r=1 w(ts , tr ) = 1
Kernel smoothed response
yi (ts) =T∑
r=1
w(ts , tr )yi (tr )
Minimizing the least-squares type loss function
g2(θ) =n∑
i=1
T∑s=1
{yi (ts)− µ(ts)− x t
i β(ts)}2
16 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Structured Predictor Spaces
Grouped predictors (Wu and Lange 2008; Yuan and Lin 2006; Zhao
et al. 2009; Zou and Hastie 2005)
Example: in gene microarray experiments, genes cansometimes be grouped into biochemical pathways subject togenetic coregulation and their expression levels are expectedto be highly correlated
Order restrictions (Efron et al. 2004; Turlach et al. 2005; Wu et al.
2009; Yuan et al. 2007)
Example: a higher order term should be selected only whenthe corresponding lower order terms are present in the model
Ordered predictors (Tibshirani et al. 2005; Li and Zhang 2008)
Example: genes or SNPs are located on chromosomes as asequence
17 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Structured Predictor Spaces
Important to incorporate the structural information into themodel building process when the covariate space is highlystructured
Two structures of predictor spaces
Linear chain: sequencing and mapping information of genomesUndirected graph: networking information
18 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Linear Chain Structure
A genome map describes the order of genes or other markersand the spacing between them
The existing works incorporating linear chain structures onlyconsider the sequencing of markers (e.g. fused LASSO(Tibshirani et al. 2005), Bayesian variable selection (Li andZhang 2008; Tai and Pan 2009))
The distance between two markers, which providesinformation on the recombination of the two markers andtheir participation together in relevant biological processes,however, has not been taken into consideration
The sequencing and mapping information is widely availablefor public use in the internet
19 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Penalty Function for Linear Chain Structure
Adaptive fused lasso penalty
P1(β) = γ
p∑j=2
1
|mj −mj−1||βj − βj−1| ≡
p∑j=2
γj |βj − βj−1|,
where mj is the map distance of marker j , andγj = γ/|mj −mj−1| is the adaptive weight
20 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Undirected Graphical Structure
Example: Nodes in the graph represent genes and edgesrepresent interactions between genes
Bayesian approach: Li and Zhang (2008); Tai and Pan (2009)
Frequentist approach: Li and Li (2008); Pan et al. (2009)
Employing a different penalty function to incorporate themapping information for a more straightforward biologicalreason
21 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Figure 2: Gene regulatory networks. Each node is a single gene, and anytwo genes are connected if in the same pathway (reproduced withpermission from Sinauer Associates, Inc. (Gifford et al. 2006))
22 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Penalty Function for Undirected Graphical Structure
Consider a weighted graph G = (V ,E ,W ), where
V : the set of nodes corresponding to the p predictorsE = {u ∼ v}: the set of edges indicating whether thepredictors u and v are linked in the networkW : the weights of the edges with w(u, v) denoting the weightof edge e = (u ∼ v)
In this study, the edge weights represent the map distancesbetween nodes
Penalty for graphical structure:
P2(β) = γ∑
i
∑j∼i
w(i , j)|βi − βj | ≡∑
i
∑j∼i
γij |βi − βj |
where γij = γw(i , j) = γ/|mi −mj |
23 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Spatial-Temporal Models
The spatial-temporal model therefore can be implemented byminimizing the objective function
f (θ) = gk(θ) + λ
p∑j=1
|βj |+ γlPl(β), k, l = 1, 2 (1)
The two `1 penalties encourage sparsity in the coefficients andtheir differences, respectively
The objective function is convex but nondifferentiable, with alarge number of predictors
Use the cyclic coordinate descent algorithm highlighted in Wuand Lange (2008) and Friedman et al. (2007)
Allow response to be related to a different subset of predictorsat each time point
24 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Literature Review: Computational Algorithms
The inefficiency of the original lasso limited its impact on statisticalpractice (Madigan and Ridgeway 2004), even with the more recentalgorithms (Osborne et al. 2000) due to their complexity
Efron et al. (2004): LARS (Least Angle Regression, Lasso,and Forward Stagewise)
Fu (1998) and Daubechies et al. (2004): coordinate descentfor lasso penalized `2 regression, but no follow up forunderdetermined problems
Wang et al. (2006): solution path for penalized `1 regressionusing linear programming
Park and Hastie (2006): solution path for penalized `2
regression and generalized linear models
Wu and Lange (2008) and Friedman et al. (2007):independent and concurrent work on coordinate descent!
25 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Literature Review: Computational Algorithms
The inefficiency of the original lasso limited its impact on statisticalpractice (Madigan and Ridgeway 2004), even with the more recentalgorithms (Osborne et al. 2000) due to their complexity
Efron et al. (2004): LARS (Least Angle Regression, Lasso,and Forward Stagewise)
Fu (1998) and Daubechies et al. (2004): coordinate descentfor lasso penalized `2 regression, but no follow up forunderdetermined problems
Wang et al. (2006): solution path for penalized `1 regressionusing linear programming
Park and Hastie (2006): solution path for penalized `2
regression and generalized linear models
Wu and Lange (2008) and Friedman et al. (2007):independent and concurrent work on coordinate descent!
25 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Coordinate Descent Algorithms
The difficulties of minimizing the objective function (1) lie inthree facts
1 Nondifferentiability due to the `1 penalties2 p � n3 Impossibility of matrix operations for huge p
Coordinate descent: simplicity, speed, and stability (Friedman
et al. 2007; Wu and Lange 2008; Wu et al. 2009)
Decisive advantages in dealing with data sets in our setting
Standard version of coordinate descent: cycling through theparameters and updating each in turn
26 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Coordinate Descent Algorithms
Directional derivatives along forward or backward directions, e.g.ek is the coordinate direction along which βk varies
Forward directional derivative of the objective function
dekf (θ) = lim
τ↓0
f (θ + τek)− f (θ)
τ= dek
g(θ) +
{λ βk ≥ 0−λ βk < 0
Backward directional derivative of the objective function
d−ekf (θ) = lim
τ↓0
f (θ − τek)− f (θ)
τ= d−ek
g(θ) +
{−λ βk > 0λ βk ≤ 0
27 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Procedures
1 Start all parameters at the origin2 Evaluate both dek
f (θ) and d−ekf (θ) for βk
If both are nonnegative ⇒ skip the update for βk
If either directional derivative is negative ⇒ solve for theminimum in that directionBoth directional derivatives cannot be negative because thiscontradicts the convexity of f (θ)s
3 After the direction is chosen, Newton’s method can be usedfor updating the parameter
4 Check if the objective function is driven downhill
5 Halve the step size if the descent property fails
28 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Underdetermined Problems
p � n with just a few relevant predictors
Fast speed of cyclic coordinate descent
Most updates are skipped and the corresponding parametersnever budge from their starting values of 0Complete absence of matrix operations
Numerical stability from the descent property of each update
29 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Extensions to Multicategory Classificationand Multitask Regression
30 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Overview of Vertex Discriminant Analysis (VDA)
A new method of supervised learning
Based on linear discrimination among the vertices of a regularsimplex in Euclidean space
Each vertex represents a different category
A regression problem involving ε-insensitive residuals andpenalties on the coefficients of the linear predictors
Ridge penalty (VDAR)Lasso penalty (VDAL)Euclidean penalty (VDAE, VDALE)
Minimizing the objective function by a primal MM algorithmor a coordinate descent algorithm
Competitive in statistical accuracy and computational speed
31 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Regularized ε-insensitive Loss Function for VDA
VDA (Vertex Discriminant Analysis) discriminates among kcategories by minimizing ε-insensitive loss plus penalties
f (θ) =n∑
i=1
g(yi − Bxi − µ) + λL
k−1∑j=1
p∑l=1
|βjl |+ λE
p∑l=1
‖β(l)‖2
where
yi ∈ Rk−1 is the vertex assignment for case i
β(l) is the lth column of a k × p matrix B of regressioncoefficients
µ is a (k − 1)× 1 column vector of intercepts
g(v): modified ε-insensitive loss
g(v) =
‖v‖2 − ε if ‖v‖2 ≥ ε+ δ(‖v‖2−ε+δ)3(3δ−‖v‖2+ε)
16δ3 if ‖v‖2 ∈ (ε− δ, ε+ δ)0 if ‖v‖2 ≤ ε− δ
32 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−1.
5−
1.0
−0.
50.
00.
51.
01.
5
Indicator Vertices for 3 Classes
z1
z2
●●●
●
●●●
●
●
Figure 3: Plot of the indicator vertices for 3 classes, with a ε-radius circle aroundeach vertex
33 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
VDA with Spatial Effects
Spatial information can be incorporated into classification byadding the network-constraint penalty to the regularizedobjective function
f (θ) =n∑
i=1
g(yi − Bxi − µ) + λL
k−1∑j=1
p∑l=1
|βjl |+ λE
p∑l=1
‖β(l)‖2 + γlPl(A)
l = 1, 2
Multiresponse regression problem: yi ∈ Rk−1 for k categories
Cyclic coordinate descent algorithm
34 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Multitask Regression with Spatial Effects
Learning multiple related tasks from data simultaneously canbe advantageous to learning these tasks independently(Bakker and Heskes 2003; Caruana 1997; Evgeniou and Pontil2004)
Main task is trained with other related problems at the sametime using a shared representation
Example: 24 soil samples with measurements of 14 quantitiesat 770 different wavelengths (CMIS/CSIRO)
35 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Multitask Regression with Spatial Effects
Assume we have k responses and share the same set ofcovariates
For subject i , yi = (yi1, . . . , yik)t and xi = (xi1, . . . , xip)t
Further assume each task comes from a different regressionmodel of x ’s
Multiresponse regression model y t1...
y tn
= 1µt +
x ti...x tn
β11 . . . β1p
. . .
βk1 . . . βkp
t
36 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Discussion
37 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Connections to MCAI 2.0
Applications in biology1 Pancreatic cancer pathways2 Cancer subtype classification
Model verification/checking by MACI 2.0
38 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Thank you very much!
39 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Bakker, B. and Heskes, T. (2003), “Task clustering and gating forBayesian multitask learning,” Journal of Machine LearningResearch, 4, 83–99.
Caruana, R. (1997), “Multitask Learning,” Machine Learning, 28,41–75.
Daubechies, I., Defrise, M., and De Mol, C. (2004), “An iterativethresholding algorithm for linear inverse problems with a sparsityconstraint,” Communications on Pure and Applied Mathematics,57, 1413–1457.
Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004),“Least angle regression,” Ann Stat, 32, 407–499.
Evgeniou, T. and Pontil, M. (2004), “Regularized multitasklearning,” Proceedings of the tenth ACM SIGKDD internationalconference on Knowledge discovery and data mining, 109–117.
Friedman, J., Hastie, T., Hofling, H., and Tibshirani, R. (2007),“Pathwise coordinate optimization,” Anns Appl Stat.
Fu, W. J. (1998), “Penalized regressions: the bridge versus thelasso,” J Comp and Graph Stat, 7, 397–416.
39 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Gifford, M. L., Gutierrez, R. A., and Coruzzi, G. M. (2006), “Essay12.2: Modeling the Virtual Plant: A Systems Approach toNitrogen-Regulatory Gene Networks. Plant Physiology, FourthEdition Online,” online.
Li, C. and Li, H. (2008), “Network-constrained regularization andvariable selection for analysis of genomic data,” Bioinformatics,24, 1175–1182.
Li, F. and Zhang, N. R. (2008), “Bayesian Variable Selection inStructured High-Dimensional Covariate Spaces with Applicationsin Genomics,” manuscript.
Madigan, D. and Ridgeway, G. (2004), “Discussion of “least angleregression” by Efron et al.” Annals of Statistics, 32, 465–469.
Osborne, M. R., Presnell, B., and Turlach, B. A. (2000), “A newapproach to variable selection in least squares problems,” IMAJournal of Numerical Analysis, 20, 389–403.
Pan, W., Xie, B., and Shen, X. (2009), “Incorporating PredictorNetwork in Penalized Regression with Application to MicroarrayData,” Biometrics, online publication.
39 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Park, M. Y. and Hastie, T. (2006), “Penalized logistic regressionfor detecting gene interactions,” Tech. Rep. 2006-15,Department of Statistics, Stanford University.
Tai, F. and Pan, W. (2009), “Bayesian Variable Selection inRegression with Networked Predictors,” manuscript.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K.(2005), “Sparsity and smoothness via the fused lasso,” J. Roy.Stat. Soc., Series B, 67, 91–108.
Turlach, B. A., Venables, W. N., and Wright, S. J. (2005),“Simultaneous variable selection,” Technometrics, 47, 349–363.
Wang, L., Gordon, M. D., and Zhu, J. (2006), “Regularized leastabsolute deviations regression and an efficient algorithm forparameter tuning,” Proceedings of the Sixth InternationalConference on Data Mining (ICDM’06). IEEE Computer Society,690–700.
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K.(2009), “Genomewide association analysis by lasso penalizedlogistic regression,” Bioinformatics, 25, 714–721.
39 / 39
Overview and Introduction Variable Selection for High-Dimensional Data with Spatial-Temporal Effects Extensions to Multicategory Classification and Multitask Regression Discussion References
Wu, T. T. and Lange, K. (2008), “Coordinate descent algorithmsfor lasso penalized regression,” Ann. Appl. Stat., 2, 224–244.
Yuan, M., Joseph, R., and Lin, Y. (2007), “An efficient variableselection approach for analyzing designed experiments,”Technometrics, 49, 430–439.
Yuan, M. and Lin, Y. (2006), “Model selection and estimation inregression with grouped variables,” J Roy Stat Soc, Series B, 68,49–67.
Zhao, P., Rocha, G., and Yu, B. (2009), “The composite absolutepenalties family for grouped and hierarchical variable selection,”Annals of Statistics, 37, 3468–3497.
Zou, H. and Hastie, T. (2005), “Regularization and variableselection via the elastic net,” J Roy Stat Soc, Series B, 67,301–320.
39 / 39