Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics...
Transcript of Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics...
![Page 1: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/1.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning: Theory, Algorithms, and Applications
Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2
1 Computer Science and Engineering, Arizona State University, AZ 2 Center for Evolutionary Medicine Informatics, Biodesign Institute, Arizona State University, AZ
3 GE Global Research, NY
SDM 2012 Tutorial
![Page 2: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/2.jpg)
Center for Evolutionary Medicine and Informatics
Tutorial Goals
• Understand the basic concepts in multi-task learning
• Understand different approaches to model task relatedness
• Get familiar with different types of multi-task learning techniques
• Introduce the multi-task learning package: MALSAR
2
![Page 3: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/3.jpg)
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Disease Progression
– Dealing with Missing Value in Multiple Sources
– Drosophila Image Analysis
• Part IV: MALSAR: Multi-task Learning via Structural Regularization Package
• Future directions
3
![Page 4: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/4.jpg)
Center for Evolutionary Medicine and Informatics
Multiple Tasks o
4
School 1 - Alverno High School
School 138 - Jefferson Intermediate School
School 139 - Rosemead High School
Student
id
Birth
year
Previous
score
…
School
ranking
…
72981 1985 95 … 83% …
Student
id
Birth
year
Previous
score
…
School
ranking
…
31256 1986 87 … 72% …
Student
id
Birth
year
Previous
score
…
School
ranking
…
12381 1986 83 … 77% …
Exam
score
?
Exam
score
?
Exam
score
?
![Page 5: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/5.jpg)
Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks o
5
Student
id
Birth
year
Previous
score
School
ranking
…
72981 1985 95 83% …
31256 1986 87 72% …
12381 1987 83 77% …
… … … … …
Exam
Score
?
?
?
…
21901 1986 87 72% … ?
Students with same
Features but different
Exam Scores
![Page 6: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/6.jpg)
Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks o
6
Student
id
Birth
year
Previous
score
School
ranking
…
72981 1985 95 83% …
Student
id
Birth
year
Previous
score
School
ranking
…
31256 1986 87 72% …
Student
id
Birth
year
Previous
score
School
ranking
…
12381 1986 83 77% …
Exam
Score
?
Exam
Score
?
Exam
Score
?
School 1 - Alverno High School
School 138 - Jefferson Intermediate School
School 139 - Rosemead High School
Excellent
Excellent
Excellent
![Page 7: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/7.jpg)
Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks o
7
Student
id
Birth
year
Previous
score
School
ranking
…
72981 1985 95 83% …
Student
id
Birth
year
Previous
score
School
ranking
…
31256 1986 87 72% …
Student
id
Birth
year
Previous
score
School
ranking
…
12381 1986 83 77% …
Exam
Score
?
Exam
Score
?
School 1 - Alverno High School
School 138 - Jefferson Intermediate School
School 139 - Rosemead High School
Exam
Score
?
![Page 8: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/8.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning
• Multi-task Learning is different from single task learning in the training (induction) process.
• Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness.
8
![Page 9: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/9.jpg)
Center for Evolutionary Medicine and Informatics
Learning Methods o
–
–
–
o
–
–
–
o
–
–
–
o
–
–
![Page 10: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/10.jpg)
Center for Evolutionary Medicine and Informatics
Web Pages Categorization • Classify documents into
categories
• The classification of each category is a task
• The tasks of predicting different categories may be latently related [Chen et.al. ICML 09]
10
Health
Travel
World
Politics
US...
Category Classifiers
Models of different categories are
latently related
Classifiers’ Parameters
![Page 11: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/11.jpg)
Center for Evolutionary Medicine and Informatics
Collaborative Ordinal Regression
• The preference prediction of each user can be modeled using ordinal regression
• Some users have similar tastes and their predictions may also have similarities
• Simultaneously perform multiple prediction to use such similarity information [Yu et. al. NIPS 06]
11
![Page 12: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/12.jpg)
Center for Evolutionary Medicine and Informatics
MTL for HIV Therapy Screening • Hundreds of possible combinations of drugs, some of which
use similar biochemical mechanisms • The sample available for each combination is limited. • For a patient, the prediction of using one combination is a
task • Use the similarity information by simultaneously inference
multiple tasks
12
ETV
ABC
AZT
D4T
DDI
FTC
...
Patient Treatment RecordDrug
AZT
ETV
FTC
?
Bickel et al. ICML 08
![Page 13: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/13.jpg)
Center for Evolutionary Medicine and Informatics
How to capture shared structures?
13
All tasks are relatedAssumption:
The relationship is not symmetric
Assumption:Tasks have group structures
Assumption:
There are outlier tasksAssumption:
![Page 14: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/14.jpg)
Center for Evolutionary Medicine and Informatics
How to capture shared structures?
14
All tasks are relatedAssumption:
Methods • Mean-regularized MTL • Joint feature learning • Low rank regularized
MTL • alternating structural
optimization (ASO) • Shared Parameter
Gaussian Process
![Page 15: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/15.jpg)
Center for Evolutionary Medicine and Informatics
How to capture shared structures?
15
Methods • Clustered MTL • Tree MTL • Network MTL
Tasks have group structuresAssumption:
Tasks have tree structuresAssumption:
a cb
Models
Tasks have graph/network structuresAssumption:
![Page 16: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/16.jpg)
Center for Evolutionary Medicine and Informatics
How to capture shared structures?
16
Methods • Robust MTL
There are outlier tasksAssumption:
![Page 17: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/17.jpg)
Center for Evolutionary Medicine and Informatics
How to capture shared structures?
17
Methods • Asymmetric MTL
The relationship is not symmetric
Assumption:
![Page 18: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/18.jpg)
Center for Evolutionary Medicine and Informatics
All tasks are related
18
All tasks are relatedAssumption:
• Shared Hidden Node in Neural Network
• Shared Parameter Gaussian Process
• Regularization-based MTL • Mean-regularized MTL • Joint feature learning • Low rank regularized
MTL • alternating structural
optimization
![Page 19: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/19.jpg)
Center for Evolutionary Medicine and Informatics
Sharing Hidden Nodes in Neural Network
19
...
Task 1 Task 2 Task 3 Task 4
Sharing Nodes
Inputs
• Neural network has been well studied for learning multiple related tasks for improved generalization performance.
• A set of hidden units are shared among multiple tasks for improved generalization (Caruana ML 97).
![Page 20: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/20.jpg)
Center for Evolutionary Medicine and Informatics
Mortality Rank
• Future lab results are used as extra outputs to bias learning for the main risk prediction task
20
![Page 21: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/21.jpg)
Center for Evolutionary Medicine and Informatics
21
Shared Parameter Gaussian Process • In (Lawrence and Platt, ICML 04) an efficient method is proposed to
learn the parameters (of a shared covariance function) for the Gaussian process.
• adopts the multi-task informative vector machine (IVM) to greedily select the most informative examples from the separate tasks and hence alleviate the computation cost.
![Page 22: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/22.jpg)
Center for Evolutionary Medicine and Informatics
Common Latent Representation in Nonparametric Bayesian Models
• Multi-Task Infinite Latent Support Vector Machines (Zhu, J. et al NIPS 11)
22
![Page 23: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/23.jpg)
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-task Learning
• All tasks are shared
– regularized MTL, joint feature learning, low rank MTL, ASO
• Tasks form groups
– clustered MTL, Network/Tree MTL
• Learning with outlier tasks: robust MTL
• Asymmetric MTL
23
![Page 24: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/24.jpg)
Center for Evolutionary Medicine and Informatics
Regularized Multi-Task Learning
• Assume all tasks are related in that the models of all tasks come from a particular distribution (Evgeniou & Pontil, KDD 04)
24
mean
Task
![Page 25: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/25.jpg)
Center for Evolutionary Medicine and Informatics
Regularized Multi-Task Learning
• Assumption: task parameter vectors of all tasks are close to each other.
– Advantage: smooth objective, easy to optimize
– Disadvantage: may not hold in real applications.
25
mean
Task
![Page 26: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/26.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning
• One way to capture the task relatedness from multiple related tasks is to constrain all models to share a common set of features.
• For example, in school data, the scores from different schools may be determined by a similar set of features.
26
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Feature 7
Feature 8
Feature 9
Feature 10
Task 1Task 2
Task 3Task 4
Task 5
![Page 27: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/27.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning
• Using group sparsity: ℓ1/ℓ2-norm regularization
27
...
Y
=
n×k... × ...+
W*X Z
n×p p×k n×k
Sample 1
......
Sample 2
Sample 3
Sample n-2
Sample n-1
Sample n
Task 1Task 2
Task 3Task k
Task 1Task 2
Task 3Task k
Input Model NoiseOutput
min𝑊
1
2𝑋𝑊 − 𝑌 𝐹
2 + 𝜆 𝒘𝑖 2
𝑝
𝑖=1
![Page 28: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/28.jpg)
Center for Evolutionary Medicine and Informatics
Joint Feature Selection in Disease Progression
28
Pat
ien
t Sa
mp
le: n
Feature Space: d
Task (Time Point): t
X ≈
Baseline MRI Volum
e
Baseline MRI Area
Baseline MRI Surface
Baseline Labtest
Baseline Cognitive Test
Other features
Task (Time Point): t
06 Month
12 Month
24 Month
36 Month
Pat
ien
t Sa
mp
le: n
Feat
ure
Siz
e: d
06 Month
12 Month
24 Month
36 Month
X YW
Removed Feature
Removed Feature
Removed Feature
Re
mo
ved
Fe
atu
re
Re
mo
ved
Fe
atu
re
Re
mo
ved
Fe
atu
re
• The progression of disease is assumed to involve the same set of features at different time points [Zhou et.al. KDD 11].
![Page 29: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/29.jpg)
Center for Evolutionary Medicine and Informatics
Joint Feature Selection in Disease Progression
• In predicting different cognitive scores, there may be shared features from different data sources.
• Multi-modal multi-task learning [Zhang, D. et.al. NeuroImage 12]
29
![Page 30: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/30.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning – L1Lq
• More general ℓ1/ℓ𝑞-norm regularization:
30
min𝑊
1
2𝑋𝑊 − 𝑌 𝐹
2 + 𝜆 𝒘𝑖 𝑞
𝑝
𝑖=1
...
Y
=
n×k... × ...+
W*X Z
n×p p×k n×k
Sample 1
......
Sample 2
Sample 3
Sample n-2
Sample n-1
Sample n
Task 1Task 2
Task 3Task k
Task 1Task 2
Task 3Task k
Input Model NoiseOutput
![Page 31: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/31.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning – L1Lq
• The selection of q may depend on the distribution of the model:
31
𝑊~𝑁(Mean, Variance)
![Page 32: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/32.jpg)
Center for Evolutionary Medicine and Informatics
Trace-Norm Regularized MTL
32
o Capture Task Relatedness via a Shared Low-Rank Structure
trained model
training data
task 1
training
trained model
training data
task 2 …
…
A shared low-rank structure
generalization
generalization
trained model
training data
task n generalization
…
![Page 33: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/33.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
… 𝛼1 𝛼2 + ≈ ×
training data weight vector target
Task 1
… ≈ ×
Task 2
…
≈ × Task 3
=
=
=
𝛽1 𝛽1 +
𝛾1 𝛾2 +
basis vector basis vector
![Page 34: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/34.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
34
=
=
𝛼1 𝛼2
𝛽1 𝛽2 𝛾1 𝛾2
𝜏 11 ⋯ 𝜏 1𝑚 ⋮ ⋱ ⋮
𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚 … …
×
×
A low rank structure Basis vectors Coefficients
m tasks p basis vectors 𝑚 > 𝑝
![Page 35: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/35.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
• Rank minimization formulation
– min𝑊
Loss(𝑊) + 𝜆 × Rank(𝑊)
– Rank minimization is NP-Hard
• Convex relaxation: trace norm minimization
– min𝑊
Loss(𝑊) + 𝜆 × 𝑊 ∗
– Trace-norm minimization is the convex envelope of the rank minimization (Fazel et al., 2001).
35
![Page 36: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/36.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL o Evaluation on the School data1:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)
36
1http://ttic.uchicago.edu/~argyriou/code/
N−MSE = mean squared error
variance (target)
1 2 3 4 5 6 7 80.7
0.75
0.8
0.85
0.9
0.95
1
1.05
Index of Training Ratio
N-M
SE
Ridge Regression
Lasso
Trace Norm
Performance measure:
The Low-Rank Structure (induced via Trace Norm) leads to the smallest N-MSE.
![Page 37: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/37.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
Rough shape of the faces
o Evaluation on the Face data1:
• Trace Norm (low-rank structure)
Plot of the 1st weight vector
![Page 38: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/38.jpg)
Center for Evolutionary Medicine and Informatics
A shared Low-Rank Structure for MTL
… ≈ ×
training data weight vector target
the i-th task
38
+ specific for each task shared among tasks
vi wi Ө
weight vector ui =Өvi + wi
o Learning from the i-th task (Ando et. al.’05, Chen et. al.’09)
ӨTӨ = Ip
![Page 39: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/39.jpg)
Center for Evolutionary Medicine and Informatics
A shared Low-Rank Structure for MTL
39
+ …
+
v2 w2 Ө
+
vm wm Ө
+
v1 w1 Ө
u1 u2 um
…
…
…
a shared low rank structure a task-specific structure
o Learning from multiple tasks
… =
transformation matrix
u1, u2, … , um = Ө v1, v2, … , vm + w1, w2, … , wm
![Page 40: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/40.jpg)
Center for Evolutionary Medicine and Informatics
Empirical Loss
… ≈ ×
Xi
40
+
o Learning from the i-th task
o Empirical loss on the i-th task, for example,
ℒi Xi Өvi + wi , yi = Xi Өvi + wi − yi 2
ui =Өvi + wi yi
![Page 41: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/41.jpg)
Center for Evolutionary Medicine and Informatics
iASO Formulation
41
o iASO formulation
minimizeӨ,{vi,wi}
ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi2 + 𝛽 wi
2
𝑚
𝑖=1
subject to ӨTӨ = I
• control both model complexity and task relatedness
• subsume ASO (Ando et al.’05) and SVM as special cases
• naturally lead to a convex relaxation (Chen et al., 10)
• iASO and cASO are equivalent under certain conditions
![Page 42: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/42.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Clustered Structure
44
• Most MTL techniques assume all tasks are related
• Not true in many applications • Clustered multi-task learning
assumes the tasks have group
structures the models of tasks from the
same group are closer to each other than those from a different group
Tasks have group structuresAssumption:
e.g. tasks in the yellow group are predictions of heart related diseases and in the blue group are brain related diseases.
![Page 43: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/43.jpg)
Center for Evolutionary Medicine and Informatics
Task Clustering in Neural Network
• Bakker and Heskes JMLR 2003
45
![Page 44: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/44.jpg)
Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning
• Use regularization to capture clustered structures.
46
Training Data X ≈
Training Data X ≈
...
Clustered Models
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 1
Cluster 2
Cluster k-1
Cluster k
![Page 45: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/45.jpg)
Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning • Capture structures by minimizing sum-
of-square error (SSE) in K-means clustering:
47
Ij index set of jth cluster
min𝐼
𝑤𝑣 − 𝑤 𝑗 2
2
𝑣∈𝐼𝑗
𝑘
𝑗=1
min𝐹
tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)
𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
Clustered Models
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 1
Cluster 2
Cluster k-1
Cluster k
m tasks
task number m < cluster number k
![Page 46: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/46.jpg)
Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning
• Directly minimizing SSE is hard because of the non-linear constraint on F:
48
Clustered Models
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 1
Cluster 2
Cluster k-1
Cluster k
m tasks
task number m < cluster number k
min𝐹
tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)
𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
min𝐹:𝐹𝑇𝐹=𝐼𝑘
tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)
Zha et. al. 2001 NIPS
![Page 47: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/47.jpg)
Center for Evolutionary Medicine and Informatics
Improves generalization performance
capture cluster structures
Cluster 1
Cluster 2
Cluster k-1
Cluster k
Clustered Multi-Task Learning
• Clustered multi-task learning (CMLT) formulation [Zhou et. al. NIPS 2011]
49
min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘
Loss W + 𝛼 tr 𝑊𝑇𝑊 − tr 𝐹𝑇𝑊𝑇𝑊𝐹 + 𝛽 tr 𝑊𝑇𝑊
![Page 48: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/48.jpg)
Center for Evolutionary Medicine and Informatics
Convex Clustered Multi-Task Learning
50
min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘
Loss W + 𝛼 tr 𝑊𝑇𝑊 − tr 𝐹𝑇𝑊𝑇𝑊𝐹 + 𝛽 tr 𝑊𝑇𝑊
min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘
Loss W + 𝛼𝜂(1 + 𝜂)tr 𝑊 𝜂𝐼 + 𝐹𝐹𝑇 −1𝑊𝑇
Chen et al KDD 2009
Jacob et al NIPS 2009
Zhou et al NIPS 2010
min𝑊,𝑀
Loss W + 𝛼𝜂(1 + 𝜂)tr 𝑊 𝜂𝐼 + 𝑀 −1𝑊𝑇
subject to: tr 𝑀 = 𝑘, 𝑀 ≼ 𝐼, 𝑀 ∈ 𝑆+𝑚
![Page 49: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/49.jpg)
Center for Evolutionary Medicine and Informatics
Convex Clustered Multi-Task Learning
51
Ground Truth
• Synthetic Study [Zhou NIPS 2011]
Mean Regularized MTL
Single Task Learning
Trace Norm Regularized MTL
Convex Relaxed CMTL
noise introduced by relaxations
Low rank can also well capture
cluster structure
![Page 50: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/50.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Tree Structures
• In some scenarios, the tasks may be equipped with tree structures:
– The tasks belong to the same node are similar to each other
– The similarity between two nodes is structured and relates to the depth of the ‘common’ tree node
52
Tasks have tree structuresAssumption:
a cb
ModelsTask a is more similar with b,
comparing to c
![Page 51: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/51.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Tree Structures
• Tree-Guided Group Lasso (Kim and Xing 2010 ICML)
53
Inp
ut
Feat
ure
s
Output (Tasks)
β1 β2 β3
Gv5={β1, β2, β3}
Gv4={β1, β2}
Gv1={β1} Gv2={β2} Gv3={β3}
Structure
min𝛽
Loss 𝛽 + 𝜆 𝑤𝑣 𝛽𝐺𝑣
𝑗
2𝑣∈𝑉𝑗
![Page 52: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/52.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• In real applications, tasks involved in MTL may have graph structures
– The two tasks are related if they are connected in a graph, i.e. the connected tasks are similar
– The similarity of two related tasks can be represented by the weight of the connecting edge.
54
Tasks have graph/network structuresAssumption:
![Page 53: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/53.jpg)
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• Graph-guided Fused Lasso (Chen et. al. UAI11)
55
ACGTTTTACTGTACAATTTACGene SNP
phenotype
Input
Output
ACGTTTTACTGTACAATTTACGene SNP
phenotype
Input
Output
Lasso
Graph-Guided Fused Lasso
min𝑊
Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty
Ω 𝑊 = 𝛾 𝜏(𝑟𝑚𝑙)
𝑒= 𝑚,𝑙 ∈𝐸
𝑤𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟𝑚𝑙 𝑤𝑗𝑙
𝐽
𝑗=1
![Page 54: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/54.jpg)
Center for Evolutionary Medicine and Informatics
Quantitative Trait Network
• Linked Edge: the corresponding two traits are highly correlated.
• Thicknesses: strength of correlation.
• Identifying SNPs that are associated with a subnetwork of clinical traits (Kim and Xing 2009).
56
![Page 55: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/55.jpg)
Center for Evolutionary Medicine and Informatics
Graph-Weighted Fused Lasso • Lasso: each phenotype represented as a circle is
independently mapped to SNPs for association
• Graph-constrained fused Lasso: consider a QTN to search for an association between a SNP and a subnetwork of traits.
• Graph-weighted fused Lasso: consider a QTN with edge weights.
57
![Page 56: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/56.jpg)
Center for Evolutionary Medicine and Informatics
Robust Multi-Task Learning o Most Existing MTL Approaches o Robust MTL Approaches
58
relevant tasks
irrelevant tasks
equally weighted
![Page 57: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/57.jpg)
Center for Evolutionary Medicine and Informatics
Incoherent Low-Rank and Sparse Structures
… ≈ ×
training data weight vector target
59
o Learning from the i-th task
+
… … … …
Low rank structure Entry-wise sparse structure
Select discriminative features for each task
Capture task relatedness
![Page 58: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/58.jpg)
Center for Evolutionary Medicine and Informatics
Incoherent Low-Rank and Sparse Structures
60
… …
Low-rank structure
… …
Entry-wise sparse structure
q1 q𝑖 q𝑚
Q 1 P ∗
p1 p𝑖 p𝑚
(Sum of singular values) (Sum of the absolute values of all entries)
P ∗ ⩽ 𝜂 𝜆 Q 1
![Page 59: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/59.jpg)
Center for Evolutionary Medicine and Informatics
ISLR Formulation
o Empirical loss on the i-th task, e.g.,
61
ℒi Xi pi + qi , yi = Xi pi + qi − yi 2
o Incoherent Sparse Low-Rank (ISLR) formulation
minimizeP,Q
ℒi Xi pi + qi , yi + 𝜆 Q 1
𝑚
𝑖=1
subject to P ∗ ⩽ 𝜂
• Convex formulation
• Decomposed sparse and low-rank structures
![Page 60: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/60.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank and Group Sparsity in MTL
… ≈ ×
training data weight vector target
62
o Learning from the i-th task
+
… … … …
Low rank structure Group sparse structure
Identify irrelevant tasks via non-zero vectors
Capture task relatedness
![Page 61: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/61.jpg)
Center for Evolutionary Medicine and Informatics
Low-Rank and Group Sparsity in MTL
63
… …
Low-rank structure
… …
Group sparse structure
𝑠1 𝑠𝑖 𝑠𝑚
𝑠𝑖 2 𝑠1 2 𝑠𝑚 2 … …
+
𝑆 1,2 𝐿 ∗
𝑙1 𝑙𝑖 𝑙𝑚
(Sum of singular values in L)
![Page 62: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/62.jpg)
Center for Evolutionary Medicine and Informatics
Robust MTL Formulation
64
minimizeL,S
ℒi Xi li + si , yi + 𝛼 L ∗ + 𝛽 S 1,2
𝑚
𝑖=1
o Robust MTL Formulation
• Capture task relatedness via a low-rank structure
• Identify irrelevant tasks via a group-sparse structure
o Empirical loss on the i-th task, e.g.,
ℒi Xi li + 𝑠 , yi = Xi li + si − yi 2
![Page 63: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/63.jpg)
Center for Evolutionary Medicine and Informatics
Performance Bound
65
1
T Xi
T 𝑙 i + 𝑠 i − 𝑓 i 2
2𝑚𝑖=1 ⩽ (1+ ε) inf
𝑙i,𝑠i
1
T Xi
T 𝑙i + 𝑠i − 𝑓 i 2
2𝑚𝑖=1 + Ф ε
α2 κ1
2 2𝑟 +
β2
κ22 𝑐
o Assumption on the existence of κ1 𝑠 and κ2 𝑞 • Training data
• Geometric structure of the coefficient matrices
o Performance Bound
with the probability of at least 1 − 𝑚𝑒−
1
2𝑡−𝑑 log 1+
𝑡
d .
![Page 64: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/64.jpg)
Center for Evolutionary Medicine and Informatics
Robust Multi-Task Feature Learning
• Simultaneously captures a common set of features among relevant tasks and identifies outlier tasks:
66
![Page 65: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/65.jpg)
Center for Evolutionary Medicine and Informatics
Robust Multi-Task Feature Learning
• Formulation:
• Algorithm:
– Accelerated Gradient Method
– Proximal Operator problems:
67
![Page 66: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/66.jpg)
Center for Evolutionary Medicine and Informatics
Robust Multi-Task Feature Learning
• Theoretical Guarantees
– With probability of at least
– With probability of
68
![Page 67: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/67.jpg)
Center for Evolutionary Medicine and Informatics
Optimization Algorithms
• Loss Function 𝑙𝑜𝑠𝑠 𝑥 – Least Squares Loss – Logistic Loss
• Convex and Smooth Penalty Ω(𝑥) – Regularized MTL
• Convex but Non-Smooth Penalty Ω(𝑥) – ℓ2,1 −Norm – Dirty MTL – Trace Norm
• Non-Convex Penalty Ω 𝑥 – Convex Relaxation – CMTL – ASO
69
Objective min 𝑓(𝑥) = 𝑙𝑜𝑠𝑠 𝑥 + 𝜆 × Ω(𝑥)
![Page 68: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/68.jpg)
Center for Evolutionary Medicine and Informatics
Optimization Algorithms
• Gradient Descent (GD)
• Accelerated Gradient Method (AGM)
– Solving Proximal Operator
70
Objective min 𝑓(𝑥) = 𝑙𝑜𝑠𝑠 𝑥 + 𝜆 × Ω(𝑥)
![Page 69: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/69.jpg)
Center for Evolutionary Medicine and Informatics
Gradient Descent
• Gradient descent is an algorithm to solve smooth optimization problems min
𝑥𝑓(𝑥):
– Repeat 𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓′ 𝑥𝑖 until convergence criterion is
met.
– 𝑓 𝑥 is continuously differentiable with Lipchitz continuous gradient L then if 𝛾𝑖 ≤ 1/𝐿 we can obtain the convergence rate of 𝑂(1/𝑁)
• Most optimization problems in MTL are non-convex.
• Can we apply gradient descent to non-smooth problems?
71
![Page 70: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/70.jpg)
Center for Evolutionary Medicine and Informatics
Gradient Descent
72
Repeat
𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓′ 𝑥𝑖
until convergence
Repeat
𝑥𝑖+1 = arg min𝑥
𝑀(𝑥𝑖 , 𝛾𝑖)
until convergence
𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1
2𝛾𝑖𝑥 − 𝑥𝑖 2
2
1st order
Taylor expansion
Model
Regularization
Smooth Objective min 𝑓(𝑥)
![Page 71: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/71.jpg)
Center for Evolutionary Medicine and Informatics
Non Smooth
Penalty 1st order
Taylor expansion
Regularization
Gradient Descent
• Using the gradient descent with composite model to solve non-smooth optimization problems.
• Convergence Rate O(1/N) 73
Objective min 𝑓(𝑥) = loss 𝑥 + 𝜆 × Ω(𝑥)
𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1
2𝛾𝑖𝑥 − 𝑥𝑖 2
2 + 𝜆 × Ω(𝑥)
Composite Model
Repeat
𝑥𝑖+1 = arg min𝑥
𝑀(𝑥𝑖 , 𝛾𝑖)
until convergence
![Page 72: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/72.jpg)
Center for Evolutionary Medicine and Informatics
Gradient Descent
74
𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1
2𝛾𝑖𝑥 − 𝑥𝑖 2
2 + 𝜆 × Ω(𝑥)
Composite Model
Repeat
𝑥𝑖+1 = arg min𝑥
𝑀(𝑥𝑖 , 𝛾𝑖)
until convergence
𝑥𝑖+1 = arg min𝑥
1
2𝑥 − 𝑣 2
2 + 𝜌 × Ω(𝑥)
𝑣 = 𝑥𝑖 − 𝛾𝑖𝑙𝑜𝑠𝑠′ 𝑥𝑖
𝜌 = 𝛾𝑖𝜆
Proximal Operator (Moreau, 1965)
![Page 73: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/73.jpg)
Center for Evolutionary Medicine and Informatics
Accelerated Gradient Method (AGM)
• A faster extension of gradient descent (Nesterov, 1983; Nemirovski, 1994; Nesterov, 2004)
75
Repeat
𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓′ 𝑥𝑖
until convergence
Repeat
𝑠𝑖 = 𝑥𝑖 + 𝛼𝑖(𝑥𝑖 − 𝑥𝑖−1) 𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓
′ 𝑥𝑖
until convergence
Xi Xi+1
Xi Xi+1
Si
![Page 74: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/74.jpg)
Center for Evolutionary Medicine and Informatics
Accelerated Gradient Method (AGM)
76
Repeat
𝑠𝑖 = 𝑥𝑖 + 𝛼𝑖(𝑥𝑖 − 𝑥𝑖−1) 𝑥𝑖+1 = arg min
𝑥𝑀(𝑠𝑖 , 𝛾𝑖)
until convergence
Xi Xi+1
Xi Xi+1
Si
𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1
2𝛾𝑖𝑥 − 𝑥𝑖 2
2 + 𝜆 × Ω(𝑥)
Composite Model
Repeat
𝑥𝑖+1 = arg min𝑥
𝑀(𝑥𝑖 , 𝛾𝑖)
until convergence
![Page 75: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/75.jpg)
Center for Evolutionary Medicine and Informatics
Optimization with Non-Convex Objectives
• In multi-task learning, optimization objectives involved may be non-convex (e.g. clustered multi-task learning).
• Directly applying convex optimization techniques may obtain suboptimal solutions.
• Convex Relaxation
– General non-convex problem: find convex envelope • Rank minimization → Trace-norm minimization
– Difference of convex (DC) problem: Convex-Concave Procedure (CCCP)[Yuille and Rangarajan NIPS 2001]
• ℓ1/ℓ0.5-regularization → Reweighted group Lasso
77
![Page 76: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/76.jpg)
Center for Evolutionary Medicine and Informatics
Difference of Convex (DC) Programming
• The objective can be written in the form:
– min𝑥
𝑓 𝑥 − 𝑔(𝑥)
– 𝑓 𝑥 and 𝑔 𝑥 are convex functions.
• We linearize 𝑔 𝑥 using the 1st order Taylor expansion at 𝑥′:
– 𝑓 𝑥 − 𝑔 𝑥 = 𝑓 𝑥 − 𝑔 𝑥′ − 𝛻𝑔 𝑥′ , 𝑥 − 𝑥′
• In every iteration of CCCP, we minimize the upper bound:
– 𝑥𝑘+1 = argmin𝑥𝑓 𝑥 − 𝛻𝑔 𝑥𝑘 , 𝑥
• The objective function is guaranteed to decrease
78
![Page 77: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/77.jpg)
Center for Evolutionary Medicine and Informatics
Case Study: Disease Progression
• Alzheimer’s Disease (AD) is
– the most common type of dementia;
– severe neurodegenerative disorder;
– definitive diagnosed only through brain biopsy or autopsy;
– clinically diagnosed by clinical/cognitive measures including MMSE and ADAS-Cog.
79
![Page 78: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/78.jpg)
Center for Evolutionary Medicine and Informatics
Modeling Disease Progression
• The prediction of cognitive scores at each time point can be modeled as a regression task.
• Motivation of using multi-task learning: the ability to explore inherent relationships among related tasks and enforce such knowledge using proper regularizations.
![Page 79: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/79.jpg)
Center for Evolutionary Medicine and Informatics
Temporal Smoothness
• Prior knowledge: the change of cognitive scores should be small for a patient. The scores should not fluctuate:
• To enforce this prior knowledge, we use regularization term to penalize the difference.
![Page 80: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/80.jpg)
Center for Evolutionary Medicine and Informatics
Temporal Group Lasso • Assumption: there is only a small subset of features related to
disease progression, shared among tasks.
• Achieve this using group sparsity:
![Page 81: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/81.jpg)
Center for Evolutionary Medicine and Informatics
Fused Sparse Group Lasso
• Goal: find temporal patterns of the biomarkers in the disease progression.
• Simultaneous feature selection via Fused Lasso:
– a common set of biomarkers for multiple time points
– specific sets of biomarkers for different time points
• Incorporate the temporal smoothness via Group Lasso.
![Page 82: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/82.jpg)
Center for Evolutionary Medicine and Informatics
Fused Sparse Group Lasso • The convex formulation:
• Non-convex formulations:
– Reduce shrinkage bias
– Closer to the optimal l0-norm
– Fewer tuning parameters
![Page 83: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/83.jpg)
Center for Evolutionary Medicine and Informatics
Performance • MTL outperforms STL
• Fused sparse group Lasso formulations achieve better performance than Temporal group Lasso
![Page 84: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/84.jpg)
Center for Evolutionary Medicine and Informatics
Performance
86
![Page 85: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/85.jpg)
Center for Evolutionary Medicine and Informatics
Longitudinal Stability Selection on ADAS-Cog • Using FSGL
• From the distribution of stability scores, we can observe temporal patters of MRI biomarkers.
![Page 86: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/86.jpg)
Center for Evolutionary Medicine and Informatics
Longitudinal Stability Selection on MMSE
• From the distribution of stability scores, we can observe temporal patters of MRI biomarkers.
![Page 87: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/87.jpg)
Center for Evolutionary Medicine and Informatics
Case Study: Missing Data in Multi-Source Learning
• In many applications, multiple data sources may suffer from a considerable amount of missing data.
• In ADNI, over half of the subjects lack CSF measurements; an independent half of the subjects do not have FDG-PET.
89
P1 P2 P3 … P114 P115 P116
PET
Subject1
Subject60
Subject61
Subject62
Subject139
Subject140
Subject141
Subject148
Subject149
Subject245
.........
...
......
......
......
......
M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5
MRI CSF
![Page 88: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/88.jpg)
Center for Evolutionary Medicine and Informatics
Challenges
• Simply removing samples with missing values will dramatically reduces the number of samples in the analysis.
• Plus, the resource and time devoted to those subjects with incomplete data are totally wasted.
• Estimating the entire chunk of missing values is very challenging.
90
![Page 89: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/89.jpg)
Center for Evolutionary Medicine and Informatics
Incomplete Multi Source Feature Learning (iMSF)
• A “row-wise” strategy
– We first partition the samples into multiple blocks, one for each combination of data sources available
– We then build one different model for each block of data
– Using multi-task techniques, all models involving a specific source are constrained to select a common set of features for that particular source
91
![Page 90: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/90.jpg)
Center for Evolutionary Medicine and Informatics
Overview of iMSF
92
MRIPET
Task ITask I
Task IITask II
Task IIITask III
Task IVTask IV
Model I
Model II
Model III
Model IV
MRI CSF
CSF
PET
![Page 91: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/91.jpg)
Center for Evolutionary Medicine and Informatics
iMSF: the Formulation
• Suppose the data set is divided into 𝑚 tasks: 𝑇𝑖 = 𝑥𝑗𝑖 , 𝑦𝑗
𝑖 ,
𝑖 = 1 … 𝑚, 𝑗 = 1 … 𝑁𝑖, where 𝑁𝑖 is the number of subjects in the 𝑖-th task
• Denote 𝛽𝑖 as the weight vector for the 𝑖-th task
• 𝛽𝐼(𝑠,𝑘) denotes all the model parameters corresponding to the
𝑘-th feature in the 𝑠-th data source
• We Solve:
min𝛽
1
𝑚
1
𝑁𝑖 𝐿 𝑥𝑗
𝑖 , 𝑦𝑗𝑖 , 𝛽𝑖
𝑁𝑖
𝑗=1
𝑚
𝑖=1
+ 𝜆 𝛽𝐼 𝑠,𝑘 2
𝑝𝑠
𝑘=1
𝑆
𝑠=1
![Page 92: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/92.jpg)
Center for Evolutionary Medicine and Informatics
iMSF: Performance
94
0.77
0.78
0.79
0.8
0.81
0.82
0.83
0.84
0.85
50.0% 66.7% 75.0%
Acc
ura
cy iMSF
Zero
EM
KNN
SVD
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
50.0% 66.7% 75.0%
Sen
siti
vity
iMSF
Zero
EM
KNN
SVD
0.94
0.95
0.96
0.97
0.98
0.99
1
50.0% 66.7% 75.0%
Spe
cifi
city
iMSF
Zero
EM
KNN
SVD
![Page 93: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/93.jpg)
Center for Evolutionary Medicine and Informatics
Case Study: Drosophila Gene Expression Image Analysis
95
[Megason and Fraser (2007) Cell]
![Page 94: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/94.jpg)
Center for Evolutionary Medicine and Informatics
Life cycle of fruit fly Drosophila melanogaster
[Wolpert et al. (2006) Principles of Development]
Computer Science & Engineering
“We are much more like flies in our development than you might think.” L. Wolpert
embryogenesis
![Page 95: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/95.jpg)
Center for Evolutionary Medicine and Informatics
Berkeley Drosophila Genome Project (BDGP) http://www.fruitfly.org/
Fly-FISH http://fly-fish.ccbr.utoronto.ca/
Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13-
Drosophila gene expression pattern images
[Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]
Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10-
bgm
en
hb
runt
Computer Science & Engineering
Expressions
![Page 96: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/96.jpg)
Center for Evolutionary Medicine and Informatics
Comparative image analysis Twist heartless stumps
anterior endoderm AISN trunk mesoderm AISN subset cellular blastoderm mesoderm AISN
dorsal ectoderm AISN procephalic ectoderm AISN subset cellular blastoderm mesoderm AISN
anterior endoderm AISN trunk mesoderm AISN head mesoderm AISN
stage 4-6
[Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]
trunk mesoderm PR head mesoderm PR
trunk mesoderm PR head mesoderm PR anterior endoderm anlage
yolk nuclei trunk mesoderm PR head mesoderm PR anterior endoderm anlage
stage 7-8
Computer Science & Engineering
We need the spatial and temporal annotations of expressions
![Page 97: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/97.jpg)
Center for Evolutionary Medicine and Informatics
Challenges of manual annotation
Computer Science & Engineering
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010
Nu
mb
er
of
imag
es
Cumulative number of in situ images
![Page 98: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/98.jpg)
Center for Evolutionary Medicine and Informatics
Spatial keywords annotation
• Prior approaches assume all keywords are associated with all images – Zhou and Peng (2007) Bioinformatics
brain primordium visceral muscle primordium
nerve cord primordium
Actn, stage 11-12
Computer Science & Engineering
Multiple keywords are associated with multiple images
Exact correspondences among keywords and images are NOT given
![Page 99: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/99.jpg)
Center for Evolutionary Medicine and Informatics
What are the challenges?
“We used human annotation, rather than automated approaches based on pattern recognition algorithms, because of the overwhelming complexity of annotation. Variation in morphology and incomplete knowledge of the shape and position of various embryonic structures make computational approaches impracticable at present.” P. Tomancak et al. (2002) Genome Biology
Computer Science & Engineering
Local invariant
features
High-level
representations
Multi-task
learning
![Page 100: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/100.jpg)
Center for Evolutionary Medicine and Informatics
SVM
Low-rank multi-task
Model construction
Graph-based multi-task
Method outline
Images
Kernel-based approach
Bag-of-words
Sparse coding
Feature extraction
[Ji et al. (2008) Bioinformatics; Ji et al. (2009) BMC Bioinformatics; Ji et al. (2009) NIPS]
Computer Science & Engineering
![Page 101: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/101.jpg)
Center for Evolutionary Medicine and Informatics
From bag-of-words to sparse coding
u codebook
A 0 0.2 0.6 0.3
0..
2
1min
1
2
xts
xuAxx
0 0 1 0
1,}1,0{..
2
1min
2
exxts
uAx
T
i
x
[Ji et al. (2009) SIGKDD]
Bag-of-words
Sparse coding
Computer Science & Engineering
Both can be improved by incorporating the proximity information of local patches
![Page 102: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/102.jpg)
Center for Evolutionary Medicine and Informatics
Low rank multi-task learning model
[Argyriou et al. (2008) Machine Learning]
Sparse coding features
*1 1
),( WYxwLk
i
n
j
jij
T
i
X Y W
Computer Science & Engineering
Low rank
trace norm = sum of singular values
Loss term
![Page 103: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/103.jpg)
Center for Evolutionary Medicine and Informatics
2-norm loss
Graph-based multi-task learning model
2
),(
2
2
1
1 1
)sgn()(),( qpqp
Gqp
pqF
k
i
n
j
jij
T
i wCwCgWYxwL
[Ji et al. (2009) SIGKDD]
sensory system head
ventral sensory complex
dorsal/lateral sensory complexes
embryonic maxillary sensory complex
embryonic antennal sense organ
sensory nervous system
0.56
0.31 0.57
0.60
0.79
0.50
0.36
0.35
Computer Science & Engineering
Closed-form solution
![Page 104: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/104.jpg)
Center for Evolutionary Medicine and Informatics
Spatial annotation performance
• 50% data for training and 50% for testing and 30 random trials are generated
• Sparse coding with low rank multi-task learning achieves the best performance
Computer Science & Engineering
0
10
20
30
40
50
60
70
80
90
Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16
AU
C
SC+LR
SC+Graph
BoW+Graph
Kernel+SVM
![Page 105: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/105.jpg)
Center for Evolutionary Medicine and Informatics
MALSAR Package
• Jiayu Zhou, Jianhui Chen, Jieping Ye
• http://www.public.asu.edu/~jzhou29/Software/MALSAR/index.html
107
Multi-TAsk Learning via StructurAl Regularization
MALSAR package
![Page 106: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/106.jpg)
Center for Evolutionary Medicine and Informatics
Functions in MALSAR Package
• Regularized Multi-Task Learning
• Joint Feature Learning
• Trace Norm Minimization
• ASO
• Clustered Multi-Task Learning
• Network Multi-Task Learning
• Robust Multi-Task Learning
108
![Page 107: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/107.jpg)
Center for Evolutionary Medicine and Informatics
Trends in Multi-Task Learning
• Develop efficient algorithms for large-scale multi-task learning. In many areas where multi-task learning is applied, such as bioinformatics, the dimensionality of data can be huge.
• Learn task structures automatically in MTL
• Most multi-task learning techniques deal with supervised learning problems. There is a high demand of developing new methods for semi-supervised and unsupervised learning.
109
![Page 108: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/108.jpg)
Center for Evolutionary Medicine and Informatics
Reference
110
![Page 109: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/109.jpg)
Center for Evolutionary Medicine and Informatics
Reference
111
![Page 110: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/110.jpg)
Center for Evolutionary Medicine and Informatics
Reference
112
![Page 111: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/111.jpg)
Center for Evolutionary Medicine and Informatics
Reference
113
![Page 112: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/112.jpg)
Center for Evolutionary Medicine and Informatics
Reference
114
![Page 113: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/113.jpg)
Center for Evolutionary Medicine and Informatics
Reference
115
![Page 114: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/114.jpg)
Center for Evolutionary Medicine and Informatics
Reference
116
![Page 115: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/115.jpg)
Center for Evolutionary Medicine and Informatics
Reference Optimization Algorithms
117
![Page 116: Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec59ccfea876b048f024009/html5/thumbnails/116.jpg)
Center for Evolutionary Medicine and Informatics
118