No definitive “gold standard” causal networks Use a novel held-out validation approach,...
-
Upload
rafe-daniels -
Category
Documents
-
view
215 -
download
0
Transcript of No definitive “gold standard” causal networks Use a novel held-out validation approach,...
![Page 1: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/1.jpg)
The HPN-DREAM breast cancer network inference
challenge:Scoring and results
Steven HillThe Netherlands Cancer Institute
RECOMB/ISCB Conference on Regulatory and Systems Genomics, with DREAM Challenges
8th November 2013
![Page 2: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/2.jpg)
SC1A: Network inference from
experimental data
![Page 3: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/3.jpg)
SC1A: Scoring
• No definitive “gold standard” causal networks• Use a novel held-out validation approach, emphasizing causal aspect of
challenge
Training Data (4 treatments)
Test Data (N-4 treatments)
FGFR1/3iAKTi
AKTi+MEKiDMSO
All Data (N treatments)
Test1Test2
….Test(N-4)
Participants infer 32
networks using
training data
Inferred networks assessed using test
data
![Page 4: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/4.jpg)
SC1A: Scoring metric
Assessment: How well do inferred causal networks agree with effects observed under inhibition in test data?
Step 1: Identify “gold standard” with a paired t-test to compare DMSO and test inhibitors for each phosphoprotein and cell line/stimulus regime
phos
pho1
(a.u
.)
p-value = 3.2x10-5
time
DMSO
Test1
Phos
pho2
(a.u
.)time
DMSOTest1
0 1 1 0 1 0 0 1 0 0Test1
p-value = 0.45
e.g. UACC812/Serum, Test1
phosphoproteins
“gold standard”
![Page 5: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/5.jpg)
FP FPTP TP TPCompare descendants of test inhibitor targetto “gold standard” list of observed effects in held-out data #TP(τ), #FP(τ)
Step 2: Score submissions
SC1A: Scoring metric
threshold, τ
Vary threshold τ ROC curve and AUROC score # TP
# FP
1 0 1 0 1 0 1 1 0 0Test1
phosphoproteins
AUROC
(0 .67 ⋯ 0.43⋮ ⋱ ⋮
0.58 ⋯ 0.87)
(1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 1)
Test1
Obtain protein descendants downstream of test inhibitor
target
Matrix of predicted edge scores for a single cell line/stimulus regime
![Page 6: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/6.jpg)
SC1A: AUROC scores & nulls
• 74 final submissions• Each submission has 32 AUROC scores
(one for each cell line/stimulus regime)
3.58 x 10-6 8.98 x 10-6
9.19 x 10-44.18 x 10-6
non-significant AUROCsignificant AUROCbest performer
![Page 7: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/7.jpg)
Scoring procedure:
1. For each submission and each cell line/stimulus pair, compute AUROC score
2. Submissions ranked for each cell line/stimulus pair
3. Mean rank across cell line/stimulus pairs calculated for each submission
4. Rank submissions according to mean rank
SC1A: Final ranking
32 cell line/stimulus pairs
Subm
issi
ons 0.50.70.90.60.5
0.80.70.40.70.6
0.80.5 AUROC
scores
32 cell line/stimulus pairs
Subm
issi
ons 4
2133
12423
14 AUROC
ranks
Subm
issi
ons 3
21.333.66
mean rank
Subm
issi
ons 3
214
final rank
![Page 8: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/8.jpg)
• Verify that final ranking is robust
Procedure:1. Mask 50% of phosphoproteins in each
AUROC calculation
2. Re-calculate final ranking
3. Repeat (1) and (2) 100 times
SC1A: Robustness analysis
phosphoproteins
rank
Top 10 teams
5.40 x 10-10
![Page 9: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/9.jpg)
SC1B: Network inference from
in silico data
![Page 10: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/10.jpg)
• Gold-standard available: Data-generating causal network
• Participants submitted a single set of edge scores
• Edge scores compared against gold standard -> AUROC score
• Participants ranked based on AUROC score
SC1B Scoring, AUROCs, Null & Robustness
3.11 x 10-11
non-significant AUROC (51)
significant AUROC (14)
best performer
Robustness Analysis:1. Mask 50% of edges in
calculation of AUROC2. Re-calculate final ranking3. Repeat (1) and (2) 100 times
rank
Top 10 teams
3.90 x 10-14
![Page 11: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/11.jpg)
• 59 teams participated in both SC1A and SC1B
• Reward for consistently good performance across both parts of SC1
• Average of SC1A rank and SC1B rank
• Top team ranked robustly first
Combined score for SC1A and SC1B
![Page 12: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/12.jpg)
SC2A:Timecourse prediction from
experimental data
![Page 13: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/13.jpg)
FGFR1/3iAKTi
AKTi+MEKiDMSO
Test1Test2
….Test(N-4)
SC2A Scoring
Training Data (4 treatments)
Test Data (N-4 treatments)
All Data (N treatments)
Participants build dynamical
models using training data
and make predictions for
phosphoprotein trajectories
under inhibitions not
in training data
Predictions assessed using test
data
![Page 14: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/14.jpg)
• Participants made predictions for all phosphoproteins for each cell line/stimulus pair, under inhibition of each of 5 test inhibitors
• Assessment: How well do predicted trajectories agree with the corresponding trajectories in the test data?
• Scoring metric: Root-mean-squared error (RMSE), calculated for each cell line/phosphoprotein/test inhibitor combination
SC2A: Scoring metric
2, , , , , , , , , ,
1 1
1ˆ( )
T Sr rp c i p c i s t p c i s t
t s
RMSE x xTS
e.g. UACC812, Phospho1, Test1
![Page 15: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/15.jpg)
SC2A: RMSE scores, nulls & final ranking
• 14 final submissions
1.35 x 10-4 3.70 x 10-8
1.21 x 10-61.49 x 10-5
non-significant AUROC
significant AUROC
best performer
Final ranking: Analogously to SC1A, submissions ranked for each regime and mean rank calculated
![Page 16: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/16.jpg)
• Verify that final ranking is robust
Procedure:1. Mask 50% of data points in each
RMSE calculation
2. Re-calculate final ranking
3. Repeat (1) and (2) 100 times
SC2A: Robustness analysis
Top 10 teams
0.99
3.04 x 10-18
rank
6.97 x 10-5
Incomplete submission
2 best performers
![Page 17: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/17.jpg)
SC2B:Timecourse prediction from
in silico data
![Page 18: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/18.jpg)
• Participants made predictions for all phosphoproteins for each stimulus regime, under inhibition of each phosphoprotein in turn
• Scoring metric is RMSE and procedure follows that of SC2A
SC2B: Scoring Metric, Nulls, Robustness
2, , , , , , ,
1 1
1ˆ( )
T Sr rp i p i s t p i s t
t s
RMSE x xTS
1.01.68 x 10-14
2.89 x 10-70.015
Robustness Analysis:1. Mask 50% of data points in
each RMSE calculation2. Re-calculate final ranking3. Repeat (1) and (2) 100 times
non-significant AUROCsignificant AUROCbest performer
7.71 x 10-19
Top 10 teams
rank
0.99
Incomplete submission
![Page 19: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/19.jpg)
• 10 teams participated in both SC2A and SC2B
• Reward for consistently good performance across both parts of SC2
• Average of SC2A rank and SC2B rank
• Top team ranked robustly first
Combined score for SC2A and SC2B
![Page 20: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/20.jpg)
SC3:Visualization
![Page 21: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/21.jpg)
• 14 submissions
• 36 HPN-DREAM participants voted – assigned ranks 1 to 3
• Final score = mean rank (unranked submissions assigned rank 4)
SC3: Scoring and Results
![Page 22: No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a871fe/html5/thumbnails/22.jpg)
• Submissions rigorously assessed using held-out test data:• SC1A: Novel procedure used to assess network inference performance
in setting with no true “gold standard”
• Many statistically significant predictions submitted
For further investigation:• Explore why some regimes (e.g. cell line/stimulus pairs) are easier to predict
than others• Determine why different teams performed well in experimental and in silico
challenges • Identify the methods/approaches that yield the best predictions • Wisdom of crowds – does aggregating submissions improve performance
and lead to discovery of biological insights?
Conclusions and Observations