Stas$cal(analysis(with( MSstats( - DIA/SWATH...
Transcript of Stas$cal(analysis(with( MSstats( - DIA/SWATH...
Sta$s$cal analysis with MSstats
Meena Choi
Department of Sta$s$cs
2013.07.18
Agenda
2 SRM course 2013 : Sta$s$cal analysis with MSstats
• What is MSstats? • How to analyze using MSstats in Skyline • Study of poor quality of peaks • R-‐based plaLorm takes advantage of more op$ons and modifies easily.
R and MSstats
3
1. Test proteins for differen$al abundance 2. Quan$fy proteins in biological samples 3. Design of future experiment
R Package
What we can do :
Label-‐free & label-‐based LC-‐MS & SRM
Label-‐free & label-‐based SRM Label-‐free shotgun MS
SRM course 2013 : Sta$s$cal analysis with MSstats
R is a freely available language and environment for sta$s$cal compu$ng and graphics, easy to develop tools. R packages allow specialized sta$s$cal techniques, graphics with specific func$ons for specific area of study.
Website
• hWp://www.stat.purdue.edu/~ovitek/SoZware.html • hWp://msstats.org
– Workflow for different analysis – Example dataset with R-‐script
4
Download
Download & website
SRM course 2013 : Sta$s$cal analysis with MSstats
Agenda
5 SRM course 2013 : Sta$s$cal analysis with MSstats
• What is MSstats? • How to analyze using MSstats in Skyline • Study of poor quality of peaks • R-‐based plaLorm takes advantage of more op$ons and modifies easily.
MSstats with Skyline
• Use as an external tool • Automa$cally run the func$ons for
– Data processing : Preprocessing the data, Drawing the profile plots, Quality control plots, Condi$on plots
– Group Comparison : Comparing between groups, Drawing the plots with results
– Design Sample Size : Calcula$ng the sample size • For the beginner of R or other sta$s$cal tools, we can do sta$s$cal
analysis with default op$ons through Skyline easily.
6
In Skyline
SRM course 2013 : Sta$s$cal analysis with MSstats
Set up MSstats as external tool
• Need ‘MSstatsExternal.zip’ (will be downloaded through Skyline website or MSstats website)
• Tools -‐> External tools • Add… -‐> select MSstatsExternal.zip • Then start to download and install R and required packages
7
In Skyline
SRM course 2013 : Sta$s$cal analysis with MSstats
1. Data processing and Quality control plots
Visualiza$on for data – Show poten$al source of varia$on : Run, Subject, Transi$on, Condi$on – Find any problema$c observa$on : outliers, missingness, poor quality runs – Show how the normaliza$on works
8
Profile Plot : individual observa$ons QC Plot : distribu$on per run SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
Data processing : Input with the report from Skyline – Log 2 or 10 transforma$on – Constant normaliza$on ( same median across runs in reference)
Reference Endogenous
●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●●●
●
●●
●●
●
●●
●●
●●
●
●
●●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●
● ●
●●
●
●●
●●
●●
●
●
T1−0h T2−6h T3−48h
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●●●●●●
●
●●
●
●●
●●●●
●
●●
●●
●
●
●
●
●
●●
●
● ●
●●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●●
●●
●
●
T1−0h T2−6h T3−48h
0
10
20
30
3 6 9 3 6 9MS runs
Log2−i
nten
sitie
s
●
●
●
●
●
●
●
●
●
●
●
●
AQAATAGIDDLRPALIR_3_y12_2
AQAATAGIDDLRPALIR_3_y13_2
AQAATAGIDDLRPALIR_3_y14_2
AQAATAGIDDLRPALIR_3_y15_2
AQAATAGIDDLRPALIR_3_y7_2
EFPDVAVFSGGR_2_y10_1
EFPDVAVFSGGR_2_y10_2
EFPDVAVFSGGR_2_y7_1
LTTPAEALVTR_2_y8_1
LTTPAEALVTR_2_y8_2
LTTPAEALVTR_2_y9_1
LTTPAEALVTR_2_y9_2
Rv0079_Rv0079 Reference Endogenous
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
● ●
●
T1−0h T2−6h T3−48h
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●● ●
●● ●
●
T1−0h T2−6h T3−48h
0
10
20
30
3 6 9 3 6 9MS runs
Log2−i
nten
sitie
s
All proteins
2. Test for differen$al abundance at the protein level
• Hypothesis : Is there a difference in abundance between condi$on1 and condi$on2?
H0 : log fold change = 0 vs. Ha : log fold change ≠ 0
• Automa$cally detect the proper$es of the experimental design • Case-‐control (matching : before-‐aZer) study • Time-‐course study
• Can choose the model • with the desired scope of conclusion
• Scope of biological replica$on • fixed (“restricted”) / random (“expanded”)
• Scope of technical MS run replica$on : • fixed (“restricted”) / random (“expanded”)
• Interference • contain interference transi$ons, need addi$onal model interac$on
9 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
● ●
●
●
●
●● ●
●
● Rv0079_Rv0079 Rv1738_Rv1738
Rv1812c_Rv1812c
Rv1996_Rv1996
Rv2027c_dosT
Rv2031c_hspXRv2623_TB31.7 Rv2626c_hrp1
Rv3132c_devS
Rv3133c_devR
0
10
20
30
−5.0 −2.5 0.0 2.5 5.0
−Log
2 (a
djus
ted
p−va
lue)
Log2 fold change●�No regulation�●�Down−regulated�●�Up−regulated
Adj p−value cutoff(0.05)
T2−T1
2. Visualiza$on for tes$ng results : Volcano plot
10 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
More significant
Less significant
Prac$cal significance
Sta$s$cal significance
Volcano plot : • Per comparison • All proteins • Adjusted p-‐value and log fold change
2. Visualiza$on for tes$ng results
11
Heatmap: • With all comparisons • All proteins • Adjusted p-‐value and cut-‐off log
fold change
Comparison plot: • With all comparisons • Per protein • log fold change and CI
SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
T2−T1
T3−T1
Rv1812c_Rv1812c
Rv2027c_dosT
Rv3132c_devS
Rv1996_Rv1996
Rv3133c_devR
Rv2626c_hrp1
Rv2623_TB31.7
Rv2031c_hspX
Rv0079_Rv0079
Rv1738_Rv1738
●
●
4
5
6
7
8
T2−T1 T3−T1ComparisonLo
g2−F
old
Cha
nge
Rv2031c_hspX
Color Key
(sign) Adjusted p−value1 0.001 1e−10���1e−10 ���0.001
3. Design of future experiment
• Use the current dataset for variance es$ma$on : with fixed Subject or random Subject
• Also calculate • The number of pep$de per protein • The number of transi$on per pep$de
12 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
05
1015
2025
1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
0.001 0.003 0.005 0.009 0.018Coefficient of variaWLRQ, CV
Desired fold change
Min
imal
num
ber o
f bio
logi
cal r
eplic
ates Number of peptides is 3
Number of transitions is 4FDR is 0.05Statistical power is 0.8
Prac$ce 1. Set up ‘Annota$ons’
• Seqngs è Annota$ons.. è Edit List è Add – Type ‘BioReplicate’ in Name -‐> select ‘Replicates’ in Applies to – Type ‘Run’ in Name -‐> select ‘Replicates’ in Applies to – Type ‘Condi$on’ in Name -‐>select ‘Value List’ in Type -‐> Type values(Condi$on
values) -‐> select ‘Replicates’ in Applies to – Click OK
• Check ‘BioReplicate’, ‘Run’, ‘Condi$on’ in Annota0on Se2ngs, then click Ok.
13 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
Prac$ce 2. Fill in ‘Results Grid’
• View è Results Grid – Type in columns,‘BioReplicate’ and ‘Run’ – Choose one of value in column ‘Condi$on’ for each replicate
14 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
Prac$ce 3. MSstats as an external tool
– Tools è MSstats • Data processing • Group Comparison • Design Sample Size
– All results and plots will be saved under current directory.
15 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
Let’s try!
Agenda
16 SRM course 2013 : Sta$s$cal analysis with MSstats
• What is MSstats? • How to analyze using MSstats in Skyline • Study of poor quality of peaks • R-‐based plaLorm takes advantage of more op$ons and modifies easily.
Data : Rat-‐plasma for Risk of heart disease
17
Each Protein
High salt (Disease) Low salt (Healthy)
Sub1 … Sub7 Sub8 … Sub14 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
Pep*Tran1 X X X … X X X X X X … X X X
Pep*Tran2 X X X … X X X X X X … X X X
Pep*Tran3 X X X … X X X X X X … X X X
• Label-‐free SRM experiment • High salt (7) vs. Low salt (7) • 3 Technical replicates • Total 42 injec$ons (Runs) • 48 proteins • Comparison : High Salt – Low Salt (Disease-‐Healthy)
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Examples of inconsistent (poor quality?) pep$des
18
Rat Plasma : label-‐free SRM
Profile plot show the problemaBc pepBdes or transiBons. We need to check what happen in this pepBde.
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 2 ● ●CSSLLWAGAAWLR_2 NLGVVVAPHALR_2
NP_001007697
NLGVVVAPHALR
19
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 2 ● ●CSSLLWAGAAWLR_2 NLGVVVAPHALR_2
NP_001007697
CSSLLWAGAAWLR
20
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 2 ● ●CSSLLWAGAAWLR_2 NLGVVVAPHALR_2
NP_001007697
Log2 FC and varia$on are different between before and aZer removing pep$des
21
All features Only NLGV (red lines) Only CSSL (black lines)
log2FC SE Adj p-‐value log2FC SE Adj p-‐value log2FC SE Adj p-‐value
Fixed Subject -‐2.6721 0.1439 <0.0001 0.8750 0.0260 <0.0001 -‐6.2272 0.2868 <0.0001
Random Subject -‐2.6701 0.2214 <0.0001 0.8750 0.2399 0.0066 -‐6.2187 0.4152 <0.0001
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 2 ● ●CSSLLWAGAAWLR_2 NLGVVVAPHALR_2
NP_001007697
Endogenous
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●●
●
●
●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 1 ● CSSLLWAGAAWLR_2
NP_001007697
Endogenous
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●
●
● ●
●●
●
●
●●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●
● ●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●●
●●●
●●
●
●●
●
●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 1 ● NLGVVVAPHALR_2
NP_001007697
22
Examples of inconsistent pep$des
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Profile plot show inconsistent paKern per pepBdes. We need to check that is there any measurement problem.
Endogenous
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s# peptide: 3 ● ● ●AIAYLNTGYQR_2 DLTGFPQGADQR_2 TVEHPFSVEEFVLPK_2
NP_036620
DLTGFPQGADQR
23
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 3 ● ● ●AIAYLNTGYQR_2 DLTGFPQGADQR_2 TVEHPFSVEEFVLPK_2
NP_036620
AIAYLNTGYQR
24
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 3 ● ● ●AIAYLNTGYQR_2 DLTGFPQGADQR_2 TVEHPFSVEEFVLPK_2
NP_036620
25
Log2 FC and varia$on are quite different depending on pep$des.
All features Only DLTG and TVEH Only AIAY
log2FC SE Adj p-‐value log2FC SE Adj p-‐value log2FC SE Adj p-‐value
Fixed Subject 2.0642 0.0951 <0.0001 0.6167 0.0414 <0.0001 5.0812 0.0591 <0.0001
Random Subject 2.0642 0.2966 <0.0001 0.6167 0.1137 0.0005 5.0812 0.7390 <0.0001
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Endogenous
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●●
●●
●●
●
●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●●
●●
●
●
●
●
●●
●
●
●●
●
●●●
●
● ●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●●●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 1 ● AIAYLNTGYQR_2
NP_036620
Endogenous
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 3 ● ● ●AIAYLNTGYQR_2 DLTGFPQGADQR_2 TVEHPFSVEEFVLPK_2
NP_036620
Endogenous
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Disease Healthy
0
10
20
30
21 42MS runs
Log2−i
nten
sitie
s
# peptide: 2 ● ●DLTGFPQGADQR_2 TVEHPFSVEEFVLPK_2
NP_036620
Summary of poor quality pep$des
• Profile plot show inconsistent paWern per pep$des. We need to check that is there any measurement problem.
• Less certainty that you look at the correct pep$de, – Due to different reasons such as any phosphoryla$on and modifica$on
in pep$de level. – sugges$on : re-‐measure in label-‐based way.
• Need to inves$gate further a subset of pep$des that we find interes$ng for some reason.
26
Rat Plasma : label-‐free SRM
SRM course 2013 : Sta$s$cal analysis with MSstats
Agenda
27 SRM course 2013 : Sta$s$cal analysis with MSstats
• What is MSstats? • How to analyze using MSstats in Skyline • Study of poor quality of peaks • R-‐based plaLorm takes advantage of more op$ons and modifies easily.
MSstats in R
• Use R-‐based plaLorm if you want the detailed op$ons for all func$ons such as, – Customized normaliza$on – Detailed op$ons for all plots – Quan$fica$on for sample
• With R-‐based plaLorm, we can take advantage of op$ons and modify the data easily.
28
In R
SRM course 2013 : Sta$s$cal analysis with MSstats
1. Set up ‘Annota$ons’
29 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
• Seqngs è Annota$ons.. è Edit List è Add – Type ‘BioReplicate’ in Name -‐> select ‘Replicates’ in Applies to – Type ‘Run’ in Name -‐> select ‘Replicates’ in Applies to – Type ‘Condi$on’ in Name -‐>select ‘Value List’ in Type -‐> Type values(Condi$on
values) -‐> select ‘Replicates’ in Applies to – Click OK
• Check ‘BioReplicate’, ‘Run’, ‘Condi$on’ in Annota0on Se2ngs, then click Ok.
2. Fill in ‘Results Grid’
• View è Results Grid – Type in columns,‘BioReplicate’ and ‘Run’ – Choose one of value in column ‘Condi$on’ for each replicate
30 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
3. Generate report from Skyline
• Export the report using MSstats report format (Msstats_report.skyr) – File è Export è Report.. è Import.. – select MSstats_report.skyr – Select MSstats in Export Reportè Export
31 SRM course 2013 : Sta$s$cal analysis with MSstats
In Skyline
Let’s try!
How to start MSstats • Download and install R 3.0.1 (hWp://cran.r-‐project.org/) • Required package
– gplots, lme4, ggplot2, limma, marray – Need to install the required package once. Then they will be loaded automa$cally
with MSstats. • Installa$on
– Select ‘packages’ in toolbar and then ‘Install package(s)’ in dropdown op$on. – Or use ‘install. packages’ func$on. (see R script example)
32
In R
R screen R studio screen SRM course 2013 : Sta$s$cal analysis with MSstats
4. Data processing
• Input : report from Skyline or excel spreadsheet, output from signal process tool
Then, we need
– Log 2 or 10 transforma$on – Constant normaliza$on – Show the summary of data structure
In R : use R script
33 SRM course 2013 : Sta$s$cal analysis with MSstats
> quantData<-dataProcess(raw, logTrans=2, normalization=TRUE)
5. Data processing : Quality control • Show poten$al source of varia$on : Run, Subject, Transi$on, Condi$on • Find any problema$c observa$on : outliers, missingness, poor quality runs • Show how the normaliza$on works • Input : processed data
34
Profile Plot : individual observa$ons QC Plot : distribu$on per run SRM course 2013 : Sta$s$cal analysis with MSstats
Let’s try!
> dataProcessPlots(data=quantData,type="ProfilePlot”)
Reference Endogenous
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●●
●
● ●
●
●
●
●
●
●
●●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
T1−0h T2−6h T3−48h
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
T1−0h T2−6h T3−48h
0
10
20
30
3 6 9 3 6 9MS runs
Log2−i
nten
sitie
s
●
●
●
●
●
●
●
●
●
●
●
●
AIVHTAAELVDAR_3_y4_1
AIVHTAAELVDAR_3_y5_1
AIVHTAAELVDAR_3_y6_1
AIVHTAAELVDAR_3_y9_1
GVLGALIEEPKPIR_3_y11_2
GVLGALIEEPKPIR_3_y3_1
GVLGALIEEPKPIR_3_y5_1
GVLGALIEEPKPIR_3_y6_1
SAIFDLHAGPSR_3_y10_2
SAIFDLHAGPSR_3_y5_1
SAIFDLHAGPSR_3_y8_1
SAIFDLHAGPSR_3_y9_2
Rv2027c_dosT
Missingness
Reference Endogenous
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●
T1−0h T2−6h T3−48h
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●●
●●
T1−0h T2−6h T3−48h
0
10
20
30
3 6 9 3 6 9MS runs
Log2−i
nten
sitie
s
All proteinsReference Endogenous
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
● ●
●
T1−0h T2−6h T3−48h
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●● ●
●● ●
●
T1−0h T2−6h T3−48h
0
10
20
30
3 6 9 3 6 9MS runs
Log2−i
nten
sitie
s
All proteins
In R : use R script
6. Test for differen$al abundance at the protein level
35 SRM course 2013 : Sta$s$cal analysis with MSstats
• Input – Processed data – Assigned comparison matrix
How to assign the comparison matrix???
CondiBon1 CondiBon2 CondiBon3 … CondiBon J example
T2-‐T1 -‐1 1 0 … 0 Disease-‐Control
T1-‐T2 1 -‐1 Control-‐Disease
T3-‐T1 -‐1 0 1 … 0
T3-‐T2 0 -‐1 1 … 0
(T2+T3)/2-‐T1 -‐1 0.5 0.5 0 (Cancer+Benign)-‐Control
>resultMultiComparisons<-groupComparison(contrast.matrix=comparison,data=quantData, labeled=TRUE, scopeOfBioReplication="restricted", scopeOfTechReplication="expanded", interference=TRUE)
In R : use R script
Comparison matrix
36 SRM course 2013 : Sta$s$cal analysis with MSstats
T1-0h T2-6h T3-48hT2-T1 -1 1 0>groupComparison(contrast.matrix=comparison,data=quantData)
T1-0h T2-6h T3-48hT1-T2 1 -1 0>groupComparison(contrast.matrix=comparison,data=quantData)
Sign of log2FC is changed. However, others including conclusion are the same.
In R : use R script
Residual plot
37
−2
−1
0
1
2
10 15 20
Predicted Abundance
Res
idua
ls
FEATURE
ADLLAAAAPR_2_y3_1ADLLAAAAPR_2_y4_1ADLLAAAAPR_2_y6_1ADLLAAAAPR_2_y7_1VIGVPAMFAAGDVAAAR_2_y13_2VIGVPAMFAAGDVAAAR_2_y5_1VIGVPAMFAAGDVAAAR_2_y8_1VIGVPAMFAAGDVAAAR_2_y9_1VIGVPAMFAAGDVAAAR_3_y4_1
VIGVPAMFAAGDVAAAR_3_y5_1VIGVPAMFAAGDVAAAR_3_y7_1VIGVPAMFAAGDVAAAR_3_y8_1VIGVPAMFAAGDVAAAR_3_y9_1VTTSTGASYSYDR_2_y10_1VTTSTGASYSYDR_2_y11_1VTTSTGASYSYDR_2_y6_1VTTSTGASYSYDR_2_y8_1VTTSTGASYSYDR_2_y9_1
Reference Endogenous
Rv1812c_Rv1812c
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
−3 −2 −1 0 1 2 3
−0.2
−0.1
0.0
0.1
0.2
Normal Q−Q Plot ( Rv1812c_Rv1812c )
Theoretical Quantiles
Sam
ple
Qua
ntile
s
• Perform model-‐based quality control • Check equal variance assump$on among features
Close to zero, no paKerns Close to diagonal line
In R : use R script
SRM course 2013 : Sta$s$cal analysis with MSstats
7. Visualiza$on for tes$ng results
38 SRM course 2013 : Sta$s$cal analysis with MSstats
• Input : result of group comparison
>groupComparisonPlots(data=resultmultiComparison,type="VolcanoPlot”,FCcutoff=1.5)
Let’s try!
Color Key
(sign) Adjusted p−value1 0.001 1e−10���1e−10 ���0.001
● ●
●
●
●
●● ●
●
● Rv0079_Rv0079 Rv1738_Rv1738
Rv1812c_Rv1812c
Rv1996_Rv1996
Rv2027c_dosT
Rv2031c_hspXRv2623_TB31.7 Rv2626c_hrp1
Rv3132c_devS
Rv3133c_devR
0
10
20
30
−5.0 −2.5 0.0 2.5 5.0
−Log
2 (a
djus
ted
p−va
lue)
Log2 fold change●�No regulation�●�Down−regulated�●�Up−regulated
Adj p−value cutoff(0.05) Fold change cutoff(1.5)
T2−T1
T2−T1
T3−T1
Rv1812c_Rv1812c
Rv2027c_dosT
Rv3133c_devR
Rv1996_Rv1996
Rv3132c_devS
Rv2626c_hrp1
Rv2623_TB31.7
Rv2031c_hspX
Rv0079_Rv0079
Rv1738_Rv1738
In R : use R script
Different scope of conclusion
39
Less SensiBve More Specific
More SensiBve Less Specific
Fixed Run Random Subject
Random Run Random Subject
Fixed Run Fixed Subject
Random Run Fixed Subject
• The choice of the model should depend on the desired scope of biological conclusions, and not on the sensi$vity/specificity.
● ●● ●
●
●● ●●● Rv0079_Rv0079 Rv1738_Rv1738Rv1812c_Rv1812c Rv1996_Rv1996
Rv2027c_dosT
Rv2031c_hspXRv2623_TB31.7Rv2626c_hrp1Rv3132c_devSRv3133c_devR
0
10
20
30
−5 0 5
Log2 fold change
−Log
2 (a
djus
ted
p−va
lue)
● ● ●No regulation Down−regulated Up−regulated
Adj p−value cutoff(0.05) Fold change cutoff(1.5)
T3−T1
●
●
●
●
●
●
●
●
●
●
Rv0079_Rv0079
Rv1738_Rv1738
Rv1812c_Rv1812c
Rv1996_Rv1996
Rv2027c_dosT
Rv2031c_hspX
Rv2623_TB31.7
Rv2626c_hrp1
Rv3132c_devS
Rv3133c_devR
0
5
10
15
−5 0 5
Log2 fold change
−Log
2 (a
djus
ted
p−va
lue)
● ● ●No regulation Down−regulated Up−regulated
Adj p−value cutoff(0.05) Fold change cutoff(1.5)
T3−T1
●
●
●
●
●
●
●
●
●
●
Rv0079_Rv0079Rv1738_Rv1738
Rv1812c_Rv1812c
Rv1996_Rv1996
Rv2027c_dosT
Rv2031c_hspX
Rv2623_TB31.7
Rv2626c_hrp1
Rv3132c_devS
Rv3133c_devR
0
5
10
15
−5 0 5
Log2 fold change
−Log
2 (a
djus
ted
p−va
lue)
● ● ●No regulation Down−regulated Up−regulated
Adj p−value cutoff(0.05) Fold change cutoff(1.5)
T3−T1
In R
SRM course 2013 : Sta$s$cal analysis with MSstats
8. Design of future experiment
40 SRM course 2013 : Sta$s$cal analysis with MSstats
• Input : processed data
>designSampleSize(data=quantData,numSample=TRUE,numPep=3,numTran=3,desiredFC=c(1.1,1.5),FDR=0.05,power=0.8)
05
1015
2025
1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
0.001 0.003 0.005 0.009 0.018Coefficient of variaWLRQ, CV
Desired fold change
Min
imal
num
ber o
f bio
logi
cal r
eplic
ates Number of peptides is 3
Number of transitions is 4FDR is 0.05Statistical power is 0.8
In R : use R script
• Can consider – Scope of subject : restricted or expanded – Interference or not
Design of future experiment : Power
41 SRM course 2013 : Sta$s$cal analysis with MSstats
Let’s try!
0.0
0.2
0.4
0.6
0.8
1.0
1.1 1.2 1.3 1.4 1.5 1.6 1.7
Desired fold change
Powe
rNumber of replicates is 3Number of peptides is 3Number of transitions is 3FDR is 0.05
• Input : processed data
>designSampleSize(data=quantData,numSample=3,numPep=3,numTran=3,desiredFC=c(1.1,1.7),FDR=0.05,power=TRUE)
In R : use R script
Different sample size calcula$on for different conclusion and experiments
42
1020
3040
5060
70
1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
Desired fold change
Min
imal
num
ber o
f bio
logi
cal r
eplic
ates Number of peptides is 3
Number of transitions is 3FDR is 0.05Statistical power is 0.8Labeled SRM, Fixed SubjectLabeled SRM, Random Subject
Need more number of biological replicate with random subject for model
In R
SRM course 2013 : Sta$s$cal analysis with MSstats
Future plan
• Several sta$s$cal analysis in prototype stage will merge different workflow in MSstats – sparseStats : with subset of reference pep$des – Feature selec$on : select a subset of informa$ve fragments
– Biomarker study : classifica$on study, find best candidates as biomarker, clustering, measure the performance
• Improve UI and expand more op$ons for MSstats in Skyline
43
MSstats
SRM course 2013 : Sta$s$cal analysis with MSstats