Proteomics Informatics –
description
Transcript of Proteomics Informatics –
Proteomics Informatics – Protein characterization: post-translational
modifications and protein-protein interactions (Week 10)
Top down / bottom up
Top down
Bottom up
mass/charge
inte
nsity
Top down Bottom up
Charge distribution
mass/chargein
tens
itymass/charge
inte
nsity
1+
2+
3+
4+
27+
31+
Top down Bottom upm = 1035 Da m = 1878 Da m = 2234 Da
Isotope distribution
mass/chargein
tens
itymass/charge
inte
nsity
Fragmentation
Top down Bottom up
Fragmentation
Correlations between modifications
Top down
Bottom up
Alternative Splicing
Top down
Bottom up
Exon 1 2 3
Top down
Kellie et al., Molecular BioSystems 2010
Proteinmass
spectraFragment
mass spectra
Protein Complexes
AB
AC
D
Digestion
Mass spectrometry
Sowa et al., Cell 2009
Protein Complexes – specific/non-specific binding
Protein Complexes – specific/non-specific binding
Choi et al., Nature Methods 2010
Tackett et al. JPR 2005
Protein Complexes – specific/non-specific binding
Analysis of Non-Covalent Protein Complexes
Taverner et al., Acc Chem Res 2008
Non-Covalent Protein Complexes
Schreiber et al., Nature 2011
More / better quality interactions
Affinity Capture Optimization Screen
+
Cell extraction
Lysate clearance/Batch Binding
Binding/Washing/Eluting
SDS-PAGE
Filtration
LaCava, Hakhverdyan, Domanski, Rout
Over 20 different extraction and washing conditions ~ 10 years or art.(41 pullouts are shown)
Molecular Architecture of the NPC
Actual model Alber F. et al. Nature (450) 683-694. 2007 Alber F. et al. Nature (450) 695-700. 2007
Cloning nanobodies for GFP pullouts
• Atypical heavy chain-only IgG antibody produced in camelid family – retain high affinity for antigen without light chain
• Aimed to clone individual single-domain VHH antibodies against GFP – only ~15 kDa, can be recombinantly expressed, used as bait for pullouts, etc.
• To identify full repertoire, will identify GFP binders through combination of high-throughput DNA sequencing and mass spectrometry
VHH clone for recombinant expression
Cloning llamabodies for GFP pullouts
Llama GFP immunization
Lymphocytetotal RNA
Crude serum
VHH amplicon
454 DNA sequencing
RT / Nested PCRIgG fractionation &
GFP affinity purification
VHH DNA sequence library
GFP-specific VHH fraction
LC-MS/MS
GFP-specific VHH clones
Bone marrow aspiration Serum bleed
500400300
1000 bp
0
100,000
200,000
300,000
400,000
500,000
No.
of R
eads
Read length (bp)
VH
VHH
Fridy, Li, Keegan, Chait, Rout
CDR3: 100.0% (14/14); combined CDR: 100.0% (33/33); DNA count: 10MAQVQLVESGGGLVQAGGSLRLSCVASGRTFSGYAMGWFRQTPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33; DNA count: 1MADVQLVESGGGLVQSGGSRTLSCAASGRVLATYHLGWFRQSPGREREAVAAITWSAHSTYYSDSVKGRFTISIDNARNTGYLQMNSLKPEDTAVYYCTVRHGTWFTVSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33); DNA count: 1MAQVQLVESGGALVQAGASLSVSCAASGGTISKYNMAWFRRAPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1MAQVQLEESGGGLVQAGDSLTLSCSASGRTFTNYAMAWSRQAPGKERELLAAIDAAGGATYYSDSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1MAQVQLVESGGGRVQAGGSLTLSCVGSEGIFWNHVMGWFRQSPGKDREFVARISKIGGTTNYADSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR1 CDR2 CDR3
Underlined regions are covered by MS
Rank sequences according to:CDR3 coverage; Overall coverage;Combined CDR coverage; DNA counts;
Identifying full-length sequences from peptides
Sequence diversity of 26 verified anti-GFP nanobodies
• Of ~200 positive sequence hits, 44 high confidence clones were synthesized and tested for expression and GFP binding: 26 were confirmed GFP binders.
• Sequences have characteristic conserved VHH residues, but significant diversity in CDR regions.
FR1 CDR1 FR2 CDR2 CDR3FR3 FR4
HIV-1
gp120Lipid Bilayer gp41
MA
CA
NC
PRIN
RT
RNA
Particle
Genome
env
rev
vpu
tat
nef
3’ LTR5’ LTR
vif gagpol
vpr
CAMA NC p6
PR RT IN
gp41gp120
9,200 nucleotides
Genetic-Proteomic Approach
Tagged Viral Protein
Tag
Protein ComplexSDS-PAGE
*
Mass Spectrometry
I-Dirt for Specific Interaction
3xFLAG Tagged HIV-1 WT HIV-1
Infection
Light Heavy (13C labeled Lys, Arg)
1:1 Mix
Immunoisolation
MS
I-DIRT = Isotopic Differentiation of Interactions as Random or Targeted
Lys Arg(+6 daltons)(+6 daltons)
Modified from Tackett AJ et al., J Proteome Res. (2005) 4, 1752-6.
IDIRT and Reverse IDIRT
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.40 0.50 0.60 0.70 0.80 0.90 1.00
Spec
ifici
ty, R
erve
rse
Specificity, Forward
gp160 IDIRT: Forward-Reverse Ratio Comparison
Env-3xFLAG Vif-3xFLAG
Luo, Jacobs, Greco, Cristae, Muesing, Chait, Rout
Protein Exchange
Vif-3F
Heavy labeled Vif-3F lysate
IP in heavy labeled Vif-3F lysate
Vif-3F
Light labeled wt lysate
Incubation with light labeled wt lysate
Vif-3F
15min
Vif-3F
5min
Stable Interactor
Vif-3F
Interactor with fast exchange
60min
Env Time Course SILAC
• Differentially labeled infection harvested at early or late stage of infection
• Distinguish proteins that interact with Env at early or late stage during infection
Early during infection Late during infection
LightHeavy (13C labeled Lys, Arg)
1:1 Mix
Immunoisolation
MS
Early interactor Late interactor
M/Z
PeptidesFragments
Fragmentation
ProteolyticPeptides
Enzymatic Digestion
ProteinComplex
Chemical Cross-Linking
MS
MS/MS
Isolation
Cross-LinkedProtein Complex
Interaction Partners by Chemical Cross-Linking
M/Z
PeptidesFragments
Fragmentation
ProteolyticPeptides
Enzymatic Digestion
ProteinComplex
Chemical Cross-Linking
MS
MS/MS
Isolation
Cross-LinkedProtein Complex
Interaction Sites by Chemical Cross-Linking
Cross-linking
protein
n peptides with reactive groups
(n-1)n/2 potential ways to cross-link peptides pairwise+ many additional uninformative forms
Protein A + IgG heavy chain 990 possible peptide pairs
Yeast NPC ˜106 possible peptide pairs
Protein Crosslinking by Formaldehyde
~1% w/v Fal20 – 60 min
~0.3% w/v Fal5 – 20 min1/100 the volume
LaCava
Protein Crosslinking by Formaldehyde
RED: triplicate experiments, FAl treated grindateBLACK: duplicated experiments, FAl treated cells (then ground)
SCORE: Log Ion Current / Log protein abundance Akgöl, LaCava, Rout
Cross-linkingMass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides.
For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation.
High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides.
Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25Number of fragment ions
Pro
babi
lity
of L
ocal
izat
ion
Phosphopeptide identification
mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Prob
abili
ty o
f Loc
aliz
atio
n
Number of fragment ions
ID3
Localization (dmin=3)
mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation
dmin>=3 for 47% of human tryptic peptides
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Prob
abili
ty o
f Loc
aliz
atio
n
Number of fragment ions
ID32
Localization (dmin=2)
mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation
dmin=2 for 33% of human tryptic peptides
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Prob
abili
ty o
f Loc
aliz
atio
n
Number of fragment ions
ID321
Localization (dmin=1)
mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation
dmin=1 for 20% of human tryptic peptides
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Prob
abili
ty o
f Loc
aliz
atio
n
Number of fragment ions
ID3211*
Localization(d=1*)
mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation
Localization of modifications
Peptide with two possible modification sites
Localization of modifications
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsity
Localization of modifications
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsity
Matching
Localization of modifications
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsity
Matching
Which assignment doesthe data support?
1, 1 or 2, or 1 and 2?
Localization of modifications
AAYYQK
Visualization of evidence for localization
AAYYQK
Visualization of evidence for localization
AAYYQK
AAYYQK
Visualization of evidence for localization
3
2
1
3
2
1
Estimation of global false localization rate using decoy sites
By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency.
0
0.005
0.01
0.015
0.02
0 0.05 0.1 0.15
0
0.005
0.01
0.015
0.02
0 0.05 0.1 0.15
Amino acid frequency
Fals
e lo
caliz
atio
n fr
eque
ncy
Y
S21
Sm1
How much can we trust a single localization assignment?
If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.
SS mm21
0
2
1
21
2
0
2
1
21
2
2
1
1
dSSFdSSFp
S m
)(
)(
1.
2.
Is it a mixture or not?If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.
S12
Sm2
SS mm21
0
12
12
1
0
12
12
11
2)(
)(2
dSSF
dSSFp
Sm
1.
2.
ppppthth
and1
2
2
1 1 and 2 pppp
ththand
1
2
2
1 1 pppp
ththand
1
2
2
1
ppppthth
and1
2
2
1 1 or 2Ø )( ppSS mm
1
2
2
121
Peptide with two possible modification sites
MS/MS spectrum
m/zIn
tens
ity
Matching
Which assignment doesthe data support?
1, 1 or 2, or 1 and 2?
Localization of modifications
Proteomics Informatics – Protein characterization: post-translational
modifications and protein-protein interactions (Week 10)