Proteomics Informatics –

Post on 24-Feb-2016

54 views 1 download

Tags:

description

Proteomics Informatics – Protein characterization: post-translational modifications and protein-protein interactions  (Week 10). Top down / bottom up. Top down. Bottom up. intensity. mass/charge. Charge distribution. Top down Bottom up. 2+. 27+. 31+. 3+. - PowerPoint PPT Presentation

Transcript of Proteomics Informatics –

Proteomics Informatics – Protein characterization: post-translational

modifications and protein-protein interactions (Week 10)

Top down / bottom up

Top down

Bottom up

mass/charge

inte

nsity

Top down Bottom up

Charge distribution

mass/chargein

tens

itymass/charge

inte

nsity

1+

2+

3+

4+

27+

31+

Top down Bottom upm = 1035 Da m = 1878 Da m = 2234 Da

Isotope distribution

mass/chargein

tens

itymass/charge

inte

nsity

Fragmentation

Top down Bottom up

Fragmentation

Correlations between modifications

Top down

Bottom up

Alternative Splicing

Top down

Bottom up

Exon 1 2 3

Top down

Kellie et al., Molecular BioSystems 2010

Proteinmass

spectraFragment

mass spectra

Protein Complexes

AB

AC

D

Digestion

Mass spectrometry

Sowa et al., Cell 2009

Protein Complexes – specific/non-specific binding

Protein Complexes – specific/non-specific binding

Choi et al., Nature Methods 2010

Tackett et al. JPR 2005

Protein Complexes – specific/non-specific binding

Analysis of Non-Covalent Protein Complexes

Taverner et al., Acc Chem Res 2008

Non-Covalent Protein Complexes

Schreiber et al., Nature 2011

More / better quality interactions

Affinity Capture Optimization Screen

+

Cell extraction

Lysate clearance/Batch Binding

Binding/Washing/Eluting

SDS-PAGE

Filtration

LaCava, Hakhverdyan, Domanski, Rout

Over 20 different extraction and washing conditions ~ 10 years or art.(41 pullouts are shown)

Molecular Architecture of the NPC

Actual model Alber F. et al. Nature (450) 683-694. 2007 Alber F. et al. Nature (450) 695-700. 2007

Cloning nanobodies for GFP pullouts

• Atypical heavy chain-only IgG antibody produced in camelid family – retain high affinity for antigen without light chain

• Aimed to clone individual single-domain VHH antibodies against GFP – only ~15 kDa, can be recombinantly expressed, used as bait for pullouts, etc.

• To identify full repertoire, will identify GFP binders through combination of high-throughput DNA sequencing and mass spectrometry

VHH clone for recombinant expression

Cloning llamabodies for GFP pullouts

Llama GFP immunization

Lymphocytetotal RNA

Crude serum

VHH amplicon

454 DNA sequencing

RT / Nested PCRIgG fractionation &

GFP affinity purification

VHH DNA sequence library

GFP-specific VHH fraction

LC-MS/MS

GFP-specific VHH clones

Bone marrow aspiration Serum bleed

500400300

1000 bp

0

100,000

200,000

300,000

400,000

500,000

No.

of R

eads

Read length (bp)

VH

VHH

Fridy, Li, Keegan, Chait, Rout

CDR3: 100.0% (14/14); combined CDR: 100.0% (33/33); DNA count: 10MAQVQLVESGGGLVQAGGSLRLSCVASGRTFSGYAMGWFRQTPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33; DNA count: 1MADVQLVESGGGLVQSGGSRTLSCAASGRVLATYHLGWFRQSPGREREAVAAITWSAHSTYYSDSVKGRFTISIDNARNTGYLQMNSLKPEDTAVYYCTVRHGTWFTVSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33); DNA count: 1MAQVQLVESGGALVQAGASLSVSCAASGGTISKYNMAWFRRAPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1MAQVQLEESGGGLVQAGDSLTLSCSASGRTFTNYAMAWSRQAPGKERELLAAIDAAGGATYYSDSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1MAQVQLVESGGGRVQAGGSLTLSCVGSEGIFWNHVMGWFRQSPGKDREFVARISKIGGTTNYADSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR1 CDR2 CDR3

Underlined regions are covered by MS

Rank sequences according to:CDR3 coverage; Overall coverage;Combined CDR coverage; DNA counts;

Identifying full-length sequences from peptides

Sequence diversity of 26 verified anti-GFP nanobodies

• Of ~200 positive sequence hits, 44 high confidence clones were synthesized and tested for expression and GFP binding: 26 were confirmed GFP binders.

• Sequences have characteristic conserved VHH residues, but significant diversity in CDR regions.

FR1 CDR1 FR2 CDR2 CDR3FR3 FR4

HIV-1

gp120Lipid Bilayer gp41

MA

CA

NC

PRIN

RT

RNA

Particle

Genome

env

rev

vpu

tat

nef

3’ LTR5’ LTR

vif gagpol

vpr

CAMA NC p6

PR RT IN

gp41gp120

9,200 nucleotides

Genetic-Proteomic Approach

Tagged Viral Protein

Tag

Protein ComplexSDS-PAGE

*

Mass Spectrometry

I-Dirt for Specific Interaction

3xFLAG Tagged HIV-1 WT HIV-1

Infection

Light Heavy (13C labeled Lys, Arg)

1:1 Mix

Immunoisolation

MS

I-DIRT = Isotopic Differentiation of Interactions as Random or Targeted

Lys Arg(+6 daltons)(+6 daltons)

Modified from Tackett AJ et al., J Proteome Res. (2005) 4, 1752-6.

IDIRT and Reverse IDIRT

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.40 0.50 0.60 0.70 0.80 0.90 1.00

Spec

ifici

ty, R

erve

rse

Specificity, Forward

gp160 IDIRT: Forward-Reverse Ratio Comparison

Env-3xFLAG Vif-3xFLAG

Luo, Jacobs, Greco, Cristae, Muesing, Chait, Rout

Protein Exchange

Vif-3F

Heavy labeled Vif-3F lysate

IP in heavy labeled Vif-3F lysate

Vif-3F

Light labeled wt lysate

Incubation with light labeled wt lysate

Vif-3F

15min

Vif-3F

5min

Stable Interactor

Vif-3F

Interactor with fast exchange

60min

Env Time Course SILAC

• Differentially labeled infection harvested at early or late stage of infection

• Distinguish proteins that interact with Env at early or late stage during infection

Early during infection Late during infection

LightHeavy (13C labeled Lys, Arg)

1:1 Mix

Immunoisolation

MS

Early interactor Late interactor

M/Z

PeptidesFragments

Fragmentation

ProteolyticPeptides

Enzymatic Digestion

ProteinComplex

Chemical Cross-Linking

MS

MS/MS

Isolation

Cross-LinkedProtein Complex

Interaction Partners by Chemical Cross-Linking

M/Z

PeptidesFragments

Fragmentation

ProteolyticPeptides

Enzymatic Digestion

ProteinComplex

Chemical Cross-Linking

MS

MS/MS

Isolation

Cross-LinkedProtein Complex

Interaction Sites by Chemical Cross-Linking

Cross-linking

protein

n peptides with reactive groups

(n-1)n/2 potential ways to cross-link peptides pairwise+ many additional uninformative forms

Protein A + IgG heavy chain 990 possible peptide pairs

Yeast NPC ˜106 possible peptide pairs

Protein Crosslinking by Formaldehyde

~1% w/v Fal20 – 60 min

~0.3% w/v Fal5 – 20 min1/100 the volume

LaCava

Protein Crosslinking by Formaldehyde

RED: triplicate experiments, FAl treated grindateBLACK: duplicated experiments, FAl treated cells (then ground)

SCORE: Log Ion Current / Log protein abundance Akgöl, LaCava, Rout

Cross-linkingMass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides.

For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation.

High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides.

Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25Number of fragment ions

Pro

babi

lity

of L

ocal

izat

ion

Phosphopeptide identification

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Prob

abili

ty o

f Loc

aliz

atio

n

Number of fragment ions

ID3

Localization (dmin=3)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

dmin>=3 for 47% of human tryptic peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Prob

abili

ty o

f Loc

aliz

atio

n

Number of fragment ions

ID32

Localization (dmin=2)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

dmin=2 for 33% of human tryptic peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Prob

abili

ty o

f Loc

aliz

atio

n

Number of fragment ions

ID321

Localization (dmin=1)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

dmin=1 for 20% of human tryptic peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Prob

abili

ty o

f Loc

aliz

atio

n

Number of fragment ions

ID3211*

Localization(d=1*)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

Localization of modifications

Peptide with two possible modification sites

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsity

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsity

Matching

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsity

Matching

Which assignment doesthe data support?

1, 1 or 2, or 1 and 2?

Localization of modifications

AAYYQK

Visualization of evidence for localization

AAYYQK

Visualization of evidence for localization

AAYYQK

AAYYQK

Visualization of evidence for localization

3

2

1

3

2

1

Estimation of global false localization rate using decoy sites

By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency.

0

0.005

0.01

0.015

0.02

0 0.05 0.1 0.15

0

0.005

0.01

0.015

0.02

0 0.05 0.1 0.15

Amino acid frequency

Fals

e lo

caliz

atio

n fr

eque

ncy

Y

S21

Sm1

How much can we trust a single localization assignment?

If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

SS mm21

0

2

1

21

2

0

2

1

21

2

2

1

1

dSSFdSSFp

S m

)(

)(

1.

2.

Is it a mixture or not?If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

S12

Sm2

SS mm21

0

12

12

1

0

12

12

11

2)(

)(2

dSSF

dSSFp

Sm

1.

2.

ppppthth

and1

2

2

1 1 and 2 pppp

ththand

1

2

2

1 1 pppp

ththand

1

2

2

1

ppppthth

and1

2

2

1 1 or 2Ø )( ppSS mm

1

2

2

121

Peptide with two possible modification sites

MS/MS spectrum

m/zIn

tens

ity

Matching

Which assignment doesthe data support?

1, 1 or 2, or 1 and 2?

Localization of modifications

Proteomics Informatics – Protein characterization: post-translational

modifications and protein-protein interactions (Week 10)