Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global...

30
Goals in Proteomics entify and quantify proteins in complex mixtures/complexes entify global protein-protein interactions fine protein localizations within cells Measure and characterize post-translational modifications asure and characterize activity (e.g. substrate specificity, 1

Transcript of Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global...

Page 1: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Goals in Proteomics

1. Identify and quantify proteins in complex mixtures/complexes

2. Identify global protein-protein interactions

3. Define protein localizations within cells

4. Measure and characterize post-translational modifications

• Measure and characterize activity (e.g. substrate specificity, etc)

1

Page 2: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Goals in Proteomics

1. Identify and quantify proteins in complex mixtures/complexesMS and MS/MS

2. Identify global protein-protein interactionsMS and MS/MS, Y2H

3. Define protein localizations within cellsHigh-throughput microscopy, organelle pull down

4. Measure and characterize post-translational modificationsMS techniques

1. Measure and characterize activity (e.g. substrate specificity, etc)Protein arrays

2

Page 3: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Coon et al. 2005

Basic overview of Tandem mass-spectrometry (MS/MS)

3

Page 4: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Mass Spec MS Spectrum

Ion source Mass analyzer Detector

Intro to Mass Spec (MS)Separate and identify peptide fragments by their Mass and Charge (m/z ratio)

Basic principles:1. Ionize (i.e. charge) peptide fragments2. Separate ions by mass/charge (m/z) ratio3. Detect ions of different m/z ratio4. Compare to database of predicted m/z fragments for each genome

4

Page 5: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Intro to Mass Spec (MS)Separate and identify peptide fragments by their Mass and Charge (m/z ratio)

1. IonizationGoal: ionize (i.e. charge) peptide fragments without destroying molecule

http://www.colorado.edu/chemistry/chem5181/MS_ESI_Gilman_Mashburn.pdf

Positive ionization (protonate amine groups) especially useful for trypsinized proteins (cleaved after R and K)

vs. Negative ionization (deprotonate carboxylics and alcohols)

5

Page 6: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Liquid chromatography + Electrospray ionization

electric field

* Commonly used with liquid solutions, more sensitive to contaminants, used for complex mixtures

6

Page 7: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Liquid chromatography + Electrospray ionization

MALDI

http://www.astbury.leeds.ac.uk/facil/MStut/mstutorial.htm

electric field

* Commonly used with liquid solutions, more sensitive to contaminants, used for complex mixtures

* Less sensitive to contaminants, more common for less complex mixtures 7

Page 8: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Intro to Mass Spec (MS)Separate and identify peptide fragments by their Mass and Charge (m/z ratio)

2. Separation of ions based on m/z ratio (mass m versus charge c)

Multiple flavors of mass analyzers use different technology

* TOF (‘time of flight’): separates based on velocity

* Triple quadrupole: separation based on pulsed electrical pulse

8

Page 9: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Multiple flavors of mass analyzers

Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted m/z of trypsinized proteins

Tandem MS/MS (peptide sequencing): Pulls each peptide from the first MS Breaks up peptide bond Identifies each fragment based on m/z

Collision cell

9

Page 10: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Multiple flavors of mass analyzers … can be hooked together in multiple configs.

10

g. Orbitrap

Page 11: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Multiple flavors of mass analyzers

Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted m/z of trypsinized proteins

Tandem MS/MS (peptide sequencing): Pulls each peptide from the first MS Breaks up peptide bond Identifies each fragment based on m/z

Collision cell

11

Now multiple types of collision cells:CID: collision induced dissociationETD: electron transfer dissociationHCD: high-energy collision dissociation

Page 12: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Peptide can fragment along 3 possible bonds … charge stays on either the ‘left’ (a,b, or c) or ‘right’ (x, y, or z) side of cleavagee

Fragmentation happens in fairly defined way along peptide backbone

Cleavage along the CO-NH bond is most common, generating ‘b’ and ‘y’ ions12

Page 13: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

MS spectrum (i.e. peptide ions)

Mann Nat Reviews MBC. 5:699:711

Each peak is a different peptide, separated based on m/zA single peptide is selected by the instrument for the second MS

Each peak often surroundedby smaller peaks of similar m/z

Sensitivity of instrument determines resolution

13

Page 14: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Second MS identifies y (or b) ions to read out amino-acid sequence

Mann Nat Reviews MBC. 5:699:71114

Page 15: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Trypsin often used to digest proteins (cleaves after Arg and Lys)WHY?

Because of challenges distinguishing spectra, simplified mixturesare typically injected into the MS:

-either excised proteins-purified complexes

-fractionated pools of complex mixtures

15

Page 16: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

The first dimension (separation by isoelectric focusing)- gel with an immobilised pH gradient- electric current causes charged proteins to move until it reaches the isoelectric point (pH gradient makes the net charge 0)

The second dimension (separation by mass)

-pH gel strip is loaded onto a SDS gel-SDS denatures the protein (to make movement solely dependent on mass, not shape) and eliminates charge.

2 dimensional gel separation2 dimensional gel separation(largely outdated)(largely outdated)

Ahna Skop

16

Page 17: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

2D-SDS PAGE gel2D-SDS PAGE gel

Ahna Skop

17

Page 18: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

TAP-tag: Tandem Affinity Purification(for IP’ing individual proteins and proteins

bound to them)

18

Page 19: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Ion exchange chromatographyIon exchange chromatography

Anion exchange:Column is postively charged (canbind negativey charged proteins).

Cation exchange:Column is negativey charged (canbind positively charged proteins).

Exploit the isoelectric point of a protein toSeparate it from other macromolecules.

Ahna Skop

19

Page 20: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Size exclusion chromatographySize exclusion chromatography

Porous beads made of different butcontrolled sizes.

Smaller proteins go in and out of beads andwill be retained in the resin.

Large proteins will only go into large beadsand will be retained less.

Very large proteins will not go into any ofthe beads (exclusion limit).

Can be used as a preparative method or todetermine the molecular weight of aprotein in solution.

Ahna Skop

20

Page 21: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

A ligand with high affinity to the proteinis attached to a matrix.

Protein of interest binds to ligand And is retained by resin. Everything elseflows through.

Can use excess of the soluble ligandto elute the protein.

Affinity chromatographyAffinity chromatography

Ahna Skop

21

Page 22: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Mann Nat Reviews MBC. 5:699:71122

How does each spectrum translate to amino acid sequence?

Page 23: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

How does each spectrum translate to amino acid sequence?

1. De novo sequencing: very difficult and not widely used (but being developed)for large-scale datasets

2. Matching observed spectra to a database of theoretical spectra

23

Page 24: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Theoretical spectra:- in silico digestion of a known

protein database

- set of limited set of theoretical spectra based on enzyme, instrument sensitivity, others

- this reduces search space - can miss some peptides

- comparisons based on several different scores (eg. correlation between obs. and theoretical profiles)

Mann Nat Reviews MBC. 5:699:71124

Page 25: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

1. De novo sequencing: very difficult and not widely used (but being developed)for large-scale datasets

2. Matching observed spectra to a database of theoretical spectra

3. Matching observed spectra to a spectral database of previously seen spectra

How does each spectrum translate to amino acid sequence?

25

Page 26: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Nesvizhskii (2010) J. Proteomics, 73:2092-2123.

- spectral matching is supposedly more accurate but …- limited to the number of peptides whose spectra have been observed before

With either approach, observed spectra are processed to:group redundant spectra, remove bad spectra, recognized co-fragmentation, improve z estimates

Many good spectra will not match a known sequence due to:absence of a target in DB, PTM modifies spectrum, constrained DB search,incorrect m or z estimate.

26

Page 27: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Result: peptide-to-spectral match (PSM)

A major problem in proteomics is bad PSM calls … therefore statistical measures are critical

Methods of estimating significance of PSMs:

p- (or E-) value: compare score S of best PSM against distribution ofall S for all spectra to all theoretical peptides

FDR correction methods:1.B&H FDR2.Estimate the null distribution of RANDOM PSMs:

- match all spectra to real (‘target’) DB and to fake (‘decoy) DB- often decoy DB is the same peptides in the library but reverse

sequence

one measure of FDR: 2*(# decoy hits) / (# decoy hits + # target hits)3. Use #2 above to calculate posterior probabilities for EACH PSM

27

Page 28: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

3. Use #2 above to calculate posterior probabilities for EACH PSM

- mixture model approach: take the distribution of ALL scores S- this is a mixture of ‘correct’ PSMs and ‘incorrect’ PSMs

- but we don’t know which are correct or incorrect

- scores from decoy comparison are included, which can providesome idea of the distribution of ‘incorrect’ scores

-EM or Bayesian approaches can then estimate the proportion of correct vs.incorrect PSM … based on each PSM score, a posterior probability is calculated

FDR can be done at the level of PSM identification … but often doneat the level of Protein identification

28

Page 29: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Error in PSM identification can amplify FDR in Protein identification

Often focus on proteins identified by at least 2 different PSMs (or proteins with single PSMs of very high posterior probability)

Nesvizhskii (2010) J. Proteomics, 73:2092-2123.

Some methodscombine PSM FDRto get a protein FDR

29

Page 30: Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Some practical guidelines for analyzing proteomics results

1. Know that abundant proteins are much easier to identify

2. # of peptides per protein is an important consideration- proteins ID’d with >1 peptide are more reliable- proteins ID’d with 1 peptide observed repeatedly are more reliable- note than longer proteins are more likely to have false PSMs

3. Think carefully about the p-value/FDR and know how it was calculated

4. Know that proteomics is no where near saturating … many proteins will be missed

30