DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics MetabolitesMetabolomics

Post on 25-Feb-2016

90 views 2 download

Tags:

description

DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics MetabolitesMetabolomics. The Central Dogma-omics. Protein Machines. The polyAdenylation Machinery. The Proteosome. Key Concept: Biochemical functions are carried out by multi-protein machines. - PowerPoint PPT Presentation

Transcript of DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics MetabolitesMetabolomics

DNA Genomics

RNA Genomics/Transcriptomics

Protein Proteomics

Metabolites Metabolomics

The Central Dogma-omics

Protein Machines

Key Concept: Biochemical functions are carried out by multi-protein machines

The polyAdenylation MachineryThe Proteosome

Key Concept: A Protein Function can be inferred by it’s binding partnersKey Concept: Knowledge of a Machine’s components is required to understand how it works and how it is regulated

Key Concept: Highly Clustered areas typically serve the same biological function.

Protein Machines Interaction with each other inHigher order Networks

Key Concept: Complex phenotypes can be understood in a network context

Understanding the Network May Give Insights intoEmergent Behaviors

-Homeostasis

-Robustness

-Periodicity

-Morphogenesis

-Tumorigenesis

Proteins Are Organized in a “Small World” Network

Key Concept: The proteome is HIGHLY Networked

The Small World Hypothesis: Six Degrees of Separation

Stanley Milgram study in 1967

-put ads in newspapers in Nebraskaand Kansas asking for volunteers for anexperiment. The volunteers were asked to contact a divinity student inBoston by going through people thatthey new on a first name basis who would then contact their friends andso on.

-the number of people (degrees) be-tween the volunteers and the targetranged between 2 and 10 with themean being 6.

Properties of Small World Networks:

-highly clustered: “my friends arealso friends”

-most nodes are not connected: “mostpeople are strangers”

-presence of hubs (nodes with a lot ofconnections): “Facebook Whales”

-can find a short path between any twonodes. “Two strangers meet and realize they know some of the sameperson” This path is often referred toas the degree of separation

-network should be resistant to pertub-ation: “Life goes on”

Clustered vs Non-Clustered

Number of Links (k)

Num

ber o

f nod

es w

ith k

link

s

Distribution of Connections

80/20 Law

Ten best Centers 1. CLU1 1.8432. CDC33 1.8673. TIF2 1.8754. MDH1 1.8985. SRP1 1.9126. YBL004W 1.9147. RPT3 1.914

8. HAS1 1.9149. YGR090W 1.91710. PFK1 1.918

Ten Worst Centers CAC2 3.803PSR1 3.838RAM2 3.840RAM1 3.840ORC2 3.863UBA3 3.902MAK10 3.975YNL056W 4.003YNR046W 4.089VPS4 4.433

Median Degree of Separation : 2.38

Shortest and longest Pathways

Is S. cerevisae Robust???-Environmentally Robust

-Robust to temperature (4-40 C)-Robust to Nutrient Sources-Robust to Starvation-Robust to Osmolarity (0-1 M NaCl)

-Is it Robust to Genetic Perturbation (mutation)???-S. cerevisiae Genome Deletion Project has deleted 95%

of all S. cereviae genes-18.7% of genes are essential

-in a typical small world network you can lose ~20% of all nodes before the network crashes.

Ten best Centers 1. CLU1 1.8432. CDC33 1.8673. TIF2 1.8754. MDH1 1.8985. SRP1 1.9126. YBL004W 1.9147. RPT3 1.914

8. HAS1 1.9149. YGR090W 1.91710. PFK1 1.918

Ten Worst Centers CAC2 3.803PSR1 3.838RAM2 3.840RAM1 3.840ORC2 3.863UBA3 3.902MAK10 3.975YNL056W 4.003YNR046W 4.089VPS4 4.433

Median Degree of Separation : 2.38

Is there any biology behind the network hypothesis?

Essential ORF deletions are only available as heterozygous diploids, while non-essential ORF deletions are available as haploids, homozygous diploids and heterozygous diploids.

Ten best Centers 1. CLU1 1.8432. CDC33 1.8673. TIF2 1.8754. MDH1 1.8985. SRP1 1.9126. YBL004W 1.9147. RPT3 1.914

8. HAS1 1.9149. YGR090W 1.91710. PFK1 1.918

Ten Worst Centers CAC2 3.803PSR1 3.838RAM2 3.840RAM1 3.840ORC2 3.863UBA3 3.902MAK10 3.975YNL056W 4.003YNR046W 4.089VPS4 4.433

Median Degree of Separation #: 2.38

Is there any biology behind the network hypothesis?

Key Concept: Connectivity and essentiality are correlated.

Essential ORF deletions are only available as heterozygous diploids, while non-essential ORF deletions are available as haploids, homozygous diploids and heterozygous diploids.

Evolutionary Effects of Connectedness

-Connected genes are non randomly distributed in the genome-Connected genes are less likely to undergo duplication -Connected genes are less likely to have close homologs -Connected genes are less likely to have introns

Evolutionary Effects of Connectedness

Is S. cerevisae Robust???-Environmentally Robust

-Robust to temperature (4-40 C)-Robust to Nutrient Sources-Robust to Starvation-Robust to Osmolarity (0-1 M NaCl)

-Is it Robust to Genetic Perturbation (mutation)???-S. cerevisiae Genome Deletion Project has deleted

95% of all S. cereviae genes-18.7% of genes are essential Is Cancer a Robust Network

-Environmentally Robust-It Lives under a constant state of genomic stress

Summary

-Proteins are organized in functional units (machines)-these machines do virtually all the work in the cell-understanding the components of a machine is critical for functionally annotating the genome-understanding the components of a machine is critical for determining how a machine is regulated-the effects of mutation are great at this level

-Protein Machines are organized into higher order Networks-the Network architecture has left its imprint on evolution-the Network is likely to be rewired under pathological pathological conditions

-especially in the case of cancer-understanding the Network is important for understanding the complex behavior of the system

Key Concept: High Throughput mapping of protein:protein interactions will provide important insights into human biology

Understanding the Network Requires a lot of Information

-Direction of Information

-Sign

-Magnitude

-Timing

Understanding the Network Requires a lot of Information

-Direction of Information

-Sign

-Magnitude

-Timing

Understanding the Network Requires a lot of Information

-Direction of Information

-Sign

-Magnitude

-Timing

Understanding the Network Requires a lot of Information

-Direction of Information

-Sign

-Magnitude

-Timing

Understanding the Network Requires a lot of Information

-Direction of Information

-Sign

-Magnitude

-Timing

Approaches for Mapping Protein:Protein Interactions-Mapping by Inference:

-if two proteins interact in one organism than they interact in other organisms.

-can be extended to domains/motifs as well-if two proteins are coregulated on microarrays they are likely

to interact-Direct Mapping:

-In vitro binding experiment-Genetic Screen/Trap

-Yeast 2-hybrid assay-Affinity Co-purifications

-IP:Western blot-IP:Mass Spectrometry

Interactomics by Genetic Screens

Key Concept: Genetic Complementation allows the identification of direct (binary) interactions.

Uetz et al 2001

Interactomics by Genetic Screens

Key Concept: No matter how good something is…there are always problems.

Advantages of Genetic Complementation:-can do genome scale screening-quick-cheap-adaptable-works best when the screen is based on selection

Problems of Genetic Complementation:-sensitive to dynamic range-protein interaction may be incompatible with the complementation scheme-can not perturb the system-more false positives than true positives

Affinity Governs the formation of Protein Complexes

Affinity is Determined by the shapesof the proteins and how well they fittogether.

-hydrophobic interactions-ionic interactions-hydrogen bonding

Affinity is usually expressed as Kd which is the [ ] that results in equivalent [ ] and [ ]. Implicitly,there is usually a mixture of freeand complexed components and thisratio is [] dependent.

+

Kd = [ ] x [ ][ ]

Affinity Governs the formation of Protein Complexes

+

A weak interaction may only form if theconcentration is high enough.

+

Interactomics by Co-purification

Key Concept: Interacting proteins will co-purify

Tap Tagging: Rigaut et al. Nat. Biotech. 1999.

Interactomics by Co-purification

Advantages of Co-purification:-proteins isolated from their native source-the system can be perturbed

Problems of Co-purification:-sensitive to dynamic range-real interactions may be lost during purification-can be difficult to purify the target protein-no “amplification”

-need a way to identify the co-purifying proteins

antibody bead

IP

antibody bead

trypsin digest direct

antibody bead

IP

Key Concept: Cutting out steps is one of the hallmarks of high through put approaches. This increases the through put and usually also increases the sensitivity.

antibody bead

trypsin digest directly from beads

antibody bead

IP

Affinity Purification - coIP

012345678

#M

atch

ed P

eptid

es

Heavy ChainLight Chain

Native Antibody is Resistant to Trypsin

01020304050

Additive

#M

atch

ed P

eptid

es

Heavy ChainLight Chain

Reduced/Denatured Antibody is Sensitive to Trypsin

antibody bead

IP

Key Concept: Complex mixtures can not be manually interpreted. The average protein generates ~50 proteolytic fragments…..so you will have

1000s and 1000s to interpret.

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

Sources of “Non-Specific Binding”

-Not enough washing. -Biofluids have a high dynamic range so

you must wash away the super abundant stuff to see the less concentrated proteins

-Proteins that stick to the beads

-Proteins that stick to the antibodies on the beads

-Proteins that stick to the wall of the tube

-Proteins that stick to your complex of interest

-Proteins that are real binders but are biologically irrelevant

“Nonspecific Binding” is Reproducible

Are All Protein Complexes Biologically Relevant?

An interaction will be selected forif it is beneficial.

An interaction will be selected againstif it is detrimental.

What happens if the interaction is neither beneficial nor detrimental?

What would be the cost of allowing onlybeneficial interactions?

+

RT: 0.00 - 89.99

0 10 20 30 40 50 60 70 80Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

34.84

49.29

35.8827.93 39.65

27.75

49.8840.00 69.7818.48

54.7625.49

69.3713.5855.30 70.21

7.73 55.81

60.68 71.336.97 89.20

NL:2.16E7Base Peak F: MS 082405OFRep78

082405OFRep78 #6779 RT: 36.02 AV: 1 NL: 7.56E6T: ITMS + p ESI Full ms [ 400.00-1700.00]

400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

652.45

943.27

627.82 978.09504.73 696.09 787.00 1414.451154.45986.45 1282.55 1515.45 1637.09

082405OFRep78 #6781 RT: 36.03 AV: 1 NL: 8.81E4T: ITMS + c ESI d Full ms2 942.58@25.00 [ 245.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

924.57

545.38

767.43

527.33638.36

881.60

397.25

329.08

1313.061105.82 1474.14

Protein ID by Mass Spectrometry

MultiDimensional Chromatography (MuDPIT)

RT: 0.00 - 90.00

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

33.49 48.9535.14

38.5231.76

58.64

41.58

59.16

30.1029.59

59.51

26.3960.1253.42

62.2525.6774.2462.7125.16

77.1018.8811.40 81.7410.13

NL:1.96E7Base Peak F: MS 082905MPMINSULINSN

RT: 0.00 - 90.03

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

11.22

10.60

11.45

11.96

44.8312.4536.9935.53

7.32 55.5835.25 50.8833.746.25

61.16

31.64 62.06 73.9429.6525.51 75.04 87.30

NL:1.05E7Base Peak F: MS 090605OF293Tctl

RT: 0.00 - 90.01

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

54.06

53.68

54.35

11.37 44.8043.54

10.9648.72

36.15 57.5633.2310.57 12.04

5.57 64.7431.3029.87 74.7526.82 75.82

NL:2.64E7Base Peak F: MS 090705OFrep78

RT: 0.00 - 90.05

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

49.28

58.71

38.52

31.77

44.49

59.4030.94

59.62

60.20

81.3526.58

60.8955.32

61.61 80.7525.19 81.6568.34 71.94 83.9918.258.54

NL:2.19E7Base Peak F: MS 082705MPMcntrlsn

RT: 0.00 - 89.99

0 10 20 30 40 50 60 70 80Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

34.84

49.29

35.8827.93 39.65

27.75

49.8840.00 69.7818.48

54.7625.49

69.3713.5855.30 70.21

7.73 55.81

60.68 71.336.97 89.20

NL:2.16E7Base Peak F: MS 082405OFRep78

RT: 0.00 - 90.02

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

33.82

70.0530.92 38.44

52.7630.42

43.34

28.2452.57

53.5745.0169.57

70.77

25.6124.71 59.53

70.9960.46

23.8271.4813.77 61.2111.9673.97 81.47

NL:9.56E6Base Peak F: MS 082405OF293Tctl

10-100 Proteins(6 hours)

100-300 Proteins(2 hours)

1000-6000 Proteins(10 hours)

Comparison of Three Analysis Techniqueson Lysates

RT: 0.00 - 90.00

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

33.49 48.9535.14

38.5231.76

58.64

41.58

59.16

30.1029.59

59.51

26.3960.1253.42

62.2525.6774.2462.7125.16

77.1018.8811.40 81.7410.13

NL:1.96E7Base Peak F: MS 082905MPMINSULINSN

RT: 0.00 - 90.03

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

11.22

10.60

11.45

11.96

44.8312.4536.9935.53

7.32 55.5835.25 50.8833.746.25

61.16

31.64 62.06 73.9429.6525.51 75.04 87.30

NL:1.05E7Base Peak F: MS 090605OF293Tctl

RT: 0.00 - 90.01

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

54.06

53.68

54.35

11.37 44.8043.54

10.9648.72

36.15 57.5633.2310.57 12.04

5.57 64.7431.3029.87 74.7526.82 75.82

NL:2.64E7Base Peak F: MS 090705OFrep78

RT: 0.00 - 90.05

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

49.28

58.71

38.52

31.77

44.49

59.4030.94

59.62

60.20

81.3526.58

60.8955.32

61.61 80.7525.19 81.6568.34 71.94 83.9918.258.54

NL:2.19E7Base Peak F: MS 082705MPMcntrlsn

RT: 0.00 - 89.99

0 10 20 30 40 50 60 70 80Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

34.84

49.29

35.8827.93 39.65

27.75

49.8840.00 69.7818.48

54.7625.49

69.3713.5855.30 70.21

7.73 55.81

60.68 71.336.97 89.20

NL:2.16E7Base Peak F: MS 082405OFRep78

RT: 0.00 - 90.02

0 10 20 30 40 50 60 70 80 90Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

33.82

70.0530.92 38.44

52.7630.42

43.34

28.2452.57

53.5745.0169.57

70.77

25.6124.71 59.53

70.9960.46

23.8271.4813.77 61.2111.9673.97 81.47

NL:9.56E6Base Peak F: MS 082405OF293Tctl

53 Proteins(6 hours)

76 Proteins(2 hours)

82 Proteins(10 hours)

Comparison of Three Analysis Techniqueson IPs

RT: 0.00 - 89.99

0 10 20 30 40 50 60 70 80Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e Ab

unda

nce

34.84

49.29

35.8827.93 39.65

27.75

49.8840.00 69.7818.48

54.7625.49

69.3713.5855.30 70.21

7.73 55.81

60.68 71.336.97 89.20

NL:2.16E7Base Peak F: MS 082405OFRep78

082405OFRep78 #6779 RT: 36.02 AV: 1 NL: 7.56E6T: ITMS + p ESI Full ms [ 400.00-1700.00]

400 600 800 1000 1200 1400 1600m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

652.45

943.27

627.82 978.09504.73 696.09 787.00 1414.451154.45986.45 1282.55 1515.45 1637.09

082405OFRep78 #6781 RT: 36.03 AV: 1 NL: 8.81E4T: ITMS + c ESI d Full ms2 942.58@25.00 [ 245.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

924.57

545.38

767.43

527.33638.36

881.60

397.25

329.08

1313.061105.82 1474.14

Protein ID by Mass Spectrometry

~10,000 MS/MS per hour

Key Concept: LC-MS/MS workflows can not be manually interpreted

acquiredspectrum

theoreticalspectrum(y/b ions)

100%

0%

1

0

x

Spectra matched

matchedpeaks

(y/b ions)

100%

0%

*

0

/n

i i

i

y bScore I P

spectrumintensities

predicted?(1,0)

Compute a Correlation Score

*

0

/n

i i

i

y bScore I P

spectrumintensities

predicted?(1,0)

The Truth about Spectral Matching

-Spectral matching produces an “answer” for every spectra, even those that are artifacts.-Experimental spectra always deviate from theoretical spectra.-A high correlation score is not a guarantee that it is correct.-Peptide must be in the database in order to be found.

Peptide ID by Mass Spectrometry

Peptide IDs can be clustered into Protein IDs

Mapping Peptides to Proteins is NOT easy!

Single Proteins to Protein Lists

Single Proteins to Protein Lists

How do you know which matches to trust???

*

0

/n

i i

i

y bScore I P

spectrumintensities

predicted?(1,0)

The Truth about Spectral Matching

-Spectral matching produces an “answer” for every spectra, even those that are clearly artifacts.-Experimental spectra always deviate from theoretical spectra.-A high correlation score is not a guarantee that it is correct.-Peptide must be in the database in order to be found.

-An e value is easily calculated using the ~11,000 “incorrect” peptides.-The false discovery rate is easily calculated using a “decoy” database

Key Concept: Statistics are required for the proper interpretation of MS/MS data.

0

10

20

30

40

50

60

0 20 40 60 80 100

hyperscore

# re

sults

“incorrect”IDs

Histogram of Correlation Scores

Highest scoring “match” is assumed to be “correct”

0

0.5

1

1.5

2

2.5

3

3.5

4

20 25 30 35 40 45 50

0

10

20

30

40

50

60

0 20 40 60 80 100

hyperscore

# re

sults

log(

# re

sults

)

significant

Significant scores

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100

0

10

20

30

40

50

60

0 20 40 60 80 100

hyperscore

# re

sults

log(

# re

sults

)

E-value=e-8.2

Estimating E-values

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100

hyperscorelo

g(#

resu

lts)

E-value=10-8.2

Interpreting E-values

E-value is the number of matches you “expect” to find at random, given thesearch parameters. Or the chance of getting a match this good from this spectrum by random chance. So this would be a chance of 10-8.2 or 1 in ~150,000,000.

-10

-8

-6

-4

-2

0

2

4

6

0 20 40 60 80 100

hyperscorelo

g(#

resu

lts)

E-value=10-8.2

Interpreting E-values

v

E-value=10-3.9

Changing the search parameters will change the statistics. Allowing many post-translational modifications and/or amino acidsubstitutions, using a very large database, allowing large mass errors,etc.

Can use a “decoy” or false database to verify the statisticalmodels being used.

Use a Decoy Database to Determine False Discovery Rate

A good decoy database:-does not contain any “correct” hits-is the same size as the Query database-has the same distribution of amino acids-has the same size distribution of proteolytic fragments (peptides)-can be reproduced by other labs

Reversed databases solve most of these constraints-so the “protein” RSAMPLER digested with trypsin gives:

-SAMPLER (forward)-RELPMASR ELPMASR (reverse)

-using the reverse typically only gives problems with palindromic sequences….which thankfully are rare*.

*except in viruses!

Use a Decoy Database to Determine False Discovery Rate

By definition: Everything from the Reversed database is incorrect!

Use a Decoy Database to Determine Global False Discovery Rates

< 0.5%

< 1%

< 1.5%

< 2.0%

< 2.3%

< 2.6%

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

“Forward”

“Reverse”

Num

ber o

f spe

ctra

in e

ach

bin

Calculating a “local” False Discovery Rate

0 -1 -1.5 -3.0 -4.0

Predicting phosphorylation sites

(low confidence)

(high confidence)

Incorrect n = 51

S118 n = 6S153 n =

54

Using FDR to choose the “best” Database

Amino Acid Substitutions can be modeled using Statistics

Allowing for Amino Acid Substitutions

117 vs 178 proteins

So now what do I do?

Interpreting Protein Lists

Comparison based on Gene Ontology (GO)

Control Experiment

Comparison based on Gene Ontology (GO)

Control Experiment

Interpreting Protein Lists

Functional Clustering of Protein Lists

Domain Enrichment

Enrichment of Components from known Pathways

Build Networks

http://string-db.org/

Other Resources

DAVID: http://david.abcc.ncifcrf.gov/GO based enrichment. Clustering of redundant GO terms.Mapping onto KEGG pathways. Mapping onto disease pathways, etc.

Biogrid: thebiogrid.orgAn Online Interaction Respository With Data Compiled Through Comprehensive Curation Efforts.

MIPS: http://mips.helmholtz-muenchen.de/proj/ppi/Manually curated protein protein interaction database

Summary

Proteomics:-large scale analysis of proteins (really peptides)-statistical analysis is required for interpretation

-can be used to address a wide range of biological problems

-best used to answer discrete questions-things that can not be answered by genomic

techniques-protein complexes-protein modification-other post translational events

-change is subcellular localization-question will help determine which hits are chosen for

validation

Feel free to email me questions: myers@icgeb.org

Ubiquitin:• Short protein that is

covalently attached to other proteins

• 7 lysine residues• all can form poly-Ub

chains• K48 chains involved in

proteosmal degradation• K63 chains involved in

signaling• K11 chains ????• K6 chains (DNA damage)• K29 chains ????

K6-Ub Pulldown

HA

?

K6-Ub

K6-Ub

K6-Ub

K6-UbHA

HA

HA??

?

• Tagged ubiquitin with only one available lysine.

• Pull down K6-linked poly-Ub chain.

• Identify proteins.

K6 chains are assembled by BRCA1

K6-Ubiquitin Pulldown

K6 Ub-IP

On Bead Digest

Data Analysis

Top Candidate

WHIP1

LC-MS/MS

K6 Ub-IP

AAA+ATPaseRFC

ZFRad18

Hit Criteria• Not in the control sample• Not a commonly known contaminant• Good score (more than one peptide) -Expressed as an False Discovery

Rate• Seen in repeat experiments

4959

40 79

Potential Hits Ub and Ub-binding proteins

Excluded:• heat shock• hnRNP• ribosomal• keratin• histones

Proteins also found in the control IP

Results from a representative non-denaturing K6-ubiquitin IP

Ubiquitin-binding proteins

“potential hits”

good scores high confidence

poor scores low confidence

2022 233

Overlap Between K6 and K63 Pulldowns

UB-K6 UB-K63

Werner’s helicase interacting protein 1 (WHIP1)

good scores high confidence

poor scores low confidence

Rad18-like Zn+ finger

AAA+ ATPase

WHIP domain architecture

Does not contain any recognizable Ubiquitin binding domain

Why does WHIP co-IP with ubiquitin?

• WHIP is ubiquitinated

• WHIP is a ubiquitin-binding protein

WHIPUb

Ub

Ub

Ub

WHIP

Ub

Ub

Ub

Covalent bond

Non-covalent interaction

Ub6

Ub4

Ub5

Ub3

Ub2

Ub1

Ub6

Ub5

Ub4

Ub3

Ub2

Ub1

I.P from bacterial lysate with anti-FLAG beads (Sigma)W.B. anti-ubiquitin (6C1) 1:1000, secondary = anti-mouse TrueBlot, exposure = 10 sec.

mono Ub K48 K63 mono

Ubmono

UbK48 K63 K48 K63

input FLAG-BAP FLAG-WHIP

Co-IP of WHIP with various poly-Ub chains in vivo

IP = α-HA (ubiquitin) WB = α-FLAG (whip)IPs from doubly-transfected 293 cells

FLAG-Whip + - + + + + + + HA-Ub - + K6 K11 K29 K48 K63 -

250 kD

100

75

50

37

150

α-FLAG IP

WHIP is Ubiquitinated

250150100

75

WHIP-FLAG-MAT - +

Ni-NTA pulldown in 8 M urea from 293T cells, W.B. = anti-FLAG (M2) 1:5000

WHIP Ubiquitinylation• Mass spectrometry

PEPTIDE (aa) SEQUENCE MODIFICATIO

N E-VALUE

254-274 SLLETNEIPSLILWGPPGCGK 274K(114.1) 6.8e-007

292-310 FVTLSATNAKTNDVRDVIK 301K(114.1) 3.7e-004

292-306 FVTLSATNAKTNDVR 301K(114.1) 1.4e-004

302-316 TNDVRDVIKQAQNEK 310K(114.1) 3.7e-005

311-321 QAQNEKSFFKR 316K(114.1) 2.6e-003

322-332 KTILFIDEIHR 322K(114.1) 7.0e-007

333-346 FNKSQQVNAALLSR 335K(114.1) 5.5e-012

449-462 VLITENDVKEGLQR 457K(114.1) 2.7e-010

Ubyquitinylated residue

SUMOylated residue

Why does WHIP co-IP with ubiquitin?

• WHIP is ubiquitinated

• WHIP is a ubiquitin-binding protein

WHIPUb

Ub

Ub

Ub

WHIP

Ub

Ub

Ub

Covalent bond

Non-covalent interaction

Rad18-like Zn+ finger

AAA+ ATPase

WHIP domain architecture

Does not contain any recognizable Ubiquitin binding domain

WHIP’s Zinc finger domain is necessary for ubiquitin binding

mono-Ub

in vivo

- WT D37A T294A

lysate, blot = anti-FLAG

lysate, blot = anti-actin

IP = anti-FLAG

blot = anti-Ub

200

10075

50

33

25

15

in vitro

10

15

20

25

37

50Ub7

Ub6

Ub5

Ub4

Ub2

Ub3

WHIP UBZ

RAD18 UBA

BeadsInput

Rad18_ZF = UBZ ubiquitin binding domain

UBZ Domain-Containing Proteins

Summary of UBZ Domain BindingDomain monoUb Ub K48 Ub K63 SUMOWHIP - + + -Rad18 - + + -PolK - + + -Pol H ? ? ? ?UBZ1 - + + -MTMR15 - - - -

Why does WHIP co-IP with ubiquitin?

• WHIP is ubiquitinated

• WHIP is a ubiquitin-binding protein

WHIPUb

Ub

Ub

Ub

WHIP

Ub

Ub

Ub

Covalent bond

Non-covalent interaction

Rad18-like Zn+ finger

AAA+ ATPase

WHIP domain architecture

Does not contain any recognizable Ubiquitin binding domain

WHIP’s Zinc finger domain is necessary for ubiquitin binding

mono-Ub

in vivo

- WT D37A T294A

lysate, blot = anti-FLAG

lysate, blot = anti-actin

IP = anti-FLAG

blot = anti-Ub

200

10075

50

33

25

15

in vitro

10

15

20

25

37

50Ub7

Ub6

Ub5

Ub4

Ub2

Ub3

WHIP UBZ

RAD18 UBA

BeadsInput

Rad18_ZF = UBZ ubiquitin binding domain

UBZ Domain-Containing Proteins

1511 135

Overlap Between Pulldowns using Different UBZ Domains

UBZ-WHIP UBZ- UBZ1

WHIP-EGFP WHIPD37A-EGFPWHIP-EGFP

EGFP UBZ1-EGFP UBZ1D473A-EGFP

UBZ Domain Regulates SubcellularLocalization

HA-UBZ1 - WT D473A

UBZ Domain Regulates CoupledUbiquitination

- 1 12

WHIP

UBZ-1 and WHIP are differentially Regulated by UV damage

WHIP Ubiquitinylation• Mass spectrometry

PEPTIDE (aa) SEQUENCE MODIFICATIO

N E-VALUE

254-274 SLLETNEIPSLILWGPPGCGK 274K(114.1) 6.8e-007

292-310 FVTLSATNAKTNDVRDVIK 301K(114.1) 3.7e-004

292-306 FVTLSATNAKTNDVR 301K(114.1) 1.4e-004

302-316 TNDVRDVIKQAQNEK 310K(114.1) 3.7e-005

311-321 QAQNEKSFFKR 316K(114.1) 2.6e-003

322-332 KTILFIDEIHR 322K(114.1) 7.0e-007

333-346 FNKSQQVNAALLSR 335K(114.1) 5.5e-012

449-462 VLITENDVKEGLQR 457K(114.1) 2.7e-010

Ubyquitinylated residue

SUMOylated residue

UBZ-1 but not WHIP Interacts with PCNA

Interaction between UBZ1 and PCNA is increasedFollowing UV treatment

31

Overlap Between UBZ1 and WHIP

UBZ1 WHIP

1645

WHIP1

ERCC1ERCC4DDB1 DDB2

Large T dnaJ

RuvBL1

RuvBL2

UBZ1

BLAP75WRN

BLM

Ku70 Ku80

DNA-PK

PCNA

Topo3A RPA1

Ub

Overlap Between UBZ1 and WHIP

Summary- Proteins are part of a highly integrated network

-The UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination, subcellular localization, protein::protein interactions

-Functional UBZ domains are found only in proteins involved in DNA replication and/or repair

-UBZ domains are frequently found in concert with PIP boxes

-The UBZ domain acts in concert with other domains to regulate the formation of Ubiquitin-dependent complexes

-New “UBZ domains” are being found everyday

Protein Networks

Giulia DeSabbataAntonell Piccini

Michael P Myers

Fabio Rossi

Martina Colombin

Acknowledgements:Fabio RossiMartina Colombin

Rebecca BishAntonella PicciniGiulia DeSabbata

Sandor Pongor

Providing Reagents:Bruce StillmanMasashi Narita/Scott LoweTomohiko OhtaToshiki Tsurimoto

HA-UBZ1 - WT D473A

UBZ Domain Regulates CoupledUbiquitination

1511 135

Overlap Between Pulldowns using Different UBZ Domains

UBZ-WHIP UBZ- UBZ1

WHIP-EGFP WHIPD37A-EGFPWHIP-EGFP

EGFP UBZ1-EGFP UBZ1D473A-EGFP

UBZ Domain Regulates SubcellularLocalization

- 1 12

WHIP

UBZ-1 and WHIP are differentially Regulated by UV damage

UBZ-1 but not WHIP Interacts with PCNA

Interaction between UBZ1 and PCNA is increasedFollowing UV treatment

antibody bead

trypsin digest directly from beads

antibody bead

IP

Affinity Purification - coIP

31

Overlap Between UBZ1 and WHIP

UBZ1 WHIP

1645

WHIP1

ERCC1ERCC4DDB1 DDB2

Large T dnaJ

RuvBL1

RuvBL2

UBZ1

BLAP75WRN

BLM

Ku70 Ku80

DNA-PK

PCNA

Topo3A RPA1

Ub

Overlap Between UBZ1 and WHIP

Summary- Proteins are part of a highly integrated network

-The UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination, subcellular localization, protein::protein interactions

-Functional UBZ domains are found only in proteins involved in DNA replication and/or repair

-UBZ domains are frequently found in concert with PIP boxes

-The UBZ domain acts in concert with other domains to regulate the formation of Ubiquitin-dependent complexes

-New “UBZ domains” are being found everyday

Protein Networks

Giulia DeSabbataAntonell Piccini

Michael P Myers

Fabio Rossi

Martina Colombin

Acknowledgements:Fabio RossiMartina Colombin

Rebecca BishAntonella PicciniGiulia DeSabbata

Sandor Pongor

Providing Reagents:Bruce StillmanMasashi Narita/Scott LoweTomohiko OhtaToshiki Tsurimoto

Major Types of ProteomicsSurvey Proteomics:

Qualitative or Quantitative Analysis of the protein component

-whole organism, tissue, cell type, or subcellular compartment

-2D gel electrophoresis ->MS-typically a few 100 proteins

-Multidimensional LC->MS/MS-typically 1000-5000 proteins

Identification of Biomarkers

Interactomics:Mapping Protein:Protein Interactions

-Yeast 2-hybrid techniques-high throughput protein identification by Mass Spectrometry

Mapping Post-Translational Modifications-High Content Mass Spectrometry

Key Concept: Proteomics is the large scale identification of proteins or peptides