Genome Biology and Biotechnology 10. The proteome Prof. M. Zabeau Department of Plant Systems...

61
Genome Biology and Genome Biology and Biotechnology Biotechnology 10. The proteome 10. The proteome Prof. M. Zabeau Prof. M. Zabeau Department of Plant Systems Biology Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology Flanders Interuniversity Institute for Biotechnology (VIB) (VIB) University of Gent University of Gent International course 2005 International course 2005

Transcript of Genome Biology and Biotechnology 10. The proteome Prof. M. Zabeau Department of Plant Systems...

Genome Biology and Genome Biology and BiotechnologyBiotechnology

10. The proteome 10. The proteome

Prof. M. ZabeauProf. M. ZabeauDepartment of Plant Systems Biology Department of Plant Systems Biology

Flanders Interuniversity Institute for Biotechnology (VIB)Flanders Interuniversity Institute for Biotechnology (VIB)University of GentUniversity of Gent

International course 2005International course 2005

SummarySummary

¤ Protein interactome– Yeast two-hybrid protein interaction mapping

¤ Proteome– Isolation of protein complexes

¤ Multilevel functional genomics– Combination of

• phenome analysis • protein interaction mapping

Functional Functional MapsMaps

or “-omes”or “-omes”

proteins

ORFeome

Localizome

Phenome

Transcriptome

Interactome

Proteome

Genes or proteins

Genes

Mutational phenotypes

Expression profiles

Protein interactions

1 2 3 4 5 n

DNA Interactome Protein-DNA interactions

“Conditions”

After: Vidal M., Cell, 104, 333 (2001)

Cellular, tissue location

Basic Concept of the Yeast Two-hybrid Basic Concept of the Yeast Two-hybrid SystemSystem

¤ Eukaryotic transcription factors– activate RNA polymerase II at promoters by binding to

upstream activating DNA sequences (UAS)

¤ Basic structure of eukaryotic transcription factors – The DNA binding and the activating functions are located in

physically separable domains• The DNA-binding domain (DB) • The activation domain (AD)

– The connection between DB and AD is structurally flexible

¤ Protein-protein interactions can reconstitute a functional transcription factor – by bringing the DB domain and the AD domain into close

physical proximity

Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)

Yeast two-hybrid systemYeast two-hybrid system

¤ ‘Architectural blueprint’ for a functional transcription factor– DB-X/AD-Y, where X and Y could be essentially any proteins from any

organism

UASUASUpstream Activating Sequence

Selectable marker geneSelectable marker gene

Gal4 transcription-activation domain

Gal4 DNA bindingdomain

bait

prey

DB

ADXY

Yeast two-hybrid systemYeast two-hybrid system

¤ The yeast two-hybrid system allows – Genetic selection of genes encoding potential interacting

proteins without the need for protein purification• System is to isolate genes encoding proteins that potentially

interact with DB-X (referred to as the ‘bait’) in complex AD-Y libraries (referred to as the ‘prey’)

– Limitations of the system include• False positives: clones with no biological relevance • False negatives: Failure to identify knowm interactions

– Stringent criteria must be used to evaluate both the specificity and the sensitivity of the assay

Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)

Protein Interaction Mapping in Protein Interaction Mapping in C. elegansC. elegans Using Using Proteins Involved in Vulval Development Proteins Involved in Vulval Development

¤ Landmark paper presents– First demonstration of large-scale two-hybrid analysis for

protein interaction mapping in C. elegans• starting with 27 proteins involved in vulval development in C.

Elegans

Walhout et al, Science 287: 116 (2000)

Experimental ApproachExperimental Approach

¤ Start from known genes in vulval development– Used Recombinational cloning to introduce ORFs of 29

known genes involved in vulval development into two-hybrid vectors

¤ Matrix two-hybrid experiment with 29 ORFs– Each DB-vORF/AD-vORF pairwise combination was

• tested for protein-protein interactions by scoring two-hybrid phenotypes

¤ Exhaustive two-hybrid screen – using 27 vORF-DB fusion proteins as baits to select

interactors from a AD-Y cDNA library• sequenced the selected clones: interaction sequence tag (IST)

Reprinted from:Walhout et al, Science 287: 116 (2000)

Construction of DB and AD Fusions by Construction of DB and AD Fusions by Recombinational CloningRecombinational Cloning

Phage lambda excision:Integrase, IHF & Exisionase

DNA bindingdomain

Activationdomain

Reprinted from: Walhout et al, Science 287: 116 (2000)

DB-ORF fusions AD-ORF fusions

Matrix of Two-hybrid Interactions Between the Matrix of Two-hybrid Interactions Between the vORFsvORFs

Reprinted from:Walhout et al, Science 287: 116 (2000)

Interaction Interaction Sequence Sequence Tag (IST) Tag (IST) screeningscreening

Reprinted from:Walhout et al, Science 287: 116 (2000)

ResultsResults

¤ Matrix two-hybrid experiment with 29 ORFs– ~ 50% (6 of 11) of the interactions reported were detected

• Two novel potential interactions were identified– Typically the yeast two-hybrid system will detect ~50% of

the naturally occurring interactions

¤ Two-hybrid screen– Identified 992 AD-Y encoding sequences– ISTs corresponded to a total 124 different interacting

proteins• 15 previously known

– Provides a functional annotation for 109 predicted genes

Reprinted from:Walhout et al, Science 287: 116 (2000)

Validation of Potential InteractionsValidation of Potential Interactions

¤ Conservation of interactions in other organisms– If X' and Y' are orthologs of X and Y, respectively

• X/Y conserved interactions are referred to as "interologs"

Reprinted from:Walhout et al, Science 287: 116 (2000)

Validation of Potential InteractionsValidation of Potential Interactions

¤ Systematic clustering analysis– closed loop connections between vORF- encoded proteins

• X interacts with Y, Y interacts with Z, Z interacts with W, and so

on (X/Y/Z/W/...)

Reprinted from:Walhout et al, Science 287: 116 (2000)

Mutations withSimilar phenotypes

Conclusions Conclusions

¤ Demonstrated the feasibility of generating a genome-wide protein interaction maps– Two-hybrid screens are

• Simple• sensitive • amenable to high-throughput

– Feasible using the C. elegans ORFeome

¤ Y2H detects approximately 50% of the interactions– provides a useful coverage of biologically important

interactions

Reprinted from:Walhout et al, Science 287: 116 (2000)

A Comprehensive Analysis of Protein–A Comprehensive Analysis of Protein–protein Interactions in protein Interactions in Saccharomyces Saccharomyces

CerevisiaeCerevisiae

¤ Landmark paper presents– The first Large scale high throughput mapping of protein-

protein interactions between ORFs predicted in S. cerevisiae using

– Two complementary yeast two-hybrid screening strategies• Two-hybrid array of 6.000 hybrid proteins• High-throughput library screen

Uetz et al., Nature 403: 623 (2000)

The two-hybrid array screeningThe two-hybrid array screening

¤ Two-hybrid array of 6.000 hybrid proteins comprises– Haploid yeast colonies derived from ~6,000 yeast ORFs fused

to the Gal4 activation domain (AD)– The two-hybrid array contained on 16 plates of 384 colonies

¤ Matrix screen for interactions – 192 different Gal4 DB ORF hybrids were mated to the two-

hybrid array– 192 two-hybrid array screens were performed in duplicate

• Each yielded 1–30 positives• But only ~ 20% were reproduced in the duplicate screen

¤ Putative interacting partners identified– 87/192 DB hybrids yielded putative protein–protein interactions– Identified 281 interacting protein pairs

Reprinted from: Uetz et al., Nature 403: 623 (2000)

The two-hybrid array screeningThe two-hybrid array screening

Reprinted from: Uetz et al., Nature 403: 623 (2000)

Positive control: 6,000 haploid yeast Gal4 activation domain - ORF fusions

Two-hybrid positives from a mating witha Gal4 DNA-binding domain - ORF fusion

16 microassay plates

High-Throughput Library ScreenHigh-Throughput Library Screen

¤ Used a library Made by pooling ORF-AD fusions – Each ORFs was fused separately to a gal4 activation domain – ORF-AD fusions were pooled to form an activation-domain

library• Advantage over traditional cDNA libraries is the uniform

presentation of each ORF

¤ Protein interactions were screened by – mating the 6.000 DNA-binding domain hybrids in duplicate to

the activation domain library– 817 yeast ORFs (15%) yielded protein–protein interactions

– Identified 692 interacting protein pairs• 68% of the interactions were identified multiple times

Reprinted from: Uetz et al., Nature 403: 623 (2000)

Results of the Systematic Two-Hybrid Screens Results of the Systematic Two-Hybrid Screens

¤ The matrix array screens – gave more interactors

• 45% of the 192 proteins in the array screens yielded interactions

– are much more labour- and material-intensive• limits the number of screens that can be performed• Full matrix would require testing 6.000 * 6.000 = 36.000.000

interactions!

¤ The library screens gave – fewer interactors

• 8% of the proteins tested in the library screens yielded interactions

– a much higher throughput

Reprinted from: Uetz et al., Nature 403: 623 (2000)

Analysis of the protein-protein Analysis of the protein-protein interactions interactions

¤ The analysis reveals– Interactions that place unknown proteins into a biological

context– Novel interactions between proteins involved in the same

biological function– Novel interactions that connect biological functions into

larger cellular processes

Interactions involving unknown Interactions involving unknown proteinsproteins

Reprinted from: Uetz et al., Nature 403: 623 (2000)

Interactions Between Proteins in the RNA Interactions Between Proteins in the RNA Splicing ComplexSplicing Complex

Reprinted from: Uetz et al., Nature 403: 623 (2000)

Interactions are consistent with the crystallographic data

Interaction Connecting two different Interaction Connecting two different ComplexesComplexes

Reprinted from: Uetz et al., Nature 403: 623 (2000)

spindle checkpoint complex microtubule checkpoint complex

Analysis Analysis of of

InterologInterologss

Reprinted from: Uetz et al., Nature 403: 623 (2000)

Yeast

Human

ConclusionsConclusions

¤ The two-hybrid array approach is feasible– for systematic genome-wide analysis of protein interactions

¤ The large scale mapping of protein-protein interactions reveals – many new interactions between proteins– that protein interactions should be viewed as potential

interactions that must be confirmed independently– This conclusion is supported by the fact that the results of

different screens only partially overlap

Reprinted from: Uetz et al., Nature 403: 623 (2000)

A Map of the Interactome Network of the A Map of the Interactome Network of the Metazoan Metazoan C. elegansC. elegans

¤ Paper presents– Large scale mapping of protein-protein interaction in C.

elegans using yeast two-hybrid screens with a subset of metazoan-specific proteins

• identified > 4000 interactions

– Together with already described Y2H interactions and interologs predicted in silico,

• the current version of the Worm Interactome map contains 5500 interactions

Li et. al., Science, 303, 540-543 (2004)

Worm Interactome map Worm Interactome map

Reprinted from: Li et. al., Science, 303, 540-543 (2004)

Phylogenetic classes

EukaryoticMulti cellularWorm

A Protein Interaction Map of A Protein Interaction Map of Drosophila Drosophila melanogastermelanogaster

¤ Paper presents– a two-hybrid–based protein-interaction map of the fly

proteome by screening 10,623 ORFs against cDNA libraries to produce

• a draft map of 7048 proteins and 20,405 interactions. • Computational rating of interaction confidence produced

– a high confidence interaction network of 4679 proteins and 4780 interactions showing two levels of organization

• a short-range organization, presumably corresponding to multiprotein complexes

• a more global organization, presumably corresponding to intercomplex connections

Giot et. al., Science, 302, 1727-1736 (2003)

The fly protein-The fly protein-interaction map: interaction map:

Protein Protein family/human family/human

disease orthologs disease orthologs

Reprinted from: Giot et. al., Science, 302, 1727-1736 (2003)

The fly protein-The fly protein-interaction interaction

map: map: Subcellular Subcellular

localization localization

Reprinted from: Giot et. al., Science, 302, 1727-1736 (2003)

Towards a proteome-scale map of the Towards a proteome-scale map of the human protein–protein interaction human protein–protein interaction

network network

¤ Paper presents– First step towards a systematic and comprehensive analysis

of the human interactome using• stringent, high-throughput yeast two-hybrid system to

test pairwise interactions among the products of 8,100 currently available Gateway-cloned open reading frames

Rual et. al., Nature 424: 1173-1178 (2005)

Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005)

High-throughput yeast two-hybrid High-throughput yeast two-hybrid pipelinepipeline

¤ Stringent test– Second test using

GAL1::HIS3 and GAL1::lacZ

– Reduces the number of false positives

¤ Detected 2,800 interactions

Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005)

Overlap of CCSB-HI1 with literature Overlap of CCSB-HI1 with literature datadata

¤ Compared the overlap between – Observed interactions– Interactions reported in the

literature

¤ Conclude that the CCSB-HI1 data set contains 1% of the human interactome– Human interactome is

estimated at 200.000 to 300.000 interactions.

Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005)

Interaction network of disease-associated Interaction network of disease-associated CCSB-HI1 proteinsCCSB-HI1 proteins

¤ The human interactome will further – the understanding of

human health and disease

¤ Illustrated by – The network of disease-

associated proteins (green nodes)

• EWS protein

Functional Functional MapsMaps

or “-omes”or “-omes”

proteins

ORFeome

Localizome

Phenome

Transcriptome

Interactome

Proteome

Genes or proteins

Genes

Mutational phenotypes

Expression profiles

Protein interactions

1 2 3 4 5 n

DNA Interactome Protein-DNA interactions

“Conditions”

After: Vidal M., Cell, 104, 333 (2001)

Cellular, tissue location

Proteome AnalysisProteome Analysis

¤ Large scale and comprehensive analysis of the proteome has so far not been feasible– Lack of suitable and sensitive protein fractionation methods

• 2-D gels are limited to a few 1000 proteins only – the most abundant

– Protein characterization is slow and laborious• Despite enormous improvements in mass spectrometry, the

characterization of individual proteins remains the bottleneck

– Level of proteome characterization to date is in the order of a few 1000 proteins at best

• Represents 5% to 25% of the proteome

¤ Tandem affinity purification (TAP) technology constitutes an important breakthrough– Fast and reliable method of protein purification

A generic protein purification method for A generic protein purification method for protein complex characterizationprotein complex characterization

¤ Paper presents– a generic procedure to purify protein complexes under

native conditions using • tandem affinity purification (TAP) tag procedure

– Using a combination of high-affinity tags for purification

Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)

Reprinted from:  Kumar A. and  Snyder M., Nature 415, 123(2002)

Tag-based Characterization of protein Tag-based Characterization of protein complexescomplexes

High-affinity Tags High-affinity Tags ¤ High-affinity protein tags

– Must allow efficient recovery of proteins present at low concentrations

• ProtA tag: two IgG-binding units of protein A of S. aureus– released from matrix-bound IgG under denaturing conditions

• CBP tag: calmodulin-binding peptide– released from the affinity column under mild conditions

¤ Tandem affinity purification (TAP) tag – A fusion cassette encoding both the ProtA tag and the CBP

tag • Separated by a specific TEV protease recognition sequence

which allows proteolytic release of the bound material under native conditions

Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)

Tandem affinity purification (TAP) tag Tandem affinity purification (TAP) tag

Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)

ProtA

CBP

The TAP Purification ProcedureThe TAP Purification Procedure

Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)

ProtA affinity purification step

CBP affinity purification step

TEV protease cleavage step

Advantage of the Two-step Advantage of the Two-step ProcedureProcedure

¤ Purification of U1 snRNP– Single-step affinity

purification yields a high level of contaminating proteins

– Tow-step affinity purification yields highly specific purification with very low background

Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)

Functional organization of the yeast Functional organization of the yeast proteome by systematic analysis of proteome by systematic analysis of

protein complexes protein complexes

¤ Landmark paper presents– Large-scale application of the TAP technology for a

systematic analysis of multiprotein complexes from yeast• Generated gene-specific TAP tag cassettes by PCR• Insert TAP cassettes by homologous recombination at the 3' end

of the genes to generate fusion proteins in their native location • Purified protein assemblies from cellular lysates by TAP

– Separate purified assemblies by denaturing gel electrophoresis

– Digest individual bands by trypsin• Analyze peptides by MALDI–TOF MS to identify the proteins

using database search algorithms

Gavin et. al., Nature 415, 141 (2002)

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

The Gene Targeting ProcedureThe Gene Targeting Procedure

TAP tag gene-specific cassette

Large-scale Analysis of Protein Large-scale Analysis of Protein ComplexesComplexes

¤ Experimental outline– Started with a selection of 1,739 genes

• 1,143 genes representing eukaryotic orthologues• 596 genes nonorthologous set

– Generated 1,167 strains expressing tagged proteins to detectable levels

– Analyzed 589 protein complexes• Comprising 418 different orthologues

– Generated 20,946 samples for mass spectrometry • Identified 16,830 proteins

– Characterized a total of 232 protein complexes• Comprising 1,440 distinct proteins ~ 25% of the ORFs in the

genome

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Purification Purification and and

IdentificatioIdentification of TAP n of TAP

ComplexesComplexes

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Sensitivity and Specificity of the Approach Sensitivity and Specificity of the Approach

¤ Very efficient large-scale purification and identification of protein complexes – 78% of the 589 purified complexes have associated

proteins– The remaining 22% showing no interacting proteins

• May not form stable or soluble complexes• The TAP tag may interfere with complex assembly or function

¤ Complexes are stable and show the same composition when purified with different entry points– Example: the polyadenylation machinery, responsible for

eukaryotic messenger RNA cleavage and polyadenylation• Identified 12 of the 13 known components• Identified 7 new components

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

The Polyadenylation Protein The Polyadenylation Protein ComplexComplex

new components of the polyadenylation

complex

Composition of the Polyadenylation Composition of the Polyadenylation ComplexComplex

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

protein tagged for affinity

purification <

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Reliability of the TAP MethodReliability of the TAP Method

¤ High sensitivity– identify proteins present at 15 copies per cell

¤ High reproducibility– 70% of the proteins are detected in independent

purifications

¤ Low background– The background comprises highly expressed proteins

• Identified 17 contaminant proteins (heat-shock and ribosomal proteins)

¤ Limitations– 18% of the tagged essential genes gave no viable strains

• The carboxy-terminal tagging can impair protein function

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Organization of the purified assemblies into Organization of the purified assemblies into complexes complexes

¤ 589 purified complexes characterized– 245 complexes corresponded to 98 known multiprotein complexes

in yeast– 242 complexes correspond to 134 new complexes

¤ In total 232 annotated TAP complexes are identified– 102 proteins showed no detectable association with other proteins

Number Of Proteins Per ComplexNumber Of Proteins Per Complex

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Average of 12 proteins per complex

Functional Classification Of The Functional Classification Of The ComplexesComplexes

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

wide functional distribution of complexes

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Protein Complexes are DynamicProtein Complexes are Dynamic

¤ Complexes are not necessarily of invariable composition – Using distinct tagged proteins as entry points to purify a

complex• Core components can be identified as invariably present• Regulatory components may be present differentially

¤ Dynamic complexes: e.g. signaling complexes– The interactions of a signalling enzyme may be sufficiently

strong to allow the detection of distinct cellular complexes • They may be diagnostic for the role of these enzymes in

different cellular activities

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Higher-order Organization of The Proteome MapHigher-order Organization of The Proteome Map

¤ Most complexes are linked together– Complexes belonging to the same functional class often

share components • mRNA metabolism, cell cycle, protein synthesis and turnover,

intermediate and energy metabolism

¤ Shared components linking complexes into a network– The network connections reflect physical interaction of

complexes• common architecture, localization or regulation

– Relationships between complexes suggests integration and coordination of cellular functions

– The more connected a complex, the more central its position in the network

Reprinted from:  Gavin et. al., Nature 415, 141

(2002)

cell cycle

signalling

TranscriptionDNA maintenancechromatin structure RNA metabolism

protein synthesisand turnover

cell polarity and structure

intermediate and energy metabolism

membrane biogenesisand traffic

The Yeast Protein Complex NetworkThe Yeast Protein Complex Network

protein and RNA transport

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

Protein Complexes Have a Similar Composition in Protein Complexes Have a Similar Composition in Yeast and HumanYeast and Human

Reprinted from:  Gavin et. al., Nature 415, 141 (2002)

ConclusionsConclusions

¤ The paper clearly demonstrates the merits of the TAP technology for– characterizing protein complexes from different

compartments, including low-abundance and large complexes– TAP data and yeast two-hybrid assay data show only a very

small overlap• The two methodologies address different aspects of protein

interaction and are complementary

¤ The TAP analysis provides an outline of the eukaryotic proteome as a network of protein complexes– The human–yeast orthologous proteome represents core

functions for the eukaryotic cell • Orthologous proteins are often responsible for essential functions

Recommended readingRecommended reading

¤ Yeast two-hybrid interaction mapping– The yeast two-hybrid system

• Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)

– Protein Interaction Mapping in C. elegans Using Proteins Involved in Vulval Development

– Walhout et al, Science 287: 116 (2000)

¤ Purification of protein complexes– Gavin et. al., Nature 415, 141 (2002)

Further readingFurther reading

¤ Protein Interaction Mapping– Interaction map of yeast

• Uetz et al., Nature 403: 623 (2000)

– Interaction map C. elegans• Li et. al., Science, 303, 540-543 (2004)

– Interaction map Drosphila• Giot et. al., Science, 302, 1727-1736 (2003)

¤ Purification of protein complexes– Tandem affinity purification (TAP) tag method

• Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)