“The new world”

27
The new world The new world As presented, global interaction-detection As presented, global interaction-detection methods have been invented in the last few methods have been invented in the last few years: years: 1. 1. Yeast 2 Hybrid arrays Yeast 2 Hybrid arrays. 2. 2. Mass spectrometry. Mass spectrometry. 3. 3. Correlated mRNA expression profiles. Correlated mRNA expression profiles. 4. 4. Genetic lethal mutations Genetic lethal mutations 5. 5. In silico predictions. In silico predictions. 6. 6. And more… And more… Having understood these methods, our goals Having understood these methods, our goals are now: are now: 1. 1. Compare the outputs of these methods. Compare the outputs of these methods. 2. 2. Use these outputs to extract biological Use these outputs to extract biological information. information.

description

“The new world”. As presented, global interaction-detection methods have been invented in the last few years: Yeast 2 Hybrid arrays . Mass spectrometry. Correlated mRNA expression profiles. Genetic lethal mutations In silico predictions. And more… - PowerPoint PPT Presentation

Transcript of “The new world”

Page 1: “The new world”

““The new worldThe new world”” As presented, global interaction-detection methods As presented, global interaction-detection methods

have been invented in the last few years:have been invented in the last few years:1.1. Yeast 2 Hybrid arraysYeast 2 Hybrid arrays..2.2. Mass spectrometry.Mass spectrometry.3.3. Correlated mRNA expression profiles.Correlated mRNA expression profiles.4.4. Genetic lethal mutationsGenetic lethal mutations5.5. In silico predictions. In silico predictions. 6.6. And more…And more…

Having understood these methods, our goals are Having understood these methods, our goals are now:now:

1.1. Compare the outputs of these methods.Compare the outputs of these methods.2.2. Use these outputs to extract biological information.Use these outputs to extract biological information.

Page 2: “The new world”

Vast amounts of interaction data has emerged: for each Vast amounts of interaction data has emerged: for each method a PPI database was created.method a PPI database was created.

Our first goal is to compare these databases:Our first goal is to compare these databases:1.1. AccuracyAccuracy2.2. BiasesBiases3.3. OverlapsOverlaps4.4. ComplementaritiesComplementarities

We’ll present this based on the following article: We’ll present this based on the following article:     Comparative assessment of large-scale data sets of protein-protein interactions. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields

S, Bork P.

Method evaluationMethod evaluation

Page 3: “The new world”

Method evaluationMethod evaluation Comparing interaction data is difficult.Comparing interaction data is difficult. However, there is only difficult in bread. However, there is only difficult in bread. To overcome these difficulties, a few decisions are To overcome these difficulties, a few decisions are

made: made: A. The common unit of analysis for this study- binary A. The common unit of analysis for this study- binary

interactions.interactions.B. We will focus on the Yeast proteome B. We will focus on the Yeast proteome C. The reference sets- manually made catalogues of known C. The reference sets- manually made catalogues of known

protein complexes:protein complexes:1.1. YPDYPD2.2. MIPSMIPS

Page 4: “The new world”

Overlaps and complementaritiesOverlaps and complementarities About 80,000 yeast PPI’s are currently available About 80,000 yeast PPI’s are currently available

from all latest databases combined.from all latest databases combined. Surprisingly, only about 2,400 (~3%) are supported Surprisingly, only about 2,400 (~3%) are supported

by more than one method.by more than one method. Possible explanations:Possible explanations:

1.1. The methods have not reached saturation.The methods have not reached saturation.2.2. Significant amount of false-positives.Significant amount of false-positives.3.3. Complementarities- strengths and weaknesses of each Complementarities- strengths and weaknesses of each

method. method. To illustrate this, look at the following graph… To illustrate this, look at the following graph…

Page 5: “The new world”

Interaction data by each methodInteraction data by each method

Page 6: “The new world”

Quality evaluationQuality evaluation Quality of the methods consists of:Quality of the methods consists of:

1.1. CoverageCoverage2.2. AccuracyAccuracy

Comparing the data with a reference set allows Comparing the data with a reference set allows evaluation of these methods.evaluation of these methods.

Page 7: “The new world”

Accuracy vs.Accuracy vs. CoverageCoverage

Page 8: “The new world”

Quality evaluationQuality evaluation An independent measure of quality :An independent measure of quality : To what degree do the methods describe PPI’s To what degree do the methods describe PPI’s

between proteins within the same functional group. between proteins within the same functional group.

This is well shown in the first graph:This is well shown in the first graph:

Page 9: “The new world”

Interaction data by each methodInteraction data by each method

Page 10: “The new world”

Biases in interaction coverageBiases in interaction coverage None of the methods covers more than 60% of the None of the methods covers more than 60% of the

proteins in the yeast genome.proteins in the yeast genome. Are there common biases as to which proteins are Are there common biases as to which proteins are

covered?covered? Yes! There are areas in the databases where biases are Yes! There are areas in the databases where biases are

found:found:1.1. ““Democracy”-Democracy”-

Common, abundant proteins are “preferred”.Common, abundant proteins are “preferred”.2.2. ““Oligarchy”-Oligarchy”-

Proteins from specific cellular locations are “preferred”.Proteins from specific cellular locations are “preferred”.3.3. ““Monarchy”-Monarchy”- Ancient, conserved proteins are “preferred” over proteins that Ancient, conserved proteins are “preferred” over proteins that

emerged later in evolution. emerged later in evolution.

Page 11: “The new world”

Bias Bias towards towards various various cecellullular lar

locationslocations

Page 12: “The new world”

Protein-protein interaction networksProtein-protein interaction networks

Having evaluated our methods, our next Having evaluated our methods, our next goal is to use their outputs- PPI databases.goal is to use their outputs- PPI databases.

How can we organize this data in order to How can we organize this data in order to extract valuable information from it?extract valuable information from it?

Networks !Networks ! 2 general kinds of networks- 2 general kinds of networks-

1.1. Simple PPI’s network.Simple PPI’s network.2.2. Category-divided PPI’s network.Category-divided PPI’s network.

Page 13: “The new world”

Why networks?Why networks?1.1. Simple networks visualize the amount and type of Simple networks visualize the amount and type of

Interactions that occur for each protein.Interactions that occur for each protein.2.2. Category-divided networks reveal a lot more- to what Category-divided networks reveal a lot more- to what

extent do proteins of different cell locations or different extent do proteins of different cell locations or different functions interact? functions interact?

3.3. Characterizing proteins according to the proteins they Characterizing proteins according to the proteins they interact with.interact with.

Now that we’re convinced, we’ll consultNow that we’re convinced, we’ll consultA network of protein-protein interactions in yeast.Schwikowski B, Uetz P, Fields S.

Protein-protein interactions networksProtein-protein interactions networks

Page 14: “The new world”

Protein-protein interactions networksProtein-protein interactions networks

2,709 PPI’s were 2,709 PPI’s were analyzed, consisting of analyzed, consisting of 2,039 yeast proteins.2,039 yeast proteins.

A surprising result was A surprising result was discovered:discovered:

Number of Number of networksnetworks

Number of Number of proteins in proteins in

networknetwork1115481548

111919

995-115-11

1931931-41-4

Page 15: “The new world”

Creating the networkCreating the network

Proteins have been assigned 42 cellular Proteins have been assigned 42 cellular roles, for example-cell structure, mitosis, etc.roles, for example-cell structure, mitosis, etc.

1,485 have been categorized, 39% with 1,485 have been categorized, 39% with more than one role.more than one role.

““cluster”- any 3 or more proteins of the same cluster”- any 3 or more proteins of the same function, separated by no more than 2 other function, separated by no more than 2 other proteins.proteins.

For example- 89% of chromatin proteins are For example- 89% of chromatin proteins are within clusters. within clusters.

Page 16: “The new world”

PPI networkPPI network

Page 17: “The new world”

Assessing the quality of the dataAssessing the quality of the data

In order to assess the quality of the In order to assess the quality of the network, we use the following algorithm:network, we use the following algorithm:

For each characterized protein, with at For each characterized protein, with at least one characterized partner:least one characterized partner:

1.1. A list of the functions of its neighbors is made.A list of the functions of its neighbors is made.2.2. If the function of the protein is among the 3 If the function of the protein is among the 3

most common functions in the list, we say it is most common functions in the list, we say it is a correct classification. a correct classification.

Page 18: “The new world”

Example of assessmentExample of assessment

Page 19: “The new world”

Results of assessmentResults of assessment

72% were marked correct.72% were marked correct. On random links only 12% were marked On random links only 12% were marked

correct- the network seems valid.correct- the network seems valid. The 28% might be due to-The 28% might be due to-

1.1. False-positives.False-positives.2.2. Incomplete annotationsIncomplete annotations3.3. Cross-talkCross-talk4.4. Unknown biological connections.Unknown biological connections.

Page 20: “The new world”

Crosstalk between and within Crosstalk between and within functional groupsfunctional groups

Relationships between functional groups Relationships between functional groups might be biologically meaningful.might be biologically meaningful.

65% of the interactions occur between 65% of the interactions occur between proteins with a common function.proteins with a common function.

But, it is the minority which is interesting…But, it is the minority which is interesting…

Page 21: “The new world”

Crosstalk between functional groupsCrosstalk between functional groups

Page 22: “The new world”

Crosstalk between and within Crosstalk between and within subcellular compartments subcellular compartments

It is probable that proteins from the same It is probable that proteins from the same cellular area interact (as with same function)cellular area interact (as with same function)

78% of the PPI’s involving proteins with 78% of the PPI’s involving proteins with known localization, occur between proteins known localization, occur between proteins of the same cellular compartment.of the same cellular compartment.

Interaction between groups of different Interaction between groups of different areas are meaningful here as well:areas are meaningful here as well:

Page 23: “The new world”

Interactions and localizationsInteractions and localizations

Page 24: “The new world”

Prediction of functionPrediction of function

Of the 2,039 proteins in the data set, 554 have no Of the 2,039 proteins in the data set, 554 have no annotation for “functional role”.annotation for “functional role”.

We would like to predict their role, how?We would like to predict their role, how? Obvious method: interacting partners. But…Obvious method: interacting partners. But…

Page 25: “The new world”

Prediction of functionPrediction of function Solution: use the network benefits-second degree Solution: use the network benefits-second degree

neighbors, and so on. neighbors, and so on. For example, if:For example, if:

-uncharacterized-uncharacterized

Page 26: “The new world”

Prediction of function-examplePrediction of function-example

Page 27: “The new world”

summarysummary

Evaluating PPI detection methods reveals Evaluating PPI detection methods reveals unique accuracy, coverage & biases for unique accuracy, coverage & biases for each method.each method.

There are typical overlaps and There are typical overlaps and complementarities between methods.complementarities between methods.

PPI networks reveal important information PPI networks reveal important information about interaction between protein groups.about interaction between protein groups.

PPI networks assist in predicting protein PPI networks assist in predicting protein functions.functions.