Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

28
Proteome Network Proteome Network Evolution Evolution by Gene Duplication by Gene Duplication S. Cenk S. Cenk Ş Ş ahinalp ahinalp Simon Fraser University Simon Fraser University
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Page 1: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Proteome Network Proteome Network Evolution Evolution

by Gene Duplicationby Gene Duplication

S. Cenk S. Cenk ŞŞahinalpahinalp

Simon Fraser UniversitySimon Fraser University

Page 2: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

AcknowledgementsAcknowledgements

Colin Cooper, KCLColin Cooper, KCLJoe Nadeau, CWRUJoe Nadeau, CWRU

Petra Berenbrink, SFUPetra Berenbrink, SFUGurkan Bebek, CWRUGurkan Bebek, CWRU

NSF, NSERC, CRC Programme, Charles Wang FoundationNSF, NSERC, CRC Programme, Charles Wang Foundation

Page 3: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Networks are found in biological systems of Networks are found in biological systems of varying scales:varying scales: time units: millions of yearstime units: millions of years   1. the evolutionary tree of life1. the evolutionary tree of life2. ecological networks2. ecological networks 3. the genetic control networks of organisms3. the genetic control networks of organisms 4. the protein interaction network in cells4. the protein interaction network in cells5. the metabolic network in cells5. the metabolic network in cellstime units: millionth of a secondtime units: millionth of a second

Page 4: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Proteins in a cellProteins in a cell

There are thousands of different active proteins There are thousands of different active proteins in a cell acting as:in a cell acting as: enzymes, catalysors to chemical reactions of the enzymes, catalysors to chemical reactions of the

metabolism metabolism components of cellular machinery (e.g. ribosomes) components of cellular machinery (e.g. ribosomes) regulators of gene expression regulators of gene expression

Certain proteins play specific roles in special Certain proteins play specific roles in special cellular compartments cellular compartments Others move from one compartment to another Others move from one compartment to another as “signals”. as “signals”.

Page 5: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Protein Interactions Protein Interactions

Proteins are produced and degraded all of the time.Proteins are produced and degraded all of the time. The rates at which these processes occur depend on The rates at which these processes occur depend on

what proteins are already present, how they interact with what proteins are already present, how they interact with one another directly and how they interact with genes (at one another directly and how they interact with genes (at DNA or mRNA level).DNA or mRNA level).

Proteins that bind to DNA or RNA have direct effect on Proteins that bind to DNA or RNA have direct effect on production or degradation of other proteins.production or degradation of other proteins.

One protein can speed up or slow down the rate of One protein can speed up or slow down the rate of production of another by binding to the corresponding production of another by binding to the corresponding DNA or mRNA,DNA or mRNA,

Page 6: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

What is a proteome network?What is a proteome network?

Represents interactions between pairs of Represents interactions between pairs of proteins as a binary relationship.proteins as a binary relationship.

Forms a network in which:Forms a network in which: Vertex = Vertex = proteinprotein Link = Link = interactioninteraction

Establishes an ordinary graph of all Establishes an ordinary graph of all proteins in an organism and all possible proteins in an organism and all possible interactions between them.interactions between them.

Page 7: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.
Page 8: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

MIPS Proteome Network Topology

Page 9: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

PPI Database SourcesPPI Database Sources

DIP (Database of Interacting Proteins - UCLA)DIP (Database of Interacting Proteins - UCLA)

BIND (also include other molecule interactions)BIND (also include other molecule interactions)

MIPS (Munich information center for proteins)MIPS (Munich information center for proteins)

others including:others including:

PROTEOMEPROTEOME

PRONETPRONET

CURAGENCURAGEN

PIMPIMsee http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.htmlsee http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html

Page 10: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Complete Yeast Proteome Network

Page 11: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

The yeast proteome network seems to reveal The yeast proteome network seems to reveal two basic graph theoretic properties:two basic graph theoretic properties:

The frequency of proteins havingThe frequency of proteins having interactions with interactions with exactly exactly kk other proteins follows a other proteins follows a power law:power law:

f(d) ~ C.df(d) ~ C.d

The network exhibits the The network exhibits the small world phenomena:small world phenomena:

small degree of separation between individualssmall degree of separation between individuals

Page 12: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Degree Distribution of PPI Network Degree Distribution of PPI Network of the yeast [Wagner], [Jeong et al.]of the yeast [Wagner], [Jeong et al.]

Page 13: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Small world phenomena & power-law degree distribution Small world phenomena & power-law degree distribution also observed in:also observed in:

communication networkscommunication networks web graphsweb graphs research citation networksresearch citation networks social networkssocial networks

[Albert, Barabasi & Jeong], [Broder et al.], [Faloutsos[Albert, Barabasi & Jeong], [Broder et al.], [Faloutsos33]]

Classical -Erdos-Renyi type random graphs do not Classical -Erdos-Renyi type random graphs do not exhibit these properties: exhibit these properties:

Links between pairs of fixed set of nodes picked uniformly:Links between pairs of fixed set of nodes picked uniformly:Maximum degree logarithmic with network sizeMaximum degree logarithmic with network sizeNo hubs to make short connections between nodesNo hubs to make short connections between nodes

Page 14: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Preferential Attachment Model Preferential Attachment Model [Yule], [Simon][Yule], [Simon]

G

New nodePower-law graphs can be Power-law graphs can be generated by an iterative generated by an iterative process:process:

Add one new node at a timeAdd one new node at a time Connect new node to existing Connect new node to existing

ones independently:ones independently:Probability that a node is Probability that a node is connected to the new node is connected to the new node is proportional to degree proportional to degree

[Bollabas et al][Bollabas et al]

Such graphs also exhibit small Such graphs also exhibit small world phenomenaworld phenomena [Barabasi & Albert], [Barabasi & Albert], [Bollabas & Riordan][Bollabas & Riordan]

Page 15: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Proteome network modeling Proteome network modeling

The model should capture the underlying mechanisms The model should capture the underlying mechanisms that generate the network while satisfying known that generate the network while satisfying known mathematical properties:mathematical properties:Ohno’s model of genome growth by duplicationOhno’s model of genome growth by duplication

Duplication based graphs [Kleinberg et al.], [Kumar et al] Duplication based graphs [Kleinberg et al.], [Kumar et al] [Pastor-Satorras et al], [Chung et al.]:[Pastor-Satorras et al], [Chung et al.]:

each iteration duplicates a randomly chosen vertex with all its each iteration duplicates a randomly chosen vertex with all its links. links.

it then independently deletes existing edges and inserts new it then independently deletes existing edges and inserts new ones.ones.

Analysis of incoming degree distribution in directed graphs reveal a Analysis of incoming degree distribution in directed graphs reveal a power law.power law.

Simulations on undirected networks exhibit power law like degree Simulations on undirected networks exhibit power law like degree distributions.distributions.

Page 16: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Duplication ModelDuplication Model

G

At each iteration t (= total number of nodes)

1. Existing vertex is chosen uniformly at random and is ``duplicated'' with all its links.

2. Emulate mutations bya. each link of the new

vertex is deleted with probability q = 1-p

b. inserting edges between the new node and every other node with probability r/t

Page 17: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Degree distribution of the best Degree distribution of the best fitting duplication modelfitting duplication model

Page 18: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Expected degree distributionExpected degree distributionIterative process give difference equations based on both degree and Iterative process give difference equations based on both degree and

time; for r = 0:time; for r = 0:

FFt+1t+1(d) = F(d) = Ftt(d) (1- pd/t) + F(d) (1- pd/t) + Ftt(d-1) p(d-1)/t + 1/t (d-1) p(d-1)/t + 1/t j>d-1j>d-1FFtt(j) p(j) pkk q qj-kj-k [j! / k! (j-k)!] [j! / k! (j-k)!]

[Pastor-Satorras et al.]: approximate difference equations by differential [Pastor-Satorras et al.]: approximate difference equations by differential equations to come up with a power law with exponential cutoff: equations to come up with a power law with exponential cutoff:

F(d)/t = f(d) ~ C dF(d)/t = f(d) ~ C d dd

Underlying assumption: Pr[ t+1 generates a degree d node] Underlying assumption: Pr[ t+1 generates a degree d node] depends on fdepends on ftt(d+1) and f(d+1) and ftt(d) only(d) only

[Chung et al.]: verify whether power law degree distribution is satisfied [Chung et al.]: verify whether power law degree distribution is satisfied by the difference equations:by the difference equations:

fftt(d) ~ C d(d) ~ C d for sufficiently large t,d for sufficiently large t,d

Underlying assumption: fUnderlying assumption: ftt(d) is independent of t for all d(d) is independent of t for all d

Page 19: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Counter evidenceCounter evidence

fftt(d) with exponential cutoff will result in a (d) with exponential cutoff will result in a maximum degree of O(log t) as per maximum degree of O(log t) as per Erdos-Renyi graphs.Erdos-Renyi graphs.

Power law degree distribution implies a Power law degree distribution implies a maximum degree of maximum degree of (t(tpp) )

fftt(d) can not be independent of t:(d) can not be independent of t:

for r=0 and p=.5, the fraction of singletons for r=0 and p=.5, the fraction of singletons approach 1 with growing tapproach 1 with growing t

Page 20: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

What to do with singletonsWhat to do with singletons

Allow them to exist and duplicate (and let them dominate)Allow them to exist and duplicate (and let them dominate)Allow them to exist but not duplicate (will have fixed Allow them to exist but not duplicate (will have fixed fraction – do not agree with the yeast network)fraction – do not agree with the yeast network)Do not allow them to exist: Do not allow them to exist:

Either delete as soon as one is createdEither delete as soon as one is created Or, always have one default connection to one of the existing Or, always have one default connection to one of the existing

nodes nodes adding a fourth term to the difference equationadding a fourth term to the difference equation

FFt+1t+1(k) = F(k) = Ftt(k) (1- pk/t)(k) (1- pk/t)

+ F+ Ftt(k-1) p(k-1)/t (k-1) p(k-1)/t

+ 1/t + 1/t j>k-1j>k-1FFtt(j) p(j) pkk q qj-kj-k [j! / k! (j-k)!] [j! / k! (j-k)!]

+ (F+ (Ftt(k-1) - F(k-1) - Ftt(k)) (k)) j>0 j>0 FFtt(j) q(j) qjj / t / t22

which gives a power law if 1 = pwhich gives a power law if 1 = p – p + p – p + p--

Page 21: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Other properties: k-reachability

rk(n): number of nodes that are at most k

hops away from n.

r1(n1) = 5

r2(n1) = 9

r3(n1) = 10

Page 22: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

k-reachability of individual nodes (nodes sorted by degree)

Page 23: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Average k-reachability of nodes with fixed initial degree

Average degree distribution of the neighbors is independent of the degree of the original node

However the variance is high

Page 24: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Sequence HomologyTwo proteins are sequencewise homologous if their pairwise cDNA alignment results with 50% similarity and above:

Dual phase distribution of the total number of protein pairs as a function of percentage similarity:

cDNA sequence source: ftp://genome-ftp.stanford.edu/pub/yeast

Page 25: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Sequence homology vs interactionsSequence homology vs interactions

If pIf p11 - p - p22 are similar and p are similar and p22 - p - p33 interact, interact,

then with .03 chance pthen with .03 chance p11 - p - p33 interact. interact.

if pif p11 - p - p22 are similar, p are similar, p22 - p - p33 are similar, are similar,

then pthen p11 - p - p33 are similar with .64 chance are similar with .64 chance

If two genes physically interact with each other, If two genes physically interact with each other, it is very likely that they are not similar it is very likely that they are not similar (excluding self interactions).(excluding self interactions).

Page 26: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Sequence Homology Enhanced Sequence Homology Enhanced Duplication ModelDuplication Model

Given duplicate node i:Given duplicate node i:• Each interaction edge (Each interaction edge (ii,,jj) is deleted with probability q. ) is deleted with probability q.

For each similarity edge (For each similarity edge (jj,,kk), with .03 probabilty, the interaction edge (), with .03 probabilty, the interaction edge (ii,,kk) is ) is deleted.deleted.

• Each similarity edge (Each similarity edge (ii,,jj) is deleted with probability q’. ) is deleted with probability q’. For each similarity edge (For each similarity edge (jj,,kk), with .64 probabilty the similarity edge (), with .64 probabilty the similarity edge (ii,,kk) is ) is deleted.deleted.For each interaction edge (For each interaction edge (jj,,kk), with .03 probabilty the interaction edge (), with .03 probabilty the interaction edge (ii,,kk) is ) is deleted.deleted.

• For each j, a new interaction edge For each j, a new interaction edge (i,j)(i,j) is added with probability r/t. is added with probability r/t.For each similarity edge (For each similarity edge (jj,,kk), with .03 probabilty the interaction edge (), with .03 probabilty the interaction edge (ii,,kk) is ) is added.added.

• A new similarity edge (A new similarity edge (ii,,jj) is added with probability r’/t.) is added with probability r’/t.For each similarity edge (For each similarity edge (jj,,kk), with .64 probabilty, the similarity edge (), with .64 probabilty, the similarity edge (ii,,k) is k) is addedadded..For each interaction edge (For each interaction edge (jj,,kk), with .03 probabilty, the interaction edge (), with .03 probabilty, the interaction edge (ii,,k) is k) is added.added.

Page 27: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Degree Distribution of the Enhanced Model

Page 28: Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

k-reachability of individual nodes (nodes sorted by degree)