Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene...

17
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik Journal Club November 14, 2006

Transcript of Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene...

Origins and impact of constraints in evolution of gene families

Boris E. Shakhnovich and Eugene V.Koonin

Genome Research 2006, October 19

Stella Veretnik

Journal Club

November 14, 2006

evolution through paralogy

paralogous families with essential genes: E-families

paralogous families without essential genes: N-families

tolerance to mutations -> extent of evolution within the family

Essential genes definition: Genes that when mutated can result in a lethal phenotype.

Essential genes and their families:

diverge more slowly than non-essential genes

diverge to a greater extent than non-essential genes

Why this happens?What parameters are responsible? - unanswered

Type of selection acting on evolving genes: purifying selection.

What is purifying selection?

The ratio Ka/Ks <1

Ka is the number of nonsynonymous mutations per site

Ks is the number of the synonymous mutation per site

9.2%18.4%3.5%

frac

tion

of e

ssen

tial g

enes

that

are

not

sin

glet

ons

Most of essential genes do not have paralogs - Why?

Is there something special about those which do have paralogs?

1.91.313.7

ratio

of n

on-e

ssen

tial t

oes

sent

ial g

enes

in E

-fam

ilies

No answer in this paper…

How can a gene have paralogs and still be essential? - All the paralogs together cannot replace all the function of the essential gene.Once this happens, the gene becomes non-essential.

Significantly fewer edges between paralogs in E-families

Edges represent homology relationships

Divergence and diffusion graph.

How were the families assembled?

Construction of paralogous families.

1. Do all-vs.-all Blast comparison of sequences of all translated ORFs within organis

2. Measure amino acid identity level between nodes

Each ORF is a node on a graph.

3. Translate amino acids to nucleotides and calculate Ks (synonymous substitution per site) and Ka (nonsynonymous substitutions)

The result is 3 weighted graphs (as defined by 1, 2, and 3). A paralogous family consist of strongly connected components of the graph.

A cutoff of Ks=5 and E-value 1e-15 are used in this work. In general there is a near-linear dependency of cutoff on Ks.

Largest families

What is a typical size of E-family and of N-family?

Are N-families typically larger? Are there more N-families than E-families? Both?

How paralogous families evolve:

After duplication and divergence the following may happen:

a. Nonfunctionalization: a duplicate turns into pseudogene

b. subfnuctionalization: multiple functions of the ancestral gene are divided between the paralogs

c. neofuntionalization: one of the paralogs evolves a new function, the other keeps the old function(s)

A more typical scenario for N-families

More common for E-families

Do non-essential members always evolve from essential memebers of the family?

Can a duplicate of non-essential paralog become essential?

Purifying selection is stronger in E-familes (about 2 times) – Ka/Ks ratio is lower in E-families

How this is done:

1. For single feature polymorphism (SFP): check within Saccharomyces cerevisiae

2. For Ka/Ks ratio compare orthologs between closely related species (S.cerevisiae/S.paradoxus – yeast;

E.coli K12/CFT073 orthologs )

Implication: N-families diverge faster…

Rate of conversion to peudogene is substantially higher in N-families

6.8 fold difference

Paralogs get fixated more often in N-families (explains the larger size of N-families?)

Equal rate of duplication in E-families and in N-families is assumed.

What happens to the paralogs that do not go to fixation?

Do they become pseudogenes, something else?

Ks is higher in E-families, than in F-families

Implication: paralogs in E-families stick around for a longer time, than in N-families (3 times longer)

Sequence divergence is higher in E-families

nonsynonomous substitutions among paralogs within the family

sequence identity among paralogs within the family

It is possible to identify E- and N-families using only sequence divergence information.

ROC plot

(true negatives)

(tru

e p

osi

tive

s)

Clustering coefficient measures now well connected are the neighbors of a given node in a graph.

Transcriptional regulation of paralogs changes more in E-families: paralogs rarely share trancriptional factors

ChIP-cip experiments

Summary:

Two types of paralogous families exist: E-families and N-families

Two type of families have dramatically different dynamics of molecular evolution:

E-families diverge slowly, but persist for a long periods of time, thus diverging further than the paralogs in N-families

N-families undergoes a more dynamic evolution: many duplicate get fixated, many other become pseudogenes. Level of sequence divergence is significantly lower.

Duplicate in E-families typically assume part of the functions from the original gene and/or evolve a new function.

This is less so with duplicates in N-families (no data shown for this…)

My musings:

N-families gradually evolve from E-families, when the essential gene(s) in the family is not essential any longer. This happens when sufficient number of duplicates exist to assure that all function of the original essential gene are covered.

In a minimalistic organism every gene would be an essential gene.

The gene becomes non-essential when its functions are assumed by other gene or split between several genes.

Every non-essential gene will go through the stage of being in an E-family in which one there is one essential gene.

In this scenario, the E-families are the transition link between essential genes on their way to become non-essential. (You could argue that more robust organism has less essential genes…)

Essential genes(singleton)

Non-essential genes(N-families)

Transition to non-essentiality(E-families)

Different selection pressures in each category? – Yes.

But… how does the behavior of the family changes once it crosses from E-family to N-family?

very careful creeping forwardcareless evolutioncareful evolution