Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

50
Using blast Using blast to study to study gene gene evolution – evolution – an example. an example. troduction to bioinformatics, lesson 3b.

Transcript of Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Page 1: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Using blast to Using blast to study gene study gene evolution – an evolution – an example.example.

Introduction to bioinformatics, lesson 3b.

Page 2: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

NCBI diagram

Page 3: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Orthologs

Homologous sequences are Homologous sequences are orthologousorthologous if they were separated by a if they were separated by a speciation event:event:

If a gene exists in a species, and that If a gene exists in a species, and that species diverges into two species, then the species diverges into two species, then the copies of this gene in the resulting species copies of this gene in the resulting species are orthologous.are orthologous.

Page 4: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Orthologs

• Orthologs will typically have the same or similar function in the course of evolution.

• Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

Page 5: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Orthologs

speciation

ancestor

descendant 2descendant 2

Page 6: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Paralogs

Homologous sequences are Homologous sequences are paralogousparalogous if if they were separated by a they were separated by a gene duplication event: event:

If a gene in an organism is duplicated, If a gene in an organism is duplicated, then the two copies are paralogous. then the two copies are paralogous.

Page 7: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Paralogs

• Orthologs will typically have the same or similar function.

• This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

Page 8: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Paralogs

DuplicationDuplication

Page 9: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Orthologs and Paralogs

Duplication

Speciation

Species a Species b

Paralogs

Orthologs

Orthologs

Page 10: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

NCBI diagram

Page 11: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

What is conservation?

Functionally or structurally important sites are conserved:

Conserved sites “slow” evolving sitesVariable sites “fast evolving” sites

A functionally or structurally important sites – are subject to stronger evolutionary pressure =Purifying selection force

Page 12: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Finding conservation regions from an alignment

S1 KITAYCELARTDMKLGLDFYKGVSLANWVCLAKWESGYN S2 MPFERCELARTLKRMADADIRGVSLANWVCLAKWFWDGGS3 MPFERCELARTLKRMMDADIRGVSLANWVCLAKWFWDGG

From the MSA and the tree, one can determine how From the MSA and the tree, one can determine how conserved is a gene.conserved is a gene.

Page 13: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Mol. Biol. Evol. (2005) 22:598-606

Page 14: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Protocol

Page 15: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Step 1 - BLAST

Search for Human-mouse orthologous protein pairs

Page 16: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Step 1 - BLAST

• The orthologs are defined as pairs of reciprocal BLAST hits.

• Eliminate genes with more than one potential orthologous sequence.

• Select only genes which the human protein was functionally annotated.

Page 17: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Step 2 – Evolutionary Rates

For each orthologous pair:

• Alignment at the amino acid level.

• Measure conservation

The data set contained 6,776 human-mouse gene pairs.

Page 18: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Step 3 – Assignment of Temporal Categories

Using BLAST for finding homologous genes in 6 different eukaryotic genomes .

Page 19: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Caenorhabditis elegans Schizosaccharo

myces pombe

Takifugu rubripes

Drosophila melanogaster

Arabidopsis thaliana

Saccharomyces cerevisiae

Page 20: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

What is Old?

• Presence of any homolog in all the 6 genomes.

What is Presence? • Using an e-value cutoff of 10-4 in BLAST.

OLD

METAZOANS

DEUTEROSTOMES

TETRAPODS

Caenorhabditis elegans

Drosophila melanogaster

Takifugu rubripes

Page 21: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

• METAZOANS - Animals whose bodies consist of many cells, as distinct from Protozoa, which are unicellular; all animals commonly recognized as animals.

• DEUTEROSTOMES - The second of the two main groups of bilaterally symmetrical animals. The name derives from 'deutero' (second) 'stome' (mouth), referring to the origin of the definitive mouth as an opening independent from the blastopore of the embryo.

• TETRAPODS - Any four-legged animals, including mammals, birds, reptiles and amphibians.

Page 22: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Results

Page 23: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Negative correlation between “age” of genes and the rate of

evolutionCONSERVATION

CONSERVATION

CONSERVATION

CONSERVATION

Page 24: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Control.• Changing the sensitivity of the BLAST

detection to a more conservative one of 10-10, did not significantly affect the result.

Page 25: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Explanations

Page 26: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Functional constraint remained constant throughout the evolutionary history of

each gene, but the newer genes are less constrained than older genes.

Page 27: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Functional constraints are not constant, rather they are weak at the time of origin of a gene and they become progressively

more stringent with age.

Page 28: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Eran Elhaik, Niv Sabath, and Dan Graur

Mol. Biol. Evol. 23(1):1–3. 2006

Page 29: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Goal

• To show that these results are an artifact caused by our inability to detect similarity when genetic distances are large.

Page 30: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Simulation

Page 31: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

The evolutionary process

Rat

Dog

Cat

Mouse

Fly

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 32: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

The evolutionary process

Rat

Dog

Cat

Mouse

Fly

V

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 33: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Rat

Dog

Cat

Mouse

Fly

V

V

The evolutionary process

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 34: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Rat

Dog

Cat

Mouse

Fly

LV

V

The evolutionary process

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 35: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

LLIM

V

Rat

Dog

Cat

Mouse

Fly

LL

V

V

The evolutionary process

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 36: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Rat L M T G S H M G N F I IMouse L M T G S G M A N H V ICat I M T G S H I G Y A M FDog M M T G S G I G L T R A Fly V M T G S W R G R M Y A

The evolutionary process

...

And repeat the process for all positions…(assume: each position evolves independently)

Page 37: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

All the genes originated in the common ancestor of A,B,C,D,E and are, thus, of equal age.

Similar to the human and mouse orthologous genes.

Remote homologous genes from increasingly more distant taxa.

Generate terminal sequences with the following phylogenetic relationships:

DA B EC

Page 38: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Simulation

• They simulated genes with 101 different rates.

• High rate -> higher likelihood for a amino acid replacement in each branch.

Page 39: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Simulation

Use BLAST, at the same way that Alba and Castresana used it, to detect homology between gene A to genes C,D and E.

Page 40: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Only one different – the groups names

OLD

METAZOANS

DEUTEROSTOMES

TETRAPODS

SENIORS

ADULTS

TEENAGERS

TODDLERS

Page 41: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Results

Page 42: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Same as Alba and Castresana

Page 43: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

But all the simulated genes are at the same “age”.

What is the problem???

Page 44: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

We can only count genes that are identified as homologous by the

protocol

Page 45: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Alba and Castresana may have, thus, failed to spot the vast

majority of homologs from among the fastest evolving genes

Page 46: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

The vast majority of the fastest evolving genes are undetectable even when the cutoffs are extremely permissive.

Page 47: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

Conclusion

Page 48: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

The inverse relationship between evolutionary rate and gene age is an artifact caused by our inability to detect similarity when genetic distances are large.

Page 49: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

• Since genetic distance increases with time of divergence and rate of evolution, it is difficult to identify homologs of fast evolving genes in distantly related taxa.

• Thus, fast evolving genes may be misclassified as “new”.

Page 50: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b.

So, the only conclusion that can be drawn from Alba and

Castresana’s study is that

Slowly evolving genesevolve slowly

!!!