Alexis Dereeper
description
Transcript of Alexis Dereeper
Alexis Dereeper
Homology analysis and molecular phylogeny
CIBA courses – Brasil 2011
Alexis Dereeper
Data selection
Sequence alignment
Method selection
Bayesian Maximum likelihood
Parsimony
Calculate or estimate the better tree fitting the data
Test the reliability of the obtained tree
Probabilistic methods Distance methods
Calculate distance
Model? Optimization
1
2
3
4
4 steps for a phylogenetic analysis
CIBA courses – Brasil 2011
Alexis Dereeper
Phylogeny.fr“The Phylogeny.fr platform transparently chains programs to automatically perform phylogenetic analysis tasks”
CIBA courses – Brasil 2011
Alexis Dereeper
Homology analysisWhat is sequence homology?
• Not a quantitative concept (to differentiate to similarity or identity : 28%identity): genes are homologous or not
• Homologs: genes coming from a common ancestor• Paralogs: homologs coming from a duplication event• Orthologs: homologs coming from a speciation event
• Homology and function: homology does not mean same function systematically. Closest orthologs may have the same function but more distant orthologs show rarely the same phenotypic role (but same role in a specific metabolic pathway)On the other hand, paralogs rapidly acquire different functions.
CIBA courses – Brasil 2011
Alexis Dereeper
How are homologous sequences similar?
• From 100% identity to a few nt/aa in common
• No rule, no limit. Estimation is based on the probability that 2 sequences are similar by chance (e-value):
DNA: e-value < 10-6 et identity > 70% Protein: e-value < 10-3 et identity > 25%
• Sequences without noticeable resemblance can be homologous (similarity found at the 3D structure level).
• Otherwise, a important resemblance is generally interpreted as a homology, and not as a convergent evolution
CIBA courses – Brasil 2011
Homology analysis
Alexis Dereeper
How to detect homology?
By sequence comparison= sequence alignment
1- Local alignment (ex:Blast) Conceived to search for similar regionsAlignment of a particular sequence against a bank of sequences
(Swith &Waterman)
2- Global alignment (ex: ClustalW)Conceived to compare homologous sequences on their full length
(Needleman & Wunsh)
CIBA courses – Brasil 2011
Homology analysis
Alexis Dereeper
Classical Blast output
Different Blast programs :
● BlastN (Query: DNA / Subject : DNA)● BlastP (Query: protein/ Subject : protein)● BlastX (Query: DNA / Subject : protein)● TBlastN (Query: protein/ Subject : DNA)● TBlastX (Query: translated DNA / Subject : translated DNA)
scoreEvalue= inform the accuracy of score
CIBA courses – Brasil 2011
Homology analysis
Alexis Dereeper
Blast Explorer
• Enable an assisted selection of homologous sequences using various criterias
• Post-processing of Blast results:
Guide tree (similarity tree) and possible selection on branches and leaves
Score / evalue distribution Taxonomic arborescence of hits
CIBA courses – Brasil 2011
Alexis Dereeper
BBMH method (Best Blast Mutual Hits) ou RBH (Reciprocal Best Hit)
Ortholog databases/banks:
● Inparanoid (eukaryotes)● HomoloGene (eukaryotes)● OrthoMCL DB● COG (Clusters of Ortholog Groups of proteins) (prokaryotes et eukaryotes)● GreenPhyl (plants)
ProteomeSpecies1
ProteomeSpecies2
CIBA courses – Brasil 2011
Homology analysis
Alexis Dereeper
Phylogenetic analysisStep 1 : Multiple alignment (global alignment)
• Alignment softwares: ClustalW Muscle Tcoffee 3DCoffee (optimize the alignment with 3D structure) Mafft
• Alignment formats : Fasta, Clustal, Phylip, Nexus
• Alignment visualization/edition softwares SeaView Jalview BioEdit
fast
slow
CIBA courses – Brasil 2011
Alexis Dereeper
Step 2 : Alignment cleaning
• Removal of divergent regions showing a low phylogenetic signal (not very informative) These regions may not be homologous or may have been saturated by substitutions (ex: synonymous sites in coding regions)
=> Cleaned alignment more suitable for a phylogenetic analysis
• Alignment curation software GBlocks
CIBA courses – Brasil 2011
Phylogenetic analysis
Alexis Dereeper
Step 3 : Phylogenetic reconstruction
Step 3a: Choose a method for phylogenetic reconstruction
• 4 main methods/algorithms: Distance method 2 by 2 (UPGMA, Neighbor Joining)
o FastDist, BIONJ, Neighbor Maximum parsimony
o DNAPars, TNT Maximum likelihood
o PhyML, PAML Bayesian inference
o MrBayes, Beast
• Output format : distance matrix, Newick format
Choose the correct compromise between speed and performance
CIBA courses – Brasil 2011
Phylogenetic analysis
Alexis Dereeper
Step 3 : Phylogenetic reconstruction
Step 3b: Choose parameters and evolution models
• Different evolution models indicating the substitution rate for aa or nt: DNA
o Juke Cantor, Kimura, F81, HKY85, GTR protein
o JTT, WAG, Dayhoff
• Evolution test softwares: Test and selection of the best substitution model (and parameters) adapted to dataset (having the maximum likelihood)
ProtTest, ModelTest (based on PhyML)
CIBA courses – Brasil 2011
Phylogenetic analysis
Alexis Dereeper
Step 3 : Phylogenetic reconstruction
Step 3c: Estimate the branch robustness
• Bootstrap procedure
1- Re-sampling of sequences on columns : creation of a pseudo-alignment by taking some sites randomly and tree computing again.2- Reiterate the process N times.3- For each branch of the initial tree, we count the number of times we can observe it into bootstrap trees. The higher is this number, the more accurate is the branch
• aLRT test (approximate Likelihood Ratio Test) (Anisimova & Gascuel, Syst Biol, 2006) Integrated in PhyML Much faster (PhyML launched only one time)
CIBA courses – Brasil 2011
Phylogenetic analysis
Alexis Dereeper
Step 4 : Visualization and edition of phylogenetic tree
• Graphical tools available to display trees from Newick format : TreeDyn DrawGram, DrawTree ATV NJPlot
• Graphical output formats : PNG, SVG, PDF…
Step 5 : Interpretation of the tree
CIBA courses – Brasil 2011
Phylogenetic analysis