BiDiBlast Tool Presentation
-
Upload
joao-feio-de-almeida -
Category
Technology
-
view
1.353 -
download
0
description
Transcript of BiDiBlast Tool Presentation
COMPARATIVE GENOMICS
- Tool Development -
COMPARATIVE GENOMICS
- Tool Development -
Driving Cause•Problem
•How many genes are absent?
Sacharomyces cerevisiae Sacharomyces kudryavzevii
•Synteny analysis?
Driving Cause•Problem
•How many genes are absent?
Sacharomyces cerevisiae Sacharomyces kudryavzevii
•Homolog ORF detection to assess differences.
Coverage gap
S. kudryavzevii contigs
Homology•Types of homology
•Origin versus time
Homology•Types of homology
•Orthology versus Paralogy versus Speciation
Homology•Types of homology
•Orthology versus Paralogy versus Speciation
•A complex picture
•Many available detection strategies – none is perfect
1. Merkeev I., P. Novichkov, and A. Mironov. 2006. PHOG: a database of supergenomes built from proteome complements. BMC Evolutionary Biology 6:52.
out-paralogy
ohnology
in-paralogy
Homology•Detecting Homolog Sequences
•Increasing levels of stringency approach
•Sequence similarity
•Similarity search
•Best reciprocal hit (BRH)
Bi-Directional BLAST
•Similar product function
•Similar domain architecture
Regular Expressions (e.g.PROSITE Patterns)
PSSM (e.g. NCBI CDD)
HMM (e.g. Pfam)
•Common (protein) family
•Similar syntenic neighbourhood
•No easy solution
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•No windows tool/server readily available
•Adapt existing PERL script
Refactor from UNIX to Windows environment
Lack of experience => effort needed?
•Migrate to UNIX environment
Same problems
•Develop simple JAVA app
Existing experience => smoother path
New useful tricks to learn
Interface command line applications
Multithreading = multitasking
In the end unanticipated problems emerged
Coding problems
Library insufficient documentation
GUI development
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•Implementation as data pipeline
•Several (thousands of) code lines
•Collection of 15 JAVA classes – 3 Packages
General routines - bidiblastsup
Data structures – bidiblastsup.objects
User interface – bidiblastsup.ui
•Uses 3 third-party libraries
BioJava 1.4 – mainly trasnlation tasks
DB4O 5.0 – data management and …
NeoBio – scoring schemes including ambiguity codes
•Integrates 4 command line tools
NCBI BLAST (blastall –p blastn)
align0 (FASTA) – ORF alignment
stretcher (EMBOSS) – protein alignment
yn00 (PAML) – dN/dS calculation
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•Implementation as data pipeline
•Swing graphical user interface (GUI)
Control over the program run
Parameter entry
BLAST database building
Result dumping
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•IS NOT an orthologous gene finding tool!
•Performs the RBH detection between pools of DNA sequences
Customised BLAST / TBLASTX parameters
Store indicator values about the results
•Stores every first BLAST hit
Bi-directional – putative orthologs
Uni-directional – putative paralogs
•Aligns the resulting hit sequences by careful global alignment
Measures the real length of the aligned regions
Proper sequence similarity
•Translates and aligns the ORF products
Global alignment using a given substitution matrix
A codon wise global alignment of the ORF as a by product
Several statistics stored
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•IS NOT an orthologous gene finding tool!
•Calculates evolution rates for every hit (pair of sequences)
Based on the codon wise global alignment of the ORFs
•Dumps the results in delimited text files
Follow on processing and analysis
Results should be imported into relational database
Spreadsheets accepted but not favoured
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•IS NOT an orthologous gene finding tool!
•Result filtration runs on the final user
Sequence length mismatches (e.g. 80% to 120%)
Similarity threshold
Intervening STOP codons as ORF quality control
•Usage scope
Comparative genomics
Annotation of ORF from newly sequenced genomes
Estimation of evolution rates for sets sequence
etc…
Homology•Detecting Homolog Sequences
•Bi-Directional BLAST
•Future developments
•Domain architecture detection in products
Integration problems
What kind of formalism?
Downstream or upstream?
•Other assorted improvements
User interface
Result management inside the application
Sinteny integration
Matching against whole genome / chromosome / contig