Bonnal bosc2010 bio_ruby
-
Upload
bosc-2010 -
Category
Technology
-
view
1.177 -
download
2
Transcript of Bonnal bosc2010 bio_ruby
BioRubyBioRuby
Project Update
Raoul J.P. Bonnal co-authors:Raoul J.P. Bonnal
Life Science Informatics
Integrative Biology Program
Fondazione INGM
Italy
11th Annual Bioinformatic Open Source Conference (BOSC) 2010
Boston, Massachusetts, USA
co-authors:
Toshiaki Katayama
Pjotr Prins
Mitsuteru Nakao
Christian M Zmasek
Nahoisa Goto
Introduction
BioRuby - bioinformatics library for Ruby language
• Object oriented scripting language, functional and reflective
• has become popular by "Ruby on Rails“
• created by Matz in 1993 in Japan• created by Matz in 1993 in Japan
BioRuby & Platforms
Ruby Interpreter
Performances
Ruby
RubyEE
Portability
JRuby
Java libraries
Operating Systems
gem install bio
BioRuby & PlatformsBioLib
Ruby Interpreter
Performances
Ruby
RubyEE
Portability
JRuby
Java libraries
Operating Systems
gem install bio
BioRuby & PlatformsCytoscape
Ruby Interpreter
Performances
Ruby
RubyEE
Portability
JRuby
Java libraries
Operating Systems
gem install bio
History2008 20102009
WebServices Workflows SemanticWeb
Code fest1.3.0
1.3.11.4.0
GSoC
•Ruby 1.9.2
GSoC
•phyloXML
Code fest
BOSC
---+++git
•Ruby 1.9.2
•NeXML I/O, RDF triples
•Infer gene duplications
•phyloXML
GitHub:
http://github.com/bioruby/bioruby
GSoC references:Ruby 1.9.2 support of BioRuby (OBF)
Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent)
Implementation of algorithm to infer gene duplications in BioRuby (OBF)
Implementing phyloXML support in BioRuby (NESCent)
BioRuby Features
Category Modules
Object Sequence pathway, tree, bibliography referenceObject Sequence pathway, tree, bibliography reference
Sequence
Manipulation
translation, alignment, location,mapping, feature table, molecular
weight, design siRNA, restriction enzyme
Format GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTQ, GFF,
MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology
Tool BLAST, FASTA, EMBOSS, HMMER, InterProScan,GenScan, BLAT, Sim4,
Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons
Phylogeny PHYLIP, PAML, phyloXML, NEXUS, NewickPhylogeny PHYLIP, PAML, phyloXML, NEXUS, Newick
Web Service NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM
ODBA BioSQL, BioFetch, indexed flat files
Shell Interactive environment for rapid Bioinformatics analyses
Relevant New Features1
Bio::SQL Interoperable storage of sequences -Raoul Bonnal-
require ‘ bio ’ #active_record (ORM)#active_record (ORM)#your_database_adapter (MYSQL, Postgresql,JDBC)connection = Bio::SQL. establish_connection ({‘development=>{‘hostname=>you_host_name,
‘database’=> ‘CoolBioSeqDB’,‘adapter’=> ‘jdbcmysql’‘username’=> ‘Raoul’,‘password’=> ‘SmartPassword’},
‘development’)#read a GenBank file and store:my_sotrage = Bio::SQL:: Biodatabase.find (:first)my_sotrage = Bio::SQL:: Biodatabase.find (:first)genbank = Bio::GenBank.open(‘dbvrl1.gb’)genbank.each_entry do |gb|
Bio::SQL::Sequence.new(:biosequence=>gb.to_bioseque nce,:biodatabase=>my_sotrage)
end
#fetch an accession is easyBio::SQL.fetch_accession(your_accession).to_biosequ ence.output(:embl)
Relevant New Features2
Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek-
require ‘ bio ’ # libxml-ruby
#Create a parserphyloxml = Bio::PhyloXML::Parser.new(‘example.xml’)
#Consume the treephyloxml.each do |tree|
puts tree.nameend#Wrintingwriter = Bio::PhyloXML::Writer.new(‘my_tree.xml’)write.writer (tree2)write.writer (tree2)
#Extract informationphyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy _mollusca.xml’)phyloxml.each do |tree|
tree.each_nome do |node|print ‘Scientific name: ‘, node.taxonomies[0].scien tific_name,‘\n’
endend Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for
evolutionary biology and
comparative genomics. BMC Bioinformatics, 10, 356.
Relevant New Features3
Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto-
require ‘ bio ’ff_fasta = Bio:: FlatFile.open ( filename.fasta )ff_fasta = Bio:: FlatFile.open ( filename.fasta )ff_qual = Bio::FlatFile.open(filename.qual)
while entry_fasta = ff_fasta.next_entryseq = entry_fasta.to_biosequenceseq.quality_score_type = :phredseq.quality_scores = ff_qual.next_entry.dataputs seq.output(:fastq,
:title => entry_fasta.definition)end
● Format supported: SOLEXA, ILLUMINA
Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P.
M. (2010). The Sanger
FASTQ file format for sequences with quality scores, and
the Solexa/Illumina
FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.
Relevant New Features4
Bio::NCBI::REST exampleBio::NCBI::REST examplerequire ‘ bio ’ncbi = Bio::NCBI::REST::ESearch.newncbi.search("nucleotide", "tardigrada")ncbi.count("nucleotide", "tardigrada")ncbi.nucleotide("tardigrada")ncbi.taxonomy("tardigrada")ncbi.pubmed("tardigrada", "reldate" => 365)ncbi.pubmed("mammoth mitochondrial genome")
Bio::TogoWSBio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGGrequire ‘ bio ’t = Bio::TogoWS::REST.newputs t.entry('genbank', 'AF237819')puts t.search('uniprot', 'lung cancer')
BioRuby is Agile
● OpenBio* developers are the Stakeholders
Speed up in the iteration proccess● Speed up in the iteration proccess
● Frequent meetings (mail, skype/voice chat, irc)
● Test Everything (required for new features)
– Improve quality , maintainability and guarantee portability
– Ruby Unit Testing Framework , Rspec
● GitHub
● Low barries for new developers
● 32 forks and 100 people watching us
Agile Manifesto
Moving to Agile Programming
2500
1000
1500
2000
Tests
Tutorial's lines
0
500
1000
1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
Refactoring
3000
3500
1500
2000
2500
3000
Files
Classes
Modules
Methods
0
500
1000
1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
Methods
Ongoing Work
● Semantic Web (started @ BioHackathon 2010)
Expose data in RDF● Expose data in RDF
● Consuming SPARQL end points efficiently
● Ruby 1.9.2 support of BioRuby ( GSoC & OBF)
● Improved performances
● Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC &
NESCent)NESCent)
● Implementation of algorithm to infer gene duplications in BioRuby
(GSoC & OBF)
PlugIn system
● We want a BioRuby core stable on every OS
But… we want to use experimental code ASAP● But… we want to use experimental code ASAP
● BioRuby + BioRuby Plugin + Rails we can have multiple
applications with an unique core and specific features
– User or Application
● Suggest Guidelines for plugin namespace
● On GitHub you can find our plugins looking for
bioruby -plugin -NAME
PlugIn system
The plugin system will be delivered with the next
BioRuby releaseBioRuby release
BioGraphics – Jan Aerts-
For biologists:
bioruby --plugin install graphics
For geeks:For geeks:
bioruby --plugin install git://github.com/user/repo.g it
It’s very experimental
What We Need
● Better integration with R
● Better support for data visualization (interpretation)
● Detailed Roadmap
Publications
BioRuby: Bioinformatics software for the Ruby programming language (submitted)
Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama
The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and
workflows (accepted)
Toshiaki Katayama et all.
Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010)
TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids
Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010)
Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010).
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic Acids Res, 38(6), 1767.1771.
Over 24 articles use BioRuby as in their analyses, check the up to date list:
http://bioruby.open-bio.org/wiki/Research_using_BioRuby
Acknoledgments
● BioRuby Team
● Toshiaki Katayama*
Open Bioinformatics Foundation
● Toshiaki Katayama*
● Naoshita Goto*
● Pjotr Prins*
● Mitsuteru Nakao*
● Jan Aerts*
● Christian M Zmasek*
● All GSoC students
Google Summer of Code
Database Center for Life Science
All GSoC students
NESCentNational Evolutionary Synthesis Center
* co-author