E Talevich - Biopython project-update
-
Upload
jan-aerts -
Category
Technology
-
view
842 -
download
6
Transcript of E Talevich - Biopython project-update
Project Update
Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues,
and Biopython contributors
Bioinformatics Open Source Conference (BOSC)July 14, 2012
Long Beach, California, USA
Hello, BOSC
Biopython is a freely available Python library for biological computation, and a long-running, distributed collaboration to produce and maintain it [1].● Supported by the Open Bioinformatics Foundation
(OBF)● "This is Python's Bio* library. There are several Bio*
libraries like it, but this one is ours."● http://biopython.org/_____[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163
Bio.Graphics (Biopython 1.59, February 2012)
New features in...BasicChromosome:
● Draw simple sub-features on chromosome segments● Show the position of genes, SNPs or other loci
GenomeDiagram [2]:● Cross-links between tracks● Track-specific start/end positions for showing regions
_____[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021
BasicChromosome: Potato NB-LRRs
Jupe et al. (2012) BMC Genomics
GenomeDiagram:A tale of three phages
Swanson et al. (2012) PLoS One (to appear)
GenomeDiagram imitatesArtemis Comparison Tool (ACT)
SeqIO and AlignIO(Biopython 1.58, August 2011)
● SeqXML format [3]
● Read support for ABI chromatogram files (Wibowo A.)
● "phylip-relaxed" format (Connor McCoy, Brandon I.)○ Relaxes the 10-character limit on taxon names○ Space-delimited instead○ Used in RAxML, PhyML, PAML, etc.
_____[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025
Bio.Phylo & pypaml
● PAML interop: wrappers, I/O, glue○ Merged Brandon Invergo’s pypaml as
Bio.Phylo.PAML (Biopython 1.58, August 2011)
● Phylo.draw improvements
● RAxML wrapper (Biopython 1.60, June 2012)
● Paper in review [4]
_____[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo: a unified toolkit for processing, analysis and visualization of phylogenetic data in Biopython. BMC Bioinformatics, in review
Phylo.draw and matplotlib
Bio.bgzf (Blocked GNU Zip Format)
● BGZF is a GZIP variant that compresses blocks of a fixed, known size
● Used in Next Generation Sequencing for efficient random access to compressed files○ SAM + BGZF = BAM
Bio.SeqIO can now index BGZF compressed sequence files. (Biopython 1.60, June 2012)
TogoWS(Biopython 1.59, February 2012)
● TogoWS is an integrated web resource for bioinformatics databases and services
● Provided by the Database Center for Life Science in Japan
● Usage is similar to NCBI Entrez
_____http://togows.dbcls.jp/
PyPy and Python 3
Biopython:● works well on PyPy 1.9
(excluding NumPy & C extensions)● works on Python 3 (excluding some C
extensions), but concerns remain about performance in default unicode mode.○ Currently 'beta' level support.
Bio.PDB
● mmCIF parser restored (Biopython 1.60, June 2012)○ Lenna Peterson fixed a 4-year-old lex/yacc-related
compilation issue○ That was awesome○ Now she's a GSoC student○ Py3/PyPy/Jython compatibility in progress
● Merging GSoC results incrementally○ Atom element names & weights (João Rodrigues,
GSoC 2010)○ Lots of feature branches remaining...
Bio.PDB feature branches
'10 '11 '12 ...
GSOC
mmCIF Parser
Bio.Struct
InterfaceAnalysis
Mocapy++Generic Features
PDBParser
Google Summer of Code (GSoC)
In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)
In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)
_____http://biopython.org/wiki/Google_Summer_of_Codehttp://www.open-bio.org/wiki/Google_Summer_of_Codehttps://www.google-melange.com/
GSoC 2011: Mikael Trellet
Biomolecular interfaces in Bio.PDBMentor: João Rodrigues
● Representation of protein-protein interfaces: SM(I)CRA
● Determining interfaces from PDB coordinates● Analyses of these objects
_____http://biopython.org/wiki/GSoC2011_mtrellet
GSoC 2011: Michele Silva
Python/Biopython bindings for Mocapy++Mentor: Thomas Hamelryck
Michele Silva wrote a Python bridge for Mocapy++ and linked it to Bio.PDB to enable statistical analysis of protein structures.
More-or-less ready to merge after the next Mocapy++ release._____http://biopython.org/wiki/GSOC2011_Mocapy
Mocapy extensions in PythonMentor: Thomas Hamelryck
Enhance Mocapy++ in a complementary way, developing a plugin system for Mocapy++ allowing users to easily write new nodes (probability distribution functions) in Python.
He's finishing this as part of his master's thesis project with Thomas Hamelryck._____http://biopython.org/wiki/GSOC2011_MocapyExt
GSoC 2011: Justinas Daugmaudis
GSoC 2012: Lenna Peterson
Diff My DNA: Development of a Genomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon
● I/O for VCF, GVF formats● internal schema for variant data
_____http://arklenna.tumblr.com/tagged/gsoc2012
GSoC 2012: Wibowo Arindrarto
SearchIO implementation in BiopythonMentor: Peter Cock
Unified, BioPerl-like API for search results from BLAST, HMMer, FASTA, etc.
_____http://biopython.org/wiki/SearchIOhttp://bow.web.id/blog/tag/gsoc/
Thanks
● OBF● BOSC organizers● Biopython contributors● Scientists like you
Check us out:● Website: http://biopython.org● Code: https://github.com/biopython/biopython