E Talevich - Biopython project-update

21
Project Update Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues, and Biopython contributors Bioinformatics Open Source Conference (BOSC) July 14, 2012 Long Beach, California, USA

Transcript of E Talevich - Biopython project-update

Page 1: E Talevich - Biopython project-update

Project Update

Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues,

and Biopython contributors

Bioinformatics Open Source Conference (BOSC)July 14, 2012

Long Beach, California, USA

Page 2: E Talevich - Biopython project-update

Hello, BOSC

Biopython is a freely available Python library for biological computation, and a long-running, distributed collaboration to produce and maintain it [1].● Supported by the Open Bioinformatics Foundation

(OBF)● "This is Python's Bio* library. There are several Bio*

libraries like it, but this one is ours."● http://biopython.org/_____[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163

Page 3: E Talevich - Biopython project-update

Bio.Graphics (Biopython 1.59, February 2012)

New features in...BasicChromosome:

● Draw simple sub-features on chromosome segments● Show the position of genes, SNPs or other loci

GenomeDiagram [2]:● Cross-links between tracks● Track-specific start/end positions for showing regions

_____[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021

Page 4: E Talevich - Biopython project-update

BasicChromosome: Potato NB-LRRs

Jupe et al. (2012) BMC Genomics

Page 6: E Talevich - Biopython project-update

GenomeDiagram imitatesArtemis Comparison Tool (ACT)

Page 7: E Talevich - Biopython project-update

SeqIO and AlignIO(Biopython 1.58, August 2011)

● SeqXML format [3]

● Read support for ABI chromatogram files (Wibowo A.)

● "phylip-relaxed" format (Connor McCoy, Brandon I.)○ Relaxes the 10-character limit on taxon names○ Space-delimited instead○ Used in RAxML, PhyML, PAML, etc.

_____[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025

Page 8: E Talevich - Biopython project-update

Bio.Phylo & pypaml

● PAML interop: wrappers, I/O, glue○ Merged Brandon Invergo’s pypaml as

Bio.Phylo.PAML (Biopython 1.58, August 2011)

● Phylo.draw improvements

● RAxML wrapper (Biopython 1.60, June 2012)

● Paper in review [4]

_____[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo: a unified toolkit for processing, analysis and visualization of phylogenetic data in Biopython. BMC Bioinformatics, in review

Page 9: E Talevich - Biopython project-update

Phylo.draw and matplotlib

Page 10: E Talevich - Biopython project-update

Bio.bgzf (Blocked GNU Zip Format)

● BGZF is a GZIP variant that compresses blocks of a fixed, known size

● Used in Next Generation Sequencing for efficient random access to compressed files○ SAM + BGZF = BAM

Bio.SeqIO can now index BGZF compressed sequence files. (Biopython 1.60, June 2012)

Page 11: E Talevich - Biopython project-update

TogoWS(Biopython 1.59, February 2012)

● TogoWS is an integrated web resource for bioinformatics databases and services

● Provided by the Database Center for Life Science in Japan

● Usage is similar to NCBI Entrez

_____http://togows.dbcls.jp/

Page 12: E Talevich - Biopython project-update

PyPy and Python 3

Biopython:● works well on PyPy 1.9

(excluding NumPy & C extensions)● works on Python 3 (excluding some C

extensions), but concerns remain about performance in default unicode mode.○ Currently 'beta' level support.

Page 13: E Talevich - Biopython project-update

Bio.PDB

● mmCIF parser restored (Biopython 1.60, June 2012)○ Lenna Peterson fixed a 4-year-old lex/yacc-related

compilation issue○ That was awesome○ Now she's a GSoC student○ Py3/PyPy/Jython compatibility in progress

● Merging GSoC results incrementally○ Atom element names & weights (João Rodrigues,

GSoC 2010)○ Lots of feature branches remaining...

Page 14: E Talevich - Biopython project-update

Bio.PDB feature branches

'10 '11 '12 ...

GSOC

mmCIF Parser

Bio.Struct

InterfaceAnalysis

Mocapy++Generic Features

PDBParser

Page 15: E Talevich - Biopython project-update

Google Summer of Code (GSoC)

In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)

In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)

_____http://biopython.org/wiki/Google_Summer_of_Codehttp://www.open-bio.org/wiki/Google_Summer_of_Codehttps://www.google-melange.com/

Page 16: E Talevich - Biopython project-update

GSoC 2011: Mikael Trellet

Biomolecular interfaces in Bio.PDBMentor: João Rodrigues

● Representation of protein-protein interfaces: SM(I)CRA

● Determining interfaces from PDB coordinates● Analyses of these objects

_____http://biopython.org/wiki/GSoC2011_mtrellet

Page 17: E Talevich - Biopython project-update

GSoC 2011: Michele Silva

Python/Biopython bindings for Mocapy++Mentor: Thomas Hamelryck

Michele Silva wrote a Python bridge for Mocapy++ and linked it to Bio.PDB to enable statistical analysis of protein structures.

More-or-less ready to merge after the next Mocapy++ release._____http://biopython.org/wiki/GSOC2011_Mocapy

Page 18: E Talevich - Biopython project-update

Mocapy extensions in PythonMentor: Thomas Hamelryck

Enhance Mocapy++ in a complementary way, developing a plugin system for Mocapy++ allowing users to easily write new nodes (probability distribution functions) in Python.

He's finishing this as part of his master's thesis project with Thomas Hamelryck._____http://biopython.org/wiki/GSOC2011_MocapyExt

GSoC 2011: Justinas Daugmaudis

Page 19: E Talevich - Biopython project-update

GSoC 2012: Lenna Peterson

Diff My DNA: Development of a Genomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon

● I/O for VCF, GVF formats● internal schema for variant data

_____http://arklenna.tumblr.com/tagged/gsoc2012

Page 20: E Talevich - Biopython project-update

GSoC 2012: Wibowo Arindrarto

SearchIO implementation in BiopythonMentor: Peter Cock

Unified, BioPerl-like API for search results from BLAST, HMMer, FASTA, etc.

_____http://biopython.org/wiki/SearchIOhttp://bow.web.id/blog/tag/gsoc/

Page 21: E Talevich - Biopython project-update

Thanks

● OBF● BOSC organizers● Biopython contributors● Scientists like you

Check us out:● Website: http://biopython.org● Code: https://github.com/biopython/biopython