Bosc2011 ntino-krampis-full
-
Upload
bioinformatics-open-source-conference -
Category
Technology
-
view
555 -
download
0
Transcript of Bosc2011 ntino-krampis-full
Cloud BioLinux: open source, fully-customizable
bioinformatics computing on the cloud for the
genomics community and beyond
BOSC 2011 - Vienna, Austria
Ntino Krampis, PhDAsst. Professor
J. Craig Venter Institute (JCVI)[email protected]
The community is what makes an open source project
Brad Chapman, Tim Booth, Mesude Bicak, Dawn Field, Dan Pass – core development and planning
Enis Afgan, Pjotr Prins, Stephen Möller - and all other members of the cloud biolinux community that move it fwd
J. Craig Venter Inst. - time allowed to work on an open-source project
Expensive sequencing and large organizations
Commodity sequencing and small labs
● large sequencing center, multi-million, broad-impact sequencing projects
● dedicated bioinformatics department, compute clusters
● small-factor, bench-top sequencer available: GS Junior by 454
● sequencing as a standard technique in basic biology and genetics research
● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
Will small labs become the long tail of sequencing ?
amount of sequencing
number of labs
Credit: WikiMedia Commons
“Bioinformatics nation is a land of city-states” Lincoln Stein
● small labs building small-scale bioinformatics infrastructures
● duplication of effort in compiling and installing software tools
● some groups have no hardware, expertise, or time to install and run software
● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools
● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS
how about large-scale sequence datasets ?
Cloud BioLinuxpre-configured and on-demand bioinformatics computing on the cloud
cloudbiolinux.org
+
=
● JCVI cloud computing research
● NEBC BioLinux software repository
● community effort – Hackathon / BOSC 2010 - 11
● Virtual Machine (VM) on Amazon cloud
● large-scale computing independently of institutional or geographic boundaries
● only need a desktop computer with internet access
http://tinyurl.com/cloud-biolinux-tutorial
signup at
aws.amazon.com
simple for end-users
Amazon EC2
→
linux desktop
via remote
desktop client
What if I want to share my
alignments with a collaborator?
save your data as a new VM
0.10$ / GB / month
at 15GB, it costs 1.5$ / month
“whole system snapshot exchange” (Dudley and Butte 2010)
capture the state of the computing system and data
software execution parameters and “massaged” input datasets
● customize Cloud BioLinux based on community requirements
● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)
● share customized VMs with collaborators, avoiding effort duplication
● deploy Cloud BioLinux on private and local clouds
Cloud BioLinux developer's frameworkcreate cloud VM / images with standardized software configurations
software domains in bioinformatics: nextgen sequencing, de novo assembly, annotation, phylogeny,
molecular structures, gene expression analysis
github.com/chapmanb/cloudbiolinux
● based on python-fabric auto-deployment tool
● software components listed in plain text files
● collaborators use files to share descriptions of cloud VM / images
● start with a bare-bones VM / image
● fabric downloads and installs specified software
tinyurl.com/python-fabric
Cloud BioLinux developer's framework
Cloud Biolinux
The future
● groups.google.com/cloudbiolinux and cloudbiolinux.org
● expand community, receive feedback, add more software to the VM
● scalable computing: SGE (Galaxy Cloudman), Hadoop (cloudgene.uibk.ac.at)
● add next-gen sequencing pipelines, NIH funding - adds effort in development
● We just had a 2-day codefest at the MetaLab, http://metalab.at/
and before I finish this talk....
Thank you !