Globus Genomics at GSI Boston University Dinanath Sulakhe, Alex Rodriguez July 2014
www.globus.org/genomics
1. Introduction to Globus Genomics - Key features of Globus Genomics - How to use Globus Transfer
2. Introduce the new BU production instance and how to access it
- Data upload using Globus transfer into the platform - Walk through the Best Practices pipelines
3. Getting started with a Best practices pipelines,
- Create, Modify a workflow and run a workflow - How to download the final results (using Globus Transfer)
4. Conclusion
- Key points to take from the workshop. - Any Questions
Agenda
www.globus.org/genomics
• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute (CI) – University of Chicago / Argonne National Lab
• The CI is a non-profit organization established nearly 15 years ago that is home to roughly 100 researchers and staff
• Our goal is to support the advancement of science by bringing together our strengths in computation and informatics to help meet the unique needs of researchers in fields such as biosciences and biomedicine
Who We Are
www.globus.org/genomics
Challenges in Sequencing Analysis
Sequencing Centers
Sequencing Centers
Data Movement and Access Challenges
Manual Data Analysis
Public Data
Storage
Local Cluster/ Cloud Seq
Center
Research Lab
• Data is distributed in different loca?ons
• Research labs need access to the data for analysis
• Be able to Share data with other researchers/collaborators • Inefficient ways of data movement
• Data needs to be available on the local and Distributed Compute Resources
• Local Clusters, Cloud, Grid
How do we analyze this Sequence Data
Once we have the Sequence Data
Picard
GATK
Fastq Ref Genome
Alignment
Variant Calling
• Manually move the data to the Compute node
(Re)Run Script
Install
Modify
• Install all the tools required for the Analysis • BWA, Picard, GATK, Filtering Scripts, etc.
• Shell scripts to sequen?ally execute the tools • Manually modify the scripts for any change
• Error Prone, difficult to keep track, messy.. • Difficult to maintain and transfer the knowledge
www.globus.org/genomics
Globus Genomics
Sequencing Centers
Sequencing Centers
Public Data
Storage
Local Cluster/ Cloud Seq
Center
Research Lab
Globus Provides a • High-‐performance • Fault-‐tolerant • Secure
file transfer Service between all data-‐endpoints
Data Management Data Analysis
Picard
GATK
Fastq Ref Genome
Alignment
Variant Calling
Galaxy Data Libraries
• Globus Integrated within Galaxy
• Web-‐based UI • Drag-‐Drop workflow
crea?ons • Easily modify Workflows
with new tools
Globus Genomics on Amazon EC2
• Analy?cal tools are automa?cally run on the scalable compute resources when possible
Galaxy Based Workflow Management System Globus Genomics
www.globus.org/genomics
• Workflows can be easily defined and automated with integrated Galaxy Platform capabilities
• Data movement is streamlined with integrated Globus file-transfer functionality
• Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure
Globus Genomics
www.globus.org/genomics
• Professionally managed and supported platform • Best practice pipelines
– Whole Genome, Exome, RNA-Seq, ChIP-Seq, …
• Enhanced workbench with breadth of analytic tools • Technical support and bioinformatics consulting • Access to pre-integrated end-points for reliable and high-
performance data transfer (e.g. Broad Institute, Perkin Elmer, university sequencing centers, etc.)
Additional Capabilities
www.globus.org/genomics
1. Access to GSI - Globus Genomic instance (Galaxy) – Register with Globus.org – Join the “GSI-BU – Globus Genomics” group – Go to http://bu.globusgenomics.org
2. Globus Transfer – Setup a Globus Connect Endpoint – Transfer data into Globus Genomics (Galaxy)
3. Best Practices Workflows (Exome-seq, RNA-seq,ChIPseq)
– Reuse or modify the best-practices pipelines – Create a new workflow
4. Run a workflow.
Demo
www.globus.org/genomics
1. “Globus Transfer” – Setup Endpoints – www.globus.org/globus-connect
2. Join “Globus Group” – https://www.globus.org/groups – “GSI-BU – Globus Genomics”
3. Access “Globus Genomics” – http://bu.globusgenomics.org
4. Best practices pipelines – Use sample data at endpoint: “sulakhe#SequencingCenter”
Take away points..
www.globus.org/genomics
Questions?
Top Related