Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported...

10
Globus Genomics at GSI Boston University Dinanath Sulakhe, Alex Rodriguez July 2014

Transcript of Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported...

Page 1: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

Globus Genomics at GSI Boston University Dinanath Sulakhe, Alex Rodriguez July 2014

Page 2: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

1. Introduction to Globus Genomics - Key features of Globus Genomics - How to use Globus Transfer

2. Introduce the new BU production instance and how to access it

- Data upload using Globus transfer into the platform - Walk through the Best Practices pipelines

3. Getting started with a Best practices pipelines,

- Create, Modify a workflow and run a workflow - How to download the final results (using Globus Transfer)

4. Conclusion

- Key points to take from the workshop. - Any Questions

Agenda

Page 3: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

•  Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute (CI) – University of Chicago / Argonne National Lab

•  The CI is a non-profit organization established nearly 15 years ago that is home to roughly 100 researchers and staff

•  Our goal is to support the advancement of science by bringing together our strengths in computation and informatics to help meet the unique needs of researchers in fields such as biosciences and biomedicine

Who We Are

Page 4: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

Challenges in Sequencing Analysis

Sequencing  Centers  

Sequencing  Centers  

Data  Movement  and  Access  Challenges  

Manual  Data  Analysis  

Public  Data  

Storage  

Local  Cluster/  Cloud  Seq  

Center  

Research  Lab  

•  Data  is  distributed  in  different  loca?ons  

•  Research  labs  need  access  to  the  data  for  analysis    

•  Be  able  to  Share  data  with  other  researchers/collaborators  •  Inefficient  ways  of  data  movement  

•  Data  needs  to  be  available  on  the  local  and  Distributed  Compute  Resources    

•  Local  Clusters,  Cloud,  Grid  

How  do  we  analyze  this    Sequence  Data  

Once  we  have  the  Sequence  Data  

Picard  

GATK  

Fastq   Ref  Genome  

Alignment  

Variant  Calling  

•  Manually  move  the  data  to  the  Compute  node  

(Re)Run  Script  

Install  

Modify  

•  Install  all  the  tools  required  for  the  Analysis  •  BWA,  Picard,  GATK,  Filtering  Scripts,  etc.  

•  Shell  scripts  to  sequen?ally  execute  the  tools  •  Manually  modify  the  scripts  for  any  change  

•  Error  Prone,  difficult  to  keep  track,  messy..  •  Difficult  to  maintain  and  transfer  the  knowledge  

Page 5: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

Globus Genomics

Sequencing  Centers  

Sequencing  Centers  

Public  Data  

Storage  

Local  Cluster/  Cloud  Seq  

Center  

Research  Lab  

Globus  Provides  a  •  High-­‐performance    •  Fault-­‐tolerant  •  Secure  

file  transfer  Service  between  all  data-­‐endpoints  

Data  Management   Data  Analysis  

Picard

GATK

Fastq Ref Genome

Alignment

Variant Calling

Galaxy    Data  Libraries  

•  Globus  Integrated  within  Galaxy  

•  Web-­‐based  UI  •  Drag-­‐Drop  workflow  

crea?ons  •  Easily  modify  Workflows  

with  new  tools  

Globus  Genomics  on  Amazon  EC2  

•  Analy?cal  tools  are  automa?cally  run  on  the  scalable  compute  resources  when  possible  

Galaxy  Based  Workflow  Management  System  Globus  Genomics  

Page 6: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

•  Workflows can be easily defined and automated with integrated Galaxy Platform capabilities

•  Data movement is streamlined with integrated Globus file-transfer functionality

•  Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure

Globus Genomics

Page 7: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

•  Professionally managed and supported platform •  Best practice pipelines

–  Whole Genome, Exome, RNA-Seq, ChIP-Seq, …

•  Enhanced workbench with breadth of analytic tools •  Technical support and bioinformatics consulting •  Access to pre-integrated end-points for reliable and high-

performance data transfer (e.g. Broad Institute, Perkin Elmer, university sequencing centers, etc.)

Additional Capabilities

Page 8: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

1.  Access to GSI - Globus Genomic instance (Galaxy) –  Register with Globus.org –  Join the “GSI-BU – Globus Genomics” group –  Go to http://bu.globusgenomics.org

2.  Globus Transfer –  Setup a Globus Connect Endpoint –  Transfer data into Globus Genomics (Galaxy)

3.  Best Practices Workflows (Exome-seq, RNA-seq,ChIPseq)

–  Reuse or modify the best-practices pipelines –  Create a new workflow

4.  Run a workflow.

Demo

Page 9: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

1.  “Globus Transfer” – Setup Endpoints –  www.globus.org/globus-connect

2.  Join “Globus Group” –  https://www.globus.org/groups –  “GSI-BU – Globus Genomics”

3.  Access “Globus Genomics” –  http://bu.globusgenomics.org

4.  Best practices pipelines –  Use sample data at endpoint: “sulakhe#SequencingCenter”

Take away points..

Page 10: Globus Genomics at GSI Boston University...• Globus Genomics is developed, operated, and supported by researchers, developers, and bioinformaticians at the Computation Institute

www.globus.org/genomics

Questions?