Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
-
Upload
gigascience-bgi-hong-kong -
Category
Technology
-
view
103 -
download
1
description
Transcript of Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap, LEESchool of Biomedical Sciences
CUHK-BGI Innovation Institute of Trans-omics,The Chinese University of Hong Kong
CUHK-BGI Innovation Institute of Trans-
Omics (CBIIT)
• Jointly established between The Chinese University of Hong Kong (CUHK) and BGI in July 2011.
• “We aim to provide a platform conductive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics, computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”
Galaxy
http://galaxyproject.org/
www.gigasciencejournal.com
Journal, data-platform and database for large-scale data
Editor-in-Chief: Laurie GoodmanExecutive Editor: Scott Edmunds
Commissioning Editor: Nicole NogoyLead Curator: Chris Hunter
Data Platform: Peter Li
in conjunction with
GigaDB
Giga-Galaxy Collaboration between GigaScience and CBIIT
A publicly accessible Galaxy Servers
Share some of the workload of the main Galaxy server
Host data and workflows published in GigaScience, particularly involving NGS data analysis
SOAP package: advantages from GigaGalaxy
Application Instance: SOAPdenovo2 tool
http://www.cuhk.edu.hk/cbiit/galaxy.html
Galaxy/CUHK-BGI
Import data from GigaDB to
GigaGalaxy
GigaSolution: deconstructing the paper
www.gigadb.orgwww.gigasciencejournal.com
galaxy.cbiit.cuhk.edu.hk
Combines and integrates:
Open-access journal
Data Publishing Platform
Data Analysis Platform
doi:10.1186/2047-217X-1-18doi:10.5524/100038
AnalysisData Methods
doi:10.5524/100044+ =
Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). GigaScience Database. http://dx.doi.org/10.5524/100038
Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”. GigaScience Database. http://dx.doi.org/10.5524/100044
Data
Methods
Luo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18
Analysis
Example
CBIIT GigaGalaxy Structure
ToolDevelopment PublishingBiomedical and bioinformatics research
What is SOAP?• SOAP - a tool package that provides full solution to NGS data analysis by BGI.
http://soap.genomics.org.cn/
SOAPdenovo2 tools An assembly tool for short reads generated from
NGS technology
Four modules Pregraph: construct bruijn graph Contig: identification from overlapping sequence reads Map: reads onto contigs Scaff: generate final assembly results
Generate 1. Contig and 2. Scaffold files
SOAPdenovo2 in GigaGalaxy
Integrate BGI SOAP tools into Giga-
Galaxy
Assembly Supporting Tools
• SOAPfilter: removed reads with artifacts
• Kmerfreq HA: a kmer frequency counter
• Corrector HA: corrects sequencing errors in short reads
• Gapcloser: close gaps in scaffolds
Put them together
Sequencing Data SOAPfilter kmerFreq HA
Corrector HASOAPdenovo2GAGE evaluation
Soapdenovo2 Workflow
S. Aureus Dataset
GAGE
Visualization Tool: CONTIGuator2
CONTIGuator2 output
VisualizationNC_010079.pdf
gi_161510924_ref_NC_010063.1_.pdf
Help Center: Shared Data• Several Datasets are available from the shared
data menu for test-running the tools. • Data Libraries• Published Workflows• Published Pages
What is in the shared data menu?
SOAPdenovo2 tutorial
How is GigaScience supporting
data reproducibility?
Data sets
Analyses
Linked to
Linked to
DOI
DOI
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
~10000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-PipelinesOpen-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/~5000 downloads
Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2
SOAPdenovo2 workflows implemented in
galaxy.cbiit.cuhk.edu.hk
Implemented entire workflow in GigaGalaxy server, inc.:
• 3 pre-processing steps
• 4 SOAPdenovo modules
• 1 post processing steps
• Evaluation and visualization tools
Will be available for >25K Galaxy users in Galaxy Toolshed
Acknowledgements• CUHK
• Huayuan Gao
• BGI-HK and GigaScience• Peter Li• Scott Edmunds
• Galaxy team members