Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
-
Upload
gigascience-bgi-hong-kong -
Category
Technology
-
view
103 -
download
0
description
Transcript of Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
![Page 1: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/1.jpg)
GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap, LEESchool of Biomedical Sciences,
CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong,
Hong Kong SAR, China.
![Page 2: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/2.jpg)
CBIIT• Jointly established between
The Chinese University of Hong Kong (CUHK) and BGI.
• “We aim to provide a platform conducive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics , computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”
![Page 3: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/3.jpg)
Genomic Data Submission and Analytical Platform(GDSAP)
Objectives:• Provides enhanced functionality in additional to the original Galaxy functions:
• Customized public instances.• Seamless integration with SBS-UCSC genome database mirror and
MyExperiment workflow environment.• Exchange and publish data through GigaSciences journal portal.
Outcomes: • Simplies complicated bioinformatics tasks, accelerate data processing and
allow flexible analysis.• Significantly reduce software and hardware costs, encourage research
collaboration.
![Page 4: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/4.jpg)
GDSAP Structure
ToolDevelopment PublishingBiomedical and bioinformatics research
![Page 5: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/5.jpg)
http://www.cuhk.edu.hk/cbiit/galaxy.html
Galaxy/CUHK-BGI
![Page 6: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/6.jpg)
GDSAP Structure
ToolDevelopment PublishingBiomedical and bioinformatics research
![Page 7: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/7.jpg)
What is SOAP?• SOAP - a tool package that provides full solution to NGS data
analysis by BGI.
![Page 8: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/8.jpg)
Why SOAP?• Galaxy has been using SAMtools for consensus sequence calling, but the
recent upgrade has left this part out, which is very limited to some biologists.
• SOAPsnp is the only other method that can call full consensus sequences besides SAMtools.
• The main galaxy site supports none of the SOAP tools, including SOAPsnp.
![Page 9: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/9.jpg)
Galaxy Tool Shed
• Enables sharing of Galaxy tools across Galaxy servers around the world.
• SOAP package tools configured for use in Galaxy.– SOAPsnp/SOAPdenovo
![Page 10: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/10.jpg)
Implement: SOAPsnp
![Page 11: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/11.jpg)
Implement: SOAPdenovo configuration file
![Page 12: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/12.jpg)
Implement: SOAPdenovo
![Page 13: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/13.jpg)
GDSAP structure
BioinformaticsDevelopment PublishingBiomedical and bioinformatics research
![Page 14: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/14.jpg)
How does it work?• MyExperiment works as a repository for
workflows.
• Taverna workflows.
• New: Galaxy workflows.
• GDSAP integration
![Page 15: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/15.jpg)
Taverna workflow
![Page 16: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/16.jpg)
![Page 17: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/17.jpg)
Galaxy workflow
![Page 18: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/18.jpg)
Import (1)
![Page 19: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/19.jpg)
Import (2)
![Page 20: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/20.jpg)
Export (1)
![Page 21: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/21.jpg)
Export (2)
![Page 22: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/22.jpg)
GDSAP structure
BioinformaticsDevelopment PublishingBiomedical and bioinformatics research
![Page 23: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/23.jpg)
www.gigasciencejournal.com
Large-Scale Data Journal/Database
Editor-in-Chief: Laurie Goodman, PhDEditor: Scott Edmunds, PhDAssistant Editor: Alexandra Basford, PhD
In conjunction with:
Now taking submissions…
![Page 24: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/24.jpg)
GigaScience is go…
![Page 25: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/25.jpg)
www.gigaDB.org
Data Publishing
![Page 26: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/26.jpg)
37 Datasets with DOI®s
PlantsChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghum
MicrobesE. Coli O104:H4 TY-2482Cell-LineChinese Hamster OvaryMouse Methylomes
Human Asian individual (YH) v1+v2- DNA Methylome - Genome Assembly- TranscriptomeCancer (14TB)Hep B infected exomesSingle Cell Bladder CancerAncient DNA - Saqqaq Eskimo - Aboriginal Australian
VertebratesGiant panda Macaque - Chinese rhesus - Crab-eatingMini-PigNaked mole rat Penguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope
InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSchistosomaSilkworm
Released pre-publicationNon-BGIPaper in GigaScience
Coming soon…Microbiome dataParrot
![Page 27: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/27.jpg)
Genomic Data Submission and Analytical platform
GDSAP:
GigaDB v2 export to GDSAP
![Page 28: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/28.jpg)
Data Modeling
Pipeline design
Validation
Applications
Genomic Data Submission and Analytical platform
Big data from the
“Sequencing Coal Face”
GDSAP:
Data, Data, Data…
Tin-Lap Lee, CUHK
![Page 29: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/29.jpg)
Acknowledgements• Lee Lab (CUHK)
– Huayan Gao
• GigaScience– Scott Edmunds– Peter Li– Tam Sneddon
• BGI-Hong Kong– Dennis Chan– Edmond Leung
• Galaxy team– Nate Coraor
• myExperiment– Finn Bacall– Dave De Roure
• NBIC– Kostas Karasavvas
![Page 30: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis](https://reader035.fdocuments.us/reader035/viewer/2022070304/54c8ae3c4a79594b1c8b45ca/html5/thumbnails/30.jpg)
Thank you