CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services
description
Transcript of CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services
CSU IDRC Next Generation Sequencing CoreGenomic Sequencing Services
Semiconductor DNA Sequencing
Ion Proton Ion Torrent
“Sequencing on a Chip”
Semiconductor Sequencing in a Nutshell
“It’s a computational pH meter”
Metagenomics
• Environmental samples of communities of organisms• water, soil samples• human & animal microbiomes• mine tailings, oil spills• deep sea, polar ice• etc. etc.
Metagenomics Pipeline
CSU Cray supercomputer;Oak Ridge Titan supercomputer
Torrent/Protonsequencers Megan
NCBI nucleotide databases
Metagenomics Tools
Ion Proton Sequencer• In: Sample DNA• Out: 50M DNA fragments
NCBI nucleotide database• DNA fragments• 15M+ records
Do the math:• 50M * 15M = 1014 queries
mpiBLAST• Highly parallelized Blast algorithm• NGS sample DNA• Query NCBI DB
CSU Cray XT6m• 2,016 CPU cores
Metagenomics
• Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins• Florida Everglades water samples (4)• “What species are in the water?”
• CSU NextGen Sequencing Core: Ion Proton; 2 weeks• CSU Cray: 1,000 cores, 24-hours, 4 runs; 1 week • Results
Metagenomics
• Rarefaction curves• Estimate species richness• Asymptotic? • Find rare species
Computational Resources
Oak Ridge Titan Cray XK7 Supercomputer• 300K CPU cores; 50M GPU cores • mpiBlast• NCBI nucleotide DB• Query 100% of sample DNA
CSU Cray XT6m Supercomputer• 2,016 CPU cores• mpiBlast• NCBI nucleotide DB• Query 1% of sample DNA
Strong scaling
Summary
Big Data Issues
• Semiconductor sequencer data
• Large-scale database queries
• High-performance computing