CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

10
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

description

CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services. Semiconductor DNA Sequencing. Ion Proton. Ion Torrent. “Sequencing on a Chip”. Semiconductor Sequencing in a Nutshell. “It’s a computational pH meter”. Metagenomics. Environmental samples of communities of organisms - PowerPoint PPT Presentation

Transcript of CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Page 1: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

CSU IDRC Next Generation Sequencing CoreGenomic Sequencing Services

Page 2: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Semiconductor DNA Sequencing

Ion Proton Ion Torrent

“Sequencing on a Chip”

Page 3: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Semiconductor Sequencing in a Nutshell

“It’s a computational pH meter”

Page 4: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics

• Environmental samples of communities of organisms• water, soil samples• human & animal microbiomes• mine tailings, oil spills• deep sea, polar ice• etc. etc.

Page 5: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics Pipeline

CSU Cray supercomputer;Oak Ridge Titan supercomputer

Torrent/Protonsequencers Megan

NCBI nucleotide databases

Page 6: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics Tools

Ion Proton Sequencer• In: Sample DNA• Out: 50M DNA fragments

NCBI nucleotide database• DNA fragments• 15M+ records

Do the math:• 50M * 15M = 1014 queries

mpiBLAST• Highly parallelized Blast algorithm• NGS sample DNA• Query NCBI DB

CSU Cray XT6m• 2,016 CPU cores

Page 7: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics

• Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins• Florida Everglades water samples (4)• “What species are in the water?”

• CSU NextGen Sequencing Core: Ion Proton; 2 weeks• CSU Cray: 1,000 cores, 24-hours, 4 runs; 1 week • Results

Page 8: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics

• Rarefaction curves• Estimate species richness• Asymptotic? • Find rare species

Page 9: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Computational Resources

Oak Ridge Titan Cray XK7 Supercomputer• 300K CPU cores; 50M GPU cores • mpiBlast• NCBI nucleotide DB• Query 100% of sample DNA

CSU Cray XT6m Supercomputer• 2,016 CPU cores• mpiBlast• NCBI nucleotide DB• Query 1% of sample DNA

Strong scaling

Page 10: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Summary

Big Data Issues

• Semiconductor sequencer data

• Large-scale database queries

• High-performance computing