Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing...
Transcript of Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing...
Aaron Liston, Oregon State University
Botany 2012 Intro to Next Generation Sequencing Workshop
1
Growth in Next-Gen Sequencing Capacity
0.0E+00
5.0E+10
1.0E+11
1.5E+11
2.0E+11
2.5E+11
3.0E+11
3.5E+11
Ou
tpu
t (
bp
)
ABI
3730xl
454
GS20
Solexa
1G
ABI
SOLiD
Illumina
GAII
Illumina
HiSeq
Illumina
GAIIx
Adapted from
Mardis, 2
011, Nature
2002 2004 2006 2008 2010
Platforms
Slide courtesy of Rich Cronn
NGS Library Types
Original separation of 2-5 kb
Separation of 200-500 bp
Fragment Library
Paired-end Library
Mate-paired Library
http://www.appliedbiosystems.com
454 (2005)
Template Type Sequencing Method Imaging Method
Clonally amplified by emulsion PCR
Sequencing by synthesis using single nucleotide
addition
Bioluminescence with charge coupled device
(CCD) camera
Ansorge, W. 2009. New Biotechnology 25:195-203. http://www.rps.psu.edu/indepth/graphics/sequencing_small.jpg
Aaron Liston, Oregon State University
Botany 2012 Intro to Next Generation Sequencing Workshop
2
454
Instrument Run time Millions of Reads/run
Bases / read Yield
MB/run
454 GS Jr. Titanium
10 hrs 0.1 400 50
Ion Torrent – 314 chip
2.5 hrs 0.25 200 50
454 FLX Titanium 10 hrs 1 400 400
454 FLX+ 20 hrs 1 650 650
2012 NGS Field Guide. www.molecularecologist.com
Illumina (Solexa) 2007
Template Type Sequencing Method Imaging Method
Clonally amplified by solid phase amplification
Sequencing by synthesis with cyclic reversible
termination
Four color imaging of single events using
fluorescence
http://www.illumina.com/systems/hiseq_2000.ilmn
Clonally Amplified Templates
Solid-phase Amplification
Metzker, M. 2010. Nature Reviews Genetics 11:31-46.
Cyclic reversible termination (CRT)
Metzker, M. 2010. Nature Reviews Genetics 11:31-46.
Illumina
Instrument Run time Millions of Reads/run
Bases / read Yield
MB/run
Illumina MiSeq 26 hrs 4 150+150 1200
Illumina GAIIx 14 days 300 150+150 96,000
Illumina HiSeq 1000
8.5 days ≤1500 100+100 ≤300,000
Illumina HiSeq 2000
11.5 days ≤3000 100+100 ≤600,000
2012 NGS Field Guide. www.molecularecologist.com
Aaron Liston, Oregon State University
Botany 2012 Intro to Next Generation Sequencing Workshop
3
SOLiD (2008)
Template Type Sequencing Method Imaging Method
Clonally amplified by emulsion PCR
Sequencing by ligation Four color imaging of single events by CCD
camera
SOLiD
Instrument Run time Millions of Reads/run
Bases / read Yield
MB/run
SOLiD – 5500xl 8 days >1,410 75+35 155,100
2012 NGS Field Guide. www.molecularecologist.com
Semiconductor Sequencing
http://www.iontorrent.com/the-simplest-sequencing-chemistry/
Ion Torrent (2010) Ion Torrent (2010)
Semiconductor Sequencing
http://www.iontorrent.com/the-simplest-sequencing-chemistry/
Ion Torrent (2010)
Semiconductor Sequencing
http://www.iontorrent.com/the-simplest-sequencing-chemistry/
Ion Torrent
Instrument Run time Millions of Reads/run
Bases / read Yield
MB/run
Ion Torrent – 314 chip
2.5 hrs 0.25 200 50
Ion Torrent – 316 chip
3 hrs 1.6 200 320
Ion Torrent – 318 chip
4.5 hrs 4 200 800
2012 NGS Field Guide. www.molecularecologist.com
Aaron Liston, Oregon State University
Botany 2012 Intro to Next Generation Sequencing Workshop
4
Ion Torrent Proton (2012)
Real Time Sequencing by Synthesis
Pacific Biosciences (2010)
circular consensus
Instrument Run time Millions of Reads/run
Bases / read Yield MB/run
3730xl (capillary) 2 hrs 0.000096 650 0.06
PacBio RS 2 hrs 0.01 860 – 1,500 5-10
454 GS Jr. Titanium 10 hrs 0.1 400 50
Ion Torrent – 314 chip 2.5 hrs 0.25 200 50
454 FLX Titanium 10 hrs 1 400 400
454 FLX+ 20 hrs 1 650 650
Ion Torrent – 316 chip 3 hrs 1.6 200 320
Illumina MiSeq 26 hrs 4 150+150 1200
Ion Torrent – 318 chip 4.5 hrs 4 200 800
Illumina GAIIx 14 days 300 150+150 96,000
SOLiD – 5500xl 8 days >1,410d 75+35 155,100
Illumina HiSeq 1000 8.5 days ≤1500 100+100 ≤300,000
Illumina HiSeq 2000 11.5 days ≤3000 100+100 ≤600,000
Run time, Reads and Yield for Current NGS Instruments
2012 NGS Field Guide. www.molecularecologist.com Oxford Nanopore (2012)
Strand Sequencing
64 triplet signals
Exonuclease Sequencing
Aaron Liston, Oregon State University
Botany 2012 Intro to Next Generation Sequencing Workshop
5
GridION
MinION
Oxford Nanopore (2012)
Platform Primary Errors
Single-pass Error Rate (%)
Final Error Rate (%)
3730xl (capillary) Substitution 0.1-1 0.1-1
454 Indel 1 1
Illumina Substitution ~0.1 (85% of
reads) ~0.1 (85% of
reads)
SOLiD A-T bias ~5 ≤0.1
Ion Torrent Indel ~1 ~1
PacBio RS CG deletions ~15 ≤15
Oxford Nanopore Deletions ≥4 4
NGS Error Rates
2012 NGS Field Guide. www.molecularecologist.com
Instrument Run time Millions of Reads/run Bases / read Yield MB/run ABI 3730xl (capillary) 2 hrs 0.000096 650 0.06 PacBio RS 2 hrs 0.01 860 – 1,500 5-10 454 GS Jr. Titanium 10 hrs 0.1 400 50
Oxford Nanopore MinION (2012) 6 hrs or less [0.1] [9,000] [1000]
Ion Torrent – 314 chip 2.5 hrs 0.25 200 50 454 FLX Titanium 10 hrs 1 400 400 454 FLX+ 20 hrs 1 650 650 Ion Torrent – 316 chip 3 hrs 1.6 200 320 Illumina MiSeq 26 hrs 4 150+150 1200 Ion Torrent – 318 chip 4.5 hrs 4 200 800
Oxford Nanopore GridION 2000 (2012) [6 hrs or less] [4] [10,000] [40,000]
Oxford Nanopore GridION 8000 (2013) [6 hrs or less] [10] [10,000] [100,000]
Illumina MiSeq upgrade (2012) [36 hrs] 15 250+250 7000
Ion Torrent – Proton I (2012) 4 hrs [50] [200] [40,000]
Ion Torrent – Proton II (2013) 4 hrs [250] [400] [100,000]
Illumina GAIIx 14 days 300 150+150 96,000
Illumina HiSeq 2500 mini-cell (2012) 42 hrs 600 150+150 180,000
SOLiD – 5500xl 8 days >1,410 75+35 155,100 Illumina HiSeq 1000 8.5 days ≤1500 100+100 ≤300,000 Illumina HiSeq 2000 11.5 days ≤3000 100+100 ≤600,000
Grey = based on company sources. Brackets = speculation.
Run time, Reads & Yield for Current and Announced NGS Instruments 2012 NGS Field Guide. www.molecularecologist.com
Instrument Reagent
Cost/runa Reagent Cost/MB
Minimum Unit Cost (% run)
ABI 3730xl (capillary) $144 $2308 $6 (1%) PacBio RS $300-1700c $7-38 $500 (100%) 454 GS Jr. Titanium $1100 $22 $1500 (100%) 454 FLX Titanium $6,200 $12 $2000 (12%) 454 FLX+d $6,200 $7 $2000 (12%) Ion Torrent – 314 chip $350 $7 ~$750 (100%) Ion Torrent – 316 chip $550 $2 ~$1000 (100%)
Oxford Nanopore minION (2012) ≤$900 $1 ~$1100 (10%)
Illumina MiSeq $1160 $1 ~$1400 (100%) Ion Torrent – 318 chip $750 $1 ~$1200 (100%) Illumina GAIIx $17,575 $0.19 $3000 (14%) Illumina iScanSQ $12,750 $0.09 $3000 (14%) Ion Torrent – Proton I (2012) $1000 $0.09 ? (100%) SOLiD – 5500xl $10,503 <$0.07 $2000 (12%) Illumina HiSeq 1000 $10,220 $0.04 $3000 (12%) Illumina HiSeq 2000 $23,470d ≥$0.04 $3000 (6%)
Illumina HiSeq 2500 or MiSeq upgrades (2012) ? ? ?
Oxford Nanopore GridION 2000 (2012) varies $0.03-0.04 ? (≤1%)
Oxford Nanopore GridION 8000 (2013) varies $0.01-0.02 ? (≤1%)
Ion Torrent – Proton II (2013) [$1000] [$0.01] ? (100%)
How much will it cost?
Includes all stages of sample prep. for a single sample (i.e., library prep through sequencing. capillary = sequencing only)
2012 NGS Field Guide. www.molecularecologist.com
Platform Year Sequencing
Method Amplification Detection Features
454 2005 Pyro-
sequencing Emulsion PCR Light First NGS
Illumina 2007 Synthesis Bridge PCR Light 90% of Market
SOLiD 2008 Ligation Emulsion PCR Light Lowest Error Rate
Ion Torrent 2010 Synthesis Emulsion PCR Hydrogen Ion Semiconductor
Chip
Pacific Biosciences
2010 Synthesis None = Single
Molecule Light
Anchored Polymerases
Oxford Nanopore
2012 Nanopore None = Single
Molecule Electrical
Conductivity “Run Until” Sequencing
NGS Technology Summary
Modified from Travis C. Glenn. 2011. Field guide to next-generation DNA sequencers. Molecular Ecology Resources 11: 759-769
Aaron Liston, Oregon State University
Botany 2012 Intro to Next Generation Sequencing Workshop
6
Instrument Purchase Cost Additional
Instruments Service Contract
ABI 3730xl (capillary) $376,000 - $19,800 454 GS Jr. Titanium $108,000 $16,000 $12,600 454 FLX to FLX+ upgrade $29,500 - - 454 FLX+ $450,000 $30,000 $50,000 PacBio RS $695,000 - $85,000
Ion Torrent – (314/316/318 chips) $49,000 $16,000* $9,900*
Ion Torrent – Proton (2012) $149,000 $16,000* $32,000*
SOLiD – 5500xl $251,000 $54,000 $44,400 Illumina MiSeq $125,000 - $12,500 Illumina MiSeq upgrade (2012) $0 - - Illumina HiScanSQ $405,000 $55,000 $41,500 Illumina GAIIx $250,000 $100,000 $44,500 Illumina HiSeq 1000 $560,000 $55,000 $62,000
Illumina HiSeq 1000 to 2000 upgrade $175,000 - -
Illumina HiSeq 2000 $690,000 $55,000 $75,900
Illumina HiSeq 2000 to 2500 upgrade (2012) $50,000 - -
Illumina HiSeq 2500 (2012) $690,000 $55,000 $75,900
Oxford Nanopore minION (2012) $0 $0 $0
Oxford Nanopore GridION 2000 (2012) [$30,000]?? ? ?
Oxford Nanopore GridION 8000 (2013) [$30,000]?? ? ?
Instrument purchase, additional instrument and service agreement costs.
2012 NGS Field Guide. www.molecularecologist.com
*Includes optional OneTouch template preparation instrument.
Instrument Computational Resources Data File Sizes (GB)
3730xl (capillary) $2,000 desktop 0.03 454 GS Jr. Titanium $5,000 desktop <3 images, <1 sff 454 FLX Titanium $5,000 desktop 20 images, 4 sff 454 FLX+ $5,000 desktop 40 images, 8 sff PacBio RS $65,000 cluster 20 pulsed, 2 Fastq
Ion Torrent – 314 chip $16,500 desktop server 0.1 Fastq
Ion Torrent – 316 chip $16,500 desktop server 0.6 Fastq
Ion Torrent – 318 chip $16,500 desktop server [small]
Ion Torrent – Proton I (2012) $75,000 cluster [big]
Ion Torrent – Proton II (2013) $75,000 cluster [big]
SOLiD – 5500xl $35,000 cluster 148 Illumina MiSeq $5,000 desktop or BaseSpace cloud 1 Illumina HiScanSQ $222,000 cluster (or DYI for less) 50 Illumina GAIIx $222,000 cluster (or DYI for less) 600 Illumina HiSeq 1000 $222,000 cluster (or DYI for less) 300
Illumina HiSeq 2000 $222,000 cluster (or DYI for less) 600
Illumina HiSeq 2500 (2012) $222,000 cluster (or DYI for less) [big]
Oxford Nanopore minION (2012) laptop [small]
Oxford Nanopore GridION 2000 (2012) ? [small to big] Oxford Nanopore GridION 8000 (2013) ? [small to big]
Desktops assume higher-end models with multiple processors, ≥8 GB RAM and ≥1 TB HD.
Required Computational Resources
2012 NGS Field Guide. www.molecularecologist.com
Instrument Primary Advantages Primary Disadvantages
3730xl (capillary) Low cost for very small studies Very high cost for large amounts of data.
454 GS Jr. Titanium
Long read length. Low capital cost. Low cost per experiment
High cost per Mb.
454 FLX+ Double the maximum read length of Titanium
High capital cost. High cost per Mb. Reagent issues. Upgrade issues.
PacBio Single molecule real-time sequencing. Longest available read length. Short instrument run time. Low cost per sample.
High error rates. Low total number of reads per run. High cost per Mb. High capital cost. Many methods still in development. Weak company performance.
Ion Torrent – 314/316/318
Low cost per sample for small studies. Fast runs. Semiconductor Chips. Instrument with few moving parts.
Higher error rate than Illumina. Higher cost per Mb. Long sample prep.
SOLiD – 5500xl
Each lane of Flow-Chip can be run independently. High accuracy. Ability to rescue failed sequencing cycles. 96 validated barcodes per lane. High throughput.
Not likely to be sold very long after the Ion Torrent Proton comes to market. Relatively short reads. more gaps in assemblies than Illumina data. less even data distribution than Illumina. High capital cost.
Illumina MiSeq
Low cost instrument and runs. Low cost/Mb for a small platform. Fastest Illumina run times and longest Illumina read lengths.
Relatively few reads and Higher cost/Mb .compared to other Illumina platforms.
Illumina HiSeq One or two independent flow cells. Most reads, Gb per day and Gb per run. Lowest cost per Mb.
High capital cost. High computation needs.
Primary Advantages and Disadvantages – Current Platforms
2012 NGS Field Guide. www.molecularecologist.com
Instrument Primary Advantages Primary Disadvantages
Ion Torrent – Proton Moderately low-cost instrument for high throughput applications. Cost / Mb approaching HiSeq.
Error-rate likely higher than Illumina. Higher cost/Mb than HiSeq.
Illumina MiSeq Upgrade
Same as MiSeq, but 3X more reads and 250X250 paired ends. Free upgrade.
Reagent costs not announced yet, but likely to be higher than current MiSeq.
Illumina HiSeq 2500 Same as HiSeq 2000, but can also run two 2 lane miniFlowCells to achieve much faster run times and longer read lengths.
Mini-FlowCell will have a higher cost per read than standard flow cell. Can’t run mini and standard flow cells together.
Oxford Nanopore minION No instrument. USB powered. No sample processing required. Could be used in the field.
No data publicly available. High cost per Mb relative to other Nanopore sequencers.
Oxford Nanopore GridION
Extremely long reads are feasible. Low-cost instrument (node). Error-rate doesn’t increase along the length of the read. Real time analysis allows “run until” sequencing.
No data publicly available. Announced 4% error-rates. Single use cartridges may require serial sequencing for efficiency.
2012 NGS Field Guide. www.molecularecologist.com
Primary Advantages and Disadvantages – New Platforms
Platform – Instrument Application: de novo assemblies
BACs, plastids, & microbial genomes
transcriptome Plant & animal genome
454 – GS Jr. B – good but expensive C – need multiple runs, expensive D – cost prohibitive
454 – FLX+ A – good, need to multiplex to be economical
A/B – good but expensive, not best for short RNAs
B/C – good as part of a mixed platform strategy, expensive to use alone
MiSeq B – good, assembly more challenging than 454
B/A – may need multiple runs, assembly more challenging than 454, longer reads may make it the best
C – expensive, use to validate libraries for HiSeq
HiSeq 2000
B/C – more data than needed unless highly indexed. assembly more challenging than 454
A/B – good, assembly more challenging than 454 but much more data available for analyses
A – primary data type in many current projects. requires mate-pair libraries
Ion Torrent – 314 C – reads are shorter than Illumina & as expensive as 454
C – reads are shorter than Illumina & as expensive as 454
D – cost prohibitive, reads shorter than alternatives
Ion Torrent – 318 B – good, data more challenging to assemble than 454 or Illumina
B/C – good, data more challenging to assemble than 454 or Illumina
C – high cost, data more challenging to assemble than 454 or Illumina
Utility (according to Travis Glenn Univ. Georgia) of currently available DNA sequencing platforms for de novo assembly
2012 NGS Field Guide. www.molecularecologist.com
Utility grades combine data characteristics (amount, quality, length), cost of data, and ease of assembling the data into the final desired product.
Platform – Instrument Application: Resequencing
Targeted loci Transcript counting Genome resequencing
454 – GS Jr. B – good but expensive, need to limit loci
D – cost prohibitive D – cost prohibitive for large genomes
454 – FLX+ B – good but expensive, should limit loci
D – cost prohibitive C/D – cost prohibitive for large genomes
MiSeq A/B – good, fewer and higher cost reads than HiSeq
B – more expensive than HiSeq or SOLiD
C – expensive for large genomes
HiSeq 2000 A – primary data type in many current projects. best for many loci
A – primary data type in many current projects
A – primary data type in many current projects
Ion Torrent – 314 C – OK but expensive, need to limit loci
D – cost prohibitive D – cost prohibitive
Ion Torrent – 318 B – good, slightly less data per run than MiSeq
B/C – more expensive than HiSeq or SOLiD. new informatics pipelines needed. new error profile
C – expensive for large genomes
Utility (according to Travis Glenn Univ. Georgia) of currently available DNA sequencing platforms for resequencing
2012 NGS Field Guide. www.molecularecologist.com
Utility grades combine data characteristics (amount, quality, length), cost of data, and ease of assembling the data into the final desired product.