NGS: the basics
Human genome sequence
June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004)
Costs: HGP:
3 billion $ 15 years
Celera: 200 million $
2 years
Craig Venter Francis Collins
2004: 2 Requests for Application NIH
Current technologies are able to produce the sequence of a mammalian-sized genome of the desired data quality for $10 to $50 million; the goal of this initiative is to reduce costs by at least two orders of magnitude. It is anticipated that emerging technologies are sufficiently advanced that, with additional investment, it may be possible to achieve proof of principle or even early stage commercialization for genome-scale sequencing within five years.
A parallel RFA solicits grant applications to develop technologies to meet the longer-term goal of achieving four-orders of magnitude cost reduction in about ten years, so that a mammalian-sized genome could be sequenced for approximately $1000.
Increased efficiency: decreased costs
Exponential cost decrease
Efficient integration of each individual step to slash down the costs
Massively parallel sequencing Next generation sequencing
Key: direct sequencing of DNA without the bacterial cloning step From colonies to polonies
454
Roche GS Flex
454: Library preparation
Clonal amplification of single molecules
Emulsion PCR
454: Sequencing by pyrosequencing
GS Flex throughput (2011-2013)
Up to a million sequences 700 bp long (up to 1 kb)
in 23 hours
454: Game over!
Jonathan Rothberg: “In the sequencing business, one needs to innovate or die. At 454 we were always first; first non-bacterial cloning, first commercialization, first next-gen individual human genome, Neanderthal, mammoth, deep sequencing, cancer sequencing, drug response studies, HIV, metagenomics, first drug target by whole genome sequencing, and many more firsts. Always innovating, always first."
454: Game over!
In 2007, Roche acquired 454 for $155 million in cash and stock. Rothberg said that when Roche bought 454, the company was "two years ahead of everyone else," but after the purchase, "they lost that lead, no more firsts, no more innovation."
Rothberg strikes back!
Rothberg: "When I woke up and found Roche had bought 454 without me, I had to restart. It cost three years. We had to invent a new scalable way to sequence — ion semiconductor sequencing — and establish a clear path towards both truly low-cost and mobile sequencing." He went on to found Ion Torrent, which was bought by Life Technologies in 2010 for $375 million in cash and stock, and another $350 million based on milestones.
Ion Torrent
Simple Natural Chemistry
Fast Direct Detection
Nucleotides flow sequentially over Ion semiconductor chip Direct detection of natural DNA extension A few seconds per incorporation
Sensor Plate
Silicon Substrate Drain Source Bulk
dNTP
To column receiver
∆ pH
∆ Q
∆ V
Sensing Layer
H+
Rothberg J.M. et al Nature doi:10.1038/nature10242
Scalable Semiconductor Technology
Wafer Semiconductor Manufacturing
Chip Semiconductor Packaging
Chip Cross Section
Semiconductor Design
The Chip is the Machine™
Scalability Simplicity Speed
Two machines, 5 chips
PGM 314
316 318
Proton P1
P2?
Ion Torrent Specs
314 Chip: 0.4 to 0.5 million reads, 30 to 100 Mb data 316 Chip: 2 to 3 million reads, 300 to 1000 Mb data 318 Chip: 4 to 5.5 million reads, 0.6 to 2 Gb data 200bp or 400bp reads, 2 to 7 hours
Proton P1: 60-80 million reads, up to 10 Gb data 200bp reads, 2-4 hours Proton P2: L’arlésienne!
Barcode read just before insert with Ion Torrent
Barcoded adapter Insert Biotin adapter
Barcode
Sequencing primer
Ion Torrent paired-end sequencing
Illumina genome analyzer, HiSeq, Miseq
(formerly Solexa)
Solexa amplification step
Amplification and sequencing on a solid support
Sequencing by synthesis
CRT: cyclic reversible termination
Sequencing by synthesis
Amplification and sequencing on a solid support
Illumina: Primary data analysis
120 tiles per lane 480 images per lane and cycle: 36nt run = 138,240 images = 945 Gb 2x50nt run = 384,000 images = 1.3 Tb 2x100nt run = 768,000 images = 5.3 Tb
Image analysis (Illumina)
Image registration:
Get image coordinates congruent
Register images between cycles
Cluster identification
Template of cluster positions
created from first five cycles
A C
T G
Cluster identification
If neighboring clusters have identical sequences during first 5 cycles: one cluster
If neighboring clusters have different sequences during first 5 cycles: two clusters
As a consequence: Barcodes should not be included in the first bases otherwise the
probability of fusing two different clusters would be too high
Illumina paired-end sequencing
Barcoding with a single index (Illumina)
Barcoding with dual indexing (Illumina)
Illumina-Solexa throughput (End 2013)
Up to 3 billion sequences, up to 2*100 bp long in 11days (Hiseq2000)
Or 0.6 billion, 2*150 bp, in 40 hours (Hiseq2500) Or 12-55 million, 2*250, in 39 hours (Miseq V2) Or 22-25 million, 2*300, in 65 hours (Miseq V3)
Solid sequencing
Applied Biosystems
Solid Applied Library
Solid Applied Library
Emulsion PCR
Solid Applied Library
Solid Applied Sequencing
Solid Applied Sequencing
Solid throughput (Early 2009)
Up to 0.2 billion sequences up to 2*60 bp long
in 7 days
Complete Genomics
A human genome for 5,000$
Step1: fragment tagging
Complete Genomics
A human genome for 5,000$
Step2: Clonal DNA amplification
Complete Genomics
A human genome for 5,000$
Step3:Distribution over patterned substrate 1 billion spots per slide
Complete Genomics
A human genome for 5,000$
Step 4: Sequencing by ligation
Complete Genomics
A human genome for 5,000$
Step 5: Assembly
Complete Genomics
A human genome for 5,000$
Costs slashing: small volumes, «simple» equipment
Third Generation sequencing
Single molecule sequencing No PCR amplification
Helicos Bioscience
Single molecule fluorescent sequencing on a flow cell
Helicos
Cyclic reversible termination: single DNA molecule extended one base at a time, blocking fluorescent label removed and washed, and reiteration
Helicos
Improved cyclic reversible termination and single DNA molecule detection
Helicos throughput
Up to 1 Billion sequences On average 32 bp long
in 7 days
Pacific Biosciences
Long single molecule sequencing
Pacific Biosciences
The label is on the phosphate, and the label is captured transiently using a DNA polymerase tethered on a nanopore
Pacific Biosciences
Thousands of nanoguides concentrate light
The ZMW nanostructure provides excitation confinement in the zeptoliter (10−21 liter) regime
Pacific Biosciences
Label on the phosphate, not on the base
Pacific Biosciences
Real time detection of incorporation of each base on thousands of molecules
Pacific Biosciences throughput
Each pore: 10 bases/sec Claim: in 2013, a high quality human
genome in 15 minutes
Third or Fourth generation sequencing
Single molecules, no fluorophore Oxford Nanopore Technology
Oxford Nanopore
Nanopore Array chip Pore across lipid bilayer
Exonuclease
Bases passing through the pore generate a change in the electrical conductance of the membrane allowing electrical measurements. A, T, G, C and MeC can be distinguished.
Oxford Nanopore
There are several more possibilities in the pipelines
BioNanomatrix VisiGen
Dover Systems Intelligent Bio-Systems
ZS Genetics Reveo
LightSpeed Genomics
Top Related