High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

52
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown

Transcript of High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Page 1: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

High Throughput SequencingMethods and Concepts

Cedric Notredame adapted from S.M Brown

Page 2: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

DNA Sequencing

• The final essential tool in the molecular biology toolkit is the ability to read the base sequence of DNA molecules

• Fred Sanger developed an elegant method to sequence DNA by using DNA polymerase enzyme

• (for which he was awarded the Nobel Prize in 1980)

• The Sanger method copies a piece of cloned DNA but some of the copies are halted at each base pair along the sequence.

Page 3: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 4: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 5: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Sanger Method

• DNA polymerase adds free nucleotides to a primer which is complementary DNA template.

• Sanger used some modified dideoxynucleotides to stop the replication process if they are incorporated in the growing DNA chain (terminators).

• This produces a set of partial DNA copies of the original template sequence, each one stopping at a different base.

• Sanger used 4 different reactions that each contained only terminators for one of the bases.

• When the partial copies are sorted by size using electrophoresis, all fragment of a distinct size are terminated with the same base.

Page 6: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 7: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Automated Sequencing

• Sequencing technology was improved in the late 1980s by Leroy Hood who developed fluorescent color labels for the 4 terminator nucleotide bases.

• This allowed all 4 bases to be sequenced in a single reaction and sorted in a single gel lane.

• Hood also pioneered direct data collection by computer.

• Minor improvements in this technology now enable the sequencing of billion base genomes in a year or less.

Page 8: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 9: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Automated sequencing machines,particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once.

Page 10: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 11: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

DNA Sequencing capability has grown exponentially

Doubling time = 18 months

DNA sequences in GenBank

Page 12: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Next Generation Sequencing • 454 Life Sciences/Roche

– Genome Sequencer FLX: currently produces 400-600 million bases per day per machine

– Published 1 million bases of Neanderthal DNA in 2006– May 2007 published complete genome of James Watson (3.2

billion bases ~20x coverage)

• Solexa/Illumina– 10 GB per machine/week– May 2008 published complete genomes for 3 hapmap subjects

(14x coverage)

• ABI SOLID– 20 GB per machine/week

Page 13: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

“Paradigm Shift”

• Standard ABI “Sanger” sequencing – 96 samples/day– Read length ~650 bp– Total = 450,000 bases of sequence data

• 454 was the game changer!– ~400,000 different templates (reads)/day– Read length ~250 bp– Total = 100,000,000 bases of sequence data!!!

Page 14: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Solexa ups the Game

• Solexa (Illumina GA)– 60,000,000 different sequence templates

(yes that is an insane 60 million reads)

– 36 bp read length– 4 billion bases of DNA per run (3 days)

Page 15: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Nanotechnology

• Each system works differently, but they are all based on a similar principals: – Shear target DNA into small pieces– bind individual DNA molecules to a solid surface, – amplify each molecule into a cluster– copy one base at a time and detect different signals

for A, C, T, & G bases– requires very precise high-resolution imaging of tiny

features • (Solexa has 800 images @ 4 megapixels each)

Page 16: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

One (of 800) tiles on Solexa Sequencer

Page 17: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Huge Amount of Image Data

• The raw image data is truly huge: 1-2 TB for the Solexa, more for ABI-SOLID, less for 454

• The images are immediately processed into intensity data (spots w/ location and brightness)

• Intensity data is then processed into basecalls (A, C, T, or G plus a quality score for each)

• Basecall data is on the order of 5-10 GB per run (or a week of runs for 454).

Page 18: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

454

• First high-throughput DNA sequencer, commercially

available in 2004• Now (10/08) produces ~500 MB reads of 500 bp• Run of 8 samples in 10 hours, so can do multiple runs/week• Uses pyrosquencing, beads, and a microtiter plate • Low error rate, but insert/delete problems with

homopolymers (stretches of a single base)

Page 19: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 20: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 21: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 22: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 23: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Illumina Genome Analyzer

• Originally developed by Solexa, now subsidiary of Illumina.

• Commercially available in 2006• Now produces 8-12 million reads per sample of 36 bp

length = 10 GB/week. • Run takes 3 days for 7 samples.• Low error rate, mostly base changes, few indels

Page 24: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Illumina sequencing technology in 12 steps

Source: http://www.illumina.com/downloads/SS_DNAsequencing.pdf

Page 25: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

1. Prepare genomic DNA

2. Attach DNA to surface

3. Bridge amplification

4. Fragments become double stranded

5. Denature the double- stranded molecules

6. Complete amplification

Randomly fragment genomic DNA and ligate adapters to both ends of the fragments

adapters

DNA

Page 26: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

1. Prepare genomic DNA

2. Attach DNA to surface

3. Bridge amplification

4. Fragments become double stranded

5. Denature the double- stranded molecules

6. Complete amplification

adapter

Bind single-stranded fragments randomly to the inside surface of the flow cell channels

adapterDNA fragment

dense lawn of primers

Page 27: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

1. Prepare genomic DNA

2. Attach DNA to surface

3. Bridge amplification

4. Fragments become double stranded

5. Denature the double- stranded molecules

6. Complete amplification

Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification

Page 28: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

1. Prepare genomic DNA

2. Attach DNA to surface

3. Bridge amplification

4. Fragments become double stranded

5. Denature the double- stranded molecules

6. Complete amplification

Attached terminus

The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate

terminusfree

Attached terminus

Page 29: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

1. Prepare genomic DNA

2. Attach DNA to surface

3. Bridge amplification

4. Fragments become double stranded

5. Denature the double- stranded molecules

6. Complete amplification

Attached

Attached

Denaturation leaves single-stranded templates anchored to the substrate

Page 30: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

1. Prepare genomic DNA

2. Attach DNA to surface

3. Bridge amplification

4. Fragments become double stranded

5. Denature the double- stranded molecules

6. Complete amplification Clusters

Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell

Page 31: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

7. Determine first base

8. Image first base

9. Determine second base

10. Image second chemistry cycle

11. Sequencing over multiple chemistry cycles

12. Align dataLaser

The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase

Page 32: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified

7. Determine first base

8. Image first base

9. Determine second base

10. Image second chemistry cycle

11. Sequencing over multiple chemistry cycles

12. Align data

Page 33: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase

7. Determine first base

8. Image first base

9. Determine second base

10. Image second chemistry cycle

11. Sequencing over multiple chemistry cycles

12. Align dataLaser

Page 34: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

After laser excitation the image is captured as before, and the identity of the second base is recorded.

7. Determine first base

8. Image first base

9. Determine second base

10. Image second chemistry cycle

11. Sequencing over multiple chemistry cycles

12. Align data

Page 35: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time.

7. Determine first base

8. Image first base

9. Determine second base

10. Image second chemistry cycle

11. Sequencing over multiple chemistry cycles

12. Align data

Page 36: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Reference sequence

The data are aligned and compared to a reference, and sequencing differences are identified.

7. Determine first base

8. Image first base

9. Determine second base

10. Image second chemistry cycle

11. Sequencing over multiple chemistry cycles

12. Align data

Known SNP called

Unknown variant identified and called

Page 37: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Illumina Genome Analyzer

Richard K. Wilson

Page 38: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Paired-End Sequencing

Nature Methods 5, May 2008

Page 39: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Sequencing

Denaturation and Hybridization

Sequencing First Read

Denaturation and De-Protection

OH OH

Resynthesis of P5 Strand (15Cycles)

OH

P7 Linearization

OH

Block with ddNTPs

Denaturation and Hybridization

SequencingSecond Read

Page 40: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 41: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

ABI-SOLID

• First commercially available in late 2007• Currently capable of producing 20 GB of data

per run (week)• Most users generate 6 GB/run• Reads ~30 bp long• Uses unique

sequence-by-ligation method• “color-space” data• Very low error rate

Page 42: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 43: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Page 44: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Short Reads• Short reads from Nex-Gen machines are a

challenge (Solexa = 36 bp)– Very hard to assemble whole genomes– Difficult to get any information on repeat regions

• Requires many-fold coverage • New algorithms needed for many traditional

bioinformatics operations• Reads are getting longer – another moving

target

Page 45: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

PacBio

• High throughput Single Molecule Real Time (SMRT) Sequencing

Page 46: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

PacBio

• High throughput Single Molecule Real Time (SMRT) Sequencing

Page 47: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

PacBio

Page 48: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

PacBio

Page 49: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

PacBio

www.pacificbiosciences.com/

Page 50: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Applications• “If you build it, they will come.”• An explosion of scientific innovation!• Every new technology enables new

applications, which are not directly foreseen by the original developers of the tech.

• Cheap access to high-volume sequencing becomes a data collection method for many different types of experimental applications

Page 51: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

When All You Have is a Hammer, All Problems Look Like NailsMark Twains

Page 52: High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.

Applications