Introduction to DNA Sequencing Technology. Dideoxy Sequencing (Sanger Sequencing, Chain Terminator...

Post on 22-Dec-2015

228 views 1 download

Tags:

Transcript of Introduction to DNA Sequencing Technology. Dideoxy Sequencing (Sanger Sequencing, Chain Terminator...

Introduction to DNA Sequencing

Technology

Dideoxy Sequencing (Sanger Sequencing, Chain Terminator

method).• Clone the fragments to be sequenced

into the virus M13.

• Why M13?

• The clones that are isolated are single-stranded DNA.

M13

. . . . . TGATGTCGAGCGAGTCGTACGGT-----^^^

Primer

Fragment to be deciphered

DNA sequencing reaction:1) DNA fragment to be sequenced cloned into the

vector M13

2) DNA polymerase

3) “Universal” primer

4) All 4 DNA building blocks

5) One ddNTP tagged with a radioactive tracer

The most popular technique is based on the dideoxynucleotide.

Purine

• Pyrimidine

Set up 4 separate reactions. Each reaction contians one of the 4 ddNTPs. Each ddNTP is tagged with a radioactive tracer.

A reaction (with ddA) 21, 26, 29, . . . .T reaction (with ddT) 25, 31, 35, . . . . .C reaction (with ddC) 22, 23, 27, . . . . G reaction (with ddG) ??

M13

. . . . . TGATGTCGAGCGAGTCGTACGGT-----^^^

Primer (20 nt.)(3’ end of primer)

• Each reaction generates a set of unique fragment lengths.

• All fragment lengths are represented (from 21 - > 1,000 nucleotides).

• None of the fragments are present in more than one reaction.

• DNA sequencing technology requires gel electrophoresis system with the ability to separate DNA fragments that separate by one b.p.

DNA sequencing, as performed in the 1980s (manually) is slow

and labor intensive.

• NCBI HomePage

~1988- First big change in DNA sequencing technology:

• Introduction of ‘automated DNA sequencing’:

• This technique uses 4 fluorescent labels (red, yellow, blue, green) rather than one radioactive tag.

• The bases are read by a laser/detector rather than by humans.

• York University

? Questions ?

Newest Innovations in DNA Sequencing

Technology

• 1) Capillary Electrophoresis

• 2) Robotics

Capillary Gel Electrophoresis:

“The capillaries we typically use in CE are inexpensive and commercially available. We use capillaries that range about 30 to 50 centimeters in length, 0.150 to 0.375 millimeters in outer diameter, and a 0.010 to 0.075 millimeter diameter channel down the center. “

DNA sequencing with CE

# of capillary tubes/machine:

Initally- one (Introduced ~ 1998)

State of the Art- 2000: 96 tube CE (cost $300k)

Today- 384 tube CE (cost of one unit- $500k)

• DOE Joint Genome Institute

HUMAN GENOME PROJECT (HGP)

• The ultimate goal of the HGP is to decipher the 3.3 billion b.p. of the human genome.

• When the project was initiated, its was technologically unfeasible.

Genomic Sequencing

Organisms sequenced

•Year # genomes sequenced •1994 0•1995 2•1996 4•1997 8 (est.)•1998 30 (est.)•2001 ~75

Genomics Research Funding(selected programs; $ millions)

PROGRAM 1998 2000

NHGRI (U.S.) 211 326

WELCOME TRUST (U.K.)

61 121

STA (JAPAN) 39 115

ENERGY (U.S.)

85 89

GHGP 19 79

SWEDEN 5 35

Why such a sudden increase in funding??

• It became apparent that if the public agencies didn’t get their act together, an upstart organization might sequence the HG before they did (despite their ~ 8 year head start).

Sequencing the human genome suddenly had become a race.

• The competitors:

• Publicly funded genome centers, scattered throughout the U.S., Europe, and Japan.

• Celera, the private company directed by J. C raig Venter.

The story of how J. Craig Venter brought about a paradigm shift

in genomic sequencing has now entered the mythology of

science.

Craig VenterScientist of the Year

• from Time Magazine: What was perhaps the most important scientific event of the past century occurred this year when scientists announced the cracking of the human genetic code. And what everyone, including his numerous critics, acknowledges is that the brash and impatient Venter is the man who made it happen years before it would have otherwise by throwing computing power at the traditional, laborious process of manually examining every bit of human DNA to find the genes within.

Why did Craig Venter and his new company Celera threaten the

established genome sequencers?

• Venter’s new company had 300 $300k state-of-the-art sequencing machines and an $80 million dollar supercomputer.

• Venter suggested Celera could sequence the genome in but 3 years at a cost of $300 million.

Venter’s first company, TIGR, pioneered the ‘shotgun

sequencing’ approach to sequencing a genome:

• 1) Shear the DNA into thousands of random pieces.

• 2) Sequence the DNA of each fragment.

• 3) Use a computer to align the overlapping fragments to produce a single, contiguous DNA sequence of the entire organism.

Advantages/Disadvantages of the ‘shotgun approach’:

Disadvantages- Requires significant over-sequencingRequires powerful alignment softwareThere may be problems ‘finishing’ certain

regions

Advantages-Eliminates the needing for mapping

Sequencing of Archaeoglobus fulgidus:

• 29,000 sequencing reactions

• 500 bp. Average ‘read’

• 14,500,000 bases aligned 2,178,400 bp.

• 6.7- fold sequence coverage

(14,500,000 / 2,178,400 = 6.7)

Even with remarkable success sequencing bacterial genomes,

skeptics doubted a whole genome random sequencing approach would

work with a eukaryotic genome. Why?

2 Reasons-

• Eukaryotic genomes are much larger.

• Eukaryotic genomes carry significant amounts of repetetive DNA.

Who won the race?

• With much fanfare, the rough draft of the human genome was ‘declared’ a draw. Both Celera and the various public agencies shared credit for the rough draft of the human genome (‘announced Feb. 2000).

Insert Video (10’)

What is meant by the term mapping?

• Mapping to a geneticist means the same as it does to a non-scientist:

• A drawing showing the spatial relationship between a series of points.

Traditional map: Gene Map:

Western U.S.-

Seattle-

Portland-

S.F. -

L. A. -

Human Chromosome # 11

Hemoglobin-

Insulin

Albinism

Parathyroid Hormone

Mouse Clickable Cytogenetic MapChromosome X is selected

Restriction Enzyme Map

HinDIII EcoRI HinDIII HinDIII

• ____|__________|________|_________|_

Construction of various maps has been a major goal of genetic

research. Why?

• Maps serve as navigational tools. They are useful in finding genes or other genetic features and ordering fragments of DNA.

• There is a direct correlation between the usefulness of a map, and the number of points on the map. Analogy??

The STS map:• STS = sequence-tagged site.

• STS are short, unique fragments of DNA generated by PCR.

• Verification of a human STS: PCR amplification of the human genome generates one small fragment unique lanckmark

Usefulness of STSs

• STSs are used to find overlaps between fragments of genomic DNA.

• Finding overlaps ordering of fragments (see handout).

Expressed Sequence Tags (ESTs)

• As of June 2000, the 4.6 million EST records comprised 62% of the sequences in GenBank. Although the original ESTs were of human origin, NCBI’s EST database (dbEST) mow contains ESTs from over 250 organisms.

What is an EST?

Short DNA sequence representing a gene expressed in a particular tissue. A given EST often represents a fraction of the gene.

ESTs are often produced by sequencing the ends of a cDNA (complementary DNA).

What is the value of ESTs?

• Rapid identification of genes.

Feb. 1992- Craig Venter and 14 co-workers published the partial DNA sequence of of 2,375 genes expressed in the human brain. This represented about half of the total human genes known at the time.

How to sequence a genome???

• 1) Quickly- focus on the genes and their regulatory regions and human polymorphisms.

• 2) Thoroughly and completely- every nucleotide with 99.99% accuracy.

Extra Slides

• Does completion of HGP identification of all disease genes?

• A Timeline of The Human Genome

• YEAR# human genes mapped to a definite chromosome location# years it would take to sequence the human

genome• 1967 none   sequencing not possible yet

• 1977 3 genes mapped 

• 4,000,000 years to finish at 1977 rate

• 198712 genes mapped • 1000 years to finish at 1987 rate

1997 30,000 genes mapped • 50 years to finish at present rate

First Sequenced Genome:

• May 1995, TIGR researchers led by Robert Fleischmann closed the last gaps in the Haemophilus genome. In total, 26,708 sequences had been assembled to span the 1,830,137 base pair genome of the bacterium. The genome was published in July. (Fleischmann, et al, Science, 269: 496-512, 1995).

• DNALC: Cycle Sequencing

In the February 16 issue of Science, Venter et al. announce the sequencing of the euchromatic

portion of the human genome by a whole-genome shotgun sequencing approach. The

sequencing achievement was accomplished by Celera Genomics in nine months in a factory-

scale project involving 300 automatic squencing machines producing 175,000 sequence-reads

per day. The company generated 14.8 gigabases (Gb) of DNA sequence and combined

data with the public GenBank database to generate a 2.91 Gb consensus sequence (94%

coverage) representing over eight-fold coverage of the genome.