Gene Sequencing methods (word document)

22
Introduction: Since the discovery of the chemical nature of DNA in the 1950s that, it is written in a simple four- letter code of nucleotides, and is the hereditary material in all living organisms, sequencing, or "reading" the genetic code has become of increasing interest to scientists. RNA sequencing was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers. Prior to the mid-1970’s no method existed by which DNA could be directly sequenced. Knowledge about gene and genome organization was based upon studies of prokaryotic organisms and the primary means of obtaining DNA sequence was so-called reverse genetics in which the amino acid sequence of the gene product of interest is back-translated into a nucleotide sequence based upon the appropriate codons. Given the degeneracy of the genetic code, this process can be tricky at best. In the mid-1970’s two methods were developed for directly sequencing DNA. These were the Maxam-Gilbert chemical cleavage method and the Sanger chain-termination method. Prior to the development of rapid DNA

Transcript of Gene Sequencing methods (word document)

Page 1: Gene Sequencing methods (word document)

Introduction:

Since the discovery of the chemical nature of DNA in the 1950s that, it is

written in a simple four-letter code of nucleotides, and is the hereditary material in

all living organisms, sequencing, or "reading" the genetic code has become of

increasing interest to scientists. RNA sequencing was one of the earliest forms of

nucleotide sequencing. The major landmark of RNA sequencing is the sequence of

the first complete gene and the complete genome of Bacteriophage MS2,

identified and published by Walter Fiers. Prior to the mid-1970’s no method

existed by which DNA could be directly sequenced. Knowledge about gene and

genome organization was based upon studies of prokaryotic organisms and the

primary means of obtaining DNA sequence was so-called reverse genetics in

which the amino acid sequence of the gene product of interest is back-translated

into a nucleotide sequence based upon the appropriate codons. Given the

degeneracy of the genetic code, this process can be tricky at best. In the mid-

1970’s two methods were developed for directly sequencing DNA. These were the

Maxam-Gilbert chemical cleavage method and the Sanger chain-termination

method. Prior to the development of rapid DNA sequencing methods in the early

1970s by Frederick Sanger in England and Walter Gilbert and Allan Maxam at

Harvard, a number of laborious methods were used. For instance, in 1973, Gilbert

and Maxam reported the sequence of 24 basepairs using a method known as

wandering-spot analysis. The chain-termination method developed by Sanger and

coworkers in 1975 soon became the method of choice, owing to its relative ease

and reliability. Technical variations of chain-termination sequencing include

tagging with nucleotides containing radioactive phosphorus for radiolabelling, or

using a primer labeled at the 5’ end with a fluorescent dye. Several changes took

place in these technologies owing to the high demand for low-cost sequencing and

it has driven the development of high-throughput sequencing technologies that

parallelize the sequencing process, producing thousands or millions of sequences

Page 2: Gene Sequencing methods (word document)

at once. High-throughput sequencing technologies are intended to lower the cost

of DNA sequencing beyond what is possible with standard dye-terminator

methods.

Need for gene sequencing:

• Understanding a particular DNA sequence can shed light on a genetic

condition and offer hope for the eventual development of treatment.

• An alteration in a DNA sequence can lead to an altered or non functional

protein, and hence to a harmful effect in a plant or animal.

• Simple point mutations can cause altered protein shape and function.

Terminology related to sequencing:

DNAA nucleic acid, that carries the genetic information in the body’s cells. made up of four similar chemicals called bases and abbreviated A, T, C, and G that are repeated over and over in pairs.

DNA sequencing

Determination of the order of the nucleotide bases - adenine, guanine, cytosine, and thymine in a molecule of DNA.

Gene

A gene is a distinct portion of a cell’s DNA that codes for a type of protein or for an RNA chain.

Gene sequencing

Gene sequencing is a process in which the individual base nucleotides in an organism's DNA are identified.

Page 3: Gene Sequencing methods (word document)

Genome

Complete copy of chromosomal and extra chromosomal gene insrtuctions.

Genome sequencing:

Breaking the whole genome into small pieces, sequencing the pieces and then reassembling them in proper order to arrive at the sequence of the whole genome.

Genomics:

Sequencing of genomes, determination of the complete set of proteins encoded by an organism and functioning of genes and metabolic path ways in an organism.

Historical facts in DNA sequencing:

1953 Discovery of the structure of the DNA double helix.

1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.

1975 The first complete DNA genome to be sequenced is that of bacteriophage φX174

1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by chemical degradation". Frederick Sanger, independently, publishes "DNA sequencing by enzymatic synthesis".

1980 Frederick Sanger and Walter Gilbert receive the Nobel Prize in Chemistry

1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.

1986 Leroy E. Hood's laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine.

1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.

Page 4: Gene Sequencing methods (word document)

1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at 75 cents (US)/base).

1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for

Genomic Research (TIGR) publish the first complete genome of a free-

living organism, the bacterium Haemophilus influenzae by shot gun

method.

1995 Richard Mathies et al.. publish fluorescence energy transfer dye-based

sequencing.

1996 Pal Nyren and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing.

1998 Phil Green and Brent Ewing of the University of Washington publish

“phred” for sequencer data analysis.

1999 Completion of sequencing of the chromosome 22

2000 completion of rough draft of human genome.

Different sequencing methods:

Chemical cleavage method:

In 1976–1977, Allan Maxam and Walter Gilbert developed a DNA

sequencing method based on chemical modification of DNA and subsequent

cleavage at specific bases by taking advantage of two step catalytic process. It

involves piperidine and two chemicals that selectively attack purines and

pyrimidines. Purines will react with dimethyl sulfate and pyrimidines will react

with hydrazine in such a way as to break the glycoside bond between the ribose

sugar and the base displacing the base (Step 1). Piperidine will then catalyze

phosphodiester bond cleavage where the base has been displaced (Step 2). The use

of these selective reactions to DNA sequencing then involved creating a

Page 5: Gene Sequencing methods (word document)

singlestranded DNA substrate carrying a radioactive label on the 5’ end. This

labeled substrate would be subjected to four separate cleavage reactions, each of

which would create a population of labeled cleavage products ending in known

nucleotides. Chemicals for cleavage are 1) methyl sulfate which breaks DNA at G,

2) Acid (pH 2.0) at A and G, 3) Hydrazine at T and C and 4) Hydrazine in salt

which breaks DNA at C. The reactions would be loaded on high percentage

polyacrylamide gels. To visualize the fragments, the gel is exposed to X-ray film

for autoradiography, yielding a series of dark bands each corresponding to a

radiolabelled DNA fragment, from which the sequence may be inferred.

Since electrophoresis, whether in an acrylamide or an agarose matrix, will

resolve nucleic acid fragments in the inverse order of length, that is, smaller

fragments will run faster in the gel matrix than larger fragments, the dark

autoradiographic bands on the film will represent the 5’- 3’ DNA sequence when

read from bottom to top.

Base calling: Interpreting the banding pattern relative to the four chemical

reactions. For example, a band in the lanes corresponding to the C only and the C

+ T reactions would be called a C. If the band was present in the C + T reaction

lane but not in the C only reaction lane it would be called a T. The same decision

process would obtain for the G only and the G + A reaction lanes.

Chain termination method:

At about the same time as Maxam-Gilbert DNA sequencing was being

developed, Fred Sanger was developing an alternative method. Rather than using

chemical cleavage reactions, Sanger opted for a method involving dideoxy ribose

sugars which are the chain-terminating nucleotides, lacking a 3'-OH group

required for the formation of a phosphodiester bond between two nucleotides, thus

terminating DNA strand extension and resulting in DNA fragments of varying

Page 6: Gene Sequencing methods (word document)

length. Thus this method requires a single-stranded DNA template, a DNA

primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and

modified nucleotides that terminate DNA strand elongation.

The DNA sample is divided into four separate sequencing reactions,

containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and

dTTP) and the DNA polymerase. To each reaction only one of the four

dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) is added. These reactions

would produce a population of fragments all ending in the same dideoxynucleotide

in the presence of a DNA polymerase if the ratio of the dideoxynucleotide and the

corresponding deoxynucleotide was properly set.

The newly synthesized and labeled DNA fragments are heat denatured, and

separated by size (with a resolution of just one nucleotide) by gel electrophoresis

on a denaturing polyacrylamide-urea gel with each of the four reactions run in one

of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by

autoradiography or UV light, and the DNA sequence can be directly read off the

X-ray film or gel image. A dark band in a lane indicates a DNA fragment that is

the result of chain termination after incorporation of a dideoxynucleotide (ddATP,

ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among

the four lanes are then used to read (from bottom to top) the DNA sequence.

Page 7: Gene Sequencing methods (word document)

Differences between Maxam-Gilbert and Sanger method:

1) Unlike Maxam-Gilbert method each lane would be base-specific in Sanger’s

method.

2) Autoradiography is same but base calling is easier.

3) The sequence fragments on the gel were the complement of the actual template

in Sanger’s method.

4) A major improvement ushered in by Sanger sequencing was the elimination of

some of the dangerous chemicals, like hydrazine.

5) Efficiency is more than chemical cleavage method. When dealing with nucleic

acids, enzymatic processes are more efficient than chemical processes.

Technical variations of chain-termination sequencing include tagging with

nucleotides containing radioactive phosphorus for radiolabelling, or using a primer

labeled at the 5’ end with a fluorescent dye. Dye-primer sequencing facilitates

reading in an optical system for faster and more economical analysis and

automation.

A G T C

ATAGCGTAGCGTAGCGTAGCTAGCGATTAATTA

Page 8: Gene Sequencing methods (word document)

Cycle sequencing

Cycle sequencing is a modification of the traditional Sanger sequencing

method. The principles are the same as in Sanger sequencing; Dideoxynucleotides

are used in a polymerization reaction to create a nested set of DNA fragments with

dideoxynucleotides at the 3' terminus of each fragment. The key difference is that

cycle sequencing employs a thermostable DNA polymerase which can be heated

to 95oC and still retain activity. The advantage of using such a polymerase is that

the sequencing reaction can be repeated over and over again in the same tube by

heating the mixture to denature the DNA and then allowing it to cool to anneal the

primers and polymerize new strands. Thus, fewer templates DNA is needed than

for conventional sequencing reactions. Furthermore, the repeated heating and

cooling can be done in a DNA thermal cycler.

Advantages:

• Works with ssDNA and dsDNA and thus eliminates the need for M13

phage

• Requires only small amounts of template

• Can be set up in microtitre plates or microtubes

• Can use internal labeling with [α-32P], [α-33P],or [35S]or with 5’- end

labeled primer

• Can be adapted for rapid screening

High-throughput sequencing

The high demand for low-cost sequencing has driven the development of

high-throughput sequencing technologies that parallelize the sequencing process,

producing thousands or millions of sequences at once. The dye-terminator

Page 9: Gene Sequencing methods (word document)

sequencing method, along with automated high-throughput DNA sequence

analyzers, is now being used for the vast majority of sequencing projects.

Dye-terminator sequencing

It is the semi-automated system that utilizes labelling of the chain

terminator ddNTPs, which permits sequencing in a single reaction, rather than four

reactions as in the labelled-primer method. In dye-terminator sequencing, each of

the four dideoxynucleotide chain terminators is labelled with fluorescent dyes,

each of which with different wavelengths of fluorescence and emission. Owing to

its greater expediency and speed, dye-terminator sequencing is now the mainstay

in automated sequencing.

Automation and sample preparation:

The most dramatic advance in sequencing and the one that carried DNA

sequencing into a high throughput environment was the introduction of automated

sequencing using fluorescence-labeled dideoxy-terminators. In 1986, Leroy Hood

and colleagues reported on a DNA sequencing method in which the radioactive

labels, autoradiography, and manual base calling were all replaced by fluorescent

labels, laser induced fluorescence detection, and computerized base calling. In

their method, the primer was labeled with one of four different fluorescent dyes

and each was placed in a separate sequencing reaction with one of the four

dideoxynucleotides plus all four deoxynucleotides. Once the reactions were

complete, the four reactions were pooled and run together in one lane of a

polyacrylamide sequencing gel. A four-color laser induced fluorescence detector

scanned the gel as the reaction fragments migrated past. The fluorescence

signature of each fragment was then sent to a computer where the software was

trained to perform base calling. This method was commercialized in 1987 by

Applied Biosystems. Automated DNA-sequencing instruments (DNA sequencers)

Page 10: Gene Sequencing methods (word document)

can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a

day. A number of commercial and non-commercial software packages can trim

low-quality DNA traces automatically. These programs score the quality of each

peak and remove low-quality base peaks (generally located at the ends of the

sequence). Best estimates of error rates for base calling with slab gel based

sequencing is PHRED and for capillary sequencing is Life Trace.

chromatogram

Capillary electrophoresis:

In the early 1990’s Harold swerdlow and colleagues reported the use of

capillaries to obtain DNA sequences. Capillaries are small, a 50μm inner diameter,

and they dissipate heat very efficiently due to their high surface area to volume

ratios. This means that a capillary based system can be run with much higher

Page 11: Gene Sequencing methods (word document)

voltages thus dramatically lowering the run times. Most importantly, capillary

systems can be automated, a major limitation in gelbased systems (dye

terminater sequencing is only semi automated that too in case of base calling).

Capillaries could be flushed out after a run and replaced for the next run without

having to touch the capillary (Gupta P K 2009). DNA sequencing reactions can be

carried out in a single reaction tube and be prepared for loading once the reaction

reagents had been filtered out. Load the sequencing reaction into the capillary,

apply a constant electrical current through the capillary, and have the resolved

fragments migrate past an optical window where a laser would excite the dye

terminator, a detector would collect the fluorescence emission wavelengths, and

software would interpret the emission wavelengths as nucleotides.

DETECTORS

CAPILLARIES

OUTPUTSIGNAL

Page 12: Gene Sequencing methods (word document)

Capillary tubes

Sample tray goes here

Reagents

Inside the sequencer

Alternative sequencing methods: (primrose and Twyman 2003)

Pyrosequencing:

It is a method of DNA sequencing based on the "sequencing by synthesis"

principle. It differs from Sanger sequencing, relying on the detection of

pyrophosphate release on nucleotide incorporation, rather than chain termination

with dideoxynucleotides. "Sequencing by synthesis" involves taking a single

strand of the DNA to be sequenced and then synthesizing its complementary

strand enzymatically. The template DNA is immobile, and solutions of A, C, G,

and T nucleotides are added and removed after the reaction, sequentially.

Inorganic PPi is released as a result of nucleotide incorporation by polymerase.

The released PPi is subsequently converted to ATP by ATP sulfurylase, which

provides the energy to luciferase to oxidize luciferin and generate light. Light is

produced only when the nucleotide solution complements the first unpaired base

of the template. Because the added nucleotide is known, the sequence of the

template can be determined.

Sequencing by ligation

Page 13: Gene Sequencing methods (word document)

DNA ligase is an enzyme that joins together ends of DNA molecules.

Although commonly represented as joining two pairs of ends at once, as in the

ligation of restriction enzyme fragments, ligase can also join the ends on only one

of the two strands. Sequencing by ligation relies upon the sensitivity of DNA

ligase for base-pairing mismatches. The target molecule to be sequenced is a

single strand of unknown DNA sequence, flanked on at least one end by a known

sequence. A short "anchor" strand is brought in to bind the known sequence. A

mixed pool of probe oligonucleotides is then brought in (eight or nine bases long),

labeled (typically with fluorescent dyes) according to the position that will be

sequenced. These molecules hybridize to the target DNA sequence, next to the

anchor sequence, and DNA ligase preferentially joins the molecule to the anchor

when its bases match the unknown DNA sequence. Based on the fluorescence

produced by the molecule, one can infer the identity of the nucleotide at this

position in the unknown sequence.

Sequencing by hybridization

It is a non-enzymatic method that uses a DNA microarray. A single pool of

DNA whose sequence is to be determined is fluorescently labeled and hybridized

to an array containing known sequences. Strong hybridization signals from a given

spot on the array identifies its sequence in the DNA being sequenced (Lizardi

2008).

Next generation sequencing methods (Hardiman 2008)

• Mass Spectrophotometric Sequences.

• Direct Visualization of Single DNA Molecules by Atomic force

Microscopy (AFM )

• Single Molecule Real Time Sequencing (SMRT) Techniques

• Readout of Cellular Gene Expression

• Use of DNA chips or micro arrays

Page 14: Gene Sequencing methods (word document)

• Nano pore sequencing

Nano pore sequencing is based on the electrical perturbations generated by a

single strand of DNA as it passes through a pore more than a thousand times

smaller than the diameter of a human hair. The physicists used mathematical

calculations and computer modeling of the motions and electrical fluctuations

of DNA molecules to determine how to distinguish each of the four different

bases (A, G, C, T) that constitute a strand of DNA. They based their

calculations on a pore about a nanometer in diameter made on silicon nitride,

surrounded by two pairs of tiny gold electrodes. The electrodes would record

the electrical current perpendicular to the DNA strand as the DNA passed

through the pore. Because each DNA base is structurally and chemically

different, each base creates its own distinct electronic signature (Lagerquist

2010).

Some commercial sequencers

• Rochel454FLXpyrosequencer - pyrosequencing

• Illumina genome analyzer – sequencing by synthesis

• Applied biosystems SOLiD sequencer – sequencing by ligation.

• Helicos Heliscope

• Pacific Biosciences SMRT – zeromode waveguide principle

References:

Gupta P K 2009 Cell and Molecular biology. 3rd edition Rastogi publications.

Hardiman G 2008 ultra-high-throughput sequencing, microarray based genomic

selection and pharmacogenomics. Phamacogenomics 9 (1): 5-9.

http://www.integratedDNAtechnologies.com

http://www.appliedbiosystems.com

http://www.biostudio.com

Page 15: Gene Sequencing methods (word document)

http://www.biologyanimations.com

Primrose S B and Twyman 2003 principles of genome analysis and genomics. 3rd

edition Blackwell publishing co.

Lagerquist J 2010 nanopore based sequence specific detection of duplex DNA for

genomic profiling. Nano letters April 2010 (on line journal).

Lizardi P M A new hybridization based technique offers advantage in sequencing

genomes. Nature biotechnology 26: 648-650.