DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert...

21
1 DNA sequencing

Transcript of DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert...

Page 1: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

1

DNA sequencing

Page 2: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

2

Cloning Wolffia a. cDNA fragments into pTriplEX2

Determine the size of the insert by PCR and digests

Page 3: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

3

Sequencing DNA:• Rapid DNA sequencing methods were

first developed in the mid 1970's.• DNA sequencing has developed rapidly;

many genomes are completely sequenced.

• 1995 bacterium H. influenzae 1.8 x 106 bp ~1,700 genes

• 1996 yeast Saccharomyces cerevisiae 12 x 106 ~6,000 genes

• 1998 nematode Caenorhabditis elegans 97 x 106 ~ 20,000 genes

• 2003 Human genome! 3 x 109 ~25,000 genes

Handling the Explosion of Sequence Data

Page 4: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

4

GenBank is the NIH genetic sequence database, anannotated collection of all publicly available DNA sequences.

Dec, 2008

GenBank is the NIH genetic sequence database, anannotated collection of all publicly available DNAsequences.

Currently >1.0 X 1012 bases in 1.08 X 106 sequence recordsin the traditional GenBank divisions.

1.5 X 1012 bases in 4.8 X 107 sequence records in theWhole Genome Sequencing (WGS) division.

Page 5: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

5

Why are we sequencingthese Genomes?

• The information generated from theseprojects will serve as a blueprint forinvestigating the structure, function, andexpression patterns of genes that areinvolved in various cellular processes (thiswas controversial at the time).

• A goal of this project is to make youfamiliar with genes and searchingnucleotide and protein databases.

• The number and location of all restrictionsites without restriction mapping.

Once a gene is sequenced a lot of information can be determined about the gene

Page 6: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

6

• After conceptual translation of the DNAsequence into protein sequence, possiblesimilarities to other proteins.

Once a gene is sequenced a lot of information can be determined about the gene

AATTCGAGTTTGTG

ASN-TRP-SER-LEU ILE-ALA-VAL-CYS PHE-GLU-PHE-TRP

Frame 1

Frame 2

Frame 3

• After conceptual translation of the DNAsequence into protein sequence, possiblesimilarities to other proteins.

• Structure predictions of the encodedprotein based on the protein sequence.

Once a gene is sequenced a lot of information can be determined about the gene

Page 7: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

7

1980 Nobel Prize: Fred Sanger and Walter Gilberteach developed methods for DNA sequencing in 1970s

Gilbert(Chemical Method)

Sanger(Enzymatic Method)

Almost everyone uses Sanger's method (or variantsthereof) today.New methods being developed

How does it work?The fundamental idea behind both methods is the same.

One needs a known or fixed starting point on one end ofthe DNA to be sequenced.

DNA fragments are then generated that are random inlength but end with a defined type of base--either A, G,C or T.

The random populations of DNA fragments are thenseparated using high-resolution gels or chromatography.

This gel system can separate fragments that differ in aslittle as one base in length.

Page 8: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

8

No one really knows why DNAsynthesis can't start fresh(without adding onto somethingthat's already there).

Will take advantage of thisrequirement

Synthesis of the newly synthesizedstrand goes in the opposite directionto the template strand!

5' 3'

5' 3'

Template

Page 9: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

9

Synthesis of the newly synthesizedstrand goes in the opposite directionto the template strand!

5' 3'

5' 3'

Template

Synthesis of the newly synthesizedstrand goes in the opposite directionto the template strand!

5' 3'

5' 3'

Template

Page 10: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

10

•There are A, C, G, and Tdeoxyribonucleotides (dA, dC,dG, dT)•dNTP = deoxynucleotidetri-phosphate.•The phosphates providethe energy necessary forDNA synthesis.

•A name for an elephantwithout a tail is ddNTP =dideoxynucleotide tri-phosphate.(ddA, ddC, ddG, ddT).•This nucleotide is missing a 3’ -OH.•Once incorporated into a DNAmolecule, DNA synthesis stops.

p. 4-3

Page 11: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

11

Dideoxy (Sanger) Sequencing• DNA polymerase is used in order

to synthesize a complimentarysingle stranded DNA from atemplate.

• Elongation occurs at the 3´ end ofa primer DNA that is annealed to“template” DNA.

• Overall chain growth is in the 5´-3´direction.

• dNTPs are added to the growingDNA chain until a ddNTP is added.

• When a ddNTP is incorporated atthe 3´ end of the growing primerchain, chain elongation isterminated at G,A, T, or C becausethe primer chain now lacks a 3´hydroxyl group.

Dideoxy (Sanger) Sequencing

• Necessary reagents for sequencing.– DNA polymerase– Primers– dNTP– Buffer– Labeling reagent– Dideoxy nucleotides– Template DNA– ddNTP

Page 12: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

12

Dideoxy (Sanger) Sequencing

• Many individual strands willbe replicated in eachreaction tube using athermalcycler.

• Incorporation of dNTP intoall possible sites within thenewly synthesized strandshould occur.

• Newly synthesized strandswill then be separated andanalyzed by size by gelelectrophoresis.

Dideoxy (Sanger) Sequencing

Page 13: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

13

33

(not in chapter)

Page 14: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

14

Protocol

• PCR reaction in orderto amplify the DNA.

• DNA purification– Removes ddNTPs and

other impurities• Sequencing

(not in chapter)

Reading sequence the old way…

Page 15: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

15

The figure on the right showsthe action spectra of thefour dyes that are normallylinked to ddNTPs forautomated DNA sequencing.Each dye fluoresces adifferent color whenilluminated by a laser beam.

BASE DYE WAVELENGTHAdRGG 570 ddATP

GdROX 620 ddTTP

CdR110 540 ddCTP

TdTAMARA 600 ddGTPp. 4-3

Fluorescent dye terminators andautomated DNA sequencing

• Since four different dyes are used, all the reactionscan be done in a single tube, thus increasing throughput

• Some of the new sequencing machines use a smallcolumn (capillary), which can be reused.

• Sensitive lasers are used to determine the 3’ nucleotideof each successive fragment that migrates off the column

Page 16: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

16

Dideoxy (Sanger) Sequencing

• Sequencing machinesanalyze fluorescentlylabeled ddNTPs.– fluorescently labeled (red,

green, blue, yellow)– All reactions can be done in a

single tube.• A computer program

analyzes and interprets theresults.

Page 17: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

17

41

Dideoxy (Sanger) Sequencing

Page 18: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

18

47

atttaccgtg ttggattgaa attatcttgc atgagccagctgatgagtat gatacagttt tccgtattaa taacgaacggccggaaatag gatcccgatc atgattgctt caatattttcacttcaatga ttggttctaa gcattcgaat gcgtacccgtttgattaata tttccatttc tgtcccagtt tttaattttcatttcttttg gttaaaaaat tcccagtctc ttgaatgcttttctaaaatc tttaattcaa ttatttatta gaatcttctgttttgagaac attatcttgc atgagccagc tgatgagtatgatacagttt

LOCUS AB231879 1383 bp mRNA linear INV 07-JUN-2006DEFINITION Artemia franciscana mRNA for zinc finger protein Af-Zic, complete cds.ACCESSION AB231879VERSION AB231879.1 GI:94966317KEYWORDS .SOURCE Artemia franciscana ORGANISM Artemia franciscana Eukaryota; Metazoa; Arthropoda; Crustacea; Branchiopoda; Anostraca; Artemiidae; Artemia.REFERENCE 1 AUTHORS Aruga,J., Kamiya,A., Takahashi,H., Fujimi,T.J., Shimizu,Y., Ohkawa,K., Yazawa,S., Umesono,Y., Noguchi,H., Shimizu,T., Saitou,N., Mikoshiba,K., Sakaki,Y., Agata,K. and Toyoda,A. TITLE A wide-range phylogenetic analysis of Zic proteins: Implications for correlations between protein structure conservation and body plan complexity JOURNAL Genomics 87 (6), 783-792 (2006) PUBMED 16574373REFERENCE 2 (bases 1 to 1383) AUTHORS Aruga,J. and Toyoda,A. TITLE Direct Submission JOURNAL Submitted (10-AUG-2005) Jun Aruga, RIKEN Brain Science Institute, Laboratory for Comparative Neurogenesis; 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan (E-mail:[email protected], URL:http://www.brain.riken.go.jp/labs/lcn/, Tel:81-48-467-9791, Fax:81-48-467-9792)FEATURES Location/Qualifiers source 1..1383 /organism="Artemia franciscana" /mol_type="mRNA" /db_xref="taxon:6661" gene 1..1383 /gene="Af-Zic" CDS 1..1383 /gene="Af-Zic" /codon_start=1 /product="zinc finger protein Af-Zic" /protein_id="BAE94140.1" /db_xref="GI:94966318" /translation="MTASLSASVMNPSFIKRESPASATALFVPNQFSAVPNFGFHHVP SACATEQSSEMLNPFVDNHLRLNDQSNFQGYHHPHHGQIQQHHLGSYAARDFLFRRDM GLGMGLEAHHTHAAQHHHMFDPSHAAAAAHHAMFTGFDHNTMRLPTEMYTRDASGYAA QQFHQMGSMAPMAHPASAGAFLRYMRTPIKQELHCLWVDPEQPSPKKTCGKTFGSMHE GKVFARSENLKIHKRTHTGEKPFKCEFEGCDRRFANSSDRKKHSHVHTSDKPYNCKVR GCDKSYTHPSSLRKHMKVHGKSPPPASSGCDSDENESIADTNSDSAASPSPSSHDSSQ VQVNHNRPPNHHNLGLGFTNPGHIGDWYVHQSAPDMPVPPATEHSPIGPPMHHPPNSL NYFKTELVQN"ORIGIN 1 atgactgcta gtttaagtgc aagcgtgatg aatccaagtt ttataaagag ggaaagtcct 61 gcatcggcta cagccctgtt cgtaccaaac caatttagtg cagtgcctaa ttttggattt 121 caccatgttc ctagtgcttg tgcaactgag caaagtagtg aaatgctgaa cccttttgtg(Note: the rest of the DNA sequence was deleted to save space)

Genbank DNAsequence report

Page 19: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

19

49

LOCUS AB231879 1383 bp mRNA linear INV 07-JUN-2006DEFINITION Artemia franciscana mRNA for zinc finger protein Af-Zic, complete cds.ACCESSION AB231879VERSION AB231879.1 GI:94966317KEYWORDS .SOURCE Artemia franciscana ORGANISM Artemia franciscana Eukaryota; Metazoa; Arthropoda; Crustacea; Branchiopoda; Anostraca; Artemiidae; Artemia.REFERENCE 1 AUTHORS Aruga,J., Kamiya,A., Takahashi,H., Fujimi,T.J., Shimizu,Y., Ohkawa,K., Yazawa,S., Umesono,Y., Noguchi,H., Shimizu,T., Saitou,N., Mikoshiba,K., Sakaki,Y., Agata,K. and Toyoda,A. TITLE A wide-range phylogenetic analysis of Zic proteins: Implications for correlations between protein structure conservation and body plan complexity JOURNAL Genomics 87 (6), 783-792 (2006) PUBMED 16574373REFERENCE 2 (bases 1 to 1383) AUTHORS Aruga,J. and Toyoda,A. TITLE Direct Submission JOURNAL Submitted (10-AUG-2005) Jun Aruga, RIKEN Brain Science Institute, Laboratory for Comparative Neurogenesis; 2-1 Hirosawa, Wako-shi,

Clone and contact information

EATURES Location/Qualifiers source 1..1383 /organism="Artemia franciscana" /mol_type="mRNA" /db_xref="taxon:6661" gene 1..1383 /gene="Af-Zic" CDS 1..1383 /gene="Af-Zic" /codon_start=1 /product="zinc finger protein Af-Zic" /protein_id="BAE94140.1" /db_xref="GI:94966318" /translation="MTASLSASVMNPSFIKRESPASATALFVPNQFSAVPNFGFHHVP SACATEQSSEMLNPFVDNHLRLNDQSNFQGYHHPHHGQIQQHHLGSYAARDFLFRRDM GLGMGLEAHHTHAAQHHHMFDPSHAAAAAHHAMFTGFDHNTMRLPTEMYTRDASGYAA QQFHQMGSMAPMAHPASAGAFLRYMRTPIKQELHCLWVDPEQPSPKKTCGKTFGSMHE GKVFARSENLKIHKRTHTGEKPFKCEFEGCDRRFANSSDRKKHSHVHTSDKPYNCKVR GCDKSYTHPSSLRKHMKVHGKSPPPASSGCDSDENESIADTNSDSAASPSPSSHDSSQ VQVNHNRPPNHHNLGLGFTNPGHIGDWYVHQSAPDMPVPPATEHSPIGPPMHHPPNSL NYFKTELVQN"

Annotations

Page 20: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

20

General Databases:

NCBI DNA and protein sequences (USA database)EMBL DNA sequences (European Molecular Biology Laboratory)GenEMBL GenBank and EMBL sequences combinedDDBJ DNA sequences (Japan’s equivalent of Genbank)PIR Protein Identification Resource (protein sequences)SwissProt Protein sequences (Switzerland and EMBL)Genpept Translations of DNA based on authors’ informationPDB Coordinates for protein 3D structure. (Now maintained at Rutgers)

Organism Specific Databases:

Sanger Worm sequence and genomic databaseSGD Saccharomyces Genomic DatabaseYPD Yeast Protein DatabaseWPD Worm Protein DatabaseWormBase C. elegansFlybase Drosophila sequence and genetic databaseHuman Many

DNA search programs

BLAST--basic local alignment search tool

BLASTn--you provide nucleotide sequence, program comparesand reports nucleotide similarity alignment

BLASTp--you provide protein sequence, program comparesand reports protein similarity alignment

BLASTx--you provide nucleotide sequence, program translatesIn all six reading frames and compares and reports proteinsimilarity alignment

All three of these programs will be used in this project.

Page 21: DNA sequencing - Caldwell-West Caldwell Public … 1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s Gilbert (Chemical Method) Sanger

21

Next Generation DNA Sequencing

• Traditional SangerSequencing– 700-1000 bp– 96 samples/run

• Roche 454– 200-400 bp– 1 million/run

• NextGen:SOLiD/Illumina shortread sequencing– 25-50 bp– >300 million/run

Genomic scaffold

SOLiD System Overview

© 2008 Applied Biosystems