Overview of Genome Sequencing Progress
description
Transcript of Overview of Genome Sequencing Progress
![Page 1: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/1.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Overview of Genome Sequencing Progress
Eric C. Rouchka, D.Sc.
Bioinformatics Journal Club
October 1, 2003
![Page 2: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/2.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
DNA Sequences
• DNA: double stranded helix
• Composed of 4 bases: A,C,G,T
• Genome: linear chain of bases– Humans: 22 Autosome pairs, 2
sex chromosomes, 3.2 billion bases
(Image source: http://www.ebi.ac.uk/)
![Page 3: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/3.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Double Helix
• Two complementary DNA strands form a stable DNA double helix
• A, T are complements; G, C are complements
Image source; www.ebi.ac.uk/microarray/ biology_intro.htm
![Page 4: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/4.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Brief History of Sequencing
• Discovery of Complementary Bases– Erwin Chargaff, 1950
• Discovery of DNA Double Helix– 1953 – only 50 years ago
– James Watson– Francis Crick– Rosland Franklin
Image: www.simr.org.uk/pages/biotechnology/ biotechnology_2.html
![Page 5: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/5.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
DNA
RNA
PROTEIN
Central Dogma
![Page 6: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/6.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
RNA• Ribonucleic Acid
• Similar to DNA
• Thymine (T) is replaced by uracil (U)
• RNA can be:– Single stranded– Double stranded– Hybridized with DNA
![Page 7: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/7.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
RNA
• RNA is generally single stranded
• Forms secondary or tertiary structures
• Important in a variety of ways, including protein synthesis
![Page 8: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/8.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
How Central Dogma Works
• DNA “transcribed” into SS mRNA
• mRNA “translated” into protein using tRNA– Triplet bases (codons) used to code amino
acids– 3 mRNA bases code one amino acid
![Page 9: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/9.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History Of Genetic Code
• Genetic Code Completely uncovered (1965)– Marshall Nierenberg
![Page 10: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/10.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Genetic Code
• 4 possible bases (A, C, G, U)• 4 * 4 * 4 = 64 possible codon sequences• Start codon: AUG• Stop codons: UAA, UAG, UGA• 61 codons to code for amino acids (AUG as
well)• 20 amino acids – redundancy in genetic code
![Page 11: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/11.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
20 Amino Acids• Glycine (G, GLY)• Alanine (A, ALA)• Valine (V, VAL)• Leucine (L, LEU)• Isoleucine (I, ILE)• Phenylalanine (F, PHE)• Proline (P, PRO)• Serine (S, SER)• Threonine (T, THR)• Cysteine (C, CYS)• Methionine (M, MET)• Tryptophan (W, TRP)• Tyrosine (T, TYR)• Asparagine (N, ASN)• Glutamine (Q, GLN)• Aspartic acid (D, ASP)• Glutamic Acid (E, GLU)• Lysine (K, LYS)• Arginine (R, ARG)• Histidine (H, HIS)• START: AUG• STOP: UAA, UAG, UGA
![Page 12: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/12.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
tRNA structure
http://www.tulane.edu/~biochem/nolan/lectures/rna/frames/trnabtx2.htm
![Page 13: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/13.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Protein Structure
Image source; www.ebi.ac.uk/microarray/ biology_intro.htm
![Page 14: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/14.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Brief History of Sequencing
• First Protein Sequence– ~1955 Bovine Insulin (Fred Sanger)
• First DNA Sequence– ~1965 yeast alanine tRNA (77 bases)
• Development of DNA sequencing – Maxam-Gilbert and Sanger Methods (1977)
![Page 15: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/15.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Sanger Sequencing Method
• (Quicktime Movie)
• SOURCE: Molecular Cell Biology
![Page 16: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/16.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Improving Sanger’s Method
• Dideoxynucleosides fluorescently labeled (1986)– Reaction cut by ¼
• Sequencing Automated by machine (1986)
• Laser detects fluorescence
![Page 17: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/17.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Image Source: plantbio.berkeley.edu/ ~bruns/tour3.html
![Page 18: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/18.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
![Page 19: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/19.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Genetic Mapping
• Sex-linked genes studied since early 1900s
• Gene mapping takes off in late 1970s– David Botstein (RFLPs 1978)
• 1979: 579 Genes Mapped• 2003 ~30,000 Genes Mapped
– Mapping of Huntington’s Disease (First Diseased Gene)• Triplet Repeat• 1983• Nancy Wexler
![Page 20: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/20.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Mapping of Markers
• Sequence Tagged Sites (STS)– Sequences occurring only once in the
human genome
– Help to map locations
– 52,000 STS in Humans• ~ 1 every 62,000 bases
![Page 21: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/21.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Cloning Techniques
• Plasmid Cloning Introduced (1973)
– Region of Interest duplicated by inclusion
• YAC Chromosomes described (1987)
• BACs introduced (1992)
• 30,000 to 100,000 bases can be cloned
pACYC1773941 bp
KN(R)
AP r
P15A ORI
TN3 INV RPT
TN903 INV RPT
TN903 INV RPT
TN3 REPR FRAG
Apa LI (3815)
Bam HI (3321)
Cla I (2046)
Hin dIII (2473)
Pst I (304)
Sma I (2229)
Xma I (2227) Ava I (1953)
Ava I (2227)
![Page 22: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/22.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Hierarchical (Clone-based) Approach
• Know location of 30,000 – 100,000 bp region• Break into 500-700 bp fragments• Sequence Fragments• Assemble based on similarity• ~8-10x coverage
• Current Price: $0.09 / base
![Page 23: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/23.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Hierarchical (clone-based) approach
• generate overlapping set of clones• select a minimum tiling path• shotgun sequence each clone
![Page 24: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/24.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Hierarchical (clone-based) approach
• MINUS– map generation requires resources, time
and money– Some regions not cloned
• PLUS– easier to assemble smaller pieces– less chance for assembly error
![Page 25: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/25.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Shotgun Sequencing Approach
• Developed 1991 TIGR– Craig Venter, Hamilton Smith
• Break genome into millions of pieces– Sequence each piece– Reassemble into full genomes
![Page 26: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/26.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Whole Genome Shotgun Approach
• reads generated directly from a whole-genome library
• assemble the genome all at once
![Page 27: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/27.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Whole Genome Shotgun Approach
• MINUS– more prone to assembly error– computationally intensive– cannot effectively handle repeats
• PLUS– Less overhead time up front
![Page 28: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/28.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Base calling and Assembly Software
• PHRED and PHRAP Developed (1988)– PHRED: Base calling software– PHRAP: Assists in assembly of sequenced
data
![Page 29: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/29.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Available Assemblers
• SEQAID (Peltola et al., 1984)• CAP (Huang, 1992)• PHRAP (Green, 1994)• TIGR Assembler (Sutton et al., 1995)• AMASS (Kim et al., 1999)• CAP3 (Huang and Madan, 1999)• Celera Assembler (Myers et al., 2000)• EULER (Pevzner et al., 2001)• ARACHNE (Batzoglou et al., 2002)
![Page 30: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/30.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History of Genome Projects
• First Genome Sequence
– FX174 Phage 5,386 bp; 9 proteins (1980)
• Haemophilus Influenzea Sequenced
– First non-viral genome (1.8 MB) (1995)
![Page 31: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/31.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History of Genome Projects
• Saccharomyces cereviseae sequenced
– First eukaryotic genome (12.1 MB) (1996)
• Caenorhabditis elegans sequence released
– First animal genome 200 MB (1998)
![Page 32: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/32.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History of Genome Projects
• Arabidopsis thaliana sequence released
– First publicly available plant genome (1999)
• Rough Draft of Human Genome Reported (2001)– “Finished” 2003
![Page 33: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/33.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Human Genome Project
• Began in 1990 (US DOE – 15 years)– Identify all genes in human DNA– Determine sequence of human genome– Develop faster sequencing technologies– Develop tools for data analysis– ELSI
![Page 34: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/34.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Microbial Genomes
• 122 Complete Genomes in CMR– http://www.tigr.org/tigr-scripts/CMR2/CMR_
Content.spl
![Page 35: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/35.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Genomes
– Fruit Fly– Mouse– Rat– Rice– Zebra fish– Puffer fish– Chicken– Dog– Frog
![Page 36: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/36.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Growth of GenBank
• 1982: 600,000 Bases
• 2002: 28.5 Billion Bases
![Page 37: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/37.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• Dayhoff ATLAS Database of Proteins (1960s)
• Sequence Comparison Algorithms– 1970, Needleman-Wunch (global alignment)
• Protein Databank– Brookhaven PDB (1973)
![Page 38: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/38.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• NMR for protein structure identification (1980)
• IntelliGenetics Founded
– DNA and Protein sequence analysis (1980)
![Page 39: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/39.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• Smith-Waterman algorithm– Local sequence alignment (1981)
• GenBank Database created (1982)
• Genetics Computer Group Founded– GCG suite (1982)
• PCR First Described (1985)
![Page 40: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/40.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• FASTP Algorithm
– Protein database searching (1985)
• SWISS-PROT
– Protein Database (1986)
![Page 41: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/41.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• PERL Programming Language– Allows for sequence manipulation (1987)
• NCBI Established (1988)
• Human Genome Initiative (1988)
![Page 42: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/42.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• FASTA Program released (1988)– DNA and Protein sequence database searches
• BLAST Program released (1990)– Allows for quick database searches
• Informax Founded (1990)
• Human Genome Project Begins (1990)
![Page 43: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/43.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• Creation and Use of ESTs Described (1991)
• Incyte Pharmaceuticals Founded (1991)
• TIGR Established (1992)
– Shotgun sequencing methods
![Page 44: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/44.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• Affymetrix founded (1993)
• PRINTS protein motif database (1994)
![Page 45: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/45.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
First Commercial Microarray chips produced (1996)
• Dolly Cloned (1997)
• Capillary Sequencing machines introduced (1997)
![Page 46: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/46.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables
• Celera Genomics Formed (1998)
![Page 47: Overview of Genome Sequencing Progress](https://reader036.fdocuments.us/reader036/viewer/2022062423/56814e4d550346895dbbd80d/html5/thumbnails/47.jpg)
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
More Detailed Histories
http://www.netsci.org/Science/Bioinform/feature06.html
http://www.dhgp.de/intro/history/history.html