L14 human genome

22
Human Genome Project Determine the entire sequence of the human genome. 3 billion base pairs Problem: It’s really big!

Transcript of L14 human genome

Page 1: L14 human genome

Human Genome Project

Determine the entire sequence of the human genome.

3 billion base pairs

Problem:

It’s really big!

Page 2: L14 human genome

Genome Sequencing

As of 6/ 25/ 04

1128 genome projects:

199 complete (includes 28 eukaryotes)

508 prokaryotic genomes in progress

421 eukaryotic genomes in progress

smallest: archaebacterium Nanoarchaeum equitans 500 kb

Bacillus anthracis (anthrax) 5228 kb

S. cerivisiae (yeast) 12,069 kb

Arabidopsis thaliana 115,428 kb

Drosophila melanogaster (fruit fly) 137,000 kb

Anopheles gambiae (malaria mosquito) 278,000 kb

Oryza sativa (rice) 420,000 kb

Mus musculus (mouse) 2,493,000 kb

Homo sapiens (human) 2,900,000 kb

http:// www. genomesonline. org/

1980 - $10/bp2001 - $0.1/bp

S. cerevisiae200xH. sapiens200xA. dubia

Page 4: L14 human genome

Human Genome Project timeline

E. coli Drosophila

C. elegans

Yeast

NRCRecommends

HGP

U.S.HGP

Begins

1990 1995 2000

Human Gene Map(16,000 genes)

Human Gene Map(30,181 genes)

Goal for Human Genetic Map

Exceeded

Physical MapCovers 98% of

Genome

Pilot Human Sequencing

Begins

Full-Scale Human Sequencing

Begins

Human draft

Phil Hieter

Page 5: L14 human genome

Completion of the genome

GenBank entries

double every 18 months

“Working Draft”

“Complete”4-5 coverage

9x coverage99.99 % acc

Page 6: L14 human genome

Completion of the genome

The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases.Human genome seems to encode only 20,000-25,000 protein-coding genes

International Human Genome Sequencing Consortium.Finishing the euchromatic sequence of the human genome.Nature 2004 Oct 21;431(7011):931-45.

Page 7: L14 human genome

Institutes that produced 85 % of the sequence

1.Whitehead Institute for Biomedical Research, Center for Genome Research, Cambridge, MA2. The Sanger Centre, Cambridge, UK3. Washington University Genome Sequencing Center, St. Louis, MI4. US Department of Energy, JGI, Walnut Creek, CA5. Baylor College of Medicine Human Genome Sequencing Center, Houston, TX

Countries: USA, UK, Japan, Germany, China, France

Page 8: L14 human genome

Genome SequencingGenome: 3 Gb

Cut genome into large pieces

Clone into BACs: 100 kb

Order based on sequence features (markers) = mapping

Cut again

Sequence AGAACAGGACGTATGTGGTTGTGGTTTTCTACTCC

CTACTCCTGTGTT

TTGTAAGTGAGAACAAssemble each BAC

…TTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTT…

Assemble entire sequence

Page 9: L14 human genome
Page 10: L14 human genome
Page 11: L14 human genome
Page 12: L14 human genome

What does the sequence mean?TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCCAGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATGTATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGTCATATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTTTATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTAGGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACATACAAAATAATACCTCAGTGTAACCTCTGTTTGTATTTCCCTTGATTAACTGATGCTGAGCACATCTTCATGTGCTTATTGACCATTAATTAGTCTTATTTGTTAAATGTCTCAAATATTTTATACAGTTTTACATTGTGTTATTCATTTTTTAAAAAATTCATTTTAGGTTATATGTATGTGTGTGTCAAAGTGTGTGTACATCTATTTGATATATGTATGTCTATATATTCTGGATACCATCTCTGTTTCATGCATTGCATATATATTTGCCTATTTAGTGGTTTATCTTTTCATTTTCTTTTGGTATCTTTTCATTAGAAATGTTATTTATTTTGAGTAAGTAACATTTAATATATTCTGTAACATTTAATGAATCATTTTATGTTATGTTTAGTATTAAATTTCTGAAAACATTCTATGTATTCTACTAGAATTGTCATAATTTTATCTTTTATATACATTGATATTTTTATGTCAAATATGTAGGTATGTGATATTATGCACATGGTTTTAATTCAGTTAATTGTTCTTCCAGATGTTTGTACCATTCCAACATCATTTAAATCATTAAATGAAAAGCCTTTCCTTACTAGCTAGCCAGCTTTGAAAATCCATTCATAGGGTTTGTGTTAATATATTTTTGTTCTTTTTTTTCCTTTCTACTGATCTCTTTATATTAATACCTACTGTGGCTTTATATGAAGTCATGGAATAATACGTAGTAAGCCCTCTAACACTGTTCTGTTACTGTTGTTATTGTTTTCTCAGGGTACTTTGAAATATTCGAGATTTTATTATTTTTTAGTAGCCTAGATTTCAAGATTGTTTTGACGATCAATTTTTGAATCAATTGTCAATATTTTTAGTAATAAAATGATGATTTTTGATTGGAAATACATTAAATCTATAAGCCAAATTGGAGATTATTGATATATTAACAAAAATGAGTTTTCCAGTCCATGAATGTATGCACATTATAAAATTCATTCTTAAGTATGTCATTTTTTAAGTTTTAGTTTCAGCAGTATATGTTTGTTACATAGGTAAACTCCTGTCATGGGGGTTAGTTGTACAGGTTATTTTATCATCCAGGCATAAAGCCCAGTACCCAGTAGTTATCTTTTCTGCTCCTCTCCCTCCTGTCACCCTCCACTCTCAAGTAGACCCCAGTTTCTGTTGTTCTCTTCTTTGCATTAATGACTTCTCATCATTTAGATTGCACTTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTTAGTTTGCTAAGGATAACCACCTCCATCTCCATCCATGTTCCCACAAAAGACATGATCTCCTTTTTTATGGCTGCATATTATTCCATGGTATATATGTACCACATTTTCTTTATCCAATCTGTCATTGATGGACATTTAGGTTGTTTCCACATCATTGCCGTTGTAAATACTGCTGCAGTGAATATTCGTGTGTATGTCTTTATGGTAGAATGATTTATATTCCTCTGGGTATATTTCCAAGTAATGGGATGGTTGGGTCAAATGGTAATTCTGCTTTTAGCTTTTTGAGGAATTGCCATATTGCCTTTCACAACGGTTGAACTAATTTATACTCCCAAGAGTGTATAAGTTGTTCCTTTTTCTCTGCAACCTCGACATCACCTGTTATTTATGACTTTTATATAATAGCCATTCTGCTGGTCTGAGATGGTATCTCATTATGATTTTGATTTGCATTTCTCTAATGCTCAGTGATATTGAGCTTGGCTGCATATATGTCTTCTTTTAAAAATATCTGTTCATGTCCTTTGCCTAATTTATAACGGGGTTGTTTGTTTTTCTCTTGTAAATTTGTTTAAGTTCCTTATAGATTCTAGGTATTAAACCTTTTTTCAGAGGCGTGGCTTGCAAATATTTTCTCCCATTCTATAGGTTGTCTGTTTATTCTGTTGATAGTTTCCCTTGCTGTGCAGAAGCTCTTAACTTTAATTAGATCCGACTTGTCAATTTTTGCTTTGGTCGCAATTGCTTTTGATGTTATTGTCGTGAAATCTTTGCTAGTTCTTAGGTCCAGGATGATATTGCCCAAGTTGTCTTCCAGGGCTTTTATAATTTTGGATTTTACATTTAAGTCTTAATATATTTATTAAATTTGTTAGGGTTTCAGGATACAAGGACAATATAGCAGCAAACAATGTAAAAGTAAAATCTGAAAAATAATAGAAAACAGTTTAATTGAACACTTTACCATTATGTAATGCCCTTCTTTGTCTTTCCTGATCTTTGTTGGTTTGAAGTTCAAAAAAGACAAACTTAATGGTACAATAGGTATTGTAGATTTCAGGACTTTCTGTATAAAATATTTTGTATATATGAATAGATCATTTTTTATTTCCAGTCTTTAAACATTTTCTTAACATTTTCTTCTATTGCTTCACTTCACTCGCTAGGACCATCAGGACAGTGTTGAACAGAAATTGTCAGACTGATCATCACAACTTTTTCTAGATTTTAGAAGGAAATTTTTCTTTATTTCAACATAAAGCAGCATGTTAATGCCAAGTTTTAATATGTGTTATCAGATTGAAATTTTTTTGTATATTTCTACATTACCAAGAATTTTTAGCAAGAGTTTTTGTTGAGTTTTAATTTAAAAATCATTTGTTAATTTCATCTGATTTTTTTATTTCTCTTTTTACCTTAAGAGATTAAACTGACTACAGATTGAATATAAACAAACAAACAAACAAACAAAAACTCTAAAATGCTGTGGATCAACACCACTTAGTAATTTGTATACTTGGATTCAATTTGCTGAAATTTTGTTAGACATTTTTGCGTCGATATTTATGAGGGATGTTGATCTGTAAAAGTATTAAAATGCCTTTGACAGATTTTGATAGCAGTGTTATTCTGGCCTAATAAATCAAACTGAGGTATGATCCTTCCTTTTCTATTTCTTAATAGCATTTTTAAAATTGGTGGTTTTTTCCTTCCTTAGTGAAATTTACCAGCAAAGTAACAGGCCTTATATTTCTCTTGTGGAAATATTTTAATTTCAAATTAATGGTATTTTGTTCTTGTAGGGTGGTAATTTTCTCTGTGTTTGGTCTTAATGGACTCTTAGCTGATCACCCAGTTACTCAGCGAGGTCTCTTCACTCTGGAAGAGCTGGAACTCCAGTGTGTTTTAGTGCAGCATGACCACGGGTATTACCGTTCAACATTTAGGCTTTATCAGTGATAACTATTTGTCCTCATGGAGTTTTTGCCGCTGGGCCTACACAGTTTAGGCTTCAGCTTAGAACACATAATGAATTCTTATGCAGATTTCTGCCCACCTTTGACCTTTCATGATTTCCTCTTCTTGGGTAAGCTGCCTTATTAATCTGATACACTTCAGCAGTCCAGAACTACACTCTTTCCCTTCTCTGCTCTTGGAGATGACTCTTTTGTCTGAGATTCACTTTGCTGTGCTGAAAAAGAAAAGTGCTTCAAGGAAGATACCAAGGAAAATCACAGGGCTCATTTATGTATTTCTCTTCTTTCAAGGACTACAGCTTTGTGTTGCCTATGTTCAATTTCTGAAAATAATTAGAGCATATATACTCTGTGTGAGAAGGCAAATCCAGACAGTTAGTTTGTATGACTAGAAGCAGAAGTCTACATGGAGAATTTTACTTAACTGTGTTATAGTTTCTTTAATTATTTCAAGAGTATGTTTAATGTTCCACAGATCTCATTCTATAAATCTTTATCATCTTAGAGCTCTGATACTATTTAGAATTACTATTCCTTCAAATAAGAGATTAGAAACAGGGTTATATTTGGGGTAGGTTGACTTACTTTTCTGGGAACCAAAGCATATTAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATTTACATTCTGCAAAATCATTCTTTCCTTTTTGAATTTGAAAAGGATCTTTGGTATACAGATATTCAATAGCCAGCCTGAAGATTCATTTGAATTCATTTAATGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAAGTATCTATAACGATGGAAATATACATCTCCACTGCCCAAGATGGTAGTCATGAGTCAATATTGATCATGTGAGACGTGGCAAGTGTTACTCAGGGTCTCAATATTTAAATGTATTAAGCTTTAATTAATGTAAATTTGAATTTAGCAAAACATGTATAGCTTGTGGTTACTGTTTTATTCAGTGCCAATATAGAACATTTCCATGATTACAGAAAGTTATCTTAGAATACTCAGTTCTGGACTATTTTATCTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTGTTGAATATTTTTATAGCAAAAGTCATTTATAAATTTAAAACTCAAATAATTATCTTTTTCAATATGTAAAATATGTCTTTACATATTCTACTCCCTTCTTACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGAAATGACAACATAAATAATTTTACTAAGTGTCACCATATAAAAAACTTTGAACAAAATCAGATTATATCACTGTGGATATTTCTATTTTGAACTAACTTAGATGATAATTTTAATCTATATCCTAGATGAACTTTAAATCAATAAAATCTCTCAATGGTGTTATAAATCTCAAGCCATTAGCCACTGATTATCCCATTTTTATTCTTTTCATATTAATTTTATTGCCATGTATGAATGCTGTAGCATCCATGTTTAAATACTAGTTAACAAAATGCACTGGCATCAGATACAATAAGGATGAAATGAGATATAATTAGGACTCTGGTAACACACATAAAATTGGAAAGATACCCTGAAATTCAAGCCAAGAAGATATTTATCCAGCTTATTTTATTTTGAGACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGGCTCGCTCCAACCTCTGTCTCCCAAATTGAAGTAATTCTCGTGCCTCAATCTCCCGAGTAGCTGGGATTACAGGCATGTGTCACCAAGCCTGGCTGATTTTTGTAGTTTTAGTAGAGACGGGGTTTCACCATGATGGCCAGGCTGGTCTTGAACTCCTGGCCTCAAGTGACTGGAACACCTCGGCCTCCTAAAGTGCTGGGATTACAGACGAGAGCCACTGAACAGCTTTGATCCAACTTATTTGGATGAATGAGTTACATATTTTACATTAAATCTGTTATTGTGATAATTCTTCATGTTATTTTCCATGTATAGATTTATATATAATGTAATTTTAATTTTTTTTCACCGGAGAGTATAAACAACAATTATTTTATAAACAGGATAATAAAAATAAGACAAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGTTACTTTGAAGCTGTTATCAATACTTGTGATGTATTACAATTAAGTAAAGATTTAAAGATGCCATTTTTAACTTATTATGACACAAAGTCTATAAATTCTTATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATAAAATTTCTTCTATGGATTGGTCTTCAATCGAGGCATAAAAAGGAATATAACAGTGTGGCACTATAACTTCTATATTGAATTTCTATATTATTTAACACAATTATAATTTTGCTAATGAATTGTAATGTTTTTAAAAAGCTAGGTGAATTTTATTAAATTCATTACATGGCGATAACACAGAGAAAACATTTTGGGGATTCTTTTAAAATGGTATGTACAAAAGCTTAAAAGTTGTTATGTAGTGGCAGAGATAAAAAAGTAAAACAAAAAAAAGCTTAAAAGTTTGCTTTACTATTTATAGGCTCATAAGTGTAAGTGTGCCAGAAAATGAAAAAGAAAGGAGAGAAATTATAAATAACTGTGTGGAAAACACAGATAAAGCATAAAGATAGAATATAAAGATAGAAGCATTTTAATATGAGGCAGTGATGGCTTTTTGAAGAATCCCAACTAAGGACCTACTTTTAGTTAATAAATAATATGTTTCTAATCCCTATATTGTCCACAGCAACCTTTTTAGGACATGGAGCAGTGACTATGAGTGCCAGAAGGCAAGAGTAGAAGCAATTGTAAAATCATGAACACTAGTTTGTAAAATCCTCACTGAGATATAATATCTGTTTGCCTCTACCTTAGAATTATTAATGTCTTGAGGGCTGGGA

A very small piece of chromosome 21

Page 13: L14 human genome
Page 14: L14 human genome

What’s in a genome?

Genes (i. e., protein coding)

But. . . only <2% of the human genome encodes proteins

Other than protein coding genes, what is there?

• genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.)

• structural sequences (scaffold attachment regions)

Regulatory sequences

• “junk” (including transposons, retroviral insertions, etc.)

Page 15: L14 human genome
Page 16: L14 human genome
Page 17: L14 human genome
Page 18: L14 human genome

Genome overview

Marked variation in distribution of number of features (GC, CpG, repetitions) 20.000-25.000 protein coding genes Proteome is more complex than those of invertebrates Hundreds of genes resulted from horizontal transfer More than 1.4 million SNPs, 10 million SNPs

Page 19: L14 human genome

Application to Medicine and Biology

Disease genes – positional cloning (30 genes already) Paralogues of disease genes (achromatopsia, CNGA3, CNGB3); (971 known disease genes => 286 paralogues) Drug targets – recent compendium = 483 drug targets, 18 new identified; Alzheimer’s disease, -amyloid is generated by processing APP by BACE; BACE2 in obligatory Down’s syndrom region of chromosome 21 Basic biology – bitter taste - new family of G-protein coupled receptors

Page 20: L14 human genome

The next steps

Large scale identification of regulatory regionsSequencing of additional large genomesCompleting the catalogue of human variationSequence-based functional prediction

Page 21: L14 human genome

Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome.

Nature 2005 Sep 1;437(7055):69-87.

Thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements. 98,6 % identitity to human genome sequenceDifferences in gene/exon structures

Page 22: L14 human genome

Apparent differences between humans and great apes in the incidence or severity of medically important conditions (excluding differences explained by obvious anatomical differences).

Medical Condition Humans Great Apes

Definite

HIV progression to AIDS Common Very rare

Influenza A symptomatology Moderate to severe Mild

Hepatitis B/C late complications Moderate to severe Mild

P. falciparum malaria Susceptible Resistant

Menopause Universal Rare

Likely

E. coli K99 gastroenteritis Resistant Sensitive?

Alzheimer’s disease pathology Complete Incomplete

Coronary atherosclerosis Common Uncommon

Epithelial cancers Common Rare