The Human Genome Project-Babak Nami

download The Human Genome Project-Babak Nami

of 62

Transcript of The Human Genome Project-Babak Nami

  • 8/3/2019 The Human Genome Project-Babak Nami

    1/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    2/62

    The Human Genome ProjectThe Human Genome Project BabakBabak NamiNami Department of MedicalDepartment of Medical

    GeneticsGenetics SelSelukuk UniversityUniversity

  • 8/3/2019 The Human Genome Project-Babak Nami

    3/62

    Human GenomeHuman Genome The human genome is the genome of Homo sapiens, which isThe human genome is the genome of Homo sapiens, which is

    stored on 23 chromosome pairs.stored on 23 chromosome pairs.

    22 of these are22 of these are autosomalautosomal chromosome pairs, while thechromosome pairs, while theremaining pair is sexremaining pair is sex--determining.determining.

    billion DNA basebillion DNA base pairespaires..

    The haploid human genome contains ca.The haploid human genome contains ca. 23,000 protein23,000 protein--codingcodinggenesgenes, far fewer than had been expected before its sequencing., far fewer than had been expected before its sequencing.

    In fact, only aboutIn fact, only about 1.5%1.5% of the genome codes for proteins,of the genome codes for proteins,while the rest consists ofwhile the rest consists ofnonnon--coding RNA genescoding RNA genes,, regulatoryregulatorysequencessequences intronsintrons,, andand noncodingnoncoding DNADNA (once known as "(once known as "junkjunkDNA")DNA")

  • 8/3/2019 The Human Genome Project-Babak Nami

    4/62

    Human GenomeHuman Genome Information content of the haploid human genome by chromosome:Information content of the haploid human genome by chromosome:

    Haploid means we only count one of each chromosome pair. For thisHaploid means we only count one of each chromosome pair. For thisreason, the total information content for a woman (XX) is less than for areason, the total information content for a woman (XX) is less than for aman (XY), where both the X and the Y are counted.man (XY), where both the X and the Y are counted.

  • 8/3/2019 The Human Genome Project-Babak Nami

    5/62

    How much data make up the humanHow much data make up the humangenome?genome?

    3 Bookcases with 403 Bookcases with 40BooksBooks

    per bookcase x 5000per bookcase x 5000a esa es

    per book x 5000 bases perper book x 5000 bases perpage = 3,000,000,000page = 3,000,000,000

    bases!bases!

  • 8/3/2019 The Human Genome Project-Babak Nami

    6/62

    Human Genome ProjectHuman Genome Project The Human Genome Project (HGP) is anThe Human Genome Project (HGP) is an

    international scientific research project with ainternational scientific research project with aprimary goal of determining the sequence ofprimary goal of determining the sequence ofchemical base airs which make u DNA andchemical base airs which make u DNA andto identify and map the approximately 20,000to identify and map the approximately 20,00030,000 gene of the human genome from both a30,000 gene of the human genome from both a

    physical and functional standpoint.physical and functional standpoint.

  • 8/3/2019 The Human Genome Project-Babak Nami

    7/62

    HistoryHistory 1985.1985. Proposed.Proposed. 19881988. Initiated and funded by NIH and US Dept. of. Initiated and funded by NIH and US Dept. of

    Energy ($3 billion set aside)Energy ($3 billion set aside)

    19901990. Work begins.. Work begins... --project earlyproject early

    2001.2001.Published in Science and Nature in February,Published in Science and Nature in February,

    20022002. The quest for genome sequencing was being. The quest for genome sequencing was beingpursued simultaneously in over 20 laboratories in sixpursued simultaneously in over 20 laboratories in sixcountriescountries

    20032003. the whole genome sequenced. the whole genome sequenced

  • 8/3/2019 The Human Genome Project-Babak Nami

    8/62

    HistoryHistory

    Initiative Office of HGP

  • 8/3/2019 The Human Genome Project-Babak Nami

    9/62

    Human Genome Project Goals and Completion DatesHuman Genome Project Goals and Completion Dates

    1994:

    1998:Physical map

    (3,000 markers)

    2003:DNA sequence

    (99% of gene-containing

    2003:

    3 millionmappedSNPs

    1990 1995 2000 2005

    Genetic map1 cM resolution(3,000 markers)

    2003:15,000 full-length

    cDNAs

  • 8/3/2019 The Human Genome Project-Babak Nami

    10/62

    NIHNIH put the human genomeput the human genomesequence on the web July 7,sequence on the web July 7,

    20002000 Cyber geeksSearched forhiddenMessages,

    and

    UCSC put the

    human genomesequence on CD in

    October 2000, with

    varying results

  • 8/3/2019 The Human Genome Project-Babak Nami

    11/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    12/62

    The first printout of the human genome to beThe first printout of the human genome to bepresented as a series of books, displayed at thepresented as a series of books, displayed at theWellcomeWellcome CollectionCollection, London, London

  • 8/3/2019 The Human Genome Project-Babak Nami

    13/62

    Goals of the ProjectGoals of the Project

    identifyidentify all the approximately 30,000 genes inall the approximately 30,000 genes inhuman DNA,human DNA,

    determinedetermine the sequences of the 3 billion chemicalthe sequences of the 3 billion chemical

    base pairs that make up human DNA,base pairs that make up human DNA, storestore this information in databases,this information in databases, improveimprove tools for data analysis,tools for data analysis, transfertransferrelated technologies to the private sector,related technologies to the private sector,

    andand addressaddress the ethical, legal, and social issues (ELSI)the ethical, legal, and social issues (ELSI)

    that may arise from the project.that may arise from the project.

  • 8/3/2019 The Human Genome Project-Babak Nami

    14/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    15/62

    In BiotechnologyIn Biotechnology

    Production of useful protein products for use inProduction of useful protein products for use inmedicine, agriculture, bioremediation andmedicine, agriculture, bioremediation and

    pharmaceutical industries.pharmaceutical industries.

    Protein replacement (factor VIII, TPA, streptokinase, insulin,Protein replacement (factor VIII, TPA, streptokinase, insulin,interferon)interferon)

    BT insecticide toxin (fromBT insecticide toxin (fromBacillusBacillus thuringiensisthuringiensis))

    Herbicide resistance (Herbicide resistance (glyphosateglyphosate resistance)resistance) BioengineeredBioengineered foods]foods] PharmPharm animals animals

  • 8/3/2019 The Human Genome Project-Babak Nami

    16/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    17/62

    ProteomicsProteomics

    Investigates patterns and levels of geneInvestigates patterns and levels of geneexpression in diseased cells that can beexpression in diseased cells that can be

    profiles.profiles.

  • 8/3/2019 The Human Genome Project-Babak Nami

    18/62

    DNA Chip TechnologyDNA Chip Technology

  • 8/3/2019 The Human Genome Project-Babak Nami

    19/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    20/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    21/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    22/62

    InIn PharmacogenomicsPharmacogenomics

    Investigates DNA mutations associated withInvestigates DNA mutations associated withdisease susceptibility and drug sensitivities.disease susceptibility and drug sensitivities.

    ProdrugProdrug gene therapy for cancersgene therapy for cancers

  • 8/3/2019 The Human Genome Project-Babak Nami

    23/62

    In DevelopmentalIn Developmental BiologyBiology

    Regulation of embryonic development.Regulation of embryonic development.

    Regulation of the aging processRegulation of the aging process..

    ..

    Regulation of metabolism.Regulation of metabolism.

  • 8/3/2019 The Human Genome Project-Babak Nami

    24/62

    Evolutionary and ComparativeEvolutionary and ComparativeBiologistsBiologists

    Because DNA mutates at a constant rate,Because DNA mutates at a constant rate,comparisons of DNA between differentcomparisons of DNA between different

    ..

  • 8/3/2019 The Human Genome Project-Babak Nami

    25/62

    Human Genome SequenceHuman Genome SequenceVariationVariation

    Develop technologies for rapid, largeDevelop technologies for rapid, large--scalescale

    identification and scoring of singleidentification and scoring of single--nucleotidenucleotide

    polymorphisms and other DNA sequence variants.polymorphisms and other DNA sequence variants.

    Identify common variants in the coding regions of theIdentify common variants in the coding regions of themajority of identified genes during this 5majority of identified genes during this 5--year period.year period.

    Create a SNP map of at least 100,000 markers.Create a SNP map of at least 100,000 markers.

    Develop the intellectual foundations for studies ofDevelop the intellectual foundations for studies of

    sequence variation.sequence variation.

    Create public resources of DNA samples and cellCreate public resources of DNA samples and cell

    lines.lines.

  • 8/3/2019 The Human Genome Project-Babak Nami

    26/62

    Model organismsModel organisms

    Bacteria (Bacteria (E. coliE. coli, influenza, several others), influenza, several others)

    Yeast (Yeast (Saccharomyces cerevisiaeSaccharomyces cerevisiae))

    Plant (Plant (Arabidopsis thalianaArabidopsis thaliana)) Roundworm (Roundworm (Caenorhabditis elegansCaenorhabditis elegans))

    Fruit fly (Fruit fly (Drosophila melanogasterDrosophila melanogaster))

    Mouse (Mouse (Mus musculusMus musculus))

  • 8/3/2019 The Human Genome Project-Babak Nami

    27/62

    How does the human genomeHow does the human genome

    stack up?stack up?

    Organism Genome Size (Bases) Estimated Genes

    Human (Homo sapiens) 3 billion 30,000

    Laboratory mouse (M. musculus) 2.6 billion 30,000

    Mustard weed (A. thaliana) 100 million 25,000

    Roundworm (C. elegans) 97 million 19,000

    Fruit fly (D. melanogaster) 137 million 13,000

    Yeast (S. cerevisiae) 12.1 million 6,000Bacterium (E. coli) 4.6 million 3,200

    Human immunodeficiency virus (HIV) 9700 9

  • 8/3/2019 The Human Genome Project-Babak Nami

    28/62

    AAGTTC CTAAGC ATTCGG

    AAGTTC CTAAGC

    AAGTTC

  • 8/3/2019 The Human Genome Project-Babak Nami

    29/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    30/62

    Practical GoalsPractical Goals

  • 8/3/2019 The Human Genome Project-Babak Nami

    31/62

    http://www.genome.gov/Pages/News/PaceofDiseaseGeneDiscovery.pdf

  • 8/3/2019 The Human Genome Project-Babak Nami

    32/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    33/62

    Sequencing StrategySequencing Strategy Once a contig map of the genome wasOnce a contig map of the genome was

    obtained, it was necessary to sequenceobtained, it was necessary to sequenceeach individual clone.each individual clone.

    Most of the actual human genomeMost of the actual human genomesequencing was done on BAC clones,sequencing was done on BAC clones,which are less prone to rearrangement thanwhich are less prone to rearrangement thanYAC clones. BACs are about 100YAC clones. BACs are about 100--200200

    kbp long.kbp long.

    shotgun sequencingshotgun sequencing: The large cloned: The large clonedDNA is randomly broken up into a seriesDNA is randomly broken up into a seriesof small fragments ( less than 1 kb).of small fragments ( less than 1 kb).These fragments are cloned andThese fragments are cloned andsequenced. A computer program thensequenced. A computer program thenassembles them based on overlapsassembles them based on overlaps

    between the sequences of each clone.between the sequences of each clone. To ensure that every bit has been covered,To ensure that every bit has been covered,

    you need to sequence random clones untilyou need to sequence random clones untilyou have covered each spot 5you have covered each spot 5--10 times on10 times onaverage.average.

  • 8/3/2019 The Human Genome Project-Babak Nami

    34/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    35/62

    Sequencing: BACSequencing: BAC--based methodbased method

    Each clone 150-200,000 bp

    Cloned in bacteria ,

    BAC clones mappedclones

    subclones

    Subclones 2,000 bp

    Sequenced 10 times in 500 800 bpsegments

    Subclone sequences re-assembled

  • 8/3/2019 The Human Genome Project-Babak Nami

    36/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    37/62

    Sequencing Technologies

    The two basic sequencing approaches, Maxam-Gilbertand Sanger, differ primarily in the way the nested DNAfragments are produced.

    Maxam-Gilbert sequencing (also called the

    chemical degradation method) uses chemicals to cleave,

    lengths. A refinement to the Maxam-Gilbert method knownas multiplex sequencing enables investigators to analyzeabout 40 clones on a single DNA sequencing gel.

    Sanger sequencing (also called the chain termination ordideoxy method) involves using an enzymatic procedureto synthesize DNA chains of varying length in fourdifferent reactions, stopping the DNA replication atpositions occupied by one of the four bases, and then

    determining the resulting fragment lengths.

  • 8/3/2019 The Human Genome Project-Babak Nami

    38/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    39/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    40/62

    Advanced TechniquesAdvanced Techniques

    SOLiDSOLiD SequencingSequencing

    HelicosHelicos High speed Gene SequencingHigh speed Gene Sequencing

    Laser SequencingLaser Sequencing

  • 8/3/2019 The Human Genome Project-Babak Nami

    41/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    42/62

    How the Code was DecodedHow the Code was Decoded

    DoubleTwistDoubleTwist Inc, an application service provider (ASP),Inc, an application service provider (ASP),devoted to empower life scientists, completed the firstdevoted to empower life scientists, completed the firstannotationannotation of the human genome.of the human genome.

    TheThe DoubleTwistDoubleTwist human genome database was createdhuman genome database was created,,is, a total of more thanis, a total of more than 350 processors350 processors..

    It brought to a close an extensive analysis of the availableIt brought to a close an extensive analysis of the availableHGP data that revealed genes and other valuableHGP data that revealed genes and other valuableinformation. The task was accomplished using Suninformation. The task was accomplished using SunEnterprise supercomputers, includingEnterprise supercomputers, including StarfireStarfire servers.servers.

  • 8/3/2019 The Human Genome Project-Babak Nami

    43/62

    Genome MapGenome Map

    A genome map describes the order of genesor other markers and the spacing betweenthem on each chromosome. Human genome

    maps are constructed on several differentscales or levels of resolution.

  • 8/3/2019 The Human Genome Project-Babak Nami

    44/62

    Genetic MapGenetic Map

    Genetic linkage maps of eachchromosome are made bydetermining how frequently twomarkers are passed together

    from parent to child. Becauseexchanged during theproduction of sperm and eggcells, groups of traits (ormarkers) originally together onone chromosome may not beinherited together.

  • 8/3/2019 The Human Genome Project-Babak Nami

    45/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    46/62

    . . . t o a mu l t i. . . toamul t i --resolution view . . .resolution view . . .

  • 8/3/2019 The Human Genome Project-Babak Nami

    47/62

    . . . at the gene cluster level . . .. . . at the gene cluster level . . .

  • 8/3/2019 The Human Genome Project-Babak Nami

    48/62

    . . . the single gene level . . .. . . the single gene level . . .

  • 8/3/2019 The Human Genome Project-Babak Nami

    49/62

    . . . and at the single base level. . . and at the single base level

    caggcggactcagtggatctggccagctgtgacttgacaag

    caggcggactcagtggatctagccagctgtgacttgacaag

  • 8/3/2019 The Human Genome Project-Babak Nami

    50/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    51/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    52/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    53/62

    XX--ray hybrid mappingray hybrid mapping

    XX--ray hybrids are made by irradiating a human cell line with 3000 radray hybrids are made by irradiating a human cell line with 3000 radof Xof X--rays, fusion to hamster cells, and isolation of hybrid cell lines inrays, fusion to hamster cells, and isolation of hybrid cell lines incultureculture

    A panel of 100A panel of 100--200 hybrids with 5200 hybrids with 5--10 different fragments of human10 different fragments of humanDNA in each gives about 1000 fragments in total, i.e. the humanDNA in each gives about 1000 fragments in total, i.e. the humangenome has been divided into 1000 bits.genome has been divided into 1000 bits.

    ,,that they will be present in the same hybrids (since they are less likelythat they will be present in the same hybrids (since they are less likelyto be separated by an Xto be separated by an X--ray induced break).ray induced break).

    By doing a PCR assay for each marker on all the hybrids, a map can beBy doing a PCR assay for each marker on all the hybrids, a map can bemade. The units are called cR (centiray, where 1cR is a 1% chance thatmade. The units are called cR (centiray, where 1cR is a 1% chance thatthe markers will be separated by Xthe markers will be separated by X--ray breakage).ray breakage).

  • 8/3/2019 The Human Genome Project-Babak Nami

    54/62

  • 8/3/2019 The Human Genome Project-Babak Nami

    55/62

    For each pair of markers in turn the "co-retention frequency" isthe number of hybrids in which both markers are present,divided by the number of hybrids in which one or other (or

    both) markers are present. On the figure, there are 5 hybridscontaining both markers B and C, and 6 containing B and/or C.Therefore the co-retention frequency is 5/6 or 0.83. Likewise it

    is 6/7 for markers E and F, and 2/10 for markers C and E. Thisshows that B and C are close to ether E and F are close

  • 8/3/2019 The Human Genome Project-Babak Nami

    56/62

    CloneClone contigscontigs

    A clone contig is a series of cloned DNAA clone contig is a series of cloned DNAsegments that overlap each other, assembledsegments that overlap each other, assembledin the correct order along the genomein the correct order along the genome

    cosmids (capacity 45 kb)cosmids (capacity 45 kb) BACs or YACs (Bacterial or Yeast ArtificialBACs or YACs (Bacterial or Yeast Artificial

    Chromosomes) which can clone 100s of kb ofChromosomes) which can clone 100s of kb ofDNADNA -- more suitable for dealing with largemore suitable for dealing with largestretches of mammalian DNA.stretches of mammalian DNA.

    M ki l ti b fi i ti

  • 8/3/2019 The Human Genome Project-Babak Nami

    57/62

    Making a clone contig by fingerprinting

  • 8/3/2019 The Human Genome Project-Babak Nami

    58/62

    What does the draft human genomesequence tell us?

    By the Numbers

    The human genome contains 3 billion chemical nucleotide bases (A, C, T,and G).

    , ,largest known human gene being dystrophin at 2.4 million bases.

    The total number of genes is estimated at around 30,000--much lower thanprevious estimates of 80,000 to 140,000.

    Almost all (99.9%) nucleotide bases are exactly the same in all people.

    The functions are unknown for over 50% of discovered genes.

    What does the draft human genome

  • 8/3/2019 The Human Genome Project-Babak Nami

    59/62

    What does the draft human genomesequence tell us?

    How It's Arranged

    The human genome's gene-dense "urban centers" are predominantly composed ofthe DNA building blocks G and C.

    In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T.

    GC- and AT-rich regions usually can be seen through a microscope as light and.

    Genes appear to be concentrated in random areas along the genome, with vastexpanses of noncoding DNA between.

    Stretches of up to 30,000 C and G bases repeating over and over often occuradjacent to gene-rich areas, forming a barrier between the genes and the "junkDNA." These CpG islands are believed to help regulate gene activity.

    Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest(231).

    h d h d f h

  • 8/3/2019 The Human Genome Project-Babak Nami

    60/62

    What does the draft human genomesequence tell us?

    The Wheat from the Chaff

    Less than 2% of the genome codes for proteins.

    Repeated sequences that do not code for proteins ("junk DNA").

    Repetitive sequences are thought to have no direct functions, butthey shed light on chromosome structure and dynamics.

    The human genome has a much greater portion (50%) of repeatsequences than the mustard weed (11%), the worm (7%), and thefly (3%).

    What does the draft human genome

  • 8/3/2019 The Human Genome Project-Babak Nami

    61/62

    What does the draft human genomesequence tell us?

    How the Human Compares with Other Organisms

    Unlike the human's seemingly random distribution of gene-rich areas, many otherorganisms' genomes are more uniform, with genes evenly spaced throughout.

    Humans have on average three times as many kinds of proteins as the fly or wormbecause of mRNA transcript "alternative splicing" and chemical modifications to the

    proteins. This process can yield different protein products from the same gene. Humans share most of the same protein families with worms, flies, and plants; but the

    number of gene family members has expanded in humans, especially in proteinsinvolved in development and immunity.

    Although humans appear to have stopped accumulating repeated DNA over 50 million

    years ago, there seems to be no such decline in rodents. This may account for some ofthe fundamental differences between hominids and rodents, although gene estimates aresimilar in these species. Scientists have proposed many theories to explain evolutionarycontrasts between humans and other organisms, including those of life span, litter sizes,inbreeding, and genetic drift.

  • 8/3/2019 The Human Genome Project-Babak Nami

    62/62

    What does the draft human genomesequence tell us?

    Variations and Mutations

    Scientists have identified about 3 million locations where single-baseDNA differences (SNPs) occur in humans. This information promises to

    revolutionize the processes of finding chromosomal locations for disease-assoc a e sequences an rac ng uman s ory.

    The ratio of germline (sperm or egg cell) mutations is 2:1 in males vsfemales. Researchers point to several reasons for the higher mutation ratein the male germline, including the greater number of cell divisions

    required for sperm formation than for eggs.