The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011...
-
Upload
denis-shanon-thompson -
Category
Documents
-
view
219 -
download
0
Transcript of The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011...
The eukaryotic chromosome(Chapter 16)
Monday, November 8, 2011Wednesday, November 10, 2011
Genomics260.605.01J. Pevsner
Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner, © 2009 by Wiley-Blackwell.
These images and materials may not be usedwithout permission.
Visit http://www.bioinfbook.org
Copyright notice
Today: The eukaryotic chromosome (Chapter 16)
Wednesday Nov. 10: The eukaryotic chromosome (continued)
Friday November 12: The fungi (Chapter 17)
The following week: lab; Sarah Wheelan; David Sullivan
Schedule
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomesC value paradox and genome sizesOrganization of genomes into chromosomesGenome browsersThe ENCODE project
Repetitive DNA content of eukaryotic genomesGene content of eukaryotic chromosomesRegulatory regions of eukaryotic chromosomesComparison of eukaryotic DNAVariation in chromosomal DNATechniques to measure chromosomal change
Introduction to the eukaryotes
Eukaryotes are single-celled or multicellular organismsthat are distinguished from bacteria and archaea by the presence of a membrane-bound nucleus, an extensive system of intracellular organelles, and a cytoskeleton.
Later we will explore the eukaryotes using a phylogenetic tree by Baldauf et al. (Science, 2000). This tree was made by concatenating four protein sequences: elongation factor 1a, actin, a-tubulin, and b-tubulin.
Page 640
Eukaryotes(after Baldauf et al., 2000)
Page 730
Note that we will learn how to make phylogenetic trees in lab as soon as MEGA software is installed in the computer labs.
You should visit:http://www.megasoftware.net/Download it for Windows, MacOS or Linux
General features of the eukaryotes
Some of the general features of eukaryotes that distinguishthem from bacteria and archaea are:
• eukaryotes include many multicellular organisms, in addition to unicellular organisms
• eukaryotes have [1] a membrane-bound nucleus
[2] intracellular organelles[3] a cytoskeleton
• Most eukaryotes undergo sexual reproduction
Page 641
General features of the eukaryotes (cont.)
Some of the general features of eukaryotes that distinguishthem from bacteria and archaea are:
• The genome size of eukaryotes spans a wider range than that of most bacteria and archaea
• Eukaryotic genomes have a lower density of genes
• Bacteria are haploid; eukaryotes have varying ploidy
• Eukaryotic genomes tend to be organized into linear chromosomes with a centromere and telomeres.
Page 641
Questions about eukaryotic chromosomes
What are the sizes of eukaryotic genomes, and how are they organized into chromosomes?
What are the types of repetitive DNA elements? What are their properties and amounts?
What are the types of genes? How can they be identified?
What is the mutation rate across the genome; what are the selective forces affecting genome evolution?
What is the spectrum of variation between species (comparative genomics) and within species?
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
C value paradox and genome sizes
Organization of genomes into chromosomes
Genome browsers
The ENCODE project
C value paradox: why eukaryotic genome sizes vary
The haploid genome size of eukaryotes, called the C value,varies enormously.
Encephalitozoon cuniculi ~3 MbA variety of fungi 10-40 MbTakifugu rubripes (pufferfish) 365 Mb(same number of genes as other fish or as the human genome, but 1/8th the size)
Pinus resinosa (Canadian red pine) 68 GbProtopterus aethiopicus (Marbled lungfish) 140 GbAmoeba dubia (amoeba) 690 Gb
Page 643
C value paradox: why eukaryotic genome sizes vary
The range in C values does not correlate well with thecomplexity of the organism. This phenomenon is calledthe C value paradox.
The solution to this “paradox” is that genomes are filledwith large tracts of noncoding, often repetitive DNA sequences.
Page 643
Eukaryotic genomes are organized into chromosomes
Genomic DNA is organized in chromosomes. The diploid number of chromosomes is constant in each species(e.g. 46 in human). Chromosomes are distinguished by a centromere and telomeres.
The chromosomes are routinely visualized by karyotyping(imaging the chromosomes during metaphase, when each chromosome is a pair of sister chromatids).
Page 644
Fig. 16.1Page 645
Plate II. First P.G. mitosis in polar view. Tradescantia virginiana, Commelinaceae, n = 9 (from aberrrant plant with 22 chromosomes). 2 BE - CV smears. x 1200. Printed on multigrade paper. Darlington.
Mitosis in Paris quadrifolia, Liliaceae, showing all stages from prophase to telophase. n = 10 (Darlington).
Root tip squashes showing anaphase separation. Fritillaria pudica, 3x = 39, spiral structure of chromatids revealed by pressure after cold treatment. Darlington.
Cleavage mitosis in the teleostean fish, Coregonus clupeoides, in the middle of anaphase. Spindle structure revealed by slow fixation. Darlington.
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
C value paradox and genome sizes
Organization of genomes into chromosomes
Genome browsers
The ENCODE project
The eukaryotic chromosome: the centromere
The centromere is a primary constriction where the chromosome attaches to the spindle fibers; here the boundary between sister chromatids is not clear. It may be in the middle (metacentric) or the end (acrocentric).
If a chromosome has two centromeres spaced apart (dicentric) then at anaphase there is a 50% chance that a single chromatid would be pulled to opposite poles of the mitotic spindle. This would result in a bridge formation and chromosome breakage.
Page 644
View a centromere on chromosome 11
Go to UCSC, human, hg18 assembly, chr11:51,000,001-55,000,000Set tracks in Mapping and Sequencing (all dense):
FISH clones GAPBAC End PairsFosmid End Pairs
Gene TracksRefSeq genes
Variation and RepeatsSegmental DupsRepMask 3.2.7
(1) Try zooming out 3-fold: are there more gaps?(2) Switch to the hg19 assembly: do the answers differ?(3) How many gaps are on chromosome 11? What sizes?
How many gaps are on chromosome 11? What are their sizes?
[1] Go to Table Browser, human Assembly = hg19Group = Mapping and Sequencing TracksTrack = GapRegion = chr11 (click lookup)Output = table in browserOutput = BED, send to Galaxy
[2] Galaxy result: there are 15 regionsLeft sidebar Text Manipulation Compute c3-c2
(round the result)Left sidebar Filter and Sort Sort on column c4
Execute
[3] Note the Galaxy output (implicitly) includes the telomeres (10 kb, 50 kb) and the centromere (3 Mb).
The eukaryotic chromosome: the acrocentrics
The short arm of the acrocentric autosomes has a secondary constriction usually containing a nucleolar organizer. This contains the genes for 18S and 28S ribosomal RNA.
Page 644
The eukaryotic chromosome: the telomere
The telomere is a region of highly repetitive DNA at either end of a linear chromosome. Telomeres include nucleoprotein complexes that function in the protection, replication, and stabilization of chromosome ends. Telomeres of many eukaryotes have tandemly repeated DNA sequences (discussed below).
Page 644
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
C value paradox and genome sizes
Organization of genomes into chromosomes
Genome browsers
The ENCODE project
Two main genome browsers
There are two principal genome browsers for eukaryotes:
(1) Ensembl (www.ensembl.org) offers browsers for dozens of genomes
(2) UCSC (http://genome.ucsc.edu) offers genome and table browsers for dozens of organisms. We will focus on this browser.
Page 645
Example #1 of a human genome web browser:human chromosome 21 at NCBI
nucleolar organizing center
centromere
nucleolar organizing center
centromere
Example #2 of a human genome web browser:human chromosome 21at www.ensembl.org
centromere
Example #3 of a human genome web browser:human chromosome 21 at UCSC
Example #4 of a human genome web browser:the beta globin gene cluster on human chromosome 11 at UCSC
[1] Enter hbb (for beta globin) into the gene box, and this takes you to chr11:5,246,696-5,248,301 (for the hg19 assembly), a 1,606 base pair region.
[2] Enter chr11:5,200,001-5,750,000 (a 550,000 bp region). Set the UCSC Genes and the RefSeq genes to “pack.” Note that the UCSC Genes track includes ENORMOUS models for HBE1 and HBG2.Why?
[3] We will instead focus on the RefSeq region. Enter: chr11:5,245,001-5,295,000 (50 kb region)
Beta globin cluster on chr11:5,245,001-5,295,000 (50 kb region, hg19 build, default tracks shown)
HBB, HBD, HBG1, HBG2, HBE1
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
C value paradox and genome sizes
Organization of genomes into chromosomes
Genome browsers
The ENCODE project
Page 647
The ENCODE project
►The ENCyclopedia Of DNA Elements (ENCODE) project was launched in 2003
► Pilot phase (completed): devise and test high-throughput approaches to identify functional elements. 44 DNA targets: 1 percent of the human genome, ~30 million base pairs (Mb).
► Second phase (simultaneous): technology development.
► Third phase: production. Expand the ENCODE project to analyze the remaining 99 percent of the human genome.
Key reference: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 447:799-816. PMID: 17571346
Page 647
The ENCODE project
Goal of ENCODE: build a list of all sequence-based functional elements in human DNA. This includes:
► protein-coding genes
► non-protein-coding genes
► regulatory elements involved in the control of gene transcription
► DNA sequences that mediate chromosomal structure and dynamics.
Page 647
http://genome.cse.ucsc.edu/ENCODE/encode.hg17.html
ENCODE data at the UCSC Genome Browser
ENCODE data at the UCSC Genome Browser: beta globin(switch from hg19 to hg18 assembly to access tracks!)
ENCODE tracks available at the UCSC Genome Browser(hg18 assembly)
ENCODE data at the UCSC Genome Browser: beta globin(switch from hg19 to hg18 assembly to access tracks!)
The ENCODE EGASP competition shows that leading gene prediction software generates gene models that differ greatly in sensitivity and sensitivity. This often results in quite distinct predictions of gene structure. Jigsaw is one of the best.
ENCODE variation tracks available at UCSC(hg18 assembly; try chr11:5,200,001-5,250,000)
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Noncoding and repetitive sequences
1. Interspersed repeats
2. Processed pseudogenes
3. Simple sequence repeats
4. Segmental duplications
5. Blocks of tandem repeats
Britten & Kohne’s analysis of repetitive DNA
In the 1960s, Britten and Kohne defined the repetitivenature of genomic DNA in a variety of organisms.They isolated genomic DNA, sheared it, dissociatedthe DNA strands, and measured the rates of DNAreassociation.
For dozens of eukaryotes—but not bacteria or viruses—large amount of DNA reassociates extremely rapidly.This represents repetitive DNA.
Page 650
Fig. 16.5Page 651
Britten and Kohne (1968) identified repetitive DNA classes
Software to detect repetitive DNA
It is essential to identify repetitive DNA in eukaryoticgenomes. RepBase Update is a database of knownrepeats and low-complexity regions.
RepeatMasker is a program that searches DNA queriesagainst RepBase. There are many RepeatMasker sitesavailable on-line.
We will use 50,000 base pairs from human chromosome11 as an example. This region includes the beta globin gene cluster.
Page 653
http://www.repeatmasker.org/current 11/11
Repeatmasker software screens DNA for repeats
RepeatMaskerresults for 50 kb of DNA in the human HBB locus
RepeatMaskermasksrepetitive DNA(FASTA format)
Five main classes of repetitive DNA
Page 652
1. Interspersed repeats (transposon-derived repeats)constitute ~45% of the human genome. They involveRNA intermediates (retroelements) or DNA intermediates(DNA transposons).
• Long-terminal repeat transposons (RNA-mediated)• Long interspersed elements (LINEs);
these encode a reverse transcriptase• Short interspersed elements (SINEs)(RNA-mediated);
these include Alu repeats• DNA transposons (3% of human genome)
UCSC displays interspersed repeats identified by RepeatMasker
(hg18 assembly; chr11:5,200,001-5,250,000)
Turn on the RepeatMasker track from the “Variation and Repeats” section; or the RepMask 3.2.7 track
Five main classes of repetitive DNA
Table 16.6Page 653
1. Interspersed repeats (transposon-derived repeats)
Examples include retrotransposed genes that lack introns,such as:
ADAM20 NM_003814 14q (original gene on 8p)Cetn1 NM_004066 18p (original gene on Xq)Glud2 NM_012084 Xq (original gene on 10q)Pdha2 NM_005390 4q (original gene on Xp)
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Noncoding and repetitive sequences
1. Interspersed repeats
2. Processed pseudogenes
3. Simple sequence repeats
4. Segmental duplications
5. Blocks of tandem repeats
Five main classes of repetitive DNA
Page 653
2. Processed pseudogenes
These genes have a stop codon or frameshift mutationand do not encode a functional protein. They commonly arise from retrotransposition, or following gene duplication and subsequent gene loss.
For a superb on-line resource, visit Mark Gerstein’s website, http://www.pseudogene.org. Gerstein and colleagues (2006) suggest that there are ~19,000 pseudogenes in the human genome, slightly fewer than the number of functional protein-coding genes. (11,000 non-processed, 8,000 processed [lack introns].)
Pseudogenes in the UCSC genome browser
Yale pseudogenes
VEGA pseudogenes
Pseudogenes in the UCSC genome browser
Note one retrogene prediction (duplicated HBG1 gene)
Pseudogenes in the HOX cluster ENCODE region
HOX genes
From the Entrez Gene entry for human HOXA1:
In vertebrates, the genes encoding the class of transcription factors called homeobox genes are found in clusters named A, B, C, and D on four separate chromosomes. Expression of these proteins is spatially and temporally regulated during embryonic development. This gene is part of the A cluster on chromosome 7 and encodes a DNA-binding transcription factor which may regulate gene expression, morphogenesis, and differentiation.
Vertebrate Genome Annotation (VEGA) database
From the VEGA home page (http://vega.sanger.ac.uk):
"The Vertebrate Genome Annotation (VEGA) database build 30 is designed to be a central repository for manual annotation of different vertebrate finished genome sequence. In collaboration with the genome sequencing centres Vega attempts to present consistent high-quality curation of the published chromosome sequences."
"Finished genomic sequence is analysed on a clone by clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions (GENSCAN, Fgenes)."
"In addition, comparative analysis using vertebrate datasets such as the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores (Evolutionary Conserved Regions) are used for novel gene discovery."
Vertebrate Genome Annotation (VEGA) database
VEGA definition of pseudogenes (http://vega.sanger.ac.uk):
Yale pseudogene database
http://www.pseudogene.org
The human genome has as many pseudogenes as genes
Pseudogenes: example
Mouse GULO, required for vitamin C biosynthesis, has become a pseudogene in the primate lineage (yGULO). Here is an output for GULO on the human genome:
Pseudogenes: example
GULO pseudogene in Entrez nucleotide (NG_001136):
Pseudogenes: example
Mouse GULO in Entrez protein:
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Noncoding and repetitive sequences
1. Interspersed repeats
2. Processed pseudogenes
3. Simple sequence repeats
4. Segmental duplications
5. Blocks of tandem repeats
Five main classes of repetitive DNA
Page 657
3. Simple sequence repeats
Microsatellites: from one to a dozen base pairsExamples: (A)n, (CA)n, (CGG)n
These may be formed by replication slippage.Minisatellites: a dozen to 500 base pairs
Simple sequence repeats of a particular length andcomposition occur preferentially in different species.In humans, an expansion of triplet repeats such as CAGis associated with over a dozen disorders (includingHuntington’s disease).
Example of simple sequence repeats (e.g. ACn) from the globin locus
Page 657
>hg18_rmskRM327_(TAAAA)n range=chr11:5206265-5206312 5'pad=0 3'pad=0 strand=+ repeatMasking=lower aaaataaaataaaataaaataaaataaaacaataaaatgaaataaaat
>hg18_rmskRM327_(CA)n range=chr11:5207526-5207560 5'pad=0 3'pad=0 strand=+ repeatMasking=lower acacacacacacacacacacacacacacacacaca
Repetitive DNA in the UCSC genome browser
Sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats identified by TRF. These tend to be highly polymorphic in the population.
Beta globin locus: tandem repeats, microsatellites, and RepeatMasker
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Noncoding and repetitive sequences
1. Interspersed repeats
2. Processed pseudogenes
3. Simple sequence repeats
4. Segmental duplications
5. Blocks of tandem repeats
Five main classes of repetitive DNA
Page 658
4. Segmental duplications
These are blocks of about 1 kilobase to 300 kb that arecopied intra- or interchromosomally. Evan Eichler andcolleagues estimate that about 5% of the human genomeconsists of segmental duplications. Duplicated regionsoften share very high (99%) sequence identity.
Beta globin locus: segmental duplications of the HBG1 and HBG2 genes
Fig. 16.12Page 659
Successive tandem gene duplications(after Lacazette et al., 2000)
observed today
Successive tandem gene duplications(after Lacazette et al., 2000)
Fig. 16.12Page 659
Successive tandem gene duplications(after Lacazette et al., 2000)
Fig. 16.12Page 659
Successive tandem gene duplications(after Lacazette et al., 2000)
Fig. 16.12Page 659
Beta globin locus: segmental duplicationsinvolving HBG1 and HBG2 genes
Page 660
Beta globin locus: segmental duplications(pairwise alignment of duplicated regions at UCSC)
Region of beta globin locus: segmental duplications(one region is duplicated at dozens of loci)
Region of beta globin locus: segmental duplications(one region is duplicated at dozens of loci)
By definition, non-RepeatMasked
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Noncoding and repetitive sequences
1. Interspersed repeats
2. Processed pseudogenes
3. Simple sequence repeats
4. Segmental duplications
5. Blocks of tandem repeats
Five main classes of repetitive DNA
Page 660
5. Blocks of tandem repeats
These include telomeric repeats (e.g. TTAGGG inhumans) and centromeric repeats (e.g. a 171 base pairrepeat of a satellite DNA in humans).
Such repetitive DNA can span millions of base pairs, and it is often species-specific.
Fig. 16.14Page 661
Example of telomeric repeats (obtained by blastn searching TTAGGG4)
Five main classes of repetitive DNA
Page 660
5. Blocks of tandem repeats
In two exceptional cases, chromosomes lack satellite DNA:
• Saccharomyces cerevisiae (very small centromeres)• Neocentromeres (an ectopic centromere; 60 have been described in human, often associated with disease)
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Gene content of eukaryotic chromosomes
Regulatory regions of eukaryotic chromosomes
Comparison of eukaryotic DNA
Variation in chromosomal DNA
Techniques to measure chromosomal change
Finding genes in eukaryotic DNA
Two of the biggest challenges in understanding anyeukaryotic genome are
• defining what a gene is, and • identifying genes within genomic DNA
Page 662
Finding genes in eukaryotic DNA
Types of genes include
• protein-coding genes• pseudogenes• functional RNA genes
--tRNA transfer RNA--rRNA ribosomal RNA--snoRNA small nucleolar RNA--snRNA small nuclear RNA--miRNA microRNA
Page 662
Finding genes in eukaryotic DNA
RNA genes have diverse and important functions.However, they can be difficult to identify in genomic DNA,because they can be very small, and lack open reading frames that are characteristic of protein-coding genes.
tRNAscan-SE identifies 99 to 100% of tRNA molecules,with a rate of 1 false positive per 15 gigabases.Visit http://lowelab.ucsc.edu/tRNAscan-SE/
Page 662
Finding genes in eukaryotic DNA
Protein-coding genes are relatively easy to find in prokaryotes, because the gene density is high (aboutone gene per kilobase). In eukaryotes, gene density is lower, and exons are interrupted by introns.
There are several kinds of exons:-- noncoding-- initial coding exons-- internal exons-- terminal exons-- some single-exon genes are intronless
Page 663
Fig. 16.16Page 664
Eukaryotic gene prediction algorithms distinguish several kinds of exons
Finding genes in eukaryotic DNA
Algorithms that find protein-coding genes areextrinsic or intrinsic (refer to Chapter 13).
Page 664
Homology-based searches (“extrinsic”) Rely on previously identified genes
Algorithm-based searches (“intrinsic”)Investigate nucleotide composition, open-reading frames, and other intrinsic properties of genomic DNA
DNA
RNA
Mature RNA
protein
intron
Page 664
DNA
RNA
RNA
protein
Extrinsic, homology-based searching: compare genomic DNA to expressed genes (ESTs)
intron
Page 664
DNA
RNA
Intrinsic, algorithm-based searching: Identify open reading frames (ORFs). Compare DNA in exons (unique codon usage) to DNA in introns (unique splices sites)and to noncoding DNA.
Page 664
chimpanzeeDNA
Comparative genomics: Compare gene models between species. (For annotation of the chimpanzee genome reported in 2005, BLAT and BLASTZ searches were used to align the two genomes.)
human DNA
Finding genes in eukaryotic DNA
While ESTs are very helpful in finding genes, beware ofseveral caveats.
-- The quality of EST sequence is sometimes low-- Highly expressed genes are disproportionately represented in many cDNA libraries-- ESTs provide no information on genomic location
Page 665
Finding genes in eukaryotic DNA
Both intrinsic and extrinsic algorithms vary in their ratesof false-positive and false-negative gene identification. Programs such as GENSCAN and Grail account for features such as the nucleotide composition of codingregions, and the presence of signals such as promoter elements. Try using the on-line genome annotation pipelineoffered by Oak Ridge National Laboratory. Google ORNL pipeline, or visithttp://compbio.ornl.gov/tools/pipeline/
Page 665
Oak Ridge National Laboratory (ORNL)offers an on-line annotation pipeline
Finding genes in eukaryotic DNA
We used 100,000 base pairs of human DNA. The pipelinecorrectly identified several exons of RBP4, but failed togenerate a complete gene model.
As another example, initial annotation of the rice genomeyielded over 75,000 gene predictions, only 53,000 of whichwere complete (having initial and terminal exons). Also,it is very difficult to accurately identify exon-intron boundaries.
Estimates of gene content improve dramatically whenfinished (rather than draft) sequence is analyzed.
EGASP: the human ENCODE Genome Annotation Assessment Project
EGASP goals:
[1] Assess of the accuracy of computational methods to predict protein coding genes. 18 groups competed to make gene predictions, blind; these were evaluated relative to reference annotations generated by the GENCODE project.
[2] Assess of the completeness of the current human genome annotations as represented in the ENCODE regions.
Page 666
UCSC: tracks for Gencode and for various gene prediction algorithms(focus on 50 kb encompassing five globin genes)
JIGSAW
Gencode
Page 667
EGASP: the human ENCODE Genome Annotation Assessment Project
“RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.”
Guigo R et al., Genome Biology (2006) 7 Suppl 1: S2.1-31
Page 667
Protein-coding genes in eukaryotic DNA:a new paradox
The C value paradox is answered by the presence of noncoding DNA.
Why are the number of protein-coding genes about the samefor worms, flies, plants, and humans?
This has been called the N-value paradox (number of genes)or the G value paradox (number of genes).
Page 668
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Gene content of eukaryotic chromosomes
Regulatory regions of eukaryotic chromosomes
Comparison of eukaryotic DNA
Variation in chromosomal DNA
Techniques to measure chromosomal change
Transcription factor databases
In addition to identifying repetitive elements and genes,it is also of interest to predict the presence of genomicDNA features such as promoter elements and GC content.
See Table 16.10 (p. 670) for a list of websites that predicttranscription factor binding sites and related sequences.
Page 669
http://www.sanger.ac.uk/Users/td2/eponine
Eponine predicts transcription start sites in promoter regions. The algorithm uses a set of DNA weight matrices recognizing sequence motifs that are associated with a position distribution relative to the transcription start site. The model is as follows:
The specificity is good (~70%), and the positional accuracy is excellent. The program identifies ~50% of TSSs—although it does not always know the direction of transcription.
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Gene content of eukaryotic chromosomes
Regulatory regions of eukaryotic chromosomes
Comparison of eukaryotic DNA
Variation in chromosomal DNA
Techniques to measure chromosomal change
Comparison of eukaryotic DNA: PipMaker and VISTA
In studying genomes, it is important to align large segments of DNA.
PipMaker and VISTA are two tools for sequence alignmentand visualization. They show conserved segments, including the order and orientation of conserved elements. They also display large-scale genomic changes (inversions, rearrangements, duplications).
Try VISTA (http://www-gsd.lbl.gov/vista) or PipMaker(http://bio.cse.psu.edu/pipmaker) with genomic DNAfrom Hs10 and Mm19 (containing RBP4).
Page 673
VISTA offers comparative analysis of genomes
Fig. ~16.22Page 674
VISTA output for an alignment of human beta globin region with seven vertebrates
conservednoncoding
codingexon
VISTA analysis of HBB gene locus
UTR
VISTA allows comparison of conserved transcription factor binding sites
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Gene content of eukaryotic chromosomes
Regulatory regions of eukaryotic chromosomes
Comparison of eukaryotic DNA
Variation in chromosomal DNA
Techniques to measure chromosomal change
The spectrum of variation
Category of variation Size typeSingle base pair changes 1 bp SNPs,
point mutationsSmall insertions/deletions 1 – 50 bpShort tandem repeats 1 – 500 bp microsatellitesFine-scale structural var. 50 bp – 5 kb del, dup, inv
tandem repeatsRetroelement insertions 0.3 – 10 kb SINEs, LINEs
LTRs, ERVsIntermediate-scale struct. 5 kb – 50 kb del, dup, inv,
tandem repeatsLarge-scale structural var. 50 kb – 5 Mb del, dup, inv, large
tandem repeatsChromosomal variation >>5Mb aneuploidy
Adapted from Sharp AJ et al. (2006) Annu Rev Genomics Hum Genet 7:407-42
Eukaryotic chromosomes can be dynamic
Chromosomes can be highly dynamic, in several ways.
• Whole genome duplication (autopolyploidy) can occur, as in yeast (Chapter 15) and some plants.• The genomes of two distinct species can merge, as in the mule (male donkey, 2n = 62 and female horse, 2n = 64)• An individual can acquire an extra copy of a chromosome (e.g. Down syndrome, TS13, TS18)• Chromosomes can fuse; e.g. human chromosome 2 derives from a fusion of two ancestral primate chromosomes• Chromosomal regions can be inverted (hemophilia A)• Portions of chromosomes can be deleted (e.g. D11q syndrome)• Segmental and other duplications occur• Chromatin diminution can occur (Ascaris)
Page 675
Conservative nature of chromosome evolution
Among placental mammals, the number of diploid chromosomes is:
84 in black rhinoceros46 in Homo sapiens17 in two rodent species
The process of chromosome evolution tends to remain conservative. Heterozygous carriers of most types of chromosomal rearrangements are semisterile. Thus many chromosomal changes cannot be fixed.
Ohno (1970) p. 41
Inversions in chromosome evolution
Chromosomal inversions occur when a fragment of a chromosome breaks at two places, inverts, and is reinserted. This is a useful mechanism for producing a sterility barrier during speciation. An example is in deer mice; another example is in Anopheles gambiae.
Ohno (1970) p. 42
The eukaryotic chromosome: Robertsonian fusioncreates one metacentric by fusion of two acrocentrics
Translocations occur when chromosomal material is exchanged between two non-homologous chromosomes. Roberstonian fusion, which often accompanies speciation, is the creation of one metacentric chromosome by the centric fusion of two acrocentrics.
Robertsonian fusions are often tolerated and may sometimes be considered selectively neutral. An example is the house mouse (Mus musculus, 2n = 40) and a small group of tobacco mice in Switzerland (Mus poschiavinus, 2n = 26). Mus poschiavinus is homozygous for seven Robertsonian fusions.
Page 675; Ohno (1970) p. 43
The eukaryotic chromosome: Robertsonian fusioncreates one metacentric by fusion of two acrocentrics
Ohno (1970) Plate II
ordinary male house mouse (Mus musculus, 2n = 40)
male tobacco mouse (Mus poschiavinus, 2n = 26)
Male first meiotic metaphase from an interspecific F1-hbrid. Note seven trivalents (each from one poschiavinus metacentric and two musculus acrocentrics)
Diploidization of the tetraploid
Ohno (1970) pp 98- 101
A species can become tetraploid. All loci are duplicated, and what was formerly the diploid chromosome complement is now the haploid set of the genome.
Polyploid evolution occurs commonly in plants. For example, in the cereal plant SorghumS. versicolor (diploid) 2n = 2 x 5; 10 chromosomesS. sudanense (tetraploid) 4n = 4 x 5; 20 chromosomesS. halepense (octoplooid) 8n = 8 x 5; 40 chromosomes
In plants, the male sex organ (stamen) and female organ (pistil or carpel) is present in the same flower; they are hermaphroditic.
Diploidization of the tetraploid
Ohno (1970) pp 98- 101
Polyploid evolution occurs rarely in vertebrates and other metazoans. For diploid organisms with XY/XX sex determination, in tetraploidy the male must maintain 4AXXYY and the female 4AXXXX. But during meiosis of the 4AXXYY male, the four sex elements may pair off as the XX bivalent and the YY bivalent such that every gamete is 2AXY. All offspring of the tetraploid male and tetraploid female would be 4AXXXY. If this were male, there would be no females. The 4AXXYY male cannot produce the necessary two classes of gametes, 2AXX and 2AYY.
Mammals, birds, and reptiles are thus not polyploid.
Tetraploidy: Odontophyrynus americanus is a newly arisen bisexual autotetraploid vertebrate
Ohno (1970) pp 98- 101
Polyploid evolution can occur in fish and amphibians, because of differences in sex determination (X and Y in males, Z and W in females).
Autopolyploidy: in a diploid organism, two daughter cells at the end of mitotic telophase may fuse into one cell, forming a tetraploid cell. Two diploid gametes may produce a tetraploid zygote.
Allopolyploidy: interspecies polyploidy.
Tetraploidy: Odontophyrynus americanus is a newly arisen bisexual autotetraploid vertebrate
Ohno (1970) pp 98- 101
South American frogs species of the family Ceratophyrydidae may be autopolyploids. The diploid chromosome number has a wide range:
O. cultripes 22 chromosomes 11 bivalents in meiosis
O. americanus 44 chromosomes 11 quadrivalents in meiosis
other species of this family110 chromosomes
Ohno (1970) plate III; p. 100
Karyotype of the tetraploid frog Odontophrynus americanus (4n = 44)
44 chromosomes: 11 sets offour homologs
sperm head
two bivalents
ten quadrivalents
Diploidization of the tetraploid
Ohno (1970) p. 102
As an autotetraploid arises, it has four homologous chromosomes for each linkage group. These must change to a disomic state to allow functional diversification of the loci. If the four homologues form a quadrivalent, there cannot be functional diversification. Two distinct, separate bivalents must form, for example via a pericentric inversion.
In fish of the suborder Salmonoidea, trout, salmon, whitefish and graylings are probably autotetraploid species.
Ohno (1970) plate IV; p. 102
Karyotypes of a species in the process of diploidization:rainbow trout Salmo irideus
Liver cell61 chromosomes
43 metacentrics18 acrocentrics
104 chrom. arms
Spleen cell59 chromosomes
45 metacentrics14 acrocentrics
104 chrom. arms
4 quadrivalents 1 quadrivalent
Trisomy and polysomy
Ohno (1970) p. 107
Nondisjunction results in two chromatids of one chromosome moving to the same division pole. In diploid species, one daughter cell receives three homologous chromosomes (trisomy). If this occurs in germ cells, the progeny may be trisomic.
In the Jimson weed (Datura stramonium) trisomy for each of the 12 chromosomes was observed by Blakeslee (1930). A mating between trisomic individuals may produce tetrasomic progeny having two homologous chromosomes (thus duplicating an entire chromosome).
Trisomy and polysomy
Ohno (1970) p. 107
For vertebrates, this mechanism is too severe. Generally, only trisomy of chromosomes 13, 18, or 21 are compatible with postnatal survival in humans.
In rainbow trout that have become tetraploid, trisomy (i.e. from four to five copies) and monosomy (i.e. from four to three copies) may be tolerated.
Outline: eukaryotic chromosomes
General features of eukaryotic chromosomes
Repetitive DNA content of eukaryotic genomes
Gene content of eukaryotic chromosomes
Regulatory regions of eukaryotic chromosomes
Comparison of eukaryotic DNA
Variation in chromosomal DNA
Techniques to measure chromosomal change