The human genome structure and organization - …€¦ · The human genome structure and ... that...

40
I dedicate this review in memory of Professor Jacek Augustyniak, who introduced me to the world of genes and genomes The human genome structure and organization Wojciech Maka³owski National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, U.S.A. Vol. 48 No. 3/2001

Transcript of The human genome structure and organization - …€¦ · The human genome structure and ... that...

I dedicate this review in memory of Professor Jacek Augustyniak, who introduced me to

the world of genes and genomes

Review

The human genome structure and organization*

Wojciech Maka³owski½

National Center for Biotechnology Information, National Library of Medicine,

National Institutes of Health, Bethesda, U.S.A.

Received: 22 January, 2001; accepted: 26 February, 2001

Genetic information of human is encoded in two genomes: nuclear and mitochon-

drial. Both of them reflect molecular evolution of human starting from the beginning

of life (about 4.5 billion years ago) until the origin of Homo sapiens species about

100000 years ago. From this reason human genome contains some features that are

common for different groups of organisms and some features that are unique for

Homo sapiens. 3.2 ´ 109base pairs of human nuclear genome are packed into 23

chromosomes of different size. The smallest chromosome � 21st contains 5 ´ 107

base pairs while the biggest one �1stcontains 2.63´ 10

8base pairs. Despite the fact

that the nucleotide sequence of all chromosomes is established, the organisation of

nuclear genome put still questions: for example: the exact number of genes encoded

by the human genome is still unknown giving estimations from 30 to 150 thousand

genes. Coding sequences represent a few percent of human nuclear genome. The ma-

jority of the genome is represented by repetitive sequences (about 50%) and

noncoding unique sequences. This part of the genome is frequently wrongly called

�junk DNA�. The distribution of genes on chromosomes is irregular, DNA fragments

containing low percentage of GC pairs code lower number of genes than the frag-

ments of high percentage of GC pairs.

Vol. 48 No. 3/2001

587�598

QUARTERLY

*Presented at the XXXVIMeeting of the Polish Biochemical Society, Poznañ, 13 September 2000, Poland.½

Mailing address: NCBI/NLM/NIH, 45 Center Drive, MSC 6510, Bldg. 45, Room 6As.47A, Bethesda,MD

20892-6510, U.S.A., phone: (301) 435 5989; fax: (301) 480 2918; e-mail: [email protected]

Abbreviations:CDS, coding DNA sequence; EST, expressed sequence tag; FISH, fluorescence in situ hy-

bridization; HERV, human endogenous retrovirus; LINE, long interspersed repetitive element; LTR,

long terminal repeat; SAR, scaffold-attachement region; SINE, short interspersed repetitive element;

TE, transposable element; UTR, untranslated region.

INTRODUCTION � HISTORICAL

PERSPECTIVE

From the beginning of humanity, people

have been interested in themselves. They

were well aware of two aspects of living na-

ture: an immense variability within each spe-

cies and the tendency for characteristics of

parents to be transmitted to their offspring.

Already pre-Socratic philosophers noticed

that people shared some characteristics, e.g.

had usually, with some exceptions, two hands,

a nose, large forehead, in other words they

were alike. On the other hand, everybody was

different and nobody should have a problem

to distinguish those two gentlemen by such

characteristics as eyes, cheeks, or shirts. An-

cient people were also aware that the above

was true for both intra- and inter-species com-

parison.

The question arises: how does it happen that

our children are more similar to parents than

to monkeys? The problem already intrigued

pre-Socratic philosophers. Probably the first

person who publicly expressed his thoughts

on the subject was Anaxagoras of Clazome-

nae. According to his teaching, seed material

is carried from all parts of the body to repro-

ductive organs by the humors. Fertilization is

the mixing of the seed material of father and

mother. That all parts of the body participate

in the production of seed material is docu-

mented by the fact that blue-eyed parents

have blue-eyed children and baldheaded men

have sons that become baldheaded — not a

very good prospect for my own children. The

idea of panspermy or pangenesis was adapted

and taught by the famous physician

Hipocrates (about 460–377 B.C.) and was

widely accepted until the end of the nine-

teenth century, also by Charles Darwin. One

of the greatest scientists of all time, Aristotle

of Stagira had a different view on the prob-

lem. Aristotle�s theory of inheritance, as de-

scribed in one of his major works De

generatione animalium, was holistic. He held

that the contributions by males and females

were not equal. The semen of the male con-

tributes the form-giving principle, eidos, while

the menstrual blood, cantemina, of the female

is the unformed substance shaped by the eidos

of the semen. “The female always provides the

material, the male provides that which fash-

ions the material into shape; this in our view,

is the specific characteristic of each sex: that

is what it means to be male or to be female.”

(Aristotle, 1965).

The twentieth century witnessed accelerated

development of biology and with it the nature

of the inheritance process was understood.

Consequently, an effort to decipher the

blueprint of our species has started. Several

biological discoveries were especially impor-

tant to decipher the human genome. Every-

thing started with the rediscovery of Mendel�s

laws by Hugo Marie De Vries (1900), followed

by discovery of chromosomes by Thomas H.

Morgan in 1910 (Morgan, 1910). In 1953,

James D. Watson and Francis H.C. Crick un-

raveled the structure of DNA (Watson &

Crick, 1953a; Watson & Crick, 1953b). Fours

years later, Johan H. Matthaei and Marshall

Nirenberg performed experiments which en-

abled deciphering the genetic code. With the

development of the fast methods of DNA se-

quencing in the mid-seventies (Maxam &

Gilbert, 1977; Sanger et al., 1977), followed by

automation of cloning and sequencing in the

nineties, the way to understand our blueprint

became clear. By now, many complete

genomes of both prokaryotic and eukaryotic

organisms have been sequenced. For

up-to-date tables with completed genomes, go

to http://www.ebi. ac.uk/genomes/. On June

26, 2000, virtually all news agencies in the

world announced completion of a working

draft of the human genome. This accomplish-

ment was so important for humankind that in-

stead of announcing it at a scientific confer-

ence or in a scientific journal, as used to be

with a scientific milestones, a special press

conference was organized in The White House

in Washington, D.C. In several days faces of

major players from both private and public

588 W. Makalowski 2001

sectors appeared on journals� covers around

the world, including the Polish weeklies

Polityka and Wprost. It is worth pointing out

that the public genome project already com-

pleted sequence of two chromosomes: 22 (De-

cember, 1999) (Dunham et al., 1999) and 21

(May, 2000) (Hattori et al., 2000). The work-

ing draft of the human genome was published

by both projects last January.

HUMAN GENOME � GENERAL

INFORMATION

Our genetic material is stored in two

organelles: nucleus and mitochondria. This re-

view is focused on the nuclear genome in

which 3.2 miliard bp are packed in 22 pairs of

autosomes and two sex chromosomes, X and

Y. Human chromosomes are not of equal

sizes; the smallest, chromosome 21, is 54 mln

bp long; the largest, chromosome 1, is almost

five times bigger with 249 mln bp (see Ta-

ble 1).

Genomic sequences can be divided in several

ways. From the functional point of view we

can distinguish genes, pseudogenes, and

non-coding DNA (Fig. 1). Only a minute frac-

tion of the genome — about 3% — codes for pro-

teins. There are many pseudogenes in the hu-

man genome (0.5%) but most of the genome

consists of introns and intergenic DNA. Al-

most half of these sequences consist of differ-

ent transposons; moreover, the remaining

non-coding DNA most likely originated from

transposable elements as well but with time

they have mutated beyond recognition.

SEQUENCE COMPLEXITY

The human genome contains various levels

of complexity as demonstrated by reasso-

ciation kinetics. Such analyses of the human

genome estimate that 60% of the DNA is ei-

Vol. 48 The human genome 589

Table 1. Physical sizes of human chromosomes

Chromosome Size (Mbp)

1 249

2 237

3 192

4 183

5 174

6 165

7 153

8 135

9 132

10 132

11 132

12 123

13 108

14 105

15 99

16 84

17 81

18 75

19 69

20 63

21 54

22 57

X 141

Y 60

Figure 1. Fractions of differ-

ent sequences in the human

genome.

ther single copy or in very low copies; 30% of

the DNA is moderately repetitive; and 10% is

considered highly repetitive.

Various staining techniques demonstrate al-

ternative banding patterns of mitotic chromo-

somes referred to as karyograms. Although

the three broad classes of DNA are scattered

throughout the chromosome, chromosomal

banding patterns reflect levels of com-

partmentalization of the DNA. Using the

C-banding technique yields dark-staining re-

gions of the chromosome (or C bands), re-

ferred to as heterochromatin. These regions

are highly coiled, contain highly repetitive

DNA, and are typically found at the

centromeres, telomeres, and on the Y chromo-

some. They are composed of long arrays of

tandem repeats and therefore some may con-

tain a nucleotide composition that differs sig-

nificantly from the remainder of the genome

(approximately 40–42% GC). That means that

they can be separated from the bulk of the ge-

nome by buoyant density (caesium chloride)

gradient centrifugation. Gradient centrifu-

gation results in a major band and three mi-

nor bands referred to as satellite bands —

hence the term satellite DNA.

The G-banding technique yields a pattern of

alternating light and dark bands reflecting

variations in base composition, time of repli-

cation, chromatin conformation, and the den-

sity of genes and repetitive sequences. There-

fore, the karyograms define chromosomal or-

ganization and allow for identification of the

different chromosomes. The darker bands, or

G bands, are comparatively more condensed,

more AT-rich, less gene-rich and replicate

later than the DNA within the pale bands,

which correspond to the R bands by an alter-

native staining technique. More recently,

these alternative banding patterns have been

correlated to the level of compaction of scaf-

fold-attachment regions (SARs).

The human genome may also be compart-

mentalized into large (> 300 kb) segments of

DNA that are homogeneous in base composi-

tion referred to as isochores (Bernardi, 2000),

based on sequence analysis and compositional

mapping. L1 and L2 are GC-poor (or ‘light�)

isochore families representing about 62% of

the genome. The H1, H2 and H3 (heavy)

isochore classes are increasingly GC-rich.

There is some correlation between isochores

and chromosomal bands. G bands are almost

exclusively composed of GC-poor isochores,

with a minor contribution from H1. R bands

can be classified further into T bands (R band-

ing at elevated temperatures), which are com-

posed mainly of H2 and H3 isochores, and R�

(non-T R bands) which are comprised of

nearly equal amounts of GC-rich (primarily

H1) and GC-poor isochores.

Additionally, there are five human chromo-

somes (13, 14, 15, 21, 22) distinguished at

their terminus by a thin bridge with rounded

ends referred to as chromosomal satellites.

These contain repeats of genes coding for

rRNA and ribosomal proteins that coalesce to

form the nucleolus and are known as the nu-

cleolar organizing regions.

HUMAN GENE NUMBER

It is interesting that the number of genes

coded by our genome is not known and proba-

bly will not be known long after completion of

the human genome sequencing. Nevertheless,

in the last decade, several groups tried to an-

swer this question using different methods

(see Table 2). Unfortunately, the estimations

differ very much with prediction as low as

28000 up to 80000 genes per human haploid.

The whole genomic community is so excited

with this mysterious number that they de-

cided to organize the Gene Sweepstake. The

Gene Sweepstake will run between 2000 and

2003 and its detailed rules may be found at:

http://www. ensembl.org/Genesweep/. As of

January 2001, 165 bets were made with gene

number between 27462 and 153478 and a

mean value of 61710.

590 W. Makalowski 2001

EXONS CHARACTERISTIC

In most human genes, coding sequences are

interrupted by stretches of non-coding se-

quences, which are spliced out during mRNA

maturation. Using nomenclature introduced

by Walter Gilbert (Gilbert, 1978), the human

genes look like mosaics, consisting of series of

exons (DNA sequences that can be subse-

quently found in the mature mRNA) and

introns (silent DNA sequences that are absent

from the final mRNA). As nothing in nature is

simple, some of the introns carry significant

information and even code for other complete

genes (see description of nested genes below).

Initially, it was thought that introns occured

only in untranslated parts of mRNA and cod-

ing sequences (CDS) were not interrupted.

However, it soon became clear that introns

could be found in all domains of mRNA mole-

cule. Therefore, exons can be classified as fol-

lows: 5� UTR exons, coding exons, 3� UTR

exons, and all possible combinations of those

three main types, including single exons that

cover the whole mRNA. The latter are very in-

teresting from the evolutionary biology point

of view, because in most cases they are

retroposed copies of “regular” genes with

introns. Michael Zhang of Cold Spring Harbor

Laboratory analyzed 4731 human exons

(Zhang, 1998). It appears that human exons

are relatively short with median value of 167

bp and mean equal to 216 bp. The shortest

exon was only 12 bp while the longest one

6609 bp. These numbers have to be taken with

some caution because they are based on

GenBank annotation, which sometimes is not

very precise. Mixed (including coding and

non-coding sequences) exons tend to be longer

than single type exons, especially those at the

end of the message; not surprisingly so, since

3� UTRs are relatively long in mammalian

mRNAs. In our analysis of over 2000 human

mRNA sequences the median and mean sizes

of human message domains were as follow:

118 nt and 191 nt for 5� UTR, 1191 and 1424

for CDSs, and 534 and 576 for 3� UTRs, re-

spectively (Makalowski et al., 1996; Maka-

lowski & Boguski, 1998).

GENE DISTRIBUTION

Genes may be transcribed from either the

same or from the opposite strand of the ge-

nome, i.e. they may lie in the same

(tail-to-head) or opposite orientation

(head-to-head or tail-to-tail). Although the vast

majority of the human genome accounts for

non-exonic sequences, a surprisingly large

number of genes occupy the same genomic

space. About 6% of human genes reside in

introns of other genes (Wong et al., 2000). For

example, intron 27th of NF1 gene hosts three

other genes that have small introns on their

own, suggesting that they are not products of

retroposition (see Fig. 2). Additionally, over

100 gene pairs are overlapping at 3� end, i.e.

their 3� UTRs occupy the same region though

different strands (I. Makalowska, personal

communication). TPR and MSF genes map to

the same region of chromosome 1. The last

exon of the TPR gene is 872 nt long and over-

laps completely with the last exon of the MSF

gene (200 nt). Interestingly, the very end of

Vol. 48 The human genome 591

Table 2. Estimation of human gene number using different methods

Gene number Method Reference

80000 CpG islands (Antequera & Bird, 1994)

64000 ESTs (Fields et al., 1994)

35000 ESTs (Ewing & Green, 2000)

28000�34000 Comparative genomics (Roest Crollius et al., 2000)

30000 Gene punctuation (Yang et al., 2001)

the MSF gene overlaps with the intron of the

TPR gene (see Fig. 3).

Unlike in plant genomes, most of non-exonic

sequences in human genome account for

introns (Wong et al., 2000). However, genes

are not equally distributed throughout the ge-

nome. There is a distinct association between

GC-richness and gene density. This is consis-

tent with the association of most genes with

CpG islands, the 500–1000 bp GC-rich seg-

ments flanking (usually at the 5� end) most

housekeeping and many tissue-specific genes.

The clustering of CpG islands, as demon-

strated by fluorescence in situ hybridization

further depicts gene-poor and gene-rich chro-

mosomal segments (Craig & Bickmore, 1994).

As a consequence, more than half of human

genes locate in the so-called “genomic core”

(isochores H2 and H3) comprising only 12% of

the human genome (see Table 3).

592 W. Makalowski 2001

Figure 2. An example of nested genes.

The human sequence from chromosome 1 (GenBank accession number AC004526) was analyzed using

GeneMachine (Makalowska et al., 2001). Connected closed boxes represent gene models as predicted by GenScan

software (Burge&Karlin, 1997) and boxes with arrows represent results of BLASTn search; AC004526was used as

a query against nr database.

Figure 3. An example of

overlapping genes.

The human sequence from

chromosome 1 (GenBank

accession number AL13-

3533) was analyzed using

GeneMachine (Makalow-

ska et al., 2001). Connect-

ed closed boxes represent

gene models as predicted

by GenScan software

(Burge & Karlin, 1997)

and open boxes with ar-

rows represent results of

BLASTn search; AL13-

3533 was used as a query

against nr database.

GENE FAMILIES

Many genes can be clustered in groups of dif-

ferent sizes based on sequence similarity. The

similarity between two genes varies from

genes coding identical products to genes in

which product similarity is barely detectable

and/or limited to short sequence stretches

called sequence motives. Genes families arose

during the evolution by gene duplications

over the different periods of time as reflected

in sequence similarity. In general, more simi-

lar genes shared a common ancestor later (in

nearer past) than genes with a weaker similar-

ity, although gene conversion can result in

very similar or identical gene copies regard-

less of gene duplication time. Gene duplica-

tion can occur by different mechanisms, like

unequal recombination or retroposition. Not

all duplicated genes remain active, some of

them end up in genomic oblivion and are

called pseudogenes. Some of the pseudogenes

can be rescued from the genomic death by cap-

turing a promoter and regulatory elements in

the course of evolution as happened with

�-globin gene which was rescued by an Alu el-

ement after 200 mln years of silent existence

(see discussion in Makalowski, 1995).

The histone gene family is an example of

very similar genes. It consists of five genes

that tend to be linked, although in differing ar-

rays of variable copy numbers dispersed in

the human genome. The individual genes of a

particular histone family encode essentially

identical products (i.e. all H4 genes code for

the identical H4 protein). Analysis of individ-

ual human genomic clones has identified iso-

lated histone genes, e.g. H4, clusters of two or

more histone genes, or clusters of all histone

genes, e.g. H3-H4-H1-H3-H2A-H2B (Hentschel

& Birnstiel, 1981). A majority of histone genes

form a large cluster on human chromosome 6

(6p21.3) and a small cluster at 1q21. Interest-

ingly, histone genes lack introns; a rare fea-

ture for eukaryotic genes.

Genes that encode ribosomal RNA (rRNA)

total about 0.4% of the DNA in the human ge-

nome. The individual genes of a particular

rRNA family are essentially identical. The

28S, 5.8S and 18S rRNA genes are clustered

with spacer units in tandem arrays of approxi-

mately 60 copies each yielding about 2 million

bp of DNA. These clusters are present on the

short arms of five acrocentric chromosomes

and form the nucleolar organizing regions,

hence approximately 300 copies. These three

rRNA genes are transcribed as a single unit

and then cleaved. 5S rRNA genes are clus-

tered on chromosome 1q.

Some genes in the human genome share

highly conserved amino-acid domains with

weak overall similarity. These often have de-

velopmental function. There are nine dis-

persed paired box (Pax) genes that contain

highly conserved DNA binding domains with

six �-helices. The homeobox or Hox genes

share a common 60 amino-acid sequence. In

humans there are four Hox gene clusters,

each on a different chromosome. However,

the individual genes in the cluster demon-

strate greater similarity to a counterpart gene

in another cluster than to the other genes in

the same cluster.

There are pseudogenes that are the result of

retroposition (retropseudogenes). The

pseudogenes lack introns and the flanking

Vol. 48 The human genome 593

Table 3. Gene density in different isochores

Genomic core �Empty� space

Isochore type H2 and H3 L, H1

Genome fraction 12% 88%

Gene fraction 54% 46%

Gene density 1/10 kbp 1/100 kbp

DNA sequences of the functional locus and

therefore are not products of gene duplica-

tion. The generation of these types of ele-

ments is dependent on the reverse transcrip-

tase of other retroelements such as LINEs.

REPETITIVE SEQUENCES

The human genome is occupied by stretches

of DNA sequences of various length that exist

in variable copy number. These repetitive se-

quences may be in a tandem orientation or

they may be dispersed throughout the ge-

nome. Repetitive sequences may be classified

by function, dispersal patterns, and sequence

relatedness. Satellite DNA typically refers to

highly repetitive sequences with no known

function and interspersed repeat sequences

are typically the products of transposable ele-

ment integration, including retrogenes and

retropseudogenes of a functional gene. For

the up-to-date list of human repetitive ele-

ments visit the RepBase at http://www.

girinst.org/.

GENOMIC DUPLICATIONS

Thirty years ago, Suzumu Ohno put forth a

hypothesis about two duplications of the

whole genome in the early stages of vertebrate

evolution (Ohno, 1970). According to his hy-

pothesis, most vertebrate gene families

should give three or four well-defined

branches, as presented in Fig. 4. Unfortu-

nately, analysis of over 10000 vertebrate gene

families does not support Ohno�s hypothesis

(Makalowski, unpublished observation). Nev-

ertheless, duplications in human genome do

exist and they play a significant role in genes

and the genome evolution. Although some-

times very large, they appear to be on a local,

not a global scale. For example, the compari-

son of the complete human chromosome 21

sequence with both itself and other human se-

quences revealed many large duplications

with the largest intra-chromosomal duplica-

tion being 189 kb (position 188–377 and

14795–15002 in q arm) and the largest de-

tected inter-chromosomal duplication of over

100 kb region from q arm of chromosome 21

(position 646–751) duplicated in chromosome

22 (position 45–230) (RIKEN, 2000).

MICROSATELLITES, MINISATELLITES,

AND MACROSATELLITES

Microsatellites are small arrays of short sim-

ple tandem repeats, primarily 4 bp or less. Dif-

594 W. Makalowski 2001

Figure 4. A hypothetical phylogenetical tree of

vertebrate gene family under Ohno¢s hypothesis

about two genomeduplications in early vertebrate

evolution.

Drosophila gene represents an outgoup and four clus-

ters of a gene family are encircled. An asterisk (*)

marks first genome duplication and a hash sign (#)

marks points of second genome duplication. Different

branch lengths suggest different evolutionary rates af-

ter ancestral gene duplication.

ferent arrays are found dispersed throughout

the genome, although dinucleotide CA/TG re-

peats are most common, yielding 0.5% of the

genome. Runs of As and Ts are common as

well. Microsatellites have no known functions.

However, CA/TG dinucleotide pairs can form

the Z-DNA conformation in vitro, which may

indicate some function. Repeat unit copy

number variation of microsatellites appar-

ently occurs by replication slippage. The ex-

pansion of trinucleotide repeats within genes

has been associated with genetic disorders

such as Huntington disease or fragile-X syn-

drome.

Minisatellites are tandemly repeated se-

quences of DNA of lengths ranging from 1 kbp

to 15 kbp. For example, telomeric DNA se-

quences contain 10–15 kb of hexanucleotide

repeats, most commonly TTAGGG in the hu-

man genome, at the termini of the chromo-

somes. These sequences are added by telo-

merase to ensure complete replication of the

chromosome.

Macrosatellites are very long arrays, up to

hundreds of kilobases, of tandemly repeated

DNA. There are three satellite bands observed

by buoyant density centrifugation. However,

not all satellite sequences are resolved by den-

sity gradient centrifugation, e.g. alpha satel-

lite DNA or alphoid DNA that constitute the

bulk of centromeric heterochromatin on all

chromosomes. The interchromosomal diver-

gence of the alpha satellite families allows the

different chromosomes to be distinguished by

fluorescence in situ hybridization (FISH).

TRANSPOSABLE ELEMENTS

The human genome contains interspersed

repeat sequences that have largely amplified

in copy number by movement throughout the

genome. Those sequences (transposable ele-

ments or TEs) can be divided into two classes

based on the mode of transposition (Fin-

negan, 1989). The Class I elements are TEs

which transpose by replication that involves

an RNA intermediate which is reverse tran-

scribed back to DNA prior to reinsertion.

These are called retroelements and include

LTR transposons, which are structurally simi-

lar to integrated retroviruses, non-LTR ele-

ments (LINEs and SINEs), and retrogenes

(see Fig. 5). Class II elements move by a con-

servative cut-and-paste mechanisms, the exci-

sion of the donor element is followed by its re-

insertion elsewhere in the genome. Integra-

tion of Class I and Class II transposable ele-

ments results in the duplication of a short se-

quence of DNA, the target site. There are

about 500 families of such transposons. Most

of transposition has occurred via an RNA in-

termediate, yielding classes of sequences re-

ferred to as retroelements (more than 400

families, e.g. Alu, L1, retrogenes, MIR). How-

ever, there is also evidence of an ancient

DNA-mediated transposition (more than 60

families of class II (DNA) transposons, e.g.

THE-1, Charlie, Tigger, mariner).

RETROELEMENTS

Short interspersed repetitive elements

(SINEs) and long interspersed repetitive ele-

ments (LINEs) are the two most abundant

classes of repeats in human, and represent

the two major classes of mammalian retro-

transposons. Structural features shared by

LINEs and SINEs include an A-rich 3� end

and the lack of long terminal repeats (LTRs);

these features distinguish them from retro-

viruses and related retroelements.

A full-length LINE (or L1 element) is approx-

imately 6.1 kbp although most are truncated

pseudogenes with various 5� ends due to in-

complete reverse transcription. There are

about 100000 copies of L1 sequences in our

genome. Approximately 1% of the estimated

3500 full-length LINEs have functional RNA

polymerase II promoter sequences along with

two intact open reading frames necessary to

generate new L1 copies. Individual LINEs

contain a poly-A tail and are flanked by direct

Vol. 48 The human genome 595

repeats. LINE mobilization activity has been

verified in both germinal and somatic tissues.

The Alu element is estimated at 500000–

900000 copies in the human genome repre-

senting the primary SINE family, the most

successful transposon in any genome. Se-

quence comparisons suggest that Alu repeats

were derived from the 7SL RNA gene. Each

Alu element is about 280 bp with a dimeric

structure, contains RNA polymerase III pro-

moter sequences, and typically has an A-rich

tail and flanking direct repeats (generated

during integration). Although Alu elements

are present in all primate genomes, more than

2000 Alu elements have integrated within the

human genome subsequent to the divergence

of humans from the great apes.

The human genome also contains families of

retroviral-related sequences. These are char-

acterized by sequences encoding enzymes for

retroposition and contain LTRs. In addition,

solitary LTRs of these elements may be lo-

cated throughout the genome. There are sev-

eral low abundant (10–1000 copies) human

endogenous retrovirus (HERV) families, with

individual elements ranging from 6 to 10 kb,

collectively encompassing about 1% of the ge-

nome.

CLASS II ELEMENTS

Class II elements contain inverted repeats

(10–500 bp) at their termini and encode a

transposase that catalyses transposition.

They move by excision at the donor site and

reinsertion elsewhere in the genome by a

non-replicative mechanism. The human ge-

nome hosts a number of repeated sequences

originated in more than 60 different DNA

transposons.

The mariner ‘fossils’ present in our genome

closely resemble members of three sub-

families identified in insects, adding to the al-

ready extensive evidence that horizontal

transfer between genomes has been impor-

596 W. Makalowski 2001

Figure 5. The structure of different human transposable elements.

Open arrows denote duplicated target sites and closed arrows denote long terminal repeats (LTRs). The following

abbreviations are used: CP, capside; NC, nucleocapsid; Pr, proteinase; RT, reverse transcriptase; Int, integrase;

ORF, open reading frame, and A and B denote polymerase III internal promoter.

tant in genomic evolution. Other human DNA

transposon remains also show high similarity

to sequences in distantly related organisms.

Nevertheless, the level of sequence divergence

suggests that activity of all identified ele-

ments predates human evolution.

CONCLUSIONS

The 3.2 billion bp of our genetic blueprint is

packed into 23 pairs of chromosomes, or 46

DNA molecules. Only a fraction of the genome

is occupied by protein-coding exons and the

majority of non-exonic sequences consists of

repetitive elements. Functional exons contrib-

ute merely 2% of a genome, up to 50% of a ge-

nome is occupied by repetitive element, the re-

maining 48% is called unique DNA, most of

which probably originated in mobile elements

diverged over time beyond recognition. Differ-

ent evolutionary forces shape the human ge-

nome composition and structure. It appears

that different mobile elements play a signifi-

cant role in this process (reviewed recently in

Makalowski, 2000). The human genome is a

dynamic entity, new functional elements ap-

pear and old ones become extinct as genes

that evolve according to birth and death rule

(Ota & Nei, 1994) similarly to species evolu-

tion. This confirms that the theory of evolu-

tion is truly universal and applies not only to

all organisms but to all levels of life as well.

I would like to thank Izabela Makalowska for

sharing unpublished data and Jakub Maka-

lowski for preparing Fig. 5.

R E F E R E N C E S

Antequera, F. & Bird, A. (1994) Predicting the to-

tal number of human genes.Nat. Genet. 8, 114.

Aristotle (1965)De generatione animalium. Oxonii,

E Typographeo Clarendoniano.

Bernardi, G. (2000) Isochores and the evolution-

ary genomics of vertebrates.Gene241, 3�17.

Burge, C. & Karlin, S. (1997) Prediction of com-

plete gene structures in human genomic DNA.

J. Mol. Biol. 268, 78�94.

Craig, J.M. & Bickmore, W.A. (1994) The distribu-

tion of CpG islands in mammalian chromo-

somes. Nat. Genet. 7, 376�382.

Dunham, I., Shimizu, N. et al. (1999) The DNA se-

quence of human chromosome 22. Nature

402, 489�495.

Ewing, B. & Green, P. (2000) Analysis of ex-

pressed sequence tags indicates 35,000 hu-

man genes. Nat. Genet. 25, 232�234.

Fields, C., Adams, M.D., White, O. & Venter, C.O.

(1994) How many genes in the human ge-

nome? Nat. Genet. 7, 345�346.

Finnegan, D.J. (1989) Eukaryotic transposable el-

ements and genome evolution. Trends Genet.

5, 103�107.

Gilbert, W. (1978) Why genes in pieces? Nature

271, 501.

Hattori, M. et al. (2000) The DNA sequence of hu-

man chromosome 21. The chromosome 21

mapping and sequencing consortium (see

comments). Nature 405, 311�319.

Hentschel, C.C. & Birnstiel, M.L. (1981) The orga-

nization and expression of histone gene fami-

lies. Cell 25, 301�313.

Makalowska, I. et al. (2001) GeneMachine: A tool

for seqence analysis and annotation. submit-

ted.

Makalowski, W. (1995) SINEs as a genomic scrap

yard: An essay on genomic evolution; in The

Impact of Short Interspersed Elements (SINEs)

on the Host Genome (Maraia, R.J. & Austin,

R.G., eds.) pp. 81�104, Landes Company.

Makalowski, W. (2000) Genomic scrap yard: How

genomes utilize all that junk. Gene 259,

61�67.

Makalowski, W. & Boguski, M.S. (1998) Evolution-

ary parameters of the transcribedmammalian

genome: An analysis of 2,820 orthologous ro-

dent and human sequences. Proc. Natl. Acad.

Sci. U.S.A. 95, 9407�9412.

Vol. 48 The human genome 597

Makalowski,W., Zhang, J. & Boguski, M.S. (1996)

Comparative analysis of 1196 orthologous

mouse and human full-length mRNA and pro-

tein sequences. Genome Res. 6, 846�857.

Maxam, A.M. & Gilbert, W. (1977) A new method

for sequencing DNA. Proc. Natl. Acad. Sci.

U.S.A. 74, 560�564.

Morgan, T.H. (1910) Chromosomes and heredity.

Amer. Nat. 44, 449�496.

Ohno, S. (1970) Evolution by Gene Duplication.

Springer Verlag, New York.

Ota, T. & Nei, M. (1994) Divergent evolution and

evolution by the birth-and-death process in the

immunoglobulin VH gene family. Mol. Biol.

Evol. 11, 469�482.

RIKEN (2000) http://hgp.gsc.riken.go.jp/ chr21/

annotation.htm

Roest Crollius, H. et al. (2000) Estimate of human

gene number provided by genome-wide analy-

sis using Tetraodon nigroviridis DNA se-

quence. Nat. Genet. 25, 235�238.

Sanger, F., Nicklen, S. & Coulson, A.R. (1977)

DNA sequencing with chain-terminating in-

hibitors. Proc. Natl. Acad. Sci. U.S.A. 74,

5463�5467.

Watson, J.D. & Crick, F.H.C. (1953a) Genetical im-

plications of the structure of deoxyribonucleic

acid. Nature 171, 964�967.

Watson, J.D. & Crick, F.H.C. (1953b) Molecular

structure of nucleic acids: A structure for

deoxyribose nucleic acid. Nature 171,

737�738.

Wong, G.K., Passey, D.A., Huang, Y., Yang, Z. &

Yu, J. (2000) Is �Junk� DNA mostly intron

DNA? Genome Res. 10, 1672�1678.

Yang, C.Z. et al. (2001) Gene Punctuation. Sub-

mitted.

Zhang, M.Q. (1998) Statistical features of human

exons and their flanking regions. Hum. Mol.

Genet. 7, 919�932.

598 W. Makalowski 2001

The third and final part of this book explores the bio-chemical mechanisms underlying the apparently con-

tradictory requirements for both genetic continuity andthe evolution of living organisms. What is the molecularnature of genetic material? How is genetic informationtransmitted from one generation to the next with highfidelity? How do the rare changes in genetic materialthat are the raw material of evolution arise? How is ge-netic information ultimately expressed in the amino acidsequences of the astonishing variety of protein mole-cules in a living cell?

The fundamental unit of information in living sys-tems is the gene. A gene can be defined biochemicallyas a segment of DNA (or, in a few cases, RNA) that en-codes the information required to produce a functionalbiological product. The final product is usually a pro-tein, so much of the material in Part III concerns genesthat encode proteins. A functional gene product mightalso be one of several classes of RNA molecules. Thestorage, maintenance, and metabolism of these infor-mational units form the focal points of our discussion inPart III.

Modern biochemical research on gene structure andfunction has brought to biology a revolution compara-ble to that stimulated by the publication of Darwin’s the-ory on the origin of species nearly 150 years ago. An un-derstanding of how information is stored and used in

cells has brought penetrating new insights to some ofthe most fundamental questions about cellular structureand function. A comprehensive conceptual frameworkfor biochemistry is now unfolding.

Today’s understanding of information pathways hasarisen from the convergence of genetics, physics, andchemistry in modern biochemistry. This was epitomizedby the discovery of the double-helical structure of DNA,postulated by James Watson and Francis Crick in 1953(see Fig. 8–15). Genetic theory contributed the conceptof coding by genes. Physics permitted the determina-tion of molecular structure by x-ray diffraction analysis.Chemistry revealed the composition of DNA. The pro-found impact of the Watson-Crick hypothesis arose fromits ability to account for a wide range of observationsderived from studies in these diverse disciplines.

This revolution in our understanding of the struc-ture of DNA inevitably stimulated questions about itsfunction. The double-helical structure itself clearly sug-gested how DNA might be copied so that the informa-tion it contains can be transmitted from one generationto the next. Clarification of how the information in DNAis converted into functional proteins came with the dis-covery of both messenger RNA and transfer RNA andwith the deciphering of the genetic code.

These and other major advances gave rise to thecentral dogma of molecular biology, comprising thethree major processes in the cellular utilization of ge-netic information. The first is replication, the copyingof parental DNA to form daughter DNA molecules withidentical nucleotide sequences. The second is tran-

scription, the process by which parts of the geneticmessage encoded in DNA are copied precisely into RNA.The third is translation, whereby the genetic messageencoded in messenger RNA is translated on the ribo-somes into a polypeptide with a particular sequence ofamino acids.

PART

INFORMATION PATHWAYS

III24 Genes and Chromosomes 923

25 DNA Metabolism 948

26 RNA Metabolism 995

27 Protein Metabolism 1034

28 Regulation of Gene Expression 1081

921

8885d_c24_920-947 2/11/04 1:36 PM Page 921 mac76 mac76:385_reb:

Part III explores these and related processes. InChapter 24 we examine the structure, topology, andpackaging of chromosomes and genes. The processesunderlying the central dogma are elaborated in Chap-ters 25 through 27. Finally, we turn to regulation, ex-amining how the expression of genetic information iscontrolled (Chapter 28).

A major theme running through these chapters isthe added complexity inherent in the biosynthesis ofmacromolecules that contain information. Assemblingnucleic acids and proteins with particular sequences ofnucleotides and amino acids represents nothing lessthan preserving the faithful expression of the template

upon which life itself is based. We might expect the for-mation of phosphodiester bonds in DNA or peptidebonds in proteins to be a trivial feat for cells, given thearsenal of enzymatic and chemical tools described inPart II. However, the framework of patterns and rulesestablished in our examination of metabolic pathwaysthus far must be enlarged considerably to take intoaccount molecular information. Bonds must be formedbetween particular subunits in informational biopoly-mers, avoiding either the occurrence or the persistenceof sequence errors. This has an enormous impact on thethermodynamics, chemistry, and enzymology of thebiosynthetic processes. Formation of a peptide bond re-quires an energy input of only about 21 kJ/mol of bondsand can be catalyzed by relatively simple enzymes. Butto synthesize a bond between two specific amino acidsat a particular point in a polypeptide, the cell investsabout 125 kJ/mol while making use of more than 200enzymes, RNA molecules, and specialized proteins. Thechemistry involved in peptide bond formation does notchange because of this requirement, but additionalprocesses are layered over the basic reaction to ensurethat the peptide bond is formed between particularamino acids. Information is expensive.

The dynamic interaction between nucleic acids andproteins is another central theme of Part III. With theimportant exception of a few catalytic RNA molecules(discussed in Chapters 26 and 27), the processes thatmake up the pathways of cellular information flow arecatalyzed and regulated by proteins. An understandingof these enzymes and other proteins can have practicalas well as intellectual rewards, because they form thebasis of recombinant DNA technology (introduced inChapter 9).

Part III Information Pathways922

The central dogma of molecular biology, showing the general path-ways of information flow via replication, transcription, and transla-tion. The term “dogma” is a misnomer. Introduced by Francis Crick ata time when little evidence supported these ideas, the dogma has be-come a well-established principle.

RNA

Protein

Transcription

Translation

DNAReplication

8885d_c24_922 2/11/04 3:11 PM Page 922 mac76 mac76:385_reb:

chapter

A lmost every cell of a multicellular organism containsthe same complement of genetic material—its

genome. Just look at any human individual for a hintof the wealth of information contained in each humancell. Chromosomes, the nucleic acid molecules that arethe repository of an organism’s genetic information, arethe largest molecules in a cell and may contain thou-sands of genes as well as considerable tracts of inter-genic DNA. The 16 chromosomes in the relatively smallgenome of the yeast Saccharomyces cerevisiae havemolecular masses ranging from 1.5 � 108 to 1 � 109 dal-tons, corresponding to DNA molecules with 230,000 to1,532,000 contiguous base pairs (bp). Human chromo-somes range up to 279 million bp.

The very size of DNA molecules presents an inter-esting biological puzzle, given that they are generallymuch longer than the cells or viral packages that con-

tain them (Fig. 24–1). In this chapter we shift our focusfrom the secondary structure of DNA, considered inChapter 8, to the extraordinary degree of organizationrequired for the tertiary packaging of DNA into chromo-somes. We first examine the elements within viral andcellular chromosomes, then assess their size and organi-zation. We next consider DNA topology, providing a

GENES AND CHROMOSOMES24.1 Chromosomal Elements 924

24.2 DNA Supercoiling 930

24.3 The Structure of Chromosomes 938

DNA topoisomerases are the magicians of the DNA world.By allowing DNA strands or double helices to passthrough each other, they can solve all of the topologicalproblems of DNA in replication, transcription and othercellular transactions.

—James Wang, article in Nature Reviews in Molecular Cell Biology, 2002

Supercoiling, in fact, does more for DNA than act as anexecutive enhancer; it keeps the unruly, spreading DNAinside the cramped confines that the cell has providedfor it.

—Nicholas Cozzarelli, Harvey Lectures, 1993

24

923

0.5 m�

FIGURE 24–1 Bacteriophage T2 protein coat surrounded by its sin-gle, linear molecule of DNA. The DNA was released by lysing thebacteriophage particle in distilled water and allowing the DNA tospread on the water surface. An undamaged T2 bacteriophage parti-cle consists of a head structure that tapers to a tail by which the bac-teriophage attaches itself to the outer surface of a bacterial cell. Allthe DNA shown in this electron micrograph is normally packaged in-side the phage head.

8885d_c24_920-947 2/11/04 1:36 PM Page 923 mac76 mac76:385_reb:

description of the coiling of DNA molecules. Finally, wediscuss the protein-DNA interactions that organizechromosomes into compact structures.

24.1 Chromosomal ElementsCellular DNA contains genes and intergenic regions,both of which may serve functions vital to the cell. Themore complex genomes, such as those of eukaryoticcells, demand increased levels of chromosomal organi-zation, and this is reflected in the chromosome’s struc-tural features. We begin by considering the differenttypes of DNA sequences and structural elements withinchromosomes.

Genes Are Segments of DNA That Code for Polypeptide Chains and RNAs

Our understanding of genes has evolved tremendouslyover the last century. Classically, a gene was defined asa portion of a chromosome that determines or affects asingle character or phenotype (visible property), suchas eye color. George Beadle and Edward Tatum proposeda molecular definition of a gene in 1940. After exposingspores of the fungus Neurospora crassa to x rays andother agents known to damage DNA and cause alterationsin DNA sequence (mutations), they detected mutantfungal strains that lacked one or another specific en-zyme, sometimes resulting in the failure of an entiremetabolic pathway. Beadle and Tatum concluded that agene is a segment of genetic material that determinesor codes for one enzyme: the one gene–one enzyme

hypothesis. Later this concept was broadened to one

gene–one polypeptide, because many genes code forproteins that are not enzymes or for one polypeptide ofa multisubunit protein.

The modern biochemical definition of a gene is evenmore precise. A gene is all the DNA that encodes theprimary sequence of some final gene product, which canbe either a polypeptide or an RNA with a structural or

catalytic function. DNA also contains other segments orsequences that have a purely regulatory function. Reg-

ulatory sequences provide signals that may denote thebeginning or the end of genes, or influence the tran-scription of genes, or function as initiation points forreplication or recombination (Chapter 28). Some genescan be expressed in different ways to generate multiplegene products from one segment of DNA. The specialtranscriptional and translational mechanisms that allowthis are described in Chapters 26 through 28.

We can make direct estimations of the minimumoverall size of genes that encode proteins. As describedin detail in Chapter 27, each amino acid of a polypep-tide chain is coded for by a sequence of three consec-utive nucleotides in a single strand of DNA (Fig. 24–2),with these “codons” arranged in a sequence that corre-sponds to the sequence of amino acids in the polypep-tide that the gene encodes. A polypeptide chain of 350amino acid residues (an average-size chain) corre-

Chapter 24 Genes and Chromosomes924

George W. Beadle,1903–1989

Edward L. Tatum,1909–1975

UCU

AGA

CGU

GCA

GGA

CCT

UAC

ATG

ACU

TGA

UUU

AAA

GCC

CGG

GUU

CAA

5� 3�

3� 5�

DNA mRNA

TCT

CGTGGATACACTTTTGCCGTT

3�

5�

Arg

Gly

Tyr

Thr

Phe

Ala

Val

SerCarboxylterminus

Aminoterminus

Polypeptide

Template strand

FIGURE 24–2 Colinearity of the coding nucleotide sequences ofDNA and mRNA and the amino acid sequence of a polypeptide chain.The triplets of nucleotide units in DNA determine the amino acids ina protein through the intermediary mRNA. One of the DNA strandsserves as a template for synthesis of mRNA, which has nucleotidetriplets (codons) complementary to those of the DNA. In some bacte-rial and many eukaryotic genes, coding sequences are interrupted atintervals by regions of noncoding sequences (called introns).

8885d_c24_920-947 2/11/04 1:36 PM Page 924 mac76 mac76:385_reb:

sponds to 1,050 bp. Many genes in eukaryotes and a fewin prokaryotes are interrupted by noncoding DNA seg-ments and are therefore considerably longer than thissimple calculation would suggest.

How many genes are in a single chromosome? TheEscherichia coli chromosome, one of the prokaryoticgenomes that has been completely sequenced, is a cir-cular DNA molecule (in the sense of an endless looprather than a perfect circle) with 4,639,221 bp. Thesebase pairs encode about 4,300 genes for proteins andanother 115 genes for stable RNA molecules. Among eu-karyotes, the approximately 3.2 billion base pairs of thehuman genome include 30,000 to 35,000 genes on 24different chromosomes.

DNA Molecules Are Much Longer Than the CellularPackages That Contain Them

Chromosomal DNAs are often many orders of magni-tude longer than the cells or viruses in which they arefound (Fig. 24–1; Table 24–1). This is true of every classof organism or parasite.

Viruses Viruses are not free-living organisms; rather,they are infectious parasites that use the resources of ahost cell to carry out many of the processes they re-quire to propagate. Many viral particles consist of nomore than a genome (usually a single RNA or DNA mol-ecule) surrounded by a protein coat.

Almost all plant viruses and some bacterial and an-imal viruses have RNA genomes. These genomes tendto be particularly small. For example, the genomes ofmammalian retroviruses such as HIV are about 9,000 nu-cleotides long, and that of the bacteriophage Q� has4,220 nucleotides. Both types of viruses have single-stranded RNA genomes.

The genomes of DNA viruses vary greatly in size(Table 24–1). Many viral DNAs are circular for at leastpart of their life cycle. During viral replication within ahost cell, specific types of viral DNA called replicative

forms may appear; for example, many linear DNAs be-come circular and all single-stranded DNAs become

double-stranded. A typical medium-sized DNA virus isbacteriophage � (lambda), which infects E. coli. In itsreplicative form inside cells, � DNA is a circular doublehelix. This double-stranded DNA contains 48,502 bp andhas a contour length of 17.5 �m. Bacteriophage �X174is a much smaller DNA virus; the DNA in the viral par-ticle is a single-stranded circle, and the double-strandedreplicative form contains 5,386 bp. Although viralgenomes are small, the contour lengths of their DNAsare much greater than the long dimensions of the viralparticles that contain them. The DNA of bacteriophageT4, for example, is about 290 times longer than the vi-ral particle itself (Table 24–1).

Bacteria A single E. coli cell contains almost 100 timesas much DNA as a bacteriophage � particle. The chro-mosome of an E. coli cell is a single double-strandedcircular DNA molecule. Its 4,639,221 bp have a contourlength of about 1.7 mm, some 850 times the length ofthe E. coli cell (Fig. 24–3). In addition to the very large,circular DNA chromosome in their nucleoid, many bac-teria contain one or more small circular DNA moleculesthat are free in the cytosol. These extrachromosomalelements are called plasmids (Fig. 24–4; see alsop. 311). Most plasmids are only a few thousand basepairs long, but some contain more than 10,000 bp. Theycarry genetic information and undergo replication toyield daughter plasmids, which pass into the daughtercells at cell division. Plasmids have been found in yeastand other fungi as well as in bacteria.

In many cases plasmids confer no obvious advan-tage on their host, and their sole function appears to beself-propagation. However, some plasmids carry genesthat are useful to the host bacterium. For example,some plasmid genes make a host bacterium resistant to antibacterial agents. Plasmids carrying the gene forthe enzyme �-lactamase confer resistance to �-lactam antibiotics such as penicillin and amoxicillin (see Box20–1). These and similar plasmids may pass from an antibiotic-resistant cell to an antibiotic-sensitive cell of thesame or another bacterial species, making the recipientcell antibiotic resistant. The extensive use of antibiotics

24.1 Chromosomal Elements 925

TABLE 24–1 The Sizes of DNA and Viral Particles for Some Bacterial Viruses (Bacteriophages)

Size of viral Length of Long dimension of Virus DNA (bp) viral DNA (nm) viral particle (nm)

�X174 5,386 1,939 25T7 39,936 14,377 78� (lambda) 48,502 17,460 190T4 168,889 60,800 210

Note: Data on size of DNA are for the replicative form (double-stranded). The contour length is calculated assuming that each base pair occupies a length of 3.4 Å (see Fig. 8–15).

8885d_c24_925 2/12/04 11:21 AM Page 925 mac76 mac76:385_reb:

E. coli

E. coliDNA

mosomes (Fig. 24–5). Each chromosome of a eukary-otic cell, such as that shown in Figure 24–5a, containsa single, very large, duplex DNA molecule. The DNAmolecules in the 24 different types of human chromo-somes (22 matching pairs plus the X and Y sex chro-mosomes) vary in length over a 25-fold range. Each typeof chromosome in eukaryotes carries a characteristic setof genes. Interestingly, the number of genes does notvary nearly as much as does genome size (see Chapter9 for a discussion of the types of sequences, besidesgenes, that contribute to genome size).

The DNA of one human genome (22 chromosomesplus X and Y or two X chromosomes), placed end toend, would extend for about a meter. Most human cellsare diploid and each cell contains a total of 2 m of DNA.An adult human body contains approximately 1014 cellsand thus a total DNA length of 2 � 1011 km. Comparethis with the circumference of the earth (4 � 104 km)or the distance between the earth and the sun(1.5 � 108 km)—a dramatic illustration of the extraor-dinary degree of DNA compaction in our cells.

in some human populations has served as a strong selective force, encouraging the spread of antibiotic resistance–coding plasmids (as well as transposable el-ements, described below, that harbor similar genes) indisease-causing bacteria and creating bacterial strainsthat are resistant to several antibiotics. Physicians arebecoming increasingly reluctant to prescribe antibioticsunless a clear clinical need is confirmed. For similar rea-sons, the widespread use of antibiotics in animal feedsis being curbed.

Eukaryotes A yeast cell, one of the simplest eukary-otes, has 2.6 times more DNA in its genome than an E.

coli cell (Table 24–2). Cells of Drosophila, the fruit flyused in classical genetic studies, contain more than 35times as much DNA as E. coli cells, and human cellshave almost 700 times as much. The cells of many plantsand amphibians contain even more. The genetic materialof eukaryotic cells is apportioned into chromosomes, thediploid (2n) number depending on the species (Table24–2). A human somatic cell, for example, has 46 chro-

Chapter 24 Genes and Chromosomes926

FIGURE 24–3 The length of the E. coli chromosome (1.7 mm) depicted inlinear form relative to the length of a typical E. coli cell (2 �m).

FIGURE 24–4 DNA from a lysed E. coli cell. In this electron micrograph several small, circu-lar plasmid DNAs are indicated by white arrows. The black spots and white specks are artifactsof the preparation.

8885d_c24_920-947 2/11/04 1:36 PM Page 926 mac76 mac76:385_reb:

Eukaryotic cells also have organelles, mitochondria(Fig. 24–6) and chloroplasts, that contain DNA. Mito-chondrial DNA (mtDNA) molecules are much smallerthan the nuclear chromosomes. In animal cells, mtDNAcontains fewer than 20,000 bp (16,569 bp in humanmtDNA) and is a circular duplex. Each mitochondriontypically has two to ten copies of this mtDNA molecule,and the number can rise to hundreds in certain cellswhen an embryo is undergoing cell differentiation. In afew organisms (trypanosomes, for example) each mito-chondrion contains thousands of copies of mtDNA, or-ganized into a complex and interlinked matrix known asa kinetoplast. Plant cell mtDNA ranges in size from200,000 to 2,500,000 bp. Chloroplast DNA (cpDNA) alsoexists as circular duplexes and ranges in size from120,000 to 160,000 bp. The evolutionary origin of mito-chondrial and chloroplast DNAs has been the subject ofmuch speculation. A widely accepted view is that theyare vestiges of the chromosomes of ancient bacteria thatgained access to the cytoplasm of host cells and becamethe precursors of these organelles (see Fig. 1–36).

24.1 Chromosomal Elements 927

(a)

(b)

FIGURE 24–6 A dividing mitochondrion. Some mitochondrialproteins and RNAs are encoded by one of the copies of the mito-chondrial DNA (none of which are visible here). The DNA (mtDNA)is replicated each time the mitochondrion divides, before cell division.

FIGURE 24–5 Eukaryotic chromosomes. (a) A pair of linked and condensedsister chromatids from a human chromosome. Eukaryotic chromosomes arein this state after replication and at metaphase during mitosis. (b) A completeset of chromosomes from a leukocyte from one of the authors. There are 46chromosomes in every normal human somatic cell.

8885d_c24_920-947 2/11/04 1:36 PM Page 927 mac76 mac76:385_reb:

Mitochondrial DNA codes for the mitochondrial tRNAsand rRNAs and for a few mitochondrial proteins. Morethan 95% of mitochondrial proteins are encoded by nu-clear DNA. Mitochondria and chloroplasts divide whenthe cell divides. Their DNA is replicated before and dur-ing division, and the daughter DNA molecules pass intothe daughter organelles.

Eukaryotic Genes and Chromosomes Are Very Complex

Many bacterial species have only one chromosome percell and, in nearly all cases, each chromosome containsonly one copy of each gene. A very few genes, such asthose for rRNAs, are repeated several times. Genes andregulatory sequences account for almost all the DNA inprokaryotes. Moreover, almost every gene is preciselycolinear with the amino acid sequence (or RNA se-quence) for which it codes (Fig. 24–2).

The organization of genes in eukaryotic DNA isstructurally and functionally much more complex. Thestudy of eukaryotic chromosome structure, and morerecently the sequencing of entire eukaryotic genomes,has yielded many surprises. Many, if not most, eukary-otic genes have a distinctive and puzzling structural feature: their nucleotide sequences contain one or moreintervening segments of DNA that do not code for theamino acid sequence of the polypeptide product. Thesenontranslated inserts interrupt the otherwise colinearrelationship between the nucleotide sequence of thegene and the amino acid sequence of the polypeptide itencodes. Such nontranslated DNA segments in genesare called intervening sequences or introns, and thecoding segments are called exons. Few prokaryoticgenes contain introns.

In higher eukaryotes, the typical gene has muchmore intron sequence than sequences devoted to ex-ons. For example, in the gene coding for the singlepolypeptide chain of the avian egg protein ovalbumin(Fig. 24–7), the introns are much longer than the ex-ons; altogether, seven introns make up 85% of the gene’sDNA. In the gene for the � subunit of hemoglobin, a sin-gle intron contains more than half of the gene’s DNA.The gene for the muscle protein titin is the intron cham-pion, with 178 introns. Genes for histones appear to haveno introns. In most cases the function of introns is notclear. In total, only about 1.5% of human DNA is “cod-ing” or exon DNA, carrying information for protein orRNA products. However, when the much larger intronsare included in the count, as much as 30% of the hu-man genome consists of genes.

The relative paucity of genes in the human genomeleaves a lot of DNA unaccounted for. Figure 24–8provides a summary of sequence types. Much of the nongene DNA is in the form of repeated sequences of several kinds. Perhaps most surprising, about half thehuman genome is made up of moderately repeated se-quences that are derived from transposable elements—segments of DNA, ranging from a few hundred to sev-eral thousand base pairs long, that can move from onelocation to another in the genome. Transposable ele-ments (transposons) are a kind of molecular parasite,efficiently making a home within the host genome. Manyhave genes encoding proteins that catalyze the trans-position process, described in more detail in Chapters25 and 26. Some transposons in the human genome areactive, moving at a low frequency, but most are inactiverelics, evolutionarily altered by mutations. Althoughthese elements generally do not encode proteins orRNAs that are used in human cells, they have played a

Chapter 24 Genes and Chromosomes928

TABLE 24–2 DNA, Gene, and Chromosome Content in Some Genomes

Total DNA (bp) Number of Approximate chromosomes* number of genes

Bacterium (Escherichia coli) 4,639,221 1 4,405Yeast (Saccharomyces cerevisiae) 12,068,000 16† 6,200Nematode (Caenorhabditis elegans) 97,000,000 12‡ 19,000Plant (Arabidopsis thaliana) 125,000,000 10 25,500Fruit fly (Drosophila melanogaster) 180,000,000 18 13,600Plant (Oryza sativa; rice) 480,000,000 24 57,000Mouse (Mus musculus) 2,500,000,000 40 30,000–35,000Human (Homo sapiens) 3,200,000,000 46 30,000–35,000

Note: This information is constantly being refined. For the most current information, consult the websites for the individual genome projects.*The diploid chromosome number is given for all eukaryotes except yeast.†Haploid chromosome number. Wild yeast strains generally have eight (octoploid) or more sets of these chromosomes.‡Number for females, with two X chromosomes. Males have an X but no Y, thus 11 chromosomes in all.

8885d_c24_920-947 2/11/04 1:36 PM Page 928 mac76 mac76:385_reb:

major role in human evolution: movement of trans-posons can lead to the redistribution of other genomicsequences.

Another 3% or so of the human genome consists ofhighly repetitive sequences, also referred to assimple-sequence DNA or simple sequence repeats

(SSR). These short sequences, generally less than10 bp long, are sometimes repeated millions of times percell. The simple-sequence DNA has also been calledsatellite DNA, so named because its unusual base com-

position often causes it to migrate as “satellite” bands(separated from the rest of the DNA) when fragmentedcellular DNA samples are centrifuged in a cesium chlo-ride density gradient. Studies suggest that simple-sequence DNA does not encode proteins or RNAs. Un-like the transposable elements, the highly repetitiveDNA can have identifiable functional importance inhuman cellular metabolism, because much of it is asso-ciated with two defining features of eukaryotic chro-mosomes: centromeres and telomeres.

24.1 Chromosomal Elements 929

A B C D E F G

1 2 3 4 5 6 7

Ovalbumingene

A131 bp

B851 bp

190 bp

2222 bp

3126 bp

L

Hemoglobin� subunit

ExonIntron

FIGURE 24–7 Introns in two eukaryotic genes. The gene for ovalbu-min has seven introns (A to G), splitting the coding sequences intoeight exons (L, and 1 to 7). The gene for the � subunit of hemoglobin

has two introns and three exons, including one intron that alone con-tains more than half the base pairs of the gene.

Genes

30%M

isce

llan

eou

s

25%

Transposons45%

13%SINEs

8%Retroviruslike

3% SSR

5% SD

17% ?

28.5% Introns andnoncodingsegments

21%LINEs

1.5% Exons

FIGURE 24–8 Types of sequences in the human genome. This piechart divides the genome into transposons (transposable elements),genes, and miscellaneous sequences. There are four main classes oftransposons. Long interspersed elements (LINEs), 6 to 8 kbp long (1 kbp� 1,000 bp), typically include a few genes encoding proteins that cat-alyze transposition. The genome has about 850,000 LINEs. Short inter-spersed elements (SINEs) are about 100 to 300 bp long. Of the 1.5million in the human genome more than 1 million are Alu elements,so called because they generally include one copy of the recognitionsequence for AluI, a restriction endonuclease (see Fig. 9–3). Thegenome also contains 450,000 copies of retroviruslike transposons,1.5 to 11 kbp long. Although these are “trapped” in the genome andcannot move from one cell to another, they are evolutionarily relatedto the retroviruses (Chapter 26), which include HIV. A final class oftransposons (making up �1% and not shown here) consists of a vari-ety of transposon remnants that differ greatly in length.

About 30% of the genome consists of sequences included in genesfor proteins, but only a small fraction of this DNA is in exons (codingsequences). Miscellaneous sequences include simple-sequence re-peats (SSR) and large segmental duplications (SD), the latter being seg-ments that appear more than once in different locations. Among theunlisted sequence elements (denoted by a question mark) are genesencoding RNAs (which can be harder to identify than genes for pro-teins) and remnants of transposons that have been evolutionarily al-tered so that they are now hard to identify.

8885d_c24_920-947 2/11/04 1:36 PM Page 929 mac76 mac76:385_reb:

The centromere (Fig. 24–9) is a sequence of DNAthat functions during cell division as an attachmentpoint for proteins that link the chromosome to the mi-totic spindle. This attachment is essential for the equaland orderly distribution of chromosome sets to daugh-ter cells. The centromeres of Saccharomyces cere-

visiae have been isolated and studied. The sequencesessential to centromere function are about 130 bp longand are very rich in AUT pairs. The centromeric se-quences of higher eukaryotes are much longer and, un-like those of yeast, generally contain simple-sequenceDNA, which consists of thousands of tandem copies ofone or a few short sequences of 5 to 10 bp, in the sameorientation. The precise role of simple-sequence DNAin centromere function is not yet understood.

Telomeres (Greek telos, “end”) are sequences atthe ends of eukaryotic chromosomes that help stabilizethe chromosome. The best-characterized telomeres arethose of the simpler eukaryotes. Yeast telomeres endwith about 100 bp of imprecisely repeated sequences ofthe form

(5�)(TxGy)n

(3�)(AxCy)n

where x and y are generally between 1 and 4. The num-ber of telomere repeats, n, is in the range of 20 to 100for most single-celled eukaryotes and generally morethan 1,500 in mammals. The ends of a linear DNA mol-ecule cannot be routinely replicated by the cellular repli-cation machinery (which may be one reason why bac-terial DNA molecules are circular). Repeated telomericsequences are added to eukaryotic chromosome endsprimarily by the enzyme telomerase (see Fig. 26–35).

Artificial chromosomes (Chapter 9) have been con-structed as a means of better understanding the func-tional significance of many structural features of eukar-yotic chromosomes. A reasonably stable artificial linearchromosome requires only three components: a centro-mere, telomeres at each end, and sequences that allowthe initiation of DNA replication. Yeast artificial chromo-somes (YACs; see Fig. 9–8) have been developed as aresearch tool in biotechnology. Similarly, human artificialchromosomes (HACs) are being developed for the treat-ment of genetic diseases by somatic gene therapy.

SUMMARY 24.1 Chromosomal Elements

■ Genes are segments of a chromosome thatcontain the information for a functionalpolypeptide or RNA molecule. In addition togenes, chromosomes contain a variety ofregulatory sequences involved in replication,transcription, and other processes.

■ Genomic DNA and RNA molecules aregenerally orders of magnitude longer than theviral particles or cells that contain them.

■ Many genes in eukaryotic cells, and a few inbacteria, are interrupted by noncodingsequences called introns. The coding segmentsseparated by introns are called exons.

■ Less than one-third of human genomic DNAconsists of genes. Much of the remainderconsists of repeated sequences of varioustypes. Nucleic acid parasites known astransposons account for about half of thehuman genome.

■ Eukaryotic chromosomes have two importantspecial-function repetitive DNA sequences:centromeres, which are attachment points forthe mitotic spindle, and telomeres, located atthe ends of chromosomes.

24.2 DNA SupercoilingCellular DNA, as we have seen, is extremely compacted,implying a high degree of structural organization. Thefolding mechanism must not only pack the DNA but alsopermit access to the information in the DNA. Beforeconsidering how this is accomplished in processes suchas replication and transcription, we need to examine animportant property of DNA structure known as super-

coiling.

Supercoiling means the coiling of a coil. A telephonecord, for example, is typically a coiled wire. The pathtaken by the wire between the base of the phone andthe receiver often includes one or more supercoils (Fig.24–10). DNA is coiled in the form of a double helix, withboth strands of the DNA coiling around an axis. Thefurther coiling of that axis upon itself (Fig. 24–11) pro-duces DNA supercoiling. As detailed below, DNAsupercoiling is generally a manifestation of structuralstrain. When there is no net bending of the DNA axisupon itself, the DNA is said to be in a relaxed state.

We might have predicted that DNA compaction in-volved some form of supercoiling. Perhaps less pre-dictable is that replication and transcription of DNA alsoaffect and are affected by supercoiling. Both processes

Chapter 24 Genes and Chromosomes930

Unique sequences (genes), dispersed repeats,and multiple replication origins

TelomereCentromereTelomere

FIGURE 24–9 Important structural elements of a yeast chromosome.

8885d_c24_920-947 2/11/04 1:36 PM Page 930 mac76 mac76:385_reb:

Chatchawan Srisawat
Line
Chatchawan Srisawat
Line
Chatchawan Srisawat
Line

extended right-handed supercoils characteristic of theplectonemic form, solenoidal supercoiling involves tightleft-handed turns, similar to the shape taken up by agarden hose neatly wrapped on a reel. Although theirstructures are dramatically different, plectonemic andsolenoidal supercoiling are two forms of negative super-coiling that can be taken up by the same segment ofunderwound DNA. The two forms are readily intercon-vertible. Although the plectonemic form is more stablein solution, the solenoidal form can be stabilized byprotein binding and is the form found in chromatin. Itprovides a much greater degree of compaction (Fig.24–24b). Solenoidal supercoiling is the mechanism bywhich underwinding contributes to DNA compaction.

SUMMARY 24.2 DNA Supercoiling

■ Most cellular DNAs are supercoiled. Under-winding decreases the total number of helicalturns in the DNA relative to the relaxed, B form.To maintain an underwound state, DNA mustbe either a closed circle or bound to protein.Underwinding is quantified by a topologicalparameter called linking number, Lk.

■ Underwinding is measured in terms of specificlinking difference, � (also called superhelical

density), which is (Lk � Lk0)/Lk0. For cellularDNAs, � is typically �0.05 to �0.07, whichmeans that approximately 5% to 7% of thehelical turns in the DNA have been removed.DNA underwinding facilitates strand separationby enzymes of DNA metabolism.

■ DNAs that differ only in linking number arecalled topoisomers. Enzymes that underwindand/or relax DNA, the topoisomerases, catalyzechanges in linking number. The two classes oftopoisomerases, type I and type II, change Lk

in increments of 1 or 2, respectively, percatalytic event.

24.3 The Structure of ChromosomesThe term “chromosome” is used to refer to a nucleicacid molecule that is the repository of genetic informa-tion in a virus, a bacterium, a eukaryotic cell, or an or-ganelle. It also refers to the densely colored bodies seenin the nuclei of dye-stained eukaryotic cells, as visual-ized using a light microscope.

Chromatin Consists of DNA and Proteins

The eukaryotic cell cycle (see Fig. 12–41) produces re-markable changes in the structure of chromosomes (Fig.24–25). In nondividing eukaryotic cells (in G0) andthose in interphase (G1, S, and G2), the chromosomalmaterial, chromatin, is amorphous and appears to berandomly dispersed in certain parts of the nucleus. Inthe S phase of interphase the DNA in this amorphousstate replicates, each chromosome producing two sisterchromosomes (called sister chromatids) that remain as-sociated with each other after replication is complete.The chromosomes become much more condensed dur-ing prophase of mitosis, taking the form of a species-specific number of well-defined pairs of sister chro-matids (Fig. 24–5).

Chromatin consists of fibers containing protein andDNA in approximately equal masses, along with a smallamount of RNA. The DNA in the chromatin is verytightly associated with proteins called histones, whichpackage and order the DNA into structural units callednucleosomes (Fig. 24–26). Also found in chromatin aremany nonhistone proteins, some of which help maintainchromosome structure, others that regulate the ex-pression of specific genes (Chapter 28). Beginning withnucleosomes, eukaryotic chromosomal DNA is packagedinto a succession of higher-order structures that ulti-mately yield the compact chromosome seen with thelight microscope. We now turn to a description of thisstructure in eukaryotes and compare it with the pack-aging of DNA in bacterial cells.

Chapter 24 Genes and Chromosomes938

(b)(a)

Plectonemic

Solenoidal

FIGURE 24–24 Plectonemic and solenoidal supercoiling. (a) Plec-tonemic supercoiling takes the form of extended right-handed coils.Solenoidal negative supercoiling takes the form of tight left-handedturns about an imaginary tubelike structure. The two forms are read-ily interconverted, although the solenoidal form is generally not ob-served unless certain proteins are bound to the DNA. (b) Plectonemic(top) and solenoidal supercoiling of the same DNA molecule, drawnto scale. Solenoidal supercoiling provides a much greater degree ofcompaction.

8885d_c24_920-947 2/11/04 1:36 PM Page 938 mac76 mac76:385_reb:

Chatchawan Srisawat
Line
Chatchawan Srisawat
Line
Chatchawan Srisawat
Line
Chatchawan Srisawat
Line

Histones Are Small, Basic Proteins

Found in the chromatin of all eukaryotic cells, histoneshave molecular weights between 11,000 and 21,000 andare very rich in the basic amino acids arginine and ly-sine (together these make up about one-fourth of theamino acid residues). All eukaryotic cells have five ma-jor classes of histones, differing in molecular weight andamino acid composition (Table 24–3). The H3 histonesare nearly identical in amino acid sequence in alleukaryotes, as are the H4 histones, suggesting strictconservation of their functions. For example, only 2 of102 amino acid residues differ between the H4 histonemolecules of peas and cows, and only 8 differ betweenthe H4 histones of humans and yeast. Histones H1, H2A,and H2B show less sequence similarity among eukary-otic species.

Each type of histone has variant forms, because cer-tain amino acid side chains are enzymatically modifiedby methylation, ADP-ribosylation, phosphorylation, gly-cosylation, or acetylation. Such modifications affect thenet electric charge, shape, and other properties ofhistones, as well as the structural and functional prop-erties of the chromatin, and they play a role in the reg-ulation of transcription (Chapter 28).

24.3 The Structure of Chromosomes 939

Interphase

Mitosis

Metaphase

Anaphase Prophase

Spindlepole

G2G1

condensation

replicationand cohesion

Condensins

Cohesins

Replicationcompleted

Cohesin

DuplexDNA

SReplication occursfrom multipleorigins of replication;daughter chromatidsare linked by cohesins

alignment

separation

FIGURE 24–25 Changes in chromosome structure duringthe eukaryotic cell cycle. Cellular DNA is uncondensedthroughout interphase. The interphase period can besubdivided (see Fig. 12–41) into the G1 (gap) phase; the S(synthesis) phase, when the DNA is replicated; and the G2phase, in which the replicated chromosomes cohere to oneanother. The DNA undergoes condensation in the prophaseof mitosis. Cohesins (green) and condensins (red) areproteins involved in cohesion and condensation (discussedlater in the chapter). The architecture of the cohesin-condensin-DNA complex is not yet established, and theinteractions shown here are figurative, simply suggestingtheir role in condensation of the chromosome. Duringmetaphase, the condensed chromosomes line up along aplane halfway between the spindle poles. One chromosomeof each pair is linked to each spindle pole via microtubulesthat extend between the spindle and the centromere. Thesister chromatids separate at anaphase, each drawn towardthe spindle pole to which it is connected. After cell divisionis complete, the chromosomes decondense and the cyclebegins anew.

Histone core of nucleosome

Linker DNA of nucleosome

(a)

(b)50 nm

FIGURE 24–26 Nucleosomes. Regularly spaced nucleosomes consistof histone complexes bound to DNA. (a) Schematic illustration and(b) electron micrograph.

8885d_c24_920-947 2/11/04 1:36 PM Page 939 mac76 mac76:385_reb:

Nucleosomes Are the Fundamental OrganizationalUnits of Chromatin

The eukaryotic chromosome depicted in Figure 24–5represents the compaction of a DNA molecule about105 �m long into a cell nucleus that is typically 5 to10 �m in diameter. This compaction involves severallevels of highly organized folding. Subjection of chromo-somes to treatments that partially unfold them revealsa structure in which the DNA is bound tightly to beadsof protein, often regularly spaced (Fig. 24–26). Thebeads in this “beads-on-a-string” arrangement are com-plexes of histones and DNA. The bead plus the con-necting DNA that leads to the next bead form the nu-cleosome, the fundamental unit of organization uponwhich the higher-order packing of chromatin is built.The bead of each nucleosome contains eight histonemolecules: two copies each of H2A, H2B, H3, and H4.The spacing of the nucleosome beads provides a re-peating unit typically of about 200 bp, of which 146 bpare bound tightly around the eight-part histone core andthe remainder serve as linker DNA between nucleosomebeads. Histone H1 binds to the linker DNA. Brief treat-ment of chromatin with enzymes that digest DNA causespreferential degradation of the linker DNA, releasing his-tone particles containing 146 bp of bound DNA that havebeen protected from digestion. Researchers have crys-tallized nucleosome cores obtained in this way, and x-ray diffraction analysis reveals a particle made up ofthe eight histone molecules with the DNA wrappedaround it in the form of a left-handed solenoidal super-coil (Fig. 24–27).

A close inspection of this structure reveals why eu-karyotic DNA is underwound even though eukaryoticcells lack enzymes that underwind DNA. Recall that thesolenoidal wrapping of DNA in nucleosomes is but oneform of supercoiling that can be taken up by under-wound (negatively supercoiled) DNA. The tight wrap-ping of DNA around the histone core requires the re-moval of about one helical turn in the DNA. When theprotein core of a nucleosome binds in vitro to a relaxed,closed-circular DNA, the binding introduces a negativesupercoil. Because this binding process does not breakthe DNA or change the linking number, the formationof a negative solenoidal supercoil must be accompaniedby a compensatory positive supercoil in the unbound re-gion of the DNA (Fig. 24–28). As mentioned earlier, eu-karyotic topoisomerases can relax positive supercoils.Relaxing the unbound positive supercoil leaves the neg-ative supercoil fixed (through its binding to the nucle-osome histone core) and results in an overall decreasein linking number. Indeed, topoisomerases have provednecessary for assembling chromatin from purified his-tones and closed-circular DNA in vitro.

Another factor that affects the binding of DNA tohistones in nucleosome cores is the sequence of the

Chapter 24 Genes and Chromosomes940

H2BH4

H2A

H2A

H3

H4

H3

H2B(a)

(b)

(c)

FIGURE 24–27 DNA wrapped around a nucleosome core. (a) Space-filling representation of the nucleosome protein core, with differentcolors for the different histones (PDB ID 1AOI). (b) Top and (c) sideviews of the crystal structure of a nucleosome with 146 bp of boundDNA. The protein is depicted as a gray surface contour, with the boundDNA in blue. The DNA binds in a left-handed solenoidal supercoilthat circumnavigates the histone complex 1.8 times. A schematic draw-ing is included in (c) for comparison with other figures depicting nucleosomes.

8885d_c24_920-947 2/11/04 1:36 PM Page 940 mac76 mac76:385_reb:

bound DNA. Histone cores do not bind randomly toDNA; rather, they tend to position themselves at certainlocations. This positioning is not fully understood but insome cases appears to depend on a local abundance ofAUT base pairs in the DNA helix where it is in contactwith the histones (Fig. 24–29). The tight wrapping ofthe DNA around the nucleosome’s histone core requirescompression of the minor groove of the helix at thesepoints, and a cluster of two or three AUT base pairsmakes this compression more likely.

Other proteins are required for the positioning ofsome nucleosome cores on DNA. In several organisms,certain proteins bind to a specific DNA sequence andthen facilitate the formation of a nucleosome corenearby. Precise positioning of nucleosome cores canplay a role in the expression of some eukaryotic genes(Chapter 28).

24.3 The Structure of Chromosomes 941

TABLE 24–3 Types and Properties of Histones

Number of Content of basic amino

Molecular amino acid acids (% of total)

Histone weight residues Lys Arg

H1* 21,130 223 29.5 11.3H2A* 13,960 129 10.9 19.3H2B* 13,774 125 16.0 16.4H3 15,273 135 19.6 13.3H4 11,236 102 10.8 13.7

*The sizes of these histones vary somewhat from species to species. The numbers given here are for bovine histones.

FIGURE 24–28 Chromatin assembly. (a) Relaxed, closed-circularDNA. (b) Binding of a histone core to form a nucleosome induces onenegative supercoil. In the absence of any strand breaks, a positivesupercoil must form elsewhere in the DNA (�Lk � 0). (c) Relaxationof this positive supercoil by a topoisomerase leaves one net negativesupercoil (�Lk � �1).

DNA

Histonecore

(a)

(b)

(c)

One (net) negative supercoil

�Lk � 0

�Lk � �1

topoisomerase

Boundnegativesupercoil

(solenoidal)Unbound positivesupercoil (plectonemic)

DNA

Histone core

A T pairs abundant

FIGURE 24–29 Positioning of a nucleosome to make optimal use ofAUT base pairs where the histone core is in contact with the minorgroove of the DNA helix.

8885d_c24_920-947 2/11/04 1:36 PM Page 941 mac76 mac76:385_reb:

Nucleosomes Are Packed into Successively Higher Order Structures

Wrapping of DNA around a nucleosome core compactsthe DNA length about sevenfold. The overall compactionin a chromosome, however, is greater than 10,000-fold—ample evidence for even higher orders of structural or-ganization. In chromosomes isolated by very gentlemethods, nucleosome cores appear to be organized intoa structure called the 30 nm fiber (Fig. 24–30). Thispacking requires one molecule of histone H1 per nucle-osome core. Organization into 30 nm fibers does not ex-tend over the entire chromosome but is punctuated byregions bound by sequence-specific (nonhistone) DNA-binding proteins. The 30 nm structure also appears todepend on the transcriptional activity of the particularregion of DNA. Regions in which genes are being tran-scribed are apparently in a less-ordered state that con-tains little, if any, histone H1.

The 30 nm fiber, a second level of chromatin or-ganization, provides an approximately 100-fold com-paction of the DNA. The higher levels of folding are notyet understood, but it appears that certain regions ofDNA associate with a nuclear scaffold (Fig. 24–31). Thescaffold-associated regions are separated by loops ofDNA with perhaps 20 to 100 kbp. The DNA in a loopmay contain a set of related genes. For example, inDrosophila complete sets of histone-coding genes seemto cluster together in loops that are bounded by scaf-fold attachment sites (Fig. 24–32). The scaffold itselfappears to contain several proteins, notably large

amounts of histone H1 (located in the interior of thefiber) and topoisomerase II. The presence of topoiso-merase II further emphasizes the relationship betweenDNA underwinding and chromatin structure. Topoiso-merase II is so important to the maintenance of chro-matin structure that inhibitors of this enzyme can kill

Chapter 24 Genes and Chromosomes942

FIGURE 24–30 The 30 nm fiber, a higher-order organization of nu-cleosomes. (a) Schematic illustration of the probable structure of thefiber, showing nucleosome packing. (b) Electron micrograph.

FIGURE 24–31 A partially unraveled human chromosome, revealingnumerous loops of DNA attached to a scaffoldlike structure.

30nm

30 nm Fiber Histonegenes

Nuclearscaffold

H1

H3 H4

H2B

H2A

FIGURE 24–32 Loops of chromosomal DNA attached to a nuclearscaffold. The DNA in the loops is packaged as 30 nm fibers, so theloops are the next level of organization. Loops often contain groupsof genes with related functions. Complete sets of histone-coding genes,as shown in this schematic illustration, appear to be clustered in loopsof this kind. Unlike most genes, histone genes occur in multiple copiesin many eukaryotic genomes.

(a)

(b)

8885d_c24_920-947 2/11/04 1:36 PM Page 942 mac76 mac76:385_reb:

rapidly dividing cells. Several drugs used in cancerchemotherapy are topoisomerase II inhibitors that allowthe enzyme to promote strand breakage but not the re-sealing of the breaks.

Evidence exists for additional layers of organizationin eukaryotic chromosomes, each dramatically enhanc-ing the degree of compaction. One model for achievingthis compaction is illustrated in Figure 24–33. Higher-order chromatin structure probably varies from chro-mosome to chromosome, from one region to the next ina single chromosome, and from moment to moment inthe life of a cell. No single model can adequately de-scribe these structures. Nevertheless, the principle isclear: DNA compaction in eukaryotic chromosomes islikely to involve coils upon coils upon coils . . . Three-

Dimensional Packaging of Nuclear Chromosomes

Condensed Chromosome Structures Are Maintainedby SMC Proteins

A third major class of chromatin proteins, in addition tothe histones and topoisomerases, is the SMC proteins

(structural maintenance of chromosomes). The primarystructure of SMC proteins consists of five distinct do-mains (Fig. 24–34a). The amino- and carboxyl-terminalglobular domains, N and C, each of which has part ofan ATP hydrolytic site, are connected by two regions of�-helical coiled-coil motifs (see Fig. 4–11) that are joinedby a hinge domain. The proteins are generally dimeric,forming a V-shaped complex that is thought to be tiedtogether through their hinge domains (Fig. 24–34b). OneN and one C domain come together to form a completeATP hydrolytic site at each end of the V.

Proteins in the SMC family are found in all types oforganisms, from bacteria to humans. Eukaryotes havetwo major types, cohesins and condensins (Fig. 24–25).The cohesins play a substantial role in linking togethersister chromatids immediately after replication andkeeping them together as the chromosomes condenseto metaphase. This linkage is essential if chromosomesare to segregate properly at cell division. The detailedmechanism by which cohesins link sister chromosomes,and the role of ATP hydrolysis, are not yet understood.The condensins are essential to the condensation ofchromosomes as cells enter mitosis. In the laboratory,condensins bind to DNA in a manner that creates pos-itive supercoils; that is, condensin binding causes theDNA to become overwound, in contrast to the under-winding induced by the binding of nucleosomes. It is notyet clear how this helps to compact the chromatin, al-though one possibility is presented in Figure 24–35.

Bacterial DNA Is Also Highly Organized

We now turn briefly to the structure of bacterial chro-mosomes. Bacterial DNA is compacted in a structurecalled the nucleoid, which can occupy a significant

24.3 The Structure of Chromosomes 943

Nuclearscaffold

Twochromatids(10 coils each)

One coil(30 rosettes)

One rosette(6 loops)

One loop(~75,000 bp)

30 nm Fiber

“Beads-on-a-string”form ofchromatin

DNA

FIGURE 24–33 Compaction of DNA in a eukaryotic chromosome.Model for levels of organization that could provide DNA compactionin the chromosomes of eukaryotes. The levels take the form of coilsupon coils. In cells, the higher-order structures (above the 30 nm fibers)are unlikely to be as uniform as depicted here.

8885d_c24_920-947 2/11/04 1:36 PM Page 943 mac76 mac76:385_reb:

fraction of the cell volume (Fig. 24–36). The DNA ap-pears to be attached at one or more points to the inner surface of the plasma membrane. Much less isknown about the structure of the nucleoid than of eu-karyotic chromatin. In E. coli, a scaffoldlike structureappears to organize the circular chromosome into aseries of looped domains, as described above for chro-matin. Bacterial DNA does not seem to have any struc-ture comparable to the local organization provided bynucleosomes in eukaryotes. Histonelike proteins areabundant in E. coli—the best-characterized exampleis a two-subunit protein called HU (Mr 19,000)—butthese proteins bind and dissociate within minutes, andno regular, stable DNA-histone structure has beenfound. The bacterial chromosome is a relatively dy-

namic molecule, possibly reflecting a requirement formore ready access to its genetic information. The bac-terial cell division cycle can be as short as 15 min,whereas a typical eukaryotic cell may not divide forhours or even months. In addition, a much greaterfraction of prokaryotic DNA is used to encode RNAand/or protein products. Higher rates of cellular me-tabolism in bacteria mean that a much higher propor-tion of the DNA is being transcribed or replicated ata given time than in most eukaryotic cells.

Chapter 24 Genes and Chromosomes944

Condensin

+

Relaxed DNA

(–)

topoisomerase I

(–)

(+)(+) (+)(+)

FIGURE 24–35 Model for the effect of condensins on DNA super-coiling. Binding of condensins to a closed-circular DNA in the pres-ence of topoisomerase I leads to the production of positive supercoils(�). Wrapping of the DNA about the condensin introduces positivesupercoils because it wraps in the opposite sense to a solenoidal su-percoil (see Fig. 24–24). The compensating negative supercoils (�) thatappear elsewhere in the DNA are then relaxed by topoisomerase I. Inthe chromosome, it is the wrapping of the DNA about condensin thatmay contribute to DNA condensation.

2 m�

FIGURE 24–36 E. coli cells showing nucleoids. The DNA is stainedwith a dye that fluoresces when exposed to UV light. The light areadefines the nucleoid. Note that some cells have replicated their DNAbut have not yet undergone cell division and hence have multiplenucleoids.

ATP

ATP

(a)

N Hinge

Coiled coil Coiled coil

50 nm

C

(b)

(c)

FIGURE 24–34 Structure of SMC proteins. (a) The five domains ofthe SMC primary structure. N and C denoted the amino-terminal andcarboxyl-terminal domains, respectively. (b) Each polypeptide isfolded so that the two coiled-coil domains wrap around each otherand the N and C domains come together to form a complete ATP-binding site. Two of these domains are linked at the hinge region toform the dimeric V-shaped molecule. (c) Electron micrograph of SMCproteins from Bacillus subtilis.

8885d_c24_920-947 2/11/04 1:36 PM Page 944 mac76 mac76:385_reb:

With this overview of the complexity of DNA struc-ture, we are now ready to turn, in the next chapter, toa discussion of DNA metabolism.

SUMMARY 24.3 The Structure of Chromosomes

■ The fundamental unit of organization in thechromatin of eukaryotic cells is thenucleosome, which consists of histones and a200 bp segment of DNA. A core proteinparticle containing eight histones (two copieseach of histones H2A, H2B, H3, and H4) isencircled by a segment of DNA (about 146 bp)in the form of a left-handed solenoidalsupercoil.

■ Nucleosomes are organized into 30 nm fibers,and the fibers are extensively folded to providethe 10,000-fold compaction required to fit atypical eukaryotic chromosome into a cellnucleus. The higher-order folding involvesattachment to a nuclear scaffold that containshistone H1, topoisomerase II, and SMCproteins.

■ Bacterial chromosomes are also extensivelycompacted into the nucleoid, but thechromosome appears to be much moredynamic and irregular in structure thaneukaryotic chromatin, reflecting the shorter cellcycle and very active metabolism of a bacterialcell.

Chapter 24 Further Reading 945

Key Terms

gene 921genome 923chromosome 923phenotype 924mutation 924regulatory

sequence 924plasmid 925intron 928

exon 928simple-sequence

DNA 929satellite DNA 929centromere 930telomere 930supercoil 930relaxed DNA 930topology 931

underwinding 932linking number 933specific linking difference

(�) 933superhelical

density 933topoisomers 934topoisomerases 935plectonemic 937

solenoidal 937chromatin 938histones 938nucleosome 93830 nm fiber 942SMC proteins 943cohesins 943condensins 943nucleoid 943

Terms in bold are defined in the glossary.

Further Reading

GeneralBlattner, F.R., Plunkett, G., III, Bloch, C.A., Perna, N.T.,

Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode,

C.K., Mayhew, G.F., et al. (1997) The complete genomesequence of Escherichia coli K-12. Science 277, 1453–1474.

New secrets of this common laboratory organism are revealed.

Cozzarelli, N.R. & Wang, J.C. (eds) (1990) DNA Topology and

Its Biological Effects, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, NY.

Kornberg, A. & Baker, T.A. (1991) DNA Replication, 2nd edn,W. H. Freeman & Company, New York.

A good place to start for further information on the structureand function of DNA.

Lodish, H., Berk, A., Matsudaira, P., Kaiser, C.A., Krieger,

M., Scott, M.P., Zipursky, S.L., & Darnell, J. (2003) Molecular

Cell Biology, 5th edn, W. H. Freeman & Company, New York.Another excellent general reference.

Genes and ChromosomesBromham, L. (2002) The human zoo: endogenous retroviruses inthe human genome. Trends Ecol. Evolut. 17, 91–97.

A thorough description of one of the transposon classes thatmakes up a large part of the human genome.

Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B.,

Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C.,

Johnston, M., et al. (1996) Life with 6000 genes. Science 274,

546, 563–567.Report of the first complete sequence of a eukaryotic genome,the yeast Saccharomyces cerevisiae.

Greider, C.W. & Blackburn, E.H. (1996) Telomeres, telomeraseand cancer. Sci. Am. 274 (February), 92–97.

Huxley, C. (1997) Mammalian artificial chromosomes and chromo-some transgenics. Trends Genet. 13, 345–347.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody,

M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M.,

FitzHugh, W., et al. (2001) Initial sequencing and analysis of thehuman genome. Nature 409, 860–921.

One of the first reports on the draft sequence of the humangenome, with lots of analysis and many associated articles.

Long, M., de Souza, S.J., & Gilbert, W. (1995) Evolution of theintron-exon structure of eukaryotic genes. Curr. Opin. Genet.

Dev. 5, 774–778.

McEachern, M.J., Krauskopf, A., & Blackburn, E.H. (2000)Telomeres and their control. Annu. Rev. Genet. 34, 331–358.

8885d_c24_945 2/12/04 11:22 AM Page 945 mac76 mac76:385_reb:

Chapter 24 Genes and Chromosomes946

Schmid, C.W. (1996) Alu: structure, origin, evolution, significanceand function of one-tenth of human DNA. Prog. Nucleic Acid Res.

Mol. Biol. 53, 283–319.

Tyler-Smith, C. & Floridia, G. (2000) Many paths to the top ofthe mountain: diverse evolutionary solutions to centromere struc-ture. Cell 102, 5–8.

Details of the diversity of centromere structures from differentorganisms, as currently understood.

Zakian, V.A. (1996) Structure, function, and replication ofSaccharomyces cerevisiae telomeres. Annu. Rev. Genet. 30,

141–172.

Supercoiling and TopoisomerasesBerger, J.M. (1998) Type II DNA topoisomerases. Curr. Opin.

Struct. Biol. 8, 26–32.

Boles, T.C., White, J.H., & Cozzarelli, N.R. (1990) Structure ofplectonemically supercoiled DNA. J. Mol. Biol. 213, 931–951.

A study that defines several fundamental features of supercoiled DNA.

Champoux, J.J. (2001) DNA topoisomerases: structure, function,and mechanism. Annu. Rev. Biochem. 70, 369–413.

An excellent summary of the topoisomerase classes.

Cozzarelli, N.R., Boles, T.C., & White, J.H. (1990) Primer onthe topology and geometry of DNA supercoiling. In DNA Topology

and Its Biological Effects (Cozzarelli, N.R. & Wang, J.C., eds),pp. 139–184, Cold Spring Harbor Laboratory Press, Cold SpringHarbor, NY.

A more advanced and thorough discussion.

Lebowitz, J. (1990) Through the looking glass: the discovery ofsupercoiled DNA. Trends Biochem. Sci. 15, 202–207.

A short and interesting historical note.

Wang, J.C. (2002) Cellular roles of DNA topoisomerases: a molecular perspective. Nat. Rev. Mol. Cell Biol. 3, 430–440.

Chromatin and NucleosomesFilipski, J., Leblanc, J., Youdale, T., Sikorska, M., & Walker,

P.R. (1990) Periodicity of DNA folding in higher order chromatinstructures. EMBO J. 9, 1319–1327.

Hirano, T. (2002) The ABCs of SMC proteins: two-armed ATPasesfor chromosome condensation, cohesion and repair. Genes Dev.

16, 399–414.Description of the rapid advances in understanding of thisinteresting class of proteins.

Kornberg, R.D. (1974) Chromatin structure: a repeating unit ofhistones and DNA. Science 184, 868–871.

A classic paper that introduced the subunit model for chromatin.

Nasmyth, K. (2002) Segregating sister genomes: the molecular biology of chromosome separation. Science 297, 559–565.

Wyman, C. & Kanaar, R. (2002) Chromosome organization:reaching out to embrace new models. Curr. Biol. 12, R446–R448.

A good, short summary of chromosome structure and the rolesof SMC proteins within it.

Zlatanova, J. & van Holde, K. (1996) The linker histones andchromatin structure: new twists. Prog. Nucleic Acid Res. Mol.

Biol. 52, 217–259.

1. Packaging of DNA in a Virus Bacteriophage T2 hasa DNA of molecular weight 120 � 106 contained in a headabout 210 nm long. Calculate the length of the DNA (assumethe molecular weight of a nucleotide pair is 650) and com-pare it with the length of the T2 head.

2. The DNA of Phage M13 The base composition ofphage M13 DNA is A, 23%; T, 36%; G, 21%; C, 20%. Whatdoes this tell you about the DNA of phage M13?

3. The Mycoplasma Genome The complete genome ofthe simplest bacterium known, Mycoplasma genitalium, isa circular DNA molecule with 580,070 bp. Calculate the mo-lecular weight and contour length (when relaxed) of this mol-ecule. What is Lk0 for the Mycoplasma chromosome? If� � �0.06, what is Lk?

4. Size of Eukaryotic Genes An enzyme isolated fromrat liver has 192 amino acid residues and is coded for by agene with 1,440 bp. Explain the relationship between thenumber of amino acid residues in the enzyme and the num-ber of nucleotide pairs in its gene.

5. Linking Number A closed-circular DNA molecule inits relaxed form has an Lk of 500. Approximately how manybase pairs are in this DNA? How is the linking number altered(increases, decreases, doesn’t change, becomes undefined)when (a) a protein complex is bound to form a nucleosome,

(b) one DNA strand is broken, (c) DNA gyrase and ATP areadded to the DNA solution, or (d) the double helix is dena-tured by heat?

6. Superhelical Density Bacteriophage � infects E. coli

by integrating its DNA into the bacterial chromosome. Thesuccess of this recombination depends on the topology of theE. coli DNA. When the superhelical density (�) of the E. coli

DNA is greater than �0.045, the probability of integration is�20%; when � is less than �0.06, the probability is �70%.Plasmid DNA isolated from an E. coli culture is found to havea length of 13,800 bp and an Lk of 1,222. Calculate � for thisDNA and predict the likelihood that bacteriophage � will beable to infect this culture.

7. Altering Linking Number (a) What is the Lk of a5,000 bp circular duplex DNA molecule with a nick in onestrand? (b) What is the Lk of the molecule in (a) when thenick is sealed (relaxed)? (c) How would the Lk of the mole-cule in (b) be affected by the action of a single molecule ofE. coli topoisomerase I? (d) What is the Lk of the moleculein (b) after eight enzymatic turnovers by a single molecule ofDNA gyrase in the presence of ATP? (e) What is the Lk of themolecule in (d) after four enzymatic turnovers by a single mol-ecule of bacterial type I topoisomerase? (f) What is the Lk ofthe molecule in (d) after binding of one nucleosome?

Problems

8885d_c24_920-947 2/11/04 1:36 PM Page 946 mac76 mac76:385_reb:

Chapter 24 Problems 947

8. Chromatin Early evidence that helped researchersdefine nucleosome structure is illustrated by the agarose gelbelow, in which the thick bands represent DNA. It was gen-erated by briefly treating chromatin with an enzyme thatdegrades DNA, then removing all protein and subjecting thepurified DNA to electrophoresis. Numbers at the side of thegel denote the position to which a linear DNA of the indicatedsize would migrate. What does this gel tell you about chro-matin structure? Why are the DNA bands thick and spreadout rather than sharply defined?

9. DNA Structure Explain how the underwinding of a B-DNA helix might facilitate or stabilize the formation of Z-DNA.

10. Maintaining DNA Structure (a) Describe two struc-tural features required for a DNA molecule to maintain a neg-atively supercoiled state. (b) List three structural changesthat become more favorable when a DNA molecule is nega-tively supercoiled. (c) What enzyme, with the aid of ATP, cangenerate negative superhelicity in DNA? (d) Describe thephysical mechanism by which this enzyme acts.

11. Yeast Artificial Chromosomes (YACs) YACs areused to clone large pieces of DNA in yeast cells. What threetypes of DNA sequences are required to ensure proper repli-cation and propagation of a YAC in a yeast cell?

200 bp

400 bp

600 bp

800 bp

1,000 bp

8885d_c24_920-947 2/11/04 1:36 PM Page 947 mac76 mac76:385_reb:

343

SINEs and LINEs are short and long interspersedretrotransposable elements, respectively, that invade newgenomic sites using RNA intermediates. SINEs and LINEs arefound in almost all eukaryotes (although not in Saccharomycescerevisiae) and together account for at least 34% of thehuman genome. The noncoding SINEs depend on reversetranscriptase and endonuclease functions encoded by partnerLINEs. With the completion of many genome sequences,including our own, the database of SINEs and LINEs hastaken a great leap forward. The new data pose new questionsthat can only be answered by detailed studies of themechanism of retroposition. Current work ranges from thebiochemistry of reverse transcription and integration in vitro,target site selection in vivo, nucleocytoplasmic transport of theRNA and ribonucleoprotein intermediates, and mechanisms ofgenomic turnover. Two particularly exciting new ideas are thatSINEs may help cells survive physiological stress, and that theevolution of SINEs and LINEs has been shaped by the forcesof RNA interference. Taken together, these studies promise toexplain the birth and death of SINEs and LINEs, and thecontribution of these repetitive sequence families to theevolution of genomes.

AddressesDepartment of Biochemistry, HSB J417, University of Washington, Box 357350, Seattle, WA 98195-7350, USA;e-mail: [email protected]

Current Opinion in Cell Biology 2002, 14:343–350

0955-0674/02/$ — see front matter© 2002 Elsevier Science Ltd. All rights reserved.

AbbreviationsAP apurinic/apyrimidinicEN endonucleaseLINE long interspersed repeated sequenceORF open reading framepA polyadenylation sitepol RNA polymerase RNAi RNA interferenceRT reverse transcriptaseSINE short interspersed repeated sequenceSRP signal recognition particle

IntroductionThe dawn of the genomic age has transformed every aspectof biology, and the study of retrotransposable elements isno exception. Until complete genomic sequences began toappear in 1997, whoever studied SINEs and LINEs (shortand long interspersed repeated sequences, as rhymefullynamed by Singer [1]) had to worry that any individualretrotransposon sequence might be misleadingly differentfrom other members of the same sequence class. Forexample, with a database in 1981 of only a few dozenAlu sequences (the most abundant SINE in the humangenome), no firm conclusions could be drawn about theremaining 1,090,000 genomic Alu elements [2•• ] whose

existence was known solely from DNA reassociationkinetics. Sampling 0.002% of the genome simply does notinspire confidence.

Now, with many genomic sequences nearing completion,we can examine essentially all members of a particularclass of SINEs or LINEs in a particular genome, askingquestions and drawing conclusions that should withstandthe test of time. This is the good news.

Genomic anatomy rarely reveals mechanism, which is bestinvestigated in the cold room or the tissue culture hood. Sofor now, the most that can be said for the genomesequences of yeast, Arabidopsis, flies, worms, mice andhumans is that they have sharpened our questions aboutSINEs and LINEs without really answering any of them.

Historically, the first breakthrough from genome structureto molecular mechanism came in 1993, when the R2 proteinof the insect retrotransposon R2Bm was shown to nick thetarget DNA to generate a primer for reverse transcriptionof the R2Bm RNA in situ [3]. This brought retropositionwithin reach of conventional divide-and-conquer bio-chemistry. The second breakthrough came in 1996, whenhigh-frequency retroposition of a human LINE elementwas achieved in cultured somatic cells [4]. This meant thatretroposition could be dissected using the powerfulmethods of surrogate somatic cell genetics, includingselection for rare events. Together, these two breakthroughsdispelled any residual fears that retroposition might occuronly in experimentally inaccessible germ line cells [5] ormight occur less frequently than the typical research grantmust be renewed.

SINEs and LINEs: a parts list and owner’s manualLINEs are autonomous retroelements; SINEs are theirdependants. As shown in Figure 1, LINEs contain anunusual internal promoter for RNA polymerase II (pol II),one or two open reading frames (ORFs) (where the secondORF is accessed through a frameshift), and a 3′-terminalpolyadenylation site (pA) lacking the usual downstreamefficiency element [6]. ORF1 encodes an essential proteinof unknown function [7•• ] while ORF2 encodes a bifunc-tional polypeptide with both reverse transcriptase (RT)and DNA endonuclease (EN) activity.

EN is upstream of RT in older LINEs (e.g. R2, CRE1and -2, SLACS, CZAR, Dong and R4), and downstreamof RT in younger LINEs (e.g. L1, Jockey and CR1) [8].The downstream EN module in older LINEs is asequence-specific restriction-like EN; the upstream ENmodule in younger LINEs is an apurinic/apyrimidinic(AP) endonuclease that usually, but not always, lacks

SINEs and LINEs: the art of biting the hand that feeds youAlan M Weiner

sequence-specificity [9]. The polyadenylated LINEtranscript is exported to the cytoplasm (presumably likeany other intron-less mRNA) [10,11] and translated.However, the nascent RT preferentially binds in cis tothe LINE RNA encoding it [7•• ,12,13]. The resultingribonucleoprotein (RNP) complex of a functional LINERNA with the bifunctional RT/EN polypeptide thenenters the nucleus, where it initiates a process called‘target-primed reverse transcription’. In this process, theEN component of the RT/EN polypeptide nicks the targetDNA to generate a 3′-hydroxyl group, which is used bythe RT component of the RT/EN polypeptide to primereverse transcription of the LINE RNA in situ on thechromosome [3]. Thus, the LINE cDNA never exists freeof the chromosome, as originally postulated [14].

Moreover, the R2 endonuclease is activated by RNA,presumably to prevent it from wreaking freelance chromo-somal damage [15]. Following reverse transcription, theDNA repair machinery present in somatic [7•• ], as well asgerm line, cells mends the broken DNA, creating a staggeredbreak and generating flanking target site duplications of7–20 nucleotides. Most LINEs are 5′ truncated, an observationcommonly attributed to incomplete reverse transcription.But other interpretations are possible. A newly retroposed,full-length LINE element is ‘fertile’ — that is, capableof further rounds of retroposition — because both theinternal pol II promoter at the 5′ end of the LINE, andthe fully internal polyadenylation signal at the 3′ end,guarantee that the new element will be transcriptionallycompetent to produce an essentially identical polyadeny-lated LINE mRNA.

SINEs are similar to LINEs, but shorter, simpler, andalmost certainly dependent on LINE RT/EN functions

for retroposition. SINEs have an internal promoter forRNA polymerase III (pol III) instead of pol II, a 3′-terminalA-rich tract instead of the pol II polyadenylation signal (oroccasionally dinucleotide or trinucleotide repeats), containno significant ORFs, but otherwise function much likeLINEs. A functional SINE must be free of any olig-othymidylate tracts (where n?4) which can function aspol III termination signals. As a result, SINE transcriptioncontinues through the A-rich region until it encounters arandom oligothymidylate tract downstream.

The A-rich tract of SINE elements is presumed to functionas the template for initiation of reverse transcription,because sequences downstream of the A-rich region arenot retroposed; however, this has not been proven. Anewly retroposed SINE element is ‘fertile’ because theinternal pol III promoter and the 3′-terminal A-rich tracttogether guarantee that the new element will be tran-scriptionally competent to produce new SINE RNAs thatare essentially identical to the original.

The best evidence that SINEs piggyback on LINERT/EN functions is the discovery that SINEs sometimesshare common 3′-terminal sequences with ‘partner’ LINEs[16]. This would facilitate ‘retropositional parasitism’ ofSINEs on LINEs by increasing the efficiency with whichthe LINE RT recognises the 3′ end of a cognate SINE[17,18•• ]. Not surprisingly, some SINEs are truncatedLINEs, such as those arising from the RTE class of retro-transposons [19], but truncated LINEs are not necessarilySINEs [20].

No discussion of SINEs and LINEs would be completewithout mentioning ‘retropseudogenes’ (also known as‘processed genes’) — complete or 5′-truncated copies of

344 Nuclear and gene expression

Figure 1

A parts list for SINEs and LINEs.+1 represent the transcription start site.AP-EN, apurinic/apyrimidinic endonuclease;pA, polyadenylation signal lacking downstreamefficiency element; pol II and pol III, RNApolymerase II and III promoters; R-EN,restriction-like endonuclease; RT, reversetranscriptase. Solid black arrows represent7–20 base pair target-site duplications.Dashed lines indicate deletion of intronsequences in retropseudogenes derived frommature mRNA. Only the most common formsof LINE gene organization are shown; variantsexist. Note that some SINEs have 3′-terminaldinucleotide or trinucleotide repeats insteadof A-rich tails, and that partner SINEs andLINEs share common 3′-terminal sequences.

pol III AAAAAASINEs

+1

ORF1pol II ORF2 (AP-EN/RT) pA AAAAAAyounger LINEs(L1, jockey, CR1)

pApol II

Current Opinion in Cell Biology

pA AAAAAA

protein-encodinggenes

+1

+1

ORF (RT/R-EN) pA AAAAAA

retropseudogenes(′processed genes′)

pol II older LINEs(R2, CRE1, SLACS)

+1

mature mRNAs flanked by short target-site duplications of7–20 nucleotides. As long suspected, and now demonstratedexperimentally [7•• ], retropseudogenes are generatedwhen the LINE RT/EN binds a mature (presumablycytoplasmic) mRNA instead of a LINE or SINE RNA.The resulting retroposed mRNA lacks introns, but hasgained a 3′-terminal poly(A) tail, added post-transcriptionallyduring maturation of the mRNA precursor. Unlike SINEsand LINEs, retropseudogenes are infertile (‘dead onarrival’) because they lack an internal promoter. Shared3′-terminal sequences help SINEs exploit LINEs byovercoming the natural cis preference of LINE RTs for theRNAs that encoded them [17,18•• ]. Although generationof retropseudogenes depends on random binding of theLINE RT to mRNAs, retroposition is apparently drivenforward by the overwhelming abundance of potentialmRNA templates, and perhaps also by the ability of3′-terminal poly(A) tracts to facilitate initiation of reversetranscription by template–primer slippage [21].

Crossing the great divide: nucleocytoplasmictransportThe study of SINEs and LINEs will not get far withoutsome serious cell biology, because all available evidenceindicates that the RNA intermediates for retroposition ofSINEs, LINEs and retropseudogenes are captured — ifnot actually reverse transcribed — in the cytoplasm. LINERT/EN grabs LINE mRNA primarily in cis in thecytoplasm [13], and carries the mRNA into the nucleus asa RNP, as is also the case for retroviruses [22]. The almostcomplete absence of retropseudogenes containing unexcisedintrons (rat preproinsulin 1 being one of the few exceptions[23,24]) provides further evidence that the LINE RT/ENfirst binds RNA intermediates in the cytoplasm. However,it remains to be seen whether SINEs such as primateAlu elements [25] and rodent ID elements [5], which arederived from cytoplasmic RNAs, must retropose throughcytoplasmic RNA intermediates.

If the RNA intermediates for retroposition are cytoplasmic,SINEs and LINEs may have always been under selectionfor cytoplasmic stability as well as efficient nuclear export(an active process driven by a GTP gradient and requiringspecific cargo-binding proteins [10,11,26,27]). Interestingly,retroposition of dimeric human Alu elements derived fromthe 7SL RNA component of the signal recognition particle(SRP) appears to be facilitated by binding of two of the sixSRP proteins (SRP9 and SRP14) to the right monomer,possibly to assure nuclear export and/or stabilize full-lengthAlu RNAs in the cytoplasm [25]. Thus, cytoplasmic RNAretroposition intermediates may evolve to bind proteinsthat guarantee nuclear export and/or cytoplasmic stability,while avoiding proteins that would interfere with reversetranscription or nuclear import. For example, the nuclearexport apparatus only recognises mature tRNA withcorrect 5′ and 3′ ends [28]. This could be one explanationfor why SINEs derived from tRNA often retain a recog-nisable tRNA fold [29,30]. Another explanation might be

that the tRNA fold, like the shared 3′ terminus with apartner LINE, facilitates binding of the LINE RT/EN[17]. Similarly, binding of poly(A) binding proteins [31] tothe poly(A) tail of LINEs, and possibly the 3′-terminalA-rich tract of SINEs [32], may facilitate nuclear export orcytoplasmic stability.

LINEs: a maelstrom of modules?A phylogeny of the individual parts (the LINE EN andRT functions) is not necessarily a phylogeny of the wholebecause retroelements, like viruses and organismal genomes,are a ‘maelstrom of modules’ [33]. The most comprehensivephylogeny to date is extremely revealing [34,35•• ]: allLINE-like elements (or ‘non-LTR retrotransposons’)share a common ancestral RT, most closely related to theRT of certain group II introns. The earliest LINE-likeelements possessed a sequence-specific restriction-likeendonuclease downstream of the RT module. Thisdownstream restriction endonuclease was later replacedby an upstream AP EN, and still later some of theseelements acquired an RNase H domain downstream of theRT. All of the restriction-like endonucleases are sequence-specific [8], but the AP EN can be nonspecific or, morerarely, sequence-specific, as in the Bombyx R1Bm element[36]. Perhaps when we learn the function of the secondORF that is present only in the younger elements, we willbegin to understand whether LINEs were assembled frompre-existing cellular parts, or cellular functions weredevised for pre-existing LINE or group II intron functions.

A SINE is bornMost but not all SINEs are derived from tRNA; exceptionsinclude rodent ID elements derived from neuronalBC1 RNA, also found in male germ cells [5], and primateAlu elements derived from the ubiquitous 7SL RNAcomponent of the SRP [37]. However, any SINE that tooclosely resembles the parent RNA could function as adominant-negative mutant, or be sequestered fromretroposition by interaction with proteins that normallybind the parent RNA. Thus, as mentioned above, the trickmay be to retain sufficient resemblance to the parentRNA to bind proteins that are important for transport orstability, but not those proteins that would interfere withretroposition. Certainly, preservation of the internal pol IIItranscription factor binding sites (the A and B boxes) is notsufficient to explain preservation of a recognisable tRNAfold. We also can’t explain why an abundant pol IIItranscript like 5S ribosomal RNA has apparently nevergiven rise to a SINE; why primate Alu elements are dimericwhile the vast majority of mammalian SINEs aremonomeric; why the V-SINE (vertebrate-specific SINE)superfamily maintains a highly conserved core sequencesandwiched between the tRNA-like 5′ end and theLINE-like 3′ end [18•• ]. We can’t even explain why thereare no LINEs or SINEs in the budding yeast S. cerevisiae,despite an abundance of retroviral-like elements and thepresence of L1-like retrotransposons in the pathogenic yeastsCandida albicans and Cryptococcus neoformans [38,39]

SINEs and LINEs: the art of biting the hand that feeds you Weiner 345

For SINEs that share 3′-terminal sequences with a partnerLINE and persist by ‘retropositional parasitism’ [17,18•• ],it is not difficult to imagine how the SINE got its tail: theLINE RT would generate a short cDNA (equivalent to aretroviral ‘strong stop’ cDNA) by copying the 3′-terminalLINE RNA sequence. This cDNA would then switchfrom the LINE template to the RNA parent of theSINE-to-be, thus attaching a RT landing pad to an RNAcarrying an internal pol III promoter. One can also imaginethat 3′-terminal identity with a partner LINE enablesSINEs to more efficiently pirate the LINE RT, a cis-actingenzyme designed to keep functional LINEs alive byignoring RNAs derived from the vast excess of moribundor nonfunctional elements [7•• ,13]. However, ‘retropositionalparasitism’ is not risk-free: the human SINE MIR, whichshares 50 bases of 3′-terminal sequence with the partnerLINE2 element, was doomed when LINE2 (L2)became extinct [2•• ].

Other SINEs, such as primate Alu sequences, lack partnerLINEs, perhaps because the L1 RT binds tightly to theA-rich tail and is less dependent on auxiliary or adjacentRNA sequences. Only careful studies of LINE RTactivities will confirm, modify or enable us to rejectthese stories.

Knowing when to stop: dodging host genesand seeking safe havensAll retrotransposable elements are insertional mutagensthat can cause disease or disability by inserting near orwithin essential genes [40,41]. As a result, retroelementsface an existential dilemma [42]. Unlike infectious viralelements that can afford to kill one host in the process ofinfecting others, noninfectious retrotransposable elementsare captive but rebellious passengers within the hostgenome [42]. (We ignore the possible horizontal transfer ofSINEs and LINEs as stowaways in retroviral particles[43–45].) If the retroelement multiplies too recklessly,it will kill the host; but if it does not multiply fast enoughto offset natural loss and decay, it cannot survive. Thisbalancing act pits the ingenuity of the retroelement againstthe ingenuity of the host; the goal is to overrun, but notovercome the host. The abundance of dead retroelementslittering the human genomic landscape provides mutetestimony regarding this endless battle [2•• ]; however, wedo not understand why one family of LINEs (say, L1)succeeds another (say, L2), or what enables one family ofSINEs or LINEs to replicate explosively, while others ekeout a minimal existence without vanishing into oblivion.

New data from the (nearly) complete human genomesequence underscore the magnitude of the problem:1,500,000 SINEs (70% of them Alu elements) accountfor 13% of our genome, and 850,000 LINEs for another21% of the genome, giving a grand total of 34% trans-posable elements [2•• ]. Moreover, the distribution oftransposable elements is inexplicably uneven, bestexemplified by a 525 kb segment of chromosome Xp11,

where the density of retroposable elements is 89%, and thefour human homeobox gene clusters (HoxA, HoxB, HoxC,and HoxD), where the density of interspersed repeats isless than 2%. It is a wonder that our genes have survivedthe bombardment.

The choice of integration site is a difficult one for anytransposable element: On the one hand, it is advantageousto avoid integration in highly active genomic regions wheremajor damage might be done [46•]. On the other hand, it isdisadvantageous to be sequestered in inactive regionswhere transcriptional activity is low. One solution is totarget tandemly repeated genes (rRNA, trypanosome andnematode trans-spliced leaders [47,48]) or dispersedmultigene families (tRNA). For example, the insect R1and R2 LINE elements carry a sequence-specificendonuclease that cuts within the rDNA repeat unit [36].With excess rRNA coding capacity provided by severalhundred tandem repeats of the rDNA, the host can endurea significant burden of insertions. Yet the R1 element isscarce in Bombyx rDNA where concerted evolution or otherforces select against integrants [49•], while the very sameR1 element is superabundant in Drosophila rDNA where itmay interrupt 50–70% of the tandem rDNA repeats [50].

Another strategy — not yet shown for a SINE or LINE — isto land near genes, but not in them. In yeast, the Ty3endogenous retrovirus targets the 5′ flanking region oftRNA genes by a protein–protein interaction between the Ty3integration machinery and components of the transcriptionfactor TFIIIB [51,52]. tRNA gene expression is apparentlyunaffected, but the Ty3 element is thereby guaranteed ahome in a constitutively open chromatin region.

Mysteries of the genome: inverse distributionof SINEs and LINEsCuriously, human SINEs (Alu elements) are concentratedin gene-rich GC-rich regions of the genome, and LINEs ingene-poor AT-rich regions [2•• ,53•• ], supporting previousevidence for the existence of isochores (extended regionsof compositionally or functionally similar DNA) [54,55•].This inverse distribution is unlikely to reflect preferentialintegration of SINEs in GC-rich regions, because SINEsare widely believed to pirate the LINE RT/EN protein,and also unlikely to reflect preferential loss from AT-richregions, as these appear to tolerate high levels of ‘junk’DNA. Thus, SINEs may be preferentially retained inGC-rich regions (perhaps positively selected to augmentthe stress response, as described below [56,57,58•• ]). OrLINEs might be preferentially lost from GC-rich regions(perhaps because the 20-fold larger LINEs are moredangerous insertional mutagens [46•]).

Alternatively, an excess of L1 elements on the human Ychromosome, and to a lesser extent on the human Xchromosome, suggests that LINEs if not SINEs may bepurged by recombination between homologues containingoccupied and empty target sites [46•]. Although these gene

346 Nuclear and gene expression

Chatchawan
Highlight
Chatchawan
Highlight
Chatchawan
Highlight
Chatchawan
Highlight
Chatchawan
Highlight
Chatchawan
Highlight

conversion events would have to be directional to purgenewly integrated retrotransposable elements instead offixing them, recombinational differences between GC-richand AT-rich regions of the genome might then explain theinverse distribution of SINEs and LINEs.

Another possible explanation for the inverse genomicdistribution of SINEs and LINEs would be differentialcell cycle regulation at one (or more) of the many stepsin the retroposition process. Among the steps that couldbe cell-cycle regulated are transcription of the RNAsthemselves [59], nuclear export and cytoplasmic stabilizationof the RNAs, nuclear import of the RNAs or RNP intermediates, differential chromatin condensation (preferential decondensation of gene-rich GC-rich regionsmay facilitate integration), different DNA replicationschedules (gene-rich GC-rich DNA is usually replicatedearlier than gene-poor AT-rich regions), and differential DNArepair (preferential decondensation of gene-rich GC-richregions may render the underlying DNA more susceptibleto damage, with integration as a byproduct or consequence).Although there is no direct evidence that chromatinstructure can affect the integration of SINEs and LINEs,there are retroviral precedents: nucleosomes generallyblock, but occasionally enhance, mammalian retroviralintegration [60] and protein–protein interactions areresponsible for targeting Ty3 to tRNA promoters [51,52]and Ty5 to silent chromatin [61].

The inverse distribution of SINEs and LINEs could alsobe influenced by differential DNA methylation withingene-poor AT-rich and gene-rich GC-rich regions, butthere are conflicting reports of the effect of CpGmethylation on SINE and LINE transcription. NaturalCpG methylation in HeLa cells inhibits Alu transcription,and inhibition is relieved by treating the cells with thedemethylating drug 5-azacytidine [32]. In contrast, artificialCpG methylation inhibited L1 but not Alu transcriptionin both transient and stable expression assays when themethylated CpG binding protein MeCP2 was over-expressed and/or targeted to the SINE or LINE reporterconstructs by a Gal4 DNA-binding domain [62•]. MeCP2is a transcriptional repressor that tethers the Sin3A–HDAC1and Sin3A–HDAC2 histone deacetylase complexes toCpG-methylated DNA.

The taming of the shrewd: SINEs, LINEs andRNA interferenceRNA interference (RNAi) is a form of post-transcriptionalgene silencing triggered by double-stranded RNA. It isfound in many organisms, including flies [63,64], worms[65] and mammals [66]. In flies [67•• ], worms [68,69] andvertebrates, including humans [67•• ], RNAi appears to bea normal regulatory mechanism in which single-stranded‘micro RNAs’, around 22 nt in size and derived fromsomewhat larger developmentally controlled RNA pre-cursors, anneal with target mRNAs and trigger theirdegradation. RNAi also plays a protective or defensive role,

and has been implicated in silencing transposons [70–72]and viruses [73–75] in fungi and plants.

SINEs and LINEs appear to be subject to RNAi [76,77•• ].Although SINEs and LINEs can be independentlytranscribed from internal promoters, co-transcription ofSINEs and LINEs from external promoters will generateantisense as well as sense transcripts. These sense andantisense transcripts of SINEs and LINEs could inprinciple anneal to form double-stranded RNAs capableof triggering the degradation of any other transcriptscontaining SINE and LINE sequences, whether inde-pendently transcribed from internal SINE and LINEpromoters or co-transcribed as part of larger RNAs. Toexplain how mRNAs and mRNA precursors containing allor part of a SINE or LINE sequence can survive attack byRNAi, one might speculate that divergence between anypair of SINE or LINE sequences is usually too great totrigger RNAi. In any event, RNAi may have played a rolein the evolution of SINEs and LINEs, as it is part of thecellular environment in which SINEs and LINEs arise andpropagate. We do not know whether RNAi evolved first asa defensive or developmental regulatory mechanism, but anextreme view would be that SINEs and LINEs could notflourish until RNAi evolved to protect the cell from them.

Accidental travellersSINEs and LINEs, no less than retropseudogenes, canprovide the ‘seeds of evolution’ [78]. Perhaps the singlemost conspicuous demonstration of the power of retroposi-tion is the functional, tissue-specific human pgk-2 locus, anautosomal phosphoglycerate kinase retrogene expressedduring spermatogenesis and lacking the ten introns of theubiquitously expressed X-linked pgk-1 gene. How PGK-2acquired both a promoter and tissue specificity is aninteresting question. Was it a lucky insertion into a tissue-specific promoter or chromosomal region, or was thespecificity an accident that was then perfected by sub-sequent selection? Almost equally remarkable is the ratpreproinsulin I gene, a functional retroposon which, havinglost one of the two ancestral preproinsulin II introns, maybe the sole instance in which a partially spliced mRNAappears to have served as the RNA intermediate [23].

Instances of useful genomic mayhem created by retroposition of SINEs and LINEs are still relatively sparse(see http://exppc01.uni-muenster.de/expath/alltables.htm),but more will surely emerge from detailed analysis ofthe human genome sequence. For example, L1s canco-transduce 3′ flanking sequences [79,80], and integrationof Alu elements in reverse orientation can introducefortuitous 3′ splice sites that are capable of diversifyingthe spectrum of alternatively spliced mRNAs (R Sorek andG Ast, personal communication). There is also one reportof a rodent B2 SINE that carries a pol II promoter [81•].Remarkably, this portable pol II promoter does not interferewith the internal pol III promoter function required forB2 retroposition, and in one case the mobile pol II promoter

SINEs and LINEs: the art of biting the hand that feeds you Weiner 347

Chatchawan
Highlight
Chatchawan
Highlight
Chatchawan
Highlight
Chatchawan
Highlight

drives transcription of a typical gene encoding the lamininvariant, Lama3.

Are SINEs useful parasites?From the moment of their discovery, SINEs and LINEswere treated as genomic parasites [82], an internal infectionthat could be kept in check but rarely cured. However,evolution, like science, is rife with reversals of fortune.The first group to sequence human Alu elements has nowproposed that SINEs, once considered a source of genomicstress, may in fact help ease cells through physiologicalstresses that induce Alu transcription such as heat shockand translational inhibition [56,57,58•• ]. The inducedSINE transcripts would then bind to PKR kinase, blockingthe ability of this kinase to inhibit translation by phos-phorylating the initiation factor eIF2α.

Although initially greeted with scepticism, the view thatSINEs may lend us a helping hand has received significantsupport from an unexpected direction: the InternationalHuman Genome Sequencing Consortium argued, inpresenting the first draft of the human genome sequence[2•• ], that over-representation of Alu elements in gene-richGC-rich DNA is most easily explained if ‘SINEs actuallyearn their keep in the genome’ by positive selection, asproposed by Schmid [56,57,58•• ]. Although the jury is stillout, at least we can now look forward to a fair trial.

UpdateA recent publication [83] suggests that methylation may beresponsible for the mysteriously uneven genomic distributionof SINES, perhaps by affecting the insertion and/or deletionof the elements.

References and recommended readingPapers of particular interest, published within the annual period of review,have been highlighted as:

• of special interest••of outstanding interest

1. Singer MF: SINEs and LINEs: highly repeated short and longinterspersed sequences in mammalian genomes. Cell 1982,28:433-434.

2. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, •• Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al.: Initial

sequencing and analysis of the human genome. Nature 2001,409:860-921.

Although the first scholarly presentation of the human genome sequencemight have been expected to focus primarily on genes, the public consortiumdid a marvellous job of discussing the general appearance of the genomiclandscape, and the role that SINEs and LINEs may have played in shapingthat rugged geography. The uneven distribution of SINEs in the humangenome suggests that retroposable elements are subject to positive as wellas negative selection, reinforcing provocative recent evidence that SINEsmay be good for us after all (see Li et al. [2001] [58•• ] below).

3. Luan DD, Korman MH, Jakubczak JL, Eickbush TH: Reversetranscription of R2Bm RNA is primed by a nick at thechromosomal target site: a mechanism for non-LTRretrotransposition. Cell 1993, 72:595-605.

4. Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD,Kazazian HH Jr: High frequency retrotransposition in culturedmammalian cells. Cell 1996, 87:917-927.

5. Muslimov IA, Lin Y, Heller M, Brosius J, Zakeri Z, Tiedge H: A smallRNA in testis and brain: implications for male germ celldevelopment. J Cell Sci 2002, 115:1243-1250.

6. Hans H, Alwine JC: Functionally significant secondary structure ofthe simian virus 40 late polyadenylation signal. Mol Cell Biol2000, 20:2926-2932.

7. Esnault C, Maestre J, Heidmann T: Human LINE retrotransposons •• generate processed pseudogenes. Nat Genet 2000, 24:363-367.Since the discovery of SINEs, LINEs and processed pseudogenes in theearly 1980s, it was widely assumed that LINE reverse transcriptase wouldbe responsible for propagating noncoding SINEs and generating processedretropseudogenes derived from mRNAs. This work provides the long-awaitedevidence for this long-held hypothesis, and develops the experimental toolsfor understanding the functions of LINE reverse transcriptase and integraseat the molecular level.

8. Yang J, Malik HS, Eickbush TH: Identification of the endonucleasedomain encoded by R2 and other site- specific, non-long terminalrepeat retrotransposable elements. Proc Natl Acad Sci USA 1999,96:7847-7852.

9. Feng Q, Moran JV, Kazazian HH Jr, Boeke JD: Human L1retrotransposon encodes a conserved endonuclease required forretrotransposition. Cell 1996, 87:905-916.

10. Pasquinelli AE, Ernst RK, Lund E, Grimm C, Zapp ML, Rekosh D,Hammarskjold ML, Dahlberg JE: The constitutive transport element(CTE) of Mason–Pfizer monkey virus (MPMV) accesses a cellularmRNA export pathway. EMBO J 1997, 16:7500-7510.

11. Paca RE, Ogert RA, Hibbert CS, Izaurralde E, Beemon KL: Roussarcoma virus DR posttranscriptional elements use a novel RNAexport pathway. J Virol 2000, 74:9507-9514.

12. Kimberland ML, Divoky V, Prchal J, Schwahn U, Berger W,Kazazian HH Jr: Full-length human L1 insertions retain the capacityfor high frequency retrotransposition in cultured cells. Hum MolGenet 1999, 8:1557-1560.

13. Wei W, Gilbert N, Ooi SL, Lawler JF, Ostertag EM, Kazazian HH,Boeke JD, Moran JV: Human L1 retrotransposition: cis preference versus trans complementation. Mol Cell Biol 2001,21:1429-1439.

14. Van Arsdell SW, Denison RA, Bernstein LB, Weiner AM, Manser T,Gesteland RF: Direct repeats flank three small nuclear RNApseudogenes in the human genome. Cell 1981, 26:11-17.

15. Yang J, Eickbush TH: RNA-induced changes in the activity of theendonuclease encoded by the R2 retrotransposable element.Mol Cell Biol 1998, 18:3455-3465.

16. Okada N, Hamada M, Ogiwara I, Ohshima K: SINEs and LINEs share common 3′′ sequences: a review. Gene 1997,205:229-243.

17. Ogiwara I, Miya M, Ohshima K, Okada N: Retropositional parasitismof SINEs on LINEs: identification of SINEs and LINEs inelasmobranchs. Mol Biol Evol 1999, 16:1238-1250.

18. Ogiwara I, Miya M, Ohshima K, Okada N: V-SINEs: a new •• superfamily of vertebrate SINEs that are widespread in vertebrate

genomes and retain a strongly conserved segment within eachrepetitive unit. Genome Res 2002, 12:316-324.

The Okada group was the first to show that SINEs sometimes impersonateLINEs by imitating (or stealing) their 3′-terminal sequences. The shared3′-terminal sequences may facilitate retroposition by enabling SINE RNAto bind LINE reverse transcriptase more tightly. V-SINEs are a stunningexample of this kind of ‘retropositional parasitism’ in which one mobileelement exploits another.

19. Malik HS, Eickbush TH: The RTE class of non-LTR retrotransposonsis widely distributed in animals and is the origin of many SINEs.Mol Biol Evol 1998, 15:1123-1134.

20. Nikaido M, Okada N: CetSINEs and AREs are not SINEs but areparts of cetartiodactyl L1. Mamm Genome 2000, 11:1123-1126.

21. Bebenek K, Abbotts J, Roberts JD, Wilson SH, Kunkel TA: Specificityand mechanism of error-prone replication by humanimmunodeficiency virus-1 reverse transcriptase. J Biol Chem1989, 264:16948-16956.

22. Kenna MA, Brachmann CB, Devine SE, Boeke JD: Invading theyeast nucleus: a nuclear localization signal at the C terminus ofTy1 integrase is required for transposition in vivo. Mol Cell Biol1998, 18:1115-1124.

23. Soares MB, Schon E, Henderson A, Karathanasis SK, Cate R, Zeitlin S,Chirgwin J, Efstratiadis A: RNA-mediated gene duplication: the ratpreproinsulin I gene is a functional retroposon. Mol Cell Biol1985, 5:2090-2103.

348 Nuclear and gene expression

Chatchawan
Highlight
Chatchawan
Highlight

24. Weiner AM, Deininger PL, Efstratiadis A: Nonviral retroposons:genes, pseudogenes, and transposable elements generated bythe reverse flow of genetic information. Annu Rev Biochem 1986,55:631-661.

25. Sarrowa J, Chang DY, Maraia RJ: The decline in human Aluretroposition was accompanied by an asymmetric decrease in SRP9/14 binding to dimeric Alu RNA and increased expression of small cytoplasmic Alu RNA. Mol Cell Biol 1997,17:1144-1151.

26. Kuersten S, Ohno M, Mattaj IW: Nucleocytoplasmic transport: Ran,beta and beyond. Trends Cell Biol 2001, 11:497-503.

27. Dahlberg JE, Lund E: Functions of the GTPase Ran in RNA exportfrom the nucleus. Curr Opin Cell Biol 1998, 10:400-408.

28. Lund E, Dahlberg JE: Proofreading and amino acylation of tRNAs before export from the nucleus. Science 1998,282:2082-2085.

29. Daniels GR, Deininger PL: Repeat sequence families derived frommammalian tRNA genes. Nature 1985, 317:819-822.

30. Sakamoto K, Okada N: Rodent type 2 Alu family, rat identifiersequence, rabbit C family, and bovine or goat 73-bp repeat mayhave evolved from tRNA genes. J Mol Evol 1985, 22:134-140.

31. Voeltz GK, Ongkasuwan J, Standart N, Steitz JA: A novel embryonicpoly(A) binding protein, ePAB, regulates mRNA deadenylation inXenopus egg extracts. Genes Dev 2001, 15:774-788.

32. Liu WM, Maraia RJ, Rubin CM, Schmid CW: Alu transcripts:cytoplasmic localisation and regulation by DNA methylation.Nucleic Acids Res 1994, 22:1087-1095.

33. Gibbs A, Calisher CH, Garcia-Arenal F (eds): Molecular Basis ofVirus Evolution. Proceedings of a Fundacion Juan March Symposium.Cambridge: Cambridge University Press, 1995:603.

34. Malik HS, Burke WD, Eickbush TH: The age and evolution ofnon-LTR retrotransposable elements. Mol Biol Evol 1999,16:793-805.

35. Malik HS, Eickbush TH: Phylogenetic analysis of ribonuclease H •• domains suggests a late, chimeric origin of LTR retrotransposable

elements and retroviruses. Genome Res 2001, 11:1187-1197.The Eickbush group has led the way in using phylogeny to attack a fun-damental problem in the evolution of mobile elements: who came first, andwho stole what from whom? Were new elements assembled from parts ofthe old, or from pre-existing cellular parts; or did mobile elements both stealcellular functions and bequeath new functions to the cell? This paper arguesthat all LINE-like elements share a common ancestral reverse transcriptase(RT) most closely related to the RT of certain mobile group II introns that alsomove through RNA intermediates.

36. Feng Q, Schumann G, Boeke JD: Retrotransposon R1Bmendonuclease cleaves the target sequence. Proc Natl Acad SciUSA 1998, 95:2083-2088.

37. Ullu E, Tschudi C: Alu sequences are processed 7SL RNA genes.Nature 1984, 312:171-172.

38. Goodwin TJ, Poulter RT: The diversity of retrotransposons in theyeast Cryptococcus neoformans. Yeast 2001, 18:865-880.

39. Goodwin TJ, Ormandy JE, Poulter RT: L1-like non-LTRretrotransposons in the yeast Candida albicans. Curr Genet 2001,39:83-91.

40. Kazazian HH Jr: An estimated frequency of endogenous insertionalmutations in humans. Nat Genet 1999, 22:130.

41. Deininger PL, Batzer MA: Alu repeats and human disease. MolGenet Metab 1999, 67:183-193.

42. Craigie R: Hotspots and warm spots: integration specificity ofretroelements. Trends Genet 1992, 8:187-190.

43. Peters G, Harada F, Dahlberg JE, Panet A, Haseltine WA, Baltimore D:Low-molecular-weight RNAs of Moloney murine leukemia virus:identification of the primer for RNA-directed DNA synthesis. J Virol1977, 21:1031-1041.

44. Ikawa Y, Ross J, Leder P: An association between globinmessenger RNA and 60S RNA derived from Friend leukemiavirus. Proc Natl Acad Sci USA 1974, 71:1154-1158.

45. Linial M, Medeiros E, Hayward WS: An avian oncovirus mutant(SE 21Q1b) deficient in genomic RNA: biological and biochemicalcharacterization. Cell 1978, 15:1371-1381.

46. Boissinot S, Entezam A, Furano AV: Selection against deleterious • LINE-1-containing loci in the human lineage. Mol Biol Evol 2001,

18:926-935.This is the clearest evidence to date showing that LINEs can be subject tonegative selection, and are kicked out of genomic regions where they havethe potential to do harm.

47. Aksoy S, Williams S, Chang S, Richards FF: SLACS retrotransposonfrom Trypanosoma brucei gambiense is similar to mammalianLINEs. Nucleic Acids Res 1990, 18:785-792.

48. Malik HS, Eickbush TH: NeSL-1, an ancient lineage of site-specificnon-LTR retrotransposons from Caenorhabditis elegans. Genetics2000, 154:193-203.

49. Perez-Gonzalez CE, Eickbush TH: Dynamics of R1 and R2 elements • in the rDNA locus of Drosophila simulans. Genetics 2001,

158:1557-1567.The mobile retroelements R1 and R2 insert specifically into tandem repeatsof arthropod rRNA genes but, as determined by a new PCR assay, R1 andR2 seldom spread to other chromosomes in the population. This indicatesthat recombinational bias purges new R1 and R2 insertions, and also suggeststhat the actual frequency of R1 and R2 retrotransposition is much higherthan previously thought.

50. Wellauer PK, Dawid IB: The structural organization of ribosomalDNA in Drosophila melanogaster. Cell 1977, 10:193-212.

51. Aye M, Dildine SL, Claypool JA, Jourdain S, Sandmeyer SB:A truncation mutant of the 95-kilodalton subunit of transcriptionfactor IIIC reveals asymmetry in Ty3 integration. Mol Cell Biol2001, 21:7839-7851.

52. Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF: Transposableelements and genome organization: a comprehensive survey ofretrotransposons revealed by the complete Saccharomycescerevisiae genome sequence. Genome Res 1998, 8:464-478.

53. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, •• Smith HO, Yandell M, Evans CA, Holt RA et al.: The sequence of the

human genome. Science 2001, 291:1304-1351.The human genome sequence has fully confirmed the evidence, painstakinglyassembled by the Bernardi group over many years, documenting large-scaleinhomogeneities within the human genome. These unexpected local differencesin base composition, gene density and density of mobile elements result in‘isochores’ — extended chromosomal regions of similar composition andpresumably function — but we still do not know how or why isochores arise.Perhaps clever bioinformatics will suggest some useful experiments to probethe significance and origin of isochores.

54. Pavlicek A, Jabbari K, Paces J, Paces V, Hejnar JV, Bernardi G:Similar integration but different stability of Alus and LINEs in thehuman genome. Gene 2001, 276:39-45.

55. Pavlicek A, Paces J, Clay O, Bernardi G: A compact view of • isochores in the draft human genome sequence. FEBS Lett 2002,

511:165-169.See annotation Venter et al. (2001) [53•• ].

56. Chu WM, Ballard R, Carpick BW, Williams BR, Schmid CW:Potential Alu function: regulation of the activity ofdouble-stranded RNA-activated kinase PKR. Mol Cell Biol 1998,18:58-68.

57. Liu WM, Chu WM, Choudary PV, Schmid CW: Cell stress andtranslational inhibitors transiently increase the abundance ofmammalian SINE transcripts. Nucleic Acids Res 1995,23:1758-1765.

58. Li TH, Schmid CW: Differential stress induction of individual Alu •• loci: implications for transcription and retrotransposition. Gene

2001, 276:135-141.This is the most recent in a series of experiments arguing, against commonwisdom, that SINEs may be useful or even subject to positive selection. It isonly fitting that this provocative hypothesis should emerge from the groupthat discovered Alu elements over two decades ago.

59. Scott PH, Cairns CA, Sutcliffe JE, Alzuherri HM, McLees A, Winter AG,White RJ: Regulation of RNA polymerase III transcription duringcell cycle entry. J Biol Chem 2001, 276:1005-1014.

60. Cost GJ, Golding A, Schlissel MS, Boeke JD: Target DNAchromatinization modulates nicking by L1 endonuclease. NucleicAcids Res 2001, 29:573-577.

61. Xie W, Gai X, Zhu Y, Zappulla DC, Sternglanz R, Voytas DF:Targeting of the yeast Ty5 retrotransposon to silent chromatin ismediated by interactions between integrase and Sir4p. Mol CellBiol 2001, 21:6606-6614.

SINEs and LINEs: the art of biting the hand that feeds you Weiner 349

62. Yu F, Zingler N, Schumann G, Stratling WH: Methyl-CpG-binding • protein 2 represses LINE-1 expression and retrotransposition but

not Alu transcription. Nucleic Acids Res 2001, 29:4493-4501.Using both transient and stable assays for transcription of artificially methylatedDNA, the authors show that methylation may differentially affect SINE andLINE retroposition. This unexpected observation has the potential to explainthe uneven distribution of SINEs and LINEs. See also Greally (2002) [83].

63. Hammond SM, Bernstein E, Beach D, Hannon GJ: An RNA-directednuclease mediates post-transcriptional gene silencing inDrosophila cells. Nature 2000, 404:293-296.

64. Caplen NJ, Parrish S, Imani F, Fire A, Morgan RA: Specific inhibitionof gene expression by small double-stranded RNAs ininvertebrate and vertebrate systems. Proc Natl Acad Sci USA2001, 98:9742-9747.

65. Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC:Potent and specific genetic interference by double-stranded RNAin Caenorhabditis elegans. Nature 1998, 391:806-811.

66. Elbashir SM, Lendeckel W, Tuschl T: RNA interference is mediatedby 21- and 22-nucleotide RNAs. Genes Dev 2001, 15:188-200.

67. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T: Identification •• of novel genes coding for small expressed RNAs. Science 2001,

294:853-858.In many eukaryotes, including humans, double-stranded RNAs areprocessed into ‘micro RNAs’ (miRNAs) that can regulate expression ofcomplementary mRNAs, or mediate an unusual post-transcriptional gene-silencing phenomenon called RNAi (‘RNA interference’). RNAi appears tobe a potent cellular defence mechanism against viruses and DNA transposons,and it may do the same for SINEs and LINEs (see Jensen et al. [1999] [77•• ]).

68. Lau NC, Lim LP, Weinstein EG, Bartel DP: An abundant class of tinyRNAs with probable regulatory roles in Caenorhabditis elegans.Science 2001, 294:858-862.

69. Lee RC, Ambros V: An extensive class of small RNAs inCaenorhabditis elegans. Science 2001, 294:862-864.

70. Tabara H, Sarkissian M, Kelly WG, Fleenor J, Grishok A, Timmons L,Fire A, Mello CC: The rde-1 gene, RNA interference, andtransposon silencing in C. elegans. Cell 1999, 99:123-132.

71. Ketting RF, Haverkamp TH, van Luenen HG, Plasterk RH: Mut-7 ofC. elegans, required for transposon silencing and RNAinterference, is a homolog of Werner syndrome helicase andRNaseD. Cell 1999, 99:133-141.

72. Wu-Scharf D, Jeong B, Zhang C, Cerutti H: Transgene andtransposon silencing in Chlamydomonas reinhardtii by aDEAH-box RNA helicase. Science 2000, 290:1159-1162.

73. Dougherty WG, Lindbo JA, Smith HA, Parks TD, Swaney S,Proebsting WM: RNA-mediated virus resistance in transgenic plants: exploitation of a cellular pathway possiblyinvolved in RNA degradation. Mol Plant Microbe Interact 1994,7:544-552.

74. Mourrain P, Beclin C, Elmayan T, Feuerbach F, Godon C, Morel JB,Jouette D, Lacombe AM, Nikic S, Picault N et al.: ArabidopsisSGS2 and SGS3 genes are required for posttranscriptional gene silencing and natural virus resistance. Cell 2000, 101:533-542.

75. Dalmay T, Horsefield R, Braunstein TH, Baulcombe DC: SDE3encodes an RNA helicase required for post-transcriptional genesilencing in Arabidopsis. EMBO J 2001, 20:2069-2078.

76. Jensen S, Gassama MP, Heidmann T: Cosuppression of Itransposon activity in Drosophila by I-containing sense andantisense transgenes. Genetics 1999, 153:1767-1774.

77. Jensen S, Gassama MP, Heidmann T: Taming of transposable •• elements by homology-dependent gene silencing. Nat Genet

1999, 21:209-212.If SINEs and LINEs give rise to duplex RNA, as might be expected if theseretroelements are subject to readthrough transcription from external promoters,RNA interference may be responsible for preventing sudden explosivegrowth of SINE and LINE families.

78. Brosius J: Retroposons — seeds of evolution. Science 1991,251:753.

79. Moran JV, DeBerardinis RJ, Kazazian HH Jr: Exon shuffling byL1 retrotransposition. Science 1999, 283:1530-1534.

80. Goodier JL, Ostertag EM, Kazazian HH Jr: Transduction of3′′-flanking sequences is common in L1 retrotransposition. HumMol Genet 2000, 9:653-657.

81. Ferrigno O, Virolle T, Djabari Z, Ortonne JP, White RJ, Aberdam D: • Transposable B2 SINE elements can provide mobile RNA

polymerase II promoters. Nat Genet 2001, 28:77-81.As mobile elements, SINEs and LINEs have the potential to destroy (byinsertional mutagenesis), to create (by generating functional retropseudo-genes), and to empower (by giving old genes new promoters or regulatorysignals as described in this novel story). Thus, in genomic evolution, as in life,mayhem is sometimes useful.

82. Orgel LE, Crick FH: Selfish DNA: the ultimate parasite. Nature1980, 284:604-607.

83. Greally JM. Short interspersed transposable elements (SINEs) areexcluded from imprinted regions in the human genome. Proc NatlAcad Sci USA 2002, 99:327-332.

350 Nuclear and gene expression