Identification of a recently evolved goat embryonic beta-globin ...

8
Vol. 4, No. 10 MOLECULAR AND CELLULAR BIOLOGY, Oct. 1984, p. 2120-2127 0270-7306/84/102120-08$02.00/0 Copyright © 1984, American Society for Microbiology Identification of a Recently Evolved Goat Embryonic ,B-Globin Pseudogene Which Retains Transcriptional Activity in Vitro STEVEN G. SHAPIRO AND JERRY B. LINGREL* Department of Microbiology and Molecular Genetics, University of Cincinnati College of Medicine, Cincinnati, Ohio 45267-0524 Received 26 March 1984/Accepted 11 July 1984 A clone containing the entire goat ev 3-globin gene, which lies downstream from the two tandemly duplicated four-gene sets containing the jIc and PA genes in the linkage group S- I-EII-04C-E I- IV-+Z. £v-3', was isolated, and the sequence of the gene was determined. ev is most homologous to the first gene in each of these sets, ei and Om, and appears to be a third duplicated copy of these genes, possibly the first gene in a third four-gene set. Homology of ev to eI is very high (93.2%) in coding regions, and all transcription, processing, and potential translation consensus sequence elements appear to be present, although the Hogness box of ev is altered compared with that of by the deletion of an A(AATAAAA -*AATAAA). Nevertheless, £v is clearly a pseudogene as a result of two deletions and one insertion (or insertion-deletion) in its coding sequence, the first of which produces an in-frame stop codon at amino acid 54. Unlike the more highly mutated goat "-like pseudogene duplicates *,px and *pz, ev acquired its defects after the duplication event in which it was created. Its recently acquired defects have left the ev promoter sufficiently conserved to retain transcriptional activity in vitro. The acquisition of defects by this gene may be related to the multiple gene duplications which have created at least five e type genes in the goat P-globin locus. The genes encoding the developmentally regulated P- globins of mammalian species are arranged in closely linked clusters which differ in complexity in the different species (14, 23, 26, 42). The simplest cluster described thus far is that of rabbits, which consists of four genes arranged in the following order: 5'-embryonic-embryonic-pseudoadult- adult-3' (18-20, 26, 27). In humans (1, 28, 39, 41), mice (17, 25), and goats (37, 38), the functional P-globin genes which are most homologous to these rabbit genes are located in analogous positions in their respective loci after allowances are made for recent single-gene duplications, gene conver- sions, and block duplications (19, 38). Homology across species between analogous genes is generally higher than that between genes in the same cluster, indicating that precursors to the analogous genes existed before the species diverged. The set of these precursors therefore describes a mammalian progenitor locus which must have appeared substantially like the current rabbit locus. In rabbits, mice, and humans, temporal developmental expression of the P-globin genes occurs roughly in the same order in which the genes are aligned on the chromosome. This pattern is uniformly interrupted in all these loci by internal unexpressed pseudogenes in essentially analogous positions, i.e., between the adult gene(s) and the nearest upstream nonadult gene(s) (22, 23, 27). Since sequence analysis suggests that these pseudogenes arose indepen- dently after duplications of different genes, their prevalence at this particular position is as yet unexplained. The basic unit of the goat ,B-globin locus is a four-gene set very similar to that of rabbits (38), which is also interrupted by unexpressed pseudogenes (5, 6). However, the goat locus, unlike those of mice, humans, and rabbits, in which only single genes have been duplicated, has undergone expansion by a mechanism of block duplication (6). We have * Corresponding author. to date linked nine genes in this cluster, consisting of two block duplicates of the rabbit-like four-gene set plus a downstream embryonic globin homologous structure (42; see Fig. 1). Chromosomal alignment in order of expression has been maintained for each of the four-gene sets, but not overall. The block duplication resulted in the creation of two adult-type , genes, occupying the fourth position in each set, which evolved into the juvenile and adult expressed genes (6, 37, 42). It also resulted in the duplication of an adult ,-type pseudogene occupying the third position, a typical internal position, in the four-gene cluster. The first two positions in each cluster are occupied by embryonic type genes, two of which (r' and e"I) have been completely sequenced and contain all the structural features of func- tional globin genes (38), thus giving goats an abundance of e genes, more than any other mammal that has been charac- terized. We now report the sequence of the gene downstream from the two goat four-gene clusters. This gene appears to be another highly homologous duplicate of the first position rI gene. It has, however, recently become defective and is a clearly demonstrable example of an embryonic P-globin pseudogene. The characteristics of this pseudogene and the process which may have led to its formation are presented. MATERIALS AND METHODS Isolation of the complete Fv gene. Clone 30, containing the Ev gene, was obtained by the method of Benton and Davis (2) from a partial MboI goat genomic library cloned in X Charon 28 and propagated in Escherichia coli DP50 SupF. A mixture of two probes, subcloned into pBR322, was used for the plaque purification hybridizations. One was composed of the EcoRI-BamHI fragment containing the first two exons and unique 5'-flanking sequence of goat rI, and the other was a unique sequence of ca. 400 base pairs extending upstream from position -13 of *W3z. These were nick translated (35) to a specific activity in excess of 108 cpm/,g, and equivalent counts per minute of each were used in the screenings. 2120

Transcript of Identification of a recently evolved goat embryonic beta-globin ...

Page 1: Identification of a recently evolved goat embryonic beta-globin ...

Vol. 4, No. 10MOLECULAR AND CELLULAR BIOLOGY, Oct. 1984, p. 2120-21270270-7306/84/102120-08$02.00/0Copyright © 1984, American Society for Microbiology

Identification of a Recently Evolved Goat Embryonic ,B-GlobinPseudogene Which Retains Transcriptional Activity in Vitro

STEVEN G. SHAPIRO AND JERRY B. LINGREL*

Department of Microbiology and Molecular Genetics, University of Cincinnati College of Medicine, Cincinnati, Ohio45267-0524

Received 26 March 1984/Accepted 11 July 1984

A clone containing the entire goat ev 3-globin gene, which lies downstream from the two tandemlyduplicated four-gene sets containing the jIc and PA genes in the linkage group S- I-EII-04C-E I- IV-+Z.

£v-3', was isolated, and the sequence of the gene was determined. ev is most homologous to the first genein each of these sets, ei and Om, and appears to be a third duplicated copy of these genes, possibly the first genein a third four-gene set. Homology of ev to eI is very high (93.2%) in coding regions, and all transcription,processing, and potential translation consensus sequence elements appear to be present, although the Hognessbox of ev is altered compared with that of by the deletion of an A(AATAAAA -*AATAAA). Nevertheless,£v is clearly a pseudogene as a result of two deletions and one insertion (or insertion-deletion) in its codingsequence, the first of which produces an in-frame stop codon at amino acid 54. Unlike the more highly mutatedgoat "-like pseudogene duplicates *,px and *pz, ev acquired its defects after the duplication event in whichit was created. Its recently acquired defects have left the ev promoter sufficiently conserved to retaintranscriptional activity in vitro. The acquisition of defects by this gene may be related to the multiple geneduplications which have created at least five e type genes in the goat P-globin locus.

The genes encoding the developmentally regulated P-globins of mammalian species are arranged in closely linkedclusters which differ in complexity in the different species(14, 23, 26, 42). The simplest cluster described thus far isthat of rabbits, which consists of four genes arranged in thefollowing order: 5'-embryonic-embryonic-pseudoadult-adult-3' (18-20, 26, 27). In humans (1, 28, 39, 41), mice (17,25), and goats (37, 38), the functional P-globin genes whichare most homologous to these rabbit genes are located inanalogous positions in their respective loci after allowancesare made for recent single-gene duplications, gene conver-sions, and block duplications (19, 38). Homology acrossspecies between analogous genes is generally higher thanthat between genes in the same cluster, indicating thatprecursors to the analogous genes existed before the speciesdiverged. The set of these precursors therefore describes amammalian progenitor locus which must have appearedsubstantially like the current rabbit locus.

In rabbits, mice, and humans, temporal developmentalexpression of the P-globin genes occurs roughly in the sameorder in which the genes are aligned on the chromosome.This pattern is uniformly interrupted in all these loci byinternal unexpressed pseudogenes in essentially analogouspositions, i.e., between the adult gene(s) and the nearestupstream nonadult gene(s) (22, 23, 27). Since sequenceanalysis suggests that these pseudogenes arose indepen-dently after duplications of different genes, their prevalenceat this particular position is as yet unexplained.The basic unit of the goat ,B-globin locus is a four-gene set

very similar to that of rabbits (38), which is also interruptedby unexpressed pseudogenes (5, 6). However, the goatlocus, unlike those of mice, humans, and rabbits, in whichonly single genes have been duplicated, has undergoneexpansion by a mechanism of block duplication (6). We have

* Corresponding author.

to date linked nine genes in this cluster, consisting of twoblock duplicates of the rabbit-like four-gene set plus adownstream embryonic globin homologous structure (42;see Fig. 1). Chromosomal alignment in order of expressionhas been maintained for each of the four-gene sets, but notoverall. The block duplication resulted in the creation of twoadult-type , genes, occupying the fourth position in eachset, which evolved into the juvenile and adult expressedgenes (6, 37, 42). It also resulted in the duplication of anadult ,-type pseudogene occupying the third position, atypical internal position, in the four-gene cluster. The firsttwo positions in each cluster are occupied by embryonictype genes, two of which (r' and e"I) have been completelysequenced and contain all the structural features of func-tional globin genes (38), thus giving goats an abundance of egenes, more than any other mammal that has been charac-terized.We now report the sequence of the gene downstream from

the two goat four-gene clusters. This gene appears to beanother highly homologous duplicate of the first position rIgene. It has, however, recently become defective and is aclearly demonstrable example of an embryonic P-globinpseudogene. The characteristics of this pseudogene and theprocess which may have led to its formation are presented.

MATERIALS AND METHODSIsolation of the complete Fv gene. Clone 30, containing the

Ev gene, was obtained by the method of Benton and Davis (2)from a partial MboI goat genomic library cloned in X Charon28 and propagated in Escherichia coli DP50 SupF. A mixtureof two probes, subcloned into pBR322, was used for theplaque purification hybridizations. One was composed of theEcoRI-BamHI fragment containing the first two exons andunique 5'-flanking sequence of goat rI, and the other was aunique sequence of ca. 400 base pairs extending upstreamfrom position -13 of *W3z. These were nick translated (35) toa specific activity in excess of 108 cpm/,g, and equivalentcounts per minute of each were used in the screenings.

2120

Page 2: Identification of a recently evolved goat embryonic beta-globin ...

GOAT EMBRYONIC I-GLOBIN PSEUDOGENE 2121

' 6"IIIx1C CIII E"IV Z OA ev

CL 20CL30

FIG. 1. Current set of linked and unlinked goat 3-globin genes.The region of clone 30 containing cv was sequenced according to thestrategy shown below the gene. The complete map for restrictionsites relevant to the sequencing is shown. The solid boxes representregions of Cv which correspond to globin gene coding sequence(exons). The open boxes between the exons are the interveningsequences (IVSs or introns). The second exon is 222 bases long.

Purified hybridizing plaques were isolated, and the phagewere grown by standard techniques (3). Restriction mappingand identification of the globin-hybridizing regions of clone30 were performed as previously described (21). Small 6vgene fragments for use as hybridization probes, in sequenc-ing, and for in vitro transcriptions were subcloned intoplasmid pUC8 (43) and grown in saturated cultures of E. coliJM83. Plasmid DNA was prepared by the alkaline lysismethod as described by Maniatis et al. (30).Genomic southern blots. Total goat genomic DNA was

digested to completion with EcoRI or BamHI, electropho-resed on a 0.5% agarose gel, and transferred to nitrocellulosepaper as described by Southern (40). The blots were hybrid-ized as previously described (24) with nick-translated 6vprobe in pUC8 (see below), which was prepared as aboveexcept for omitting the CsCl2 gradient steps. Instead, RNAwas degraded by hydrolysis in 0.2 M NaOH at 90°C for 5 min(10). The DNA was then neutralized by the addition of anequimolar amount of HCI and separated from the hydro-lyzed RNA by passage over a Sephadex G-50 column. Afterethanol precipitation, the plasmid was ready for nick trans-lation.DNA sequencing. All DNA fragments, whether prepared

from the phage clone 30 or subclones in pUC8, were endlabeled at the 3' termini with appropriate [a-32P]dNTPs withthe Klenow fragment of E. coli polymerase I, followed byrestriction digestion at internal sites. DNA sequencing wasperformed by the procedure of Maxam and Gilbert (31)except that the acid depurination reaction was performed inunbuffered formic acid as described by Maniatis et al. (30).

Transcriptions in vitro. The template for transcription invitro for the 6v promoter was made by subcloning the 2.5-kilobase (kb) BamHI-EcoRI restriction fragment which con-tains ca. 2 kb of 5'-flanking region and extends to +580 in theEv pseudogene into EcoRI- and BamHI-digested pUC8. Thegoat E6 and i,4x constructions in pBR322 and their transcrip-tional activity in vitro have been described previously (29).Transcriptions were done by using nuclear extracts whichwere a gift from D. S. Luse and were prepared by theprocedure of Coppola et al. (7). All transcriptions shownwere performed as described previously (29), incorporating[a-32P]GTP into the transcripts. RNA synthesized in vitrowas electrophoresed on 5% polyacrylamide denaturing gels(7 M urea) and visualized by autoradiography.

RESULTSMost of the Ev gene was obtained previously in a clone

(no. 20) isolated from a partial EcoRI goat genomic library inA Charon 4A (42). However, we have determined that thisclone contains only the 5' end of the Ev gene up to the codonfor amino acid 122, at which point the clone ends. We havenow obtained the entire ev gene in a clone (no. 30; Fig. 1)from a partial goat MboI library in A Charon 28. Verificationof the natural environment of the 6v gene in this clone wasmade by genomic Southern analysis (24, 40). A small, uniqueHaeIII-EcoRI fragment containing IVS-2 (second interven-ing sequence) and some third coding block sequence of evwas prepared from the 1.0-kb EcoRI fragment located in the6-kb BamHI fragment of the phage clone (Fig. 2a) andsubcloned into pUC8. This probe was hybridized to totalgenomic DNA which had been digested with either BamHIor EcoRI. The results are shown in Fig. 2b. The probehybridizes only to the appropriate fragments containing theEl and EI11 genes (38, 42), which are highly homologous to EV(see below), and to EcoRI and BamHI fragments of the samesize as those in clone 30. The results of the BamHI genomichybridizations demonstrate the presence of natural flankingsequences for at least 2 kb on either side of the Ev gene inthis clone.

Sequence of goat Ev. The complete sequence of goat 6v,including near 5'- and 3'-flanking regions, was obtained bythe chemical degradation method ofMaxam and Gilbert (31).The two coding region deletions described below were

a

B E E B

0 2 4 6 Kb

bEco Barn

4

(c)7. r 9.4(

(")-6.0(7-)* 4

_-4.2(£c)

(c')2.9--

(eV)1.0-

FIG. 2. (a) EcoRl and BamHI map of the region of clone 30containing the EC pseudogene. The fragment used to probe thegenomic DNA in (b) is indicated by a thick line over the gene. (b)Southern hybridization of total genomic DNA. Goat DNA (10 ,ug)was digested with EcoRI or BamHI and hybridized with the nick-translated Ev probe shown in (a). Sizes of the bands are in kb.

VOL. 4, 1984

l1

Page 3: Identification of a recently evolved goat embryonic beta-globin ...

2122 SHAPIRO AND LINGREL

TABLE 1. Evolutionary comparisons of the coding sequence ofgoat aV

Corrected % divergencebGene % Nucleotide

homology Replacement Silentsites sites

Goat a' 93.2 4.6 23.2E1I 77.8 15.7 75.8PC 74.4 22.2 70.5

Human a 84.6 9.5 58.0G'f 79.6 14.7 76.0,B 74.6 18.0 82.8

a Each insertion or deletion in av was scored as one replacement sitechange.

b Analysis was performed by the method of Perler et al. (32).

sequenced on both strands for verification. The sequence(Fig. 3) describes a typical mammalian globin gene withthree exons and two introns. Transcription and processingsignals appear to conform to those of functional globingenes. A CCAAT sequence (12) occurs at position - 85, andan ATA box sequence (M. Goldberg Ph.D. dissertation,Stanford University, Stanford, Calif., 1979) is present atposition -31. The ATA box is slightly altered in goat evrelative to goat Ea and has the sequence AATAAA ratherthan AATAAAA. An intact cap sequence occurs at aposition analogous to the transcription initiation site of otherglobin genes and is designated + 1 in Fig. 3. All splicejunction sequences follow the rule of GT/AG (4), and apolyadenylation signal sequence, AATAAA (33), occurs 92bases downstream from the normal globin translation ter-mination site.

Analysis of sequence homology within the goat globinlocus indicates that this gene is most homologous to the firstposition e' gene in the first four-gene set (Table 1) (38). It isadditionally highly homologous to the available sequence ofthe eamI gene (42) in the second four-gene set (data notshown). Homology to the sequenced genes of the human 1-globin locus confirm the orthology with first-position a genes(Table 1). In addition, goat Ev contains, in an identicallocation in its large intervening sequence (underlined in Fig.3), a copy of the inserted ungulate repetitive element presentin goat at IVS-2 (38), and is therefore undoubtedly a dupli-cated copy of the El and Ell genes.Goat ev is a pseudogene. Although Ev is highly homolo-

gous to aI, it cannot encode a functional globin protein due totwo deletion mutations and one insertion mutation in thecoding region. A 1-base deletion in codon 38 produces aframe shift, bringing into phase a termination codon fromamino acids 54 and 55 (double underline in Fig. 3). A second4-base deletion encompassing codon 112 well downstream inexon III also produces a frame shift, as well as deleting anamino acid. A fairly complex insertion or insertion-deletionafter codon position 135 resulted finally in the insertion of 5bases. The first mutation (codon 38) alone would be suffi-cient to prevent translation of an mRNA transcribed fromthis gene, but the presence of additional mutations indicatesloss of selection after the initial, albeit indistinguishable,inactivation event.When did Fv become defective? The homology values

obtained when ev is compared with ai are very high in bothcoding (93.2%) and noncoding (IVS 87.3%) regions, suggest-ing that the duplication of these genes occurred recently. Inspite of the fact that ev is defective, it is more homologous toat than is any non-goat Ea orthologous gene (Table 1) (38).

Obviously, since Ea is not defective, Ev must have become apseudogene after duplication. We attempted to analyze theacquisition of mutations by Ev for evidence of evolution ofthis gene under selective pressure subsequent to divergencefrom Ea. Upon cursory examination, av appears to havediverged from aI as a functional gene for nearly all of itsexistence because of the low ratio of replacement to silentsite substitutions between them, <1 (random would be about3:1). However, this type of analysis is susceptible to errorswhich would result from an uneven distribution of mutationsbetween the two genes being compared. In this case, ananalysis of each replacement site substitution between al andav indicates that every assignable mutation (of which therewere 11 of a total of 12 mutations) occurred in aV. Theanalysis (38) was made by comparison to an appropriatereference gene, in this case the human a gene, whoseevolutionary position is known and close to the genes inquestion. Whenever the nucleotide at a replacement site inone gene, Ea or av, was the same as that in the human a gene,then the mutation was assumed to have occurred in the othergene. When all three were different, two mutations occurredat the site, and it was therefore unassignable and discardedfrom the analysis. It can be calculated that double mutationsto the same base at a single site occur with about one-half thefrequency of those that produce three different bases, andthey therefore do not contribute significantly to the analysisin this case.The same analysis was done, also relative to the human a

gene, for silent site changes between goat al and av. Of 14assignable sites (of 15 silent site differences), 9 occurred inav and 5 occurred in Ea. The ratio of replacement/silent sitechanges for each gene (Table 2) was therefore 0 for ea and1.22 for av. Since a fairly low value for this ratio, approach-ing 0.5, is expected for embryonic genes (38), av appears tohave been defective for a substantial period of time. Thisconclusion is supported by the particular sites at which avreplacement mutations occurred. Of the 11 changes in aV, 1was at a heme contact site, and 3 were at interchain contactsites (as defined by Eaton [11]). The percentages for these(Table 2) show no evidence of constraint on av againstmutations at these sites, as would occur in functional genes.

Transcription in vitro of goat Jv. A restriction fragmentcontaining the 5' end of Ev beginning ca. 2 kb upstream at theBamHI site and extending to the EcoRI site 100 base pairsinto IVS-2 (see map, Fig. 2a) was subcloned into EcoRI- andBamHI-digested pUC8 for use as a template for run-offtranscription (Fig. 4). This plasmid was digested with EcoRI(or with other restriction enzymes [data not shown]) so thataccurate initiation would produce a truncated transcript ofdefined length (580 bases for EcoRI-digested plasmid).

TABLE 2. Assignable mutations fixed in a1 and EV% Sites with

No. of silent No. of Replacement/ replacementGene chant replacement silent mutationa:changes changes changes

HC IC Other

av 9 11 1.22 5% 9.1% 7.9%I1 5 0 0.00a HC, Heme contact; IC, interchain contact; Other, nonfunctional + Bohr

effect (no Bohr effects are detectable for embryonic hemoglobin [13]; 2, 3-diphosphoglycerate effects are unknown) as defined for adult 3-globin genesby Eaton (11). Deletions and insertions in aV were not considered in theanalysis. Only the first replacement change was scored at amino acid 135because the other is a half-replacement site (i.e., caused by the first site).

MOL. CELL. BIOL.

Page 4: Identification of a recently evolved goat embryonic beta-globin ...

GOAT EMBRYONIC 3-GLOBIN PSEUDOGENE 2123

Restriction-digested pUC8 alone and BamHI-digested goat

FE subclone (470-base transcript) were used as controls. Theresults of these transcriptions indicate that promoter com-

ponents are sufficiently conserved in this pseudogene toallow accurate initiation of transcription and elongation invitro, giving a transcript of 580 bases (Fig. 4a, lanes 1 and 7).The transcription of rv was inhibited by 0.5 ,ug of a-amanitinper ml (Fig. 4a, lanes 2 and 8), which verifies that this is a

polymerase II-catalyzed synthesis. An additional polymer-ase II transcript was initiated within the plasmid vector atapproximately the site described in Fig. 4b, giving theadditional a-amanitin-sensitive band at ca. 1,100 bases in theEv transcription (Fig. 4a, lanes 1 and 7). These are the onlyax-amanitin-sensitive bands present in the pUC8 controltranscriptions (Fig. 4a, lanes 3 and 5). Goat 4Px, an adult ,B-

like pseudogene which has been tested previously as givingno transcript in vitro (29), was tested side by side with Ev toensure that no transcript was visible from it under theconditions in which the weak transcript from rv was ob-served. Correct initiation of q43x would have produced a

transcript 406 bases long. As expected, no transcript was

produced from the highly mutated ,pix promoter (Fig. 4a,lane 11), even after very long exposure (data not shown).Two bands did appear in the 1Ix lane, but do not representtranscripts of this gene. One is a transcript of ca. 185 baseswhich is a polymerase III product, as indicated by itsinsensitivity to 0.5,ug of a-amanitin per ml (Fig. 4a, lane 12).The other band, which is very faint and migrates just belowthe 470 base marker, is also present in the amanitin-lackinglanes 7 and 9 of Fig. 4a and is therefore a minor transcriptionsystem artifact.

DISCUSSION

Based on sequence homology and divergence analysis, Evis by far most closely related to the first-position genes, I

andE , of the two linked upstream four-gene subsets of the

goat 3-globin gene cluster (see Fig. 1). This, coupled withthe fact that both r' and v contain the same ungulateinserted element in identical locations in their large IVSs (e"'ihas not yet been sequenced in this region), clearly points tothe conclusion that these genes were produced by a recentduplication of the goat ancestral first-position r gene alreadycontaining this repeat sequence. The ancestral gene must, ofcourse, have acquired the inserted element after divergencefrom humans, rabbits, and mice, which do not have thiselement in their respective orthologous genes E, P4, and ry2(1, 17, 19). From the location of Ev, it appears that it isprobably the first gene of a third block duplication unitsimilar to the four-gene units containing the pA and pc genes

which could terminate with the p-homologous -y gene.

The current identified goat p-globin genes include fivegenes which could, based on sequence, be embryonic.Genes cl and El' have been completely sequenced (38),appear to be nondefective in structural, noncoding, andflanking regions, and should be able to provide sufficientembryonic p-globins to support life. The expression ofadditional embryonic p-globin genes may be unnecessary ingoats and may afford no selective advantage. To the con-

trary, thea to p chain unbalance which could result fromexpression of all the p-embryonic genes might be deleteri-ous, causing conditions similar to the a-thalassemias. SinceEv undoubtedly cannot be expressed at the protein level, andthe partial sequence of cill shows an alteration in theCCAAT conserved promoter sequence which could result indecreased expression of that gene (42), it may be that we are

observing an elimination from the goat 3-globin locus ofunnecessary (and therefore unselected) or perhaps evendeleterious genes.An alternative explanation for the demise of these E genes

is that, as a result of either the duplication process or thefinal chromosomal environment of the genes, some of themwere unable to be expressed after the duplication events.Specific possibilities are (i) the lack of inclusion in theduplication unit of sequences required for expression of thegenes or (ii) the placement of some of the duplicated E genesin chromosomal environments which are inaccessible orinhibitory for transcription. Supporting these possibilities isthe observation that it may be that only the first two e genesin the goat p-globin locus, El and E", have maintained theirexpressibility after the duplications. These two genes arelocated in the same 5'-proximal position as all other ex-pressed mammalian embryonic p-globin genes.The possible advantage of the 5'-terminal position for

expression could result from proximity to upstream regu-latory elements which are required for expression but whichhave not been included in the other duplicated copies of thefour-gene set. Our previously reported finding of unexpect-edly high sequence homology well upstream from goat El andhuman E (38) could reflect the maintenance of regulatoryelements in this region. Although the exact 5' extent of thishomology is not known, cross-hybridization which maps far(7 to 13 kb) upstream from the human Eand goat ' genes (Q.Li, P. Powers, and 0. Smithies, personal communication)has been observed, although the nature of these sequences isstill under investigation. The related model, namely, thatinternal positions are inaccessible for transcription (in closedchromosomal conformation), appears less likely because it isnot supported by general DNase sensitivity data, whichindicate that the entire goat locus is sensitive regardless ofwhich genes are being expressed. Furthermore, bothEv andEIII are adjacent to internal expressed genes, pA and pc,respectively, indicating that if the environment of the em-bryonic genes is inhibitory to expression, this environmentdoes not extend very far upstream from these genes.The ratio of replacement/silent site changes of 1.22 for Ev

is intermediate between that anticipated for a functionalmammalian embryonic P-globin gene (0.5) and the theoreti-cal value for an unselected P-globin gene sequence (ca. 3.0).This intermediate value suggests a priori a history for Evcomposed of a period of functional evolution followed by aperiod of nonfunctional evolution. The degree of confidencein this conclusion cannot be high, however, mainly due tothe small number of mutations the gene has acquired duringits short period of independent evolution. In addition, thetheoretical values predicted for the rate of accumulation ofreplacement and silent site mutations for unselected globingene sequences do not always hold. For example, the goatq,px and *z pseudogenes, which were pseudogenes at timeof duplication (6), accumulated replacement and silent mu-tations at a ratio of only 2.0 after the duplication. It ispossible, therefore, that Ev was in excess or unexpressible(and consequently unselected) from its creation, in whichcase its fixation in goat population would have to be attrib-uted to an overall selective advantage in maintaining anotherpart of its duplication unit, possibly the y gene ancestor.Alternatively,Ev may have been selected itself for a periodof time and then rendered superfluous or unexpressible by asubsequent event, such as the very recent block duplicationproducing the pA and Pc four-gene sets. An analysis of thesequence of the orthologous gene in cows could provide datarelevant to this question.

VOL. 4, 1984

Page 5: Identification of a recently evolved goat embryonic beta-globin ...

2124 SHAPIRO AND LINGREL MOL. CELL. BIOL.

+ :RHa E S M M§2ga Og g ~~~~~~-oo YHH i 3

YN§bM O: ^ X e @ g1 M1 RNDI >

WhN ~~~~~~~r4 oN0 g1§1 gV

r- oC+ r- FEF|MXt3E * M f ~Ei oc 0 * ,Fi0|1|-4 5

.131 | |g 1V

|t 8 § ~ 8y § | | ir-0 = E-

0~~~~~~~~~~F4 (38o||

I,Iyz 8 g g N| X E M a u z ut

)̂D%S CEE n C E g M S1 R g | E @R@R~~~~~~~~~~~~~~~~~~0E-

- - .- -9 .- - - - s-.4 %J " U

Page 6: Identification of a recently evolved goat embryonic beta-globin ...

GOAT EMBRYONIC ,B-GLOBIN PSEUDOGENE 2125

- 854

--- 570 '-

470

- 265 -

7 8 9 10 11 12

Ev E

HBE H

FIG. 4. (a) Transcriptions in vitro of goat Ev. Samples (1 ,ug) ofrestriction-digested subclones of various ,-globin genes or plasmidalone were transcribed, and the transcripts were prepared as de-scribed in the text. Lanes 1 and 7, EcoRI-digested Ev subclone; lane3, EcoRI-digested pUC8; lane 5, HindIII-digested pUC8; lane 9,BamHI-digested El subclone; lane 11, Sall-digested ij3l subclone.Each even-numbered lane was identical to the preceding odd-numbered lane except that it additionally contained 0.5 ,ug ofa-amanitin per ml. Size markers (in bases) were three run-offtranscripts of known size generated from the adenovirus 2 major latepromoter and the El BamHI 470-base transcript. (b) Sites of initia-tion of in vitro transcripts from pUC8 clones. Thin line, pUC8sequence; thick line, goat sequence. Arrows indicate length anddirection of transcripts from Ev subclone in pUC8 digested withEcoRI (top), EcoRI-digested pUC8 (middle), and HindIII-digestedpUC8 (bottom). H, B, and E represent the locations of HindIll,BamHI, and EcoRI sites in these plasmids.

Although Lv is a pseudogene, its 5'-flanking region ishighly conserved relative to other, nondefective e genes,maintaining globin gene consensus structural features whichhave been implicated in expression. Although no globingene promoters have been systematically studied, the com-ponents of the promoters of adult mammalian P-globin geneshave been studied in some detail. Maximum and faithful

expression of the rabbit P gene transfected into tissueculture cells depends on the integrity of several 5' sequenceelements (8, 9, 15), including the ATA or Hogness-Goldbergsequence at - 30, the globin consensus sequence CCAAT at-75, and an upstream - 100 short repeated sequence (8).Although the two upstream components appear to be re-quired only to obtain the maximum level of expression, theATA sequence appears to contribute to maximum expres-sion as well as specifying the site of initiation at a locationca. 30 bases downstream (8, 15). In contrast, transcription ofglobin genes in currently available in vitro systems is de-pendent only upon the presence of the ATA box for bothmaximum expression and accurate initiation (16). Althoughother components of the natural promoter are not recog-nized, these systems can be useful in evaluating the functionof the ATA region sequence.The ATA box of goat Lv is altered compared with El by a

1-base deletion of the core sequence from AATAAAA (alsopresent in the human E gene [1]) to AATAAA. Although thissequence has never been previously found per se as an entireATA core, we have found that it is still recognized in vitroand directs accurate transcription initiation at or near thesite of initiation of functional £ genes (29, 34). The transcrip-tion of Ev appears to be reduced, however, in comparisonwith E0, suggesting that this Hogness-Goldberg sequence isless efficient in initiation. It is interesting in this regard thatthe Ev AATAAA sequence is also the consensus polyad-enylation cleavage signal sequence ofpol II genes (33). Sincethis sequence occurs in the 3' untranslated region of mostpol II genes, its efficient recognition by polymerase II couldsignificantly reduce the concentration of this enzyme whichis available for authentic promoters. Polymerase II couldtherefore have evolved so that it does not recognize thissequence well. A second possible reason for the decreasedtranscripton of Ev is that the alteration in its ATA box wasthe result of a 1-base deletion. This mutation causes arepositioning of the ATA sequence one base closer to thestandard cap site, which might have an effect on initiationefficiency at this site.

In addition to the alteration in the ATA box of Ev, thereare several changes between El and Lv immediately down-stream in the region between the ATA box and the cap site.Although there is no data to indicate that the DNA in thisregion can affect the template efficiency of P-globin genes invitro, we cannot rule out the possibility that these changescontribute to the reduced transcription of Ev. It is alsoimportant to note that the Lv gene was cloned into a closelyrelated but different plasmid for the transcriptions than wasEIor t4,p (pUC8 versus pBR322). Since the plasmids alsocontain ATA sequence homologous structures that are rec-ognized by pol II, as has been described by others (36), thiscould result in slightly different competitive effects due tothese adventitious in vitro promoters. However, the initia-tion site(s) of transcripts starting within the plasmid vectorcontaining Lv are also contained within the E0 and *P1Xplasmid DNA, so this possibility is considered unlikely.

FIG. 3. Complete sequence of the goat EV pseudogene showing differences from the goat E' gene. The strand which is homologous to themRNA is shown. Dashes have been introduced into the Ev coding sequence and flanking regions (but not IVSs) to maintain colinearity withgoat L'. Amino acids encoded by Ev are shown immediately above the sequence. Where goat El encoded amino acids are different from thoseof Ev or where there is no amino acid encoded by Ev, the El amino acid is shown two lines above the sequence. Silent and replacement sitedifferences between El and Ev are indicated under the sequence by a (+) and an (*), respectively. Amino acid numbering is in parenthesesabove the sequence. Nucleotides in the 5'-flanking region are numbered above the sequence based on the undeleted sequence of goat l'.

Consensus transcription and processing signals and the embryonic - 114 conserved sequence (38) are outlined. The first in-frame stop isindicated by a double underline. The IVS-2 inserted repetitive element, including both 13-base flanking direct repeats (38), is underlined.

a

+,~~~~~~~~. .

d.- ..

'a . ..

I

. .v .

.~~~~~~~~~A a

I

1234561 2 3 4 5 6

b

E HB

E HBE

VOL. 4, 1984

w -

0 .

0

..0

0

* 0

qw

Page 7: Identification of a recently evolved goat embryonic beta-globin ...

2126 SHAPIRO AND LINGREL

The unusual pattern of expansion of the goat ,-globincluster through multigene block duplication events has pro-duced a novel set of circumstances for the creation ofpseudogenes. Under these conditions, at least one embry-onic pseudogene has recently arisen in goats which has nocounterpart in other mammals and which retains limitedtranscriptional activity. Analysis of the structure and func-tion of newly formed pseudogenes of this type should behelpful both in defining the nature of the elements requiredfor gene expression and in shedding light on the processesleading to pseudogene formation.

ACKNOWLEDGMENTS

We thank Paul Liberator for providing DNase sensitivity data forthe goat locus before publication. We also are indebted to DonalLuse for providing valuable advice and discussion concerning invitro transcription. We thank Mary C. Fitzgerald for her excellenttechnical assistance.

This work was supported by Public Health Service grants HL-15996 and GM-10999 from the National Institutes of Health.

LITERATURE CITED1. Baralle, F. E., C. C. Shoulders, and N. J. Proudfoot. 1980. The

primary structure of the human E-globin gene. Cell 21:621-626.2. Benton, W. D., and R. W. Davis. 1977. Screening Agt recombi-

nant clones by hybridization to single plaques in situ. Science196:180-182.

3. Blattner, F., B. G. Williams, A. E. Blechl, K. D. Thompson,H. R. Faber, L. A. Furlong, D. J. Grunwald, D. 0. Kiefer, D. D.Moore, J. W. Schumm, E. L. Sheldon, and 0. Smithies. 1977.Charon phages: safer derivatives of bacteriophage lambda forDNA cloning. Science 196:161-169.

4. Breathnach, R., C. Benoist, K. O'Hare, F. Gannon, and P.Chambon. 1980. Ovalbumin gene: evidence for a leader se-quence in mRNA and DNA sequences at the exon-intronboundaries. Proc. Natl. Acad. Sci. U.S.A. 75:4853-4857.

5. Cleary, M. L., J. R. Haynes, E. A. Schon, and J. B. Lingrel.1980. Identification by nucleotide sequence analysis of a goatpseudoglobin gene. Nucleic Acids Res. 8:4791-4801.

6. Cleary, M. L., E. A. Schon, and J. B. Lingrel. 1981. Two relatedpseudogenes are the result of a gene duplication in the goat 1globin locus. Cell 26:181-190.

7. Coppola, J. A., A. S. Field, and D. S. Luse. 1983. Promoter-proximal pausing by RNA polymerase II in vitro: transcriptsshorter than 20 nucleotides are not capped. Proc. Natl. Acad.Sci. U.S.A. 80:1251-1255.

8. Dierks, P., A. van Ooyen, M. D. Cochran, C. Dobkin, J. Rusei,and C. Weissmann. 1983. Three regions upstream from the capsite are required for efficient and accurate transcription of therabbit ,B-globin gene in mouse 3T6 cells. Cell 32:695-706.

9. Dierks, P., A. van Ooyen, N. Mantei, and C. Weissmann. 1981.DNA sequences preceding the rabbit 1-globin gene are requiredfor formation in mouse L cells of 1-globin RNA with the correct5' terminus. Proc. Natl. Acad. Sci. U.S.A. 78:1411-1415.

10. Dolan, M., J. B. Dodgson, and J. D. Engel. 1983. Analysis of theadult chicken 13-globin gene. J. Biol. Chem. 258:3983-3990.

11. Eaton, W. A. 1980. The relationship between coding sequencesand function in hemoglobin. Nature (London) 284:183-185.

12. Efstratiadis, A., J. W. Posakony, T. Maniatis, R. M. Lawn, C.O'Connell, R. A. Spritz, J. K. DeRiel, B. G. Forget, S. M.Weissman, J. L. Slightom, A. E. Blechl, 0. Smithies, F. E.Baralle, C. C. Shoulders, and N. J. Proudfoot. 1980. Thestructure and evolution of the human d globin gene family. Cell21:653-668.

13. Fantoni, A., M. G. Farace, and R. Gambari. 1981. Embryonichemoglobins in man and other mammals. Blood 57:623-633.

14. Fritsch, E. F., R. M. Lawn, and T. Maniatis. 1980. Molecularcloning and characterization of the human P-like globin genecluster. Cell 19:959-972.

15. Grosveld, G. C., E. deBoer, C. K. Shewmaker, and R. A. Flavell.1982. DNA sequences necessary for transcription of the rabbit

,B-globin gene in vivo. Nature (London) 295:120-126.16. Grosveld, G. C., C. K. Shewmaker, P. Jat, and R. A. Flavell.

1981. Localization of DNA sequences necessary for transcrip-tion of the rabbit 13-globin gene in vitro. Cell 25:215-226.

17. Hansen, J. N., D. A. Konkel, and P. Leder. 1982. The sequenceof a mouse embryonic 13-globin gene. J. Biol. Chem. 257:1048-1052.

18. Hardison, R. C. 1981. The nucleotide sequence of rabbit em-bryonic globin gene 13. J. Biol. Chem. 256:11780-11786.

19. Hardison, R. C. 1983. The nucleotide sequence of the rabbitembryonic globin gene 134. J. Biol. Chem. 258:8739-8744.

20. Hardison, R. C., E. T. Butler III, E. Lacy, T. Maniatis, N.Rosenthal, and A. Efstratiadis. 1979. The structure and tran-scription of four linked rabbit 1-like globin genes. Cell18:1285-1297.

21. Haynes, J. R., P. Rosteck, Jr., E. A. Schon, P. M. Gallagher,D. J. Burks, K. Smith, and J. B. Lingrel. 1980. The isolation ofPA, 1C and y globin genes and a presumptive embryonic globingene from a goat DNA library. J. Biol. Chem. 255:6355-6367.

22. Jagadeeswaran, P., J. Pan, B. G. Forget, and S. M. Weissman.1983. Sequences of non-a-globin genes in man. Cold SpringHarbor Symp. Quant. Biol. 47:1081-1083.

23. Jahn, C. L., C. L. Hutchison III, S. J. Phillips, S. Weaver, N. L.Haigwood, C. F. Voliva, and M. H. Edgell. 1980. DNA sequenceorganization of the 13 globin complex in the BALB/c mouse. Cell21:159-168.

24. Jeffreys, A. J., and R. A. Flavell. 1977. The rabbit 13-globin genecontains a large insert in the coding sequence. Cell 12:1097-1108.

25. Konkel, D. A., J. V. Maizel, Jr., and P. Leder. 1979. Theevolution and sequence comparison of two recently divergedmouse chromosomal 13 globin genes. Cell 18:865-873.

26. Lacy, E., R. C. Hardison, D. Quon, and T. Maniatis. 1979. Thelinkage arrangement of four rabbit 1-like globin genes. Cell18:1273-1283.

27. Lacy, E., and T. Maniatis. 1980. The nucleotide sequence of arabbit 13 globin pseudogene. Cell 21:545-553.

28. Lawn, R. M., A. Efstratiadis, C. O'Connell, and T. Maniatis.1980. The nucleotide sequence of the human 1 globin gene. Cell21:647-651.

29. Luse, D. S., J. R. Haynes, D. VanLeeuwen, E. A. Schon, M. L.Cleary, S. G. Shapiro, J. B. Lingrel, and R. G. Roeder. 1981.Transcription of the 1-like globin genes and pseudogenes of thegoat in a cell-free system. Nucleic Acids Res. 9:4339-4354.

30. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecularcloning: a laboratory manual. Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y.

31. Maxam, A. M., and W. Gilbert. 1980. Sequencing end-labeledDNA with base-specific chemical cleavages. Methods Enzymol.65:499-560.

32. Perler, F., A. Efstratiadis, P. Lomedico, W. Gilbert, R. Ko-lodner, and J. Dodgson. 1980. The evolution of genes: thechicken preproinsulin gene. Cell 20:555-565.

33. Proudfoot, N. J., and G. G. Brownlee. 1976. 3' non-codingregion sequences in eukaryotic messenger RNA. Nature (Lon-don) 263:211-214.

34. Proudfoot, N. J., M. H. M. Shander, J. L. Manley, M. L. Gefter,and T. Maniatis. 1980. Structure and in vitro transcription ofhuman globin genes. Science 209:1329-1335.

35. Rigby, P. W. J., M. Dieckmann, L. Rhodes, and P. Berg. 1977.Labeling DNA to high specific activity in vitro by nick trans-lation with DNA polymerase I. J. Mol. Biol. 113:237-251.

36. Sassone-Corsi, P., J. Corden, C. Kedinger, and P. Chambon.1981. Promotion of specific in vitro transcription by excised"TATA" box sequences inserted in a foreign nucleotide envi-ronment. Nucleic Acids Res. 9:3941-3958.

37. Schon, E. A., M. L. Cleary, J. R. Haynes, and J. B. Lingrel.1981. Structure and evolution of goat y-, P'-, and ,1A-globingenes: three developmentally regulated genes contain insertedelements. Cell 27:359-369.

38. Shapiro, S. G., E. A. Schon, T. M. Townes, and J. B. Lingrel.1983. Sequence and linkage of the goat 0 and e" 13-globin genes.J. Mol. Biol. 169:31-52.

39. Shen, S., J. L. Slightom, and 0. Smithies. 1981. A history of the

MOL. CELL. BIOL.

Page 8: Identification of a recently evolved goat embryonic beta-globin ...

GOAT EMBRYONIC P-GLOBIN PSEUDOGENE

human fetal globin gene duplication. Cell 26:191-203.40. Southern, E. M. 1975. Detection of specific sequences among

DNA fragments separated by gel electrophoresis. J. Mol. Biol.98:503-517.

41. Spritz, R. A., J. K. DeRiel, B. G. Forget, and S. M. Weissman.

1980. Complete nucleotide sequence of the human &-globingene. Cell 21:639-646.

42. Townes, T. M., S. G. Shapiro, S. M. Wernke, and J. B. Lingrel.1984. Duplication of a four-gene set during the evolution of thegoat 3-globin locus produced genes now expressed differentiallyin development. J. Biol. Chem. 259:1896-1900.

43. Vieira, J., and J. Messing. 1982. The pUC plasmids: an MI3mp7-derived system for insertion mutagenesis and sequencing withsynthetic universal primers. Gene 19:259-268.

VOL. 4, 1984 2127