OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL...
Transcript of OF CHEMISTRY Vol. 266, No. 16, Issue of June 5, PP. 10461 ... · THE JOURNAL OF BIOLOGICAL...
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc.
Vol. 266, No. 16, Issue of June 5 , PP. 10461-10469,1991 Printed in U. S. A.
Homologies between Members of the Germin Gene Family in Hexaploid Wheat and Similarities between These Wheat Germins and Certain Ph ysarum Spherulins”
(Received for publication, November 14, 1990)
Byron G. Lane$, Franpois Bernierp, Ella Dratewka-Kos, Roshan Shafai, Theresa D. Kennedy, Caron Pyne, J. Ronald Munro, Tristan Vaughan, Dawn Walters, and Filiberto Altomare From the Biochemistry Department, University of Toronto, Toronto, Ontario, Canada M5S lA8 and the SDeparternent de Biologie, Facult6 des Sciences et de Genie, Uniuersiti Laval, Ste.-Foy, Qdbec GlK 7P4, Canada
By screening -10’ plaques in a wheat DNA library with a “full-length” germin cDNA probe, two genomic clones were detected. When digested with EcoRI, one clone yielded a 2.8-kilobase pair fragment (gf-2.8) and the other yielded a 3.8-kilobase pair fragment (gf-3.8). By nucleotide sequencing, each of gf-2.8 and gf-3.8 was found to encode a complete sequence for germin and germin mRNA, and to contain appreciable amounts of 5’- and 3”flanking sequences. The ‘cap” site in gf- 2.8 was determined by primer extension and the cor- responding site in gf-3.8 was deduced by analogy. The mRNA coding sequences in gf-2.8 and gf-3.8 are in- tronless and 87% homologous with one another. The 6’-flanking regions in gf-2.8 and gf-3.8 contain rec- ognizable sites of what are probably cis-acting ele- ments but there is otherwise little if any significant similarity between them. In addition to putative TATA and CAAT boxes in the 5”flanking regions of gf-2.8 and gf-3.8, there are AT-rich inverted-repeats, GC boxes, long purine-rich sequences, two 19-base pair direct-repeat sequences in gf-2.8, and a remarkably long (200-base pair) inverted-repeat sequence (-90% homology) in gf-3.8. An 8% difference between the mature-protein coding regions in gf-2.8 and gf-3.8 is reflected by a corresponding 7% difference between the corresponding 201-residue proteins. Most signifi- cantly, the same 8% difference between the mature- protein coding regions in gf-2.8 and gf-3.8 is allied with no change whatever in a central part (61-151) of the encoded polypeptide sequences. It seems likely that this central, strongly conserved core in the germins is of first importance in the biochemical involvements of the proteins. When an equivalence is assumed between like amino acids, the gf-2.8 and gf-3.8 germins show significant (-44%) similarity to spherulins l a and l b of Physarumpolycephalum, a similarity that increases to -50% in the conserved core of germin. Near the middle (87-96) of the conserved core in the germins is a rare PH(I/T)HPRATEI decapeptide sequence which is shared by spherulins ( la and lb) and germins (gf- 2.8 and gf-3.8). These similarities are discussed in the context of evidence which can be interpreted to suggest
* Continuous financial support, solely by Grant MRC-MT-1226 from the Medical Research Council of Canada over the past 30 years, is warmly appreciated. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequencefs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number($ M63223andM63224. + To whom correspondence should be addressed.
that the biochemistry of germins and spherulins is involved with cellular, perhaps cell-wall responses to desiccation, hydration, and osmotic stress. Of special interest in this regard, the 5“flanking region in the gf-2.8 gene contains two sequences which are charac- teristic of auxin-responsive genes.
Mature embryos (-5% water) can be isolated en masse from dry grains of field-ripened wheat (Johnston and Stern, 1957). When cultured in water, these embryos imbibe water (-1 h) but the resulting, partially hydrated embryos (-60% water) resume growth only after a lag period (-4 h). The lag interval between the end of the period of partial hydration and the resumption of growth has been called a period of “germina- tion” (Marcus, 1969). During germination, the translatable mRNA population undergoes significant change (Thompson and Lane, 1980): a conserved mRNA population in the mature embryo is replaced by a newly synthesized mRNA population which supports renewed growth of the germinated embryo (reviewed in Lane (1988)).
Although change in the translatable mRNA population is largely completed during germination, it is not finished only in concert with the onset of renewed growth at -5 h postim- bibition, there is nascent synthesis of a translatable mRNA that encodes a novel protein we initially called g (Thompson and Lane, 1980) and later named germin (Lane, 1985). The resumption of growth leads, by -24 h postimbibition (Lane et al., 1986), to full hydration of the embryos (-85% water)- and this “water growth” (Jaikaran et al., 1990) occurs in alliance with a selective accumulation of germin mRNA (Rah- man et al., 1988) and germin itself (Lane and Kennedy, 1981; Grzelczak et al., 1982, 1985; Grzelzcak and Lane, 1983, 1984; Lane et al., 1986, 1987; Lane, 1988).
Germin is a rather rare water-soluble homopentameric pro- tein (-0.1% of the soluble proteins). It is refractory to diges- tion by broad-specificity proteases and to dissociation in SDS-containing reducing environments (Grzelczak and Lane, 1984). Germin is made in antigenically related isoforms during germination of all of the economically important ce- reals examined barley, oat, rye, wheat (Grzelczak et al., 1985), corn, and rice.’ Because of its peculiar temporal expression, it is possible that a significant part of the changes that occur during cereal germination is directed toward expression of the germin gene.
’ The abbeviations used are: SDS, sodium dodecyl sulfate; bp, base
D. Walters, E. Dratewka-Kos, T. D. Kennedy, and B. G. Lane, pair; kbp, kilobase pair.
unpublished results.
10461
10462 Germin Gene Family
A virtually full-length germin cDNA was isolated (Rahman et al., 1988) and its polynucleotide sequence was determined (Dratewka-Kos et al., 1989). This germin cDNA has been used as a probe to show that germin is encoded in a multigene family which maps primarily, to chromosomes 4A (-5 copies), 4B (-3 copies), and 4D (-9 copies) in hexaploid wheat.3 In this study, germin cDNA has been used to detect genomic clones by screening -lo6 plaques in a wheat DNA library (Murray et al., 1984). The nucleotide sequences of a 2.8-kbp fragment (gf-2.8) from one genomic clone, and of a 3.8-kbp fragment (gf-3.8) from another clone have been determined each encodes a germin mRNA and contains 5'- and 3'- flanking sequences.
Structural-protein (224 amino acids) coding regions in gf- 2.8 and gf-3.8 are intronless and 90% similar, but aside from some scattered sites of what are likely to be cis-acting ele- ments, there is no similarity between the 5"flanking se- quences in gf-2.8 and gf-3.8. The degree of similarity (92%) is fairly constant throughout the mature-protein (201 amino acids) coding regions in gf-2.8 and gf-3.8 but significantly, a central part (91 amino acids) of the corresponding proteins is fully conserved. This conserved core in the wheat germins is -50% similar to the corrresponding region in two of the slime- mold spherulins (la and l b ) that have been shown to accu- mulate, specifically, during spherulation of the Physarum polycephalum plasmodium (Bernier et al., 1986, 1987).
Spherulation is a process that is induced if the Polyce- phalum physarum plasmodium is subjected to starvation, osmotic stress, extremes of temperature, or other forms of environmentral stress, and it leads to encystment, desiccation and developmental arrest (Jump, 1954; Chet and Rusch, 1969) (see also Gorman and Wilkins (1980) and Raub and Aldrich (1982)). The principal spherulation-specific mRNAs (for spherulins la, lb, 2a, 3a) are not present in encysting amoe- bae, sporulating plasdmodia, or vegetative plasmodia but, by 24 h after the onset of plasmodium starvation, they account for -10% of the total mRNA in the organism (Bernier et al., 1986). Full nucleotide sequences have been determined for the principal spherulin-specific mRNAs (Bernier et al., 1987). Similarities between germins (gf-2.8 and gf-3.8) and spheru- lins ( la and Ib) imply, not only that these proteins may have related biochemical involvements, but that there may also be a previously unsuspected affinity, at the molecular level, be- tween the biology of cereal germination and Physarum spher- ulation. These and other findings and their implications are subjects of this report.
EXPERIMENTAL PROCEDURES
Materials-The wheat DNA library used in this work was gener- ously supplied by Dr. Michael G. Murray of the Advanced Research Division at Agrigenetics Corporation (Madison, WI). Dr. Murray also supplied the host organism (ED87671 used to prepare his Charon 32 library (Murray et al., 1984). The library was constructed in Novem- ber, 1982, using 15-23-kbp fragments, which had been derived by partial EcoRI digestion of wheat DNA (Murray and Thompson, 1980). The library originally contained 1.5 X lo6 different clones, enough to ensure 99% probability of detecting a single-copy sequence in the wheat genome (6 X lo9 bp/haploid genome), and when it was used in this investigation, in January, 1987, it had a titer of -0.5 X 10' plaque-forming units/ml. The virtually full-length cDNA used to screen the wheat DNA library was prepared as described (Rahman et al., 1988). The polynucleotide sequence of this cDNA has been re- ported (Dratewka-Kos et al., 1989). Deoxyribonucleotide primers used for primer-extension studies were prepared by and purchased from The Biotechnology Service Centre of The Hospital For Sick Children Research Institute (Toronto). ["SIDNA molecular weight markers were purchased from Amersham Life Science Products (catalog No.
'' M. D. Gale, unpublished results.
SJ.5000) and unlabeled DNA (DRIgest) markers (catalog No. 27- 4056-01) were purchased from Pharmacia LKB Biotechnology Inc.
Procedures Used to Screen the Wheat DNA Library-The library was diluted 50-fold with SM buffer (0.58 g of NaCI, 0.2 g of MgSO,. 7H20,5 ml of 1 M Tris chloride (pH 7.5), 0.5 ml of 2% gelatin in 100 ml of water) and the diluted library (50 pl) was mixed with 300 pl of ED8767 in 10-ml Falcon tubes for incubation at 37 "C (2 h) before being mixed with 6.5 ml of top-agarose (0.7%) and then poured over bottom-agar (1.5%) in a 137-mm Petri plate (150 cm'). The top- agarose and bottom-agar were prepared in NZC medium and the liquid culture of ED8767 was prepared by inoculating a single colony (from an overnight NZC plate) in 50 ml of NZC broth (supplemented with 500 pl of 20% maltose); after shaking overnight to obtain AeW nm -1.5, the culture was centrifuged and the pellet was suspended in 10 mM MgSO, for storage at 4 "C before use. After solidification of the top-agarose, 20 such plates were inverted and incubated at 37 "C for -6 h to obtain an appropriate plaque density (50,000 plaque-forming units/pIate).
In order to screen the plaques, duplicate lifts (first 2 min; second 3 min) were prepared using either Schleicher and Schuell BA85 or Millipore HATF nitrocellulose. The filters were placed, in succesion, in each of the following solutions: 1.5 M NaCl, 0.5 M NaOH (1 min), 1.5 M NaCl, 0.5 M Tris chloride (pH 8) (5 min) and 2 X SSC (5 min) before air-drying on filter paper. The filters were prewashed (first 0.5 h; second 1.5 h) at -45 "C in a solution that was prepared by mixing 100 ml of 1 M Tris chloride (pH 8), 116.8 g of NaC1, 4 ml of 0.5 M EDTA, 10 ml of 20% SDS, and diluting to a final volume of 2 liters; 1 liter of this prewash solution was used for 40 filters. The filters were then "prehybridized" (3 h) at -60 "C in a solution that was made by mixing 250 ml of 20 X SSC, 100 ml 5% Blotto (5 g of Carnation milk powder, 10 ml of phosphate-buffered saline, 90 ml of sterile water), 6.9 g of NaH2P04.H20, 5 ml of 20% SDS, 10 g of
5.51-1 Hpall-8 Sphl-1 8.~1.2 Sphl.2
0 . 3 1.2 1.0 2.0 2.5
0.07 0.8 1.1 1.8 2.8 EcoRI.1 EeoRI.2
Xb.l.1 Xb.l.2 CI.l.1 CI.1-2
Hind111 Kpnl
4- - + - t
" - e
gf-2.8
0.5 1.1 1.7 2.1 EcoRI-1
2.8 3.2 3.7
0.2 0.0 1 . o 1.5 2.1 2.B EooRI-2
3.8
EcoRV Xb.1 FnUkHI-3 Hlndl l l -1 Hlndlll.2 Hlndlll-3 4- c
4= "
gf-3.8 FIG. 1. Sites at which scissions were made with restriction
enzymes and overlapping sequences that were determined in arriving at the deoxynucleotide sequences of the genomic fragments gf-2.8 and gf-3.8. Restriction sites are oriented with respect to the 5'-end of the noncoding strands of DNA.
FIG. 2. Deoxynucleotide sequence of the noncoding strand in gf-2.8. The parts of the sequence which corre- spond to possible regulatory and mRNA sequences in the germin gene are in up- per-case lettering and the remainder of the sequence is in lower case.
1 q..ttcc.q. aaq.t.t.at arqca..... I 10 I 20 I 30
121 cttqqqtrqq SCSEqqcqc. qccEcC.qtq 61 .q.ca..qqt tR.q...Cc 255.......
111 MTTGGTCTA ATaqCtcaqa cCCoC..CCa 241 MTtaqatta ~ a . L c c t a a t CamtaCqaa 301 CCcqqaqRE tttccC~Lqq acRtq..qC
421 cacaqccqqq aqqcat*acc aqttqecqla 361 * t a C t t C a q . q t q U c l C R 1Rttq.atc
4 8 1 q e c u t q t c c caaqqcaacc CtcqtaqCta 541 CqcSctqEat t t t tqcacqt tCarqtqaEa 601 atatcaccca r t rcqt tatc Ccaq.cq.qt 661 CtcaIaCCqt qqtaqqtqaq aoaqacccac 721 C C O ~ ~ C S C C ~ tScaqqttCq Caaqacastt 781 arttqactaq taaaaacaaq t t a t a c c q a a 841 aqacaqqqaq wqccqqscq qcqcqctqqa 901 cqccaeaqqc CCqttqattg ECaRqaEaa 961 cttcccqcqa tcqaatat t t aaactqcqaq
lo81 Cattqqcaac aaqtcatcqa CtCAGMCAG 1141 AGAGTAGttl C q S S q S W C ~ aqCaqCCTCC 1201 cqqscqqcqq cqaqaccaqc t*qc.c,qqtt 1261 CqqRttttC aCCtrqC.tq rcqccarqca 1321 Rqctqaatq aaqaCSSfaa cCCttCaCCa 1381 acqaaaccca LaCqcacqac tqacqaqcaa 1141 Saaqaqcaat e.CCCCaLCqa qat.CaCCCC 1501 CaCaGCAGCC GGCqtGGCC t t q a R R q a 1561 RcgctqCR L q C t t c q t q C qtccATTTCC 1621 taqctraat t aqetCCATGC ATCCACTqR 1681 acqcctcacs I ~ ~ c c A C T C A I CCACCACAGC 1741 ACTCTCCATC AACAAACTCT AGCTGATCM
1861 CTTCGCCACC GACCCACACC CTCTCCAGGA 1101 GGGGTACTCC AUACCCTAG TAGCTGGCCI
1921 GGTCTCGGTG MCGGGCACA CGTGCMGCC 1981 CTCGTCCMG TTGGCCMGG CCGGCMCAC 2041 GCTCGACGTG GCCGAGTGGC CCGGTACCM 2101 CTTTGCTCCC GCAGGCACCA ACCCACCACA 2161 CGTGATGAAA GGTGAGCTTC TCGTCGGAAT 2221 CTACTCGAGG GIGGTGCGCG CCGUCAGAC 2281 CCAGTTCMC GTCGGTMGA CCGAGGCCTC 2341 CGGCATTGTC TTCGTGCCCC TCACGCKIT 2401 GCTCACCMG GCACTCCGGG TGCAGGCCAG 2461 CGCTGGGTTT TAATTTCTAG GAGCCTTCCC 2521 CATGCTAGCA A A A T T T M I A A T K T U C C A 2581 TCGCATGTAG TCGTGTMTA AGATTCMCA 2641 MCCMTATG AGGMTTGM TGTACTACTT 2701 CGGMTATAT MTMGCATT TTCGTataTT 2761 CtcCataqCC C a C q L C I D q a Cqqcaiatqt
I 10 I 20 I 30
1021 RtqatCaR E.CCqaCCqa tSaqCaqCaq
21121 tc
TCCTAGCTAA GCTTATTACA TAGCMCCAI 1800 polypcpt& Inadion Codon: 1799. GTTCGCMTC CTGTTACTAG CTCCGGCCGT 1860 CTTCTGTGTC GCCGACCTCG ACGGCMCGC 1920 CATGKGGAG GCCGGCGACG ACTTCCICTT 1980 GTCCACCCCG CACGCTGGGT CATCCACCCG CCTTGGUCC GTTCCTCATC CITGGTCCTC
MCGCCTCCG GTGKCATGA CGTGCCACCG CTCGACTCCG CCACGGGGCC T C C T T C U U
CCGTGACGGA ACCGCGTGGA AGATCGGUT GGAACMCCT TCATGCACTT G c c I G u l c c c
2040 2100 2160 2220 2280 2140
CGGCTCCMC CCGCCCATCC CMCGCCGGT 2400 GGTCGTGGAA CTTCTCMGT CCAAGTTTGC 2460
””” ~~~~~ ~~~ ~ ~.~~~ ~~ ~
TCMATGATA ATTATATMI TCCATATATG 2520 Polypeptide Tenn*rorion Codon: 2471- GAAGACATGT ATTCAAGTTI C A G R T M T C 2580 AGTTAGCCTC ATGGTGTIGC CITCGATCAG 2640 TTTATTGTCG TCTTTGTTCT TTTCACIGM 2700 TGTCCGAITA C C T T T T ~ ~ U qtcaaacatq 2760 mRNA Tm’narion Signal: 2729- tqaqcqatqt cqamtcact c.q*aaqaaZ 2820
I 40 I 50 I 60 2822
Germin Gene Family 10463
gf-2.8
glycine, 200 mg of yeast tRNA (in 40 ml of 1 M Tris (pH 7.5), and diluting to a final volume of 1 liter, using 1 liter of prehybridization solution for 40 filters.
For hybridization, 40 filters (20 duplicates), in 230 ml of hybridi- zation medium, were shaken -16 h at 65 “C at a speed setting of 3.25 (Brunswick Shaker) in a FridgeOSeal bowl. The hybridization me- dium contained 6 X lo6 cpm (5 ng) of cDNA probe per ml of prehybridization buffer, the probe having been made by the random- hexamer procedure (Feinberg and Vogelstein, 1983): 12.5 y1 of cDNA (0.1 yg/pl), 12.5 pl of random hexamers (0.1 unit/pl) and 55 pl of sterile water were mixed and the resulting solution was heated for 3 min at 100 ‘C before being mixed with 100 pl of 2.5 X random- hexamer buffer, 10 p1 of bovine serum albumin (10 yg/pl), 10 pl of Klenow fragment and 50 yl of [a-32P]dCTP (3000 Ci/mmol; 10 pCi/ gl), in a total volume of 250 yl. The probe was freed of unreacted nucleoside triphosphates by passage through a Sephadex G-50 column before use. After hybridization, the 40 filters were washed in the following solutions: 1 liter of 2 X SSC/O.l% SDS at room temperature (10 min), 1 liter of 2 X SSC/O.l% SDS at room temperature (15 min), twice in 1 liter of 2 X SSC/O.l% SDS at 65 “C (1 h), and twice with 1 liter of 1 X SSC/O.l% SDS at 65 “C (0.5 h). After air-drying, the filters were exposed to X-Omat AR film overnight in order to detect signals.
Procedures Used for Nucleotide Sequencing-Standard procedures used for deoxyribonucleotide sequencing have been described else- where (Dratewka-Kos et al., 1989); in addition, for this study, M13mp18(19) phage (Messing, 1988) as well as pEMBL18(19) plas- mids (Dente et al., 1983) were used to generate single-strand tem- plates, and Sequenase (U. S. Biochemical, catalog No. 70700)) and Taq polymerase (Promega, catalog No. PRQ5530) were used to tran- scribe single-strand templates. Full sequences were deduced from the overlaps obtained when deoxyribonucleotide sequences were deter- mined for (-70% of) each strand of the gf-2.8 and gf-3.8 clones.
RESULTS
Isolation of Genomic Clones-A virtually full-length clone of germin cDNA (Rahman et al., 1988) was labeled with 32P in order to screen -lo6 plaques in a wheat DNA library. The “stuffer fragments” in the library were derived by partial
EcoRI digestion of wheat DNA and were incorporated between the 17-kbp and 13-kbp “lambda arms” of Charon 32 (Murray et al., 1984). Seven possibly positive signals were detected in the primary screen and two of these proved to be bona fide positive plaques after rescreening. After full digestion with EcoRI and separation of the resulting products by electropho- resis in 0.8% agarose gel, each genomic clone yielded a series of discrete fragments, some of which gave positive signals when DNA blots of the digestion products were screened with “P-labeled germin cDNA. One genomic clone (stuffer frag- ment -11 kbp) yielded three such fragments (-0.6, -2.8, and -7 kbp), and the other (stuffer fragment -16 kbp) yielded two such fragments (-3.8 and -11 kbp) which hybridized with germin cDNA. The strongest signals were obtained with the 2.8-kbp (gf-2.8) and 3.8-kbp (gf-3.8) fragments and there- fore these were chosen for detailed study. Each of gf-2.8 and gf-3.8 was found to encode a full sequence for germin and germin mRNA.
Nucleotide Sequencing of gf-2.8 and gf-3.8-Strategies used to obtain overlaps from which the full deoxyribonucleotide sequences of gf-2.8 and gf-3.8 could be deduced are shown in Fig. 1. The sequence of the putative noncoding strand in gf- 2.8 is shown in Fig. 2, where segments corresponding to the mRNA sequence and possible 5’- and 3’-flanking regulatory sequences are depicted in upper-case lettering, whereas the remainder of the sequence is shown in lower case. The mRNA sequence of gf-2.8 is identical with that previously determined for a virtually full-length germin cDNA (Dratewka-Kos et ai., 1989), excepting only that the results of primer-extension mapping (see below) have shown that the “cap” site is at a position which is displaced 19 nucleotide residues 5’- to the 5’-end of the virtually (1075 bp) full-length (1094 bp) cDNA previously prepared (Rahman et al., 1988) by the technique of Gubler and Hoffman (1983). The sequence of the putative noncoding strand in gf-3.8 is shown in Fig. 3, and again,
10464 Germin Gene Family
FIG. 3. Deoxynucleotide sequence of the noncoding strand in gf-3.8. The parts of the sequence which corre- spond to possible regulatory and mRNA sequences in the germin gene are in up- 2041 ATGCTACCM MATTMTCA TTCTCCACAC
per-case lettering and the remainder of 2161 CCATCMTTC AAATCTACTG CTTTTTATTT
the sequence is in lower case.
1621 TTTGCACCAC GCGCCACCAA CCCACCCCAC 1681 GTGATCAAAG CTGWCTCCT CCTTGGTATC 1741 TACTCCAGGG TCGTCCCCGC TCGAGAGACC 1801 CAGTTCMCC TCGGTMGAC GGACCCCTCC 1861 WCGTCGTCT TCGTCCCACT CACCCTCTTC 1921 CTCACCMCC CTCTCCCCCT CCAGCCTGCG 1981 GGTCCCTCTT MTTCCTCCC AGCCACCCCT
2101 CCATCCACTT CTAATMGAT TCMTMDTT
2221 TATAATTATC AT‘TTTTGCaa CTTTTTCATG 2201 agcctacgcc gcgasggqcc aacacgcaac 2341 qttacaqaaa racattqtcs aastataaaa 2 4 0 1 gacgagcam t tggaaqqa aggaacagca 2461 qqaCEttcta ccacgsacgg ggcatcacgt 2521 qaaatcfgqq c c a c C t t t t t ggctaCtatc 25111 aaatttcCCC Cccttattco aaaqtatatc 2641 Cgcacccgtc ttaaattaga cttgctgata 2701 cacttctqaa caaasggaaa aaoctaargc 2761 caaccaaaaa ggcagcataa atcrcaraps 2821 aaCatcattf ctt9aaaa.a catqctccca
2941 aaaataccaa tttcccgqac aaqacgctaq 3001 aqaCttgtag gtc ta t tacc aagaatgcac
3121 ctccctaaga aagatggqat ggccgacatt 3061 ataatggcac g a a c t t t c a t t ~ t ~ g ~ a a t t
3181 gggatcccca agatcctctc caagaccccc 3241 CtCatcgctg acaataatca gagacaatcc 3301 tsaagtaatr ccatcagaqa aagattacca 3361 ct t tcgacat tg t t tcq tgg c c s c r c r t g c 3121 agqugtggc atactcaaga ggtutaccag 3481 a a c a t ~ f t t t csccargcaa aaatt&acct
3601 gCtaqatatt g c t a t ~ a g ~ t ttcgcttacg 3541 aqacttcaac ttc9Ctaa.q tgtgfaaacc
3661 aatqqataec acgacagcaa aatttggcat 3721 tcgTcgctCC zatttctcct aatactgaat
I 10 I 20 I 30
2881 ttC..gCtCf gcgcaCrlt.t gta..cgc.t
gf-3.8
segments corresponding to the mRNA sequence and possible 5’- and 3”flanking regulatory sequences are depicted in upper case and the remainder of the sequence is in lower case.
Comparison of the mRNA Coding Sequences in gf-2.8 and gf-3.8-The parts of gf-2.8 and gf-3.8 that correspond to germin mRNA, and germin itself, are elaborated in Fig. 4 and subdivided into (i) 5’-untranslated regions (lower case), (ii) structural-protein coding regions that correspond to signal- peptide and mature-protein sequences (upper case), and (iii) 3’-untranslated regions (lower case). The 5’- and 3”bounda- ries in gf-3.8 mRNA were deduced from the structure of gf- 2.8 mRNA, the sequence of which, excepting only a 5‘- terminal extension of 19 nucleotides (see the primer-extension study, below), is identical with that reported for germin cDNA (Dratewka-Kos et al., 1989). Hollow symbols are used for gf- 3.8 mRNA in Fig. 4 if nucleotide residues in gf-3.8 mRNA differ from those in gf-2.8 mRNA (Dratewka-Kos et al., 1989), or if amino acid residues encoded by gf-3.8 differ from those encoded bygf-2.8. Deletion (-) and insertion (underlining) of nucleotides in gf-3.8, relative to gf-2.8, are also indicated in Fig. 4. There are no amino acid deletions or insertions between the gf-2.8 and gf-3.8 germin sequences.
There is extensive similarity between the 5’-UTR (80%) regions in gf-2.8 and gf-3.8 mRNA, as there is between their 3’-UTR (77%) regions. There is also impressive similarity between the 672-residue intronless nucleotide sequences which encode the signal-peptide (75%) and mature-protein (92%) sequences in gf-2.8 andgf-3.8, and accordingly, there is
extensive similarity between the 224-residue amino acid se- quences of the signal peptide (70%) and mature protein (93%) encoded in gf-2.8 and gf-3.8. A large (91 amino acid) central domain of identity exists between the mature-protein se- quences encoded in gf-2.8 and gf-3.8 (see “Discussion”).
Possible Regulatory Elements in the 5’- and 3’-Flanking Regions of gf-2.8 and gf-3.8-Aside from some recognizable sites of possible cis-acting elements in the 5”flanking regions of both sequences there is little similarity between the 5’- flanking regions in gf-2.8 and gf-3.8 (Fig. 5A). The sites of possible regulatory sequences in the 5’- (Efstratiadis et al., 1980) and 3‘- (Birnstiel et al., 1985) flanking regions of gf-2.8 are indicated by lower case lettering in Fig. 2: upstream of the cap site, at -31 bp, a TATA box flanked by G + C rich sequences; at approximately -50 bp, two CAAT boxes; at approximately -110 through -150 bp, three AT-rich inverted repeat sequences; at approximately -180 bp, two GC-rich boxes; at approximately -600 bp, a 45-bp purine-rich se- quence; at approximately -1500 bp, two direct repeat (19 bp) sequences; and downstream of the poly(A) addition site, at +4 bp, a T-rich sequence that is probably involved with transcription termination. Possible proximal sites of cis-act- ing elements in the 5“flanking region of gf-3.8 are less rec- ognizable, but upstream of the cap site, at -26 bp, there is a possible TATA box, and at approximately -50 bp, a possible CAAT box (Fig. 3). The most unusual 5”flanking sequence in gf-3.8 is a long inverted-repeat (-200 bp) sequence at approximately -400 bp.
Germin Gene Family 10465
tanacatagcaagc
105- ATG GGO TAC TCC M A ACC CTA GTA GCT GGC CTG H C GCA ATG
147- CTG TTA CTA GCT CCG GCC GTC TTG GCC
Slgnal-Peptlde Codlng Sequence: 69
1 M G Y S K T L V A G L F A M
1 5 L L L A P A V L A
174- ACC GAC CCA GAC CCT CTC CAG GAC TTC TGT GTC GCC GAC CTC
216- GAC GGC AAG GCG GTC TCG GTG AAC OGG CAC ACG TGC AAG CCC
258- ATG TCG GAG GCC GGC GAC GAC TTC CTC TrC TCG TCC AAG TTG
300- GCC AAG GCC GGC AAC ACG TCC ACC CCG AAC GGC TCC GCC GTG
342- ACG GAG CTC GAC GTG GCC GAG TGG CCC GGT ACC AAC ACG CTG
384- GGT GTG TCC ATG AAC CGC GTG GAC TIT GCT CCC GGA GGC ACC
426- AAC CCA CCA CAC ATC CAC CCG COT GCC ACC GAG ATC GGC ATC
468- GTG ATG AAA GGT GAG CTT CTC GTG GGA ATC CTT GGC AGC CTC
510- GAC TCC GGG AAC AAG CTC TAC TCG AGG GTG GTG CGC GCC GGA
552- GAG ACG TTC CTC ATC CCA CGG GGC CTC ATG CAC TTC CAG TTC
594- AAC GTC GGT AAG ACC GAG GCC TCC ATG GTC GTC TCC l T C AAC
Mature-Proteln Coding Sequence 603
1~~~~~~~~~~~~~~
v x 0 G K A V S V N G H T C K P
a M S E A G D D F L F S S K L
u A K A G N T S T P N G S A V
n T E L O V A E W P G T N T L
7 3 G V S M N R V D F A P G G T
a N P P H I H P R A T E I G I
w V M K G E L L V G I L G S L
m D S G N K L Y S R V V R A G
W E T F L I P R G L M H F Q F
636- AGC CAG AAC CCC GGC ATr GTC TTC GTG CCC CTC ACG CTC TTC w N V G K T E A S M V V S F N
659- GGCTCC AAC CCG CCC ATC CCA ACG CCG GTG CTC ACC AAG GCA ~ S S ~ N P G I V F V P L T L F
720- CTC CGG GTG GAG GCC AGG GTC GTG GAA C i T CTC AAG TCC AAG l s s G S N P P I P T P V L T K A
762- l T T GCC GCT GGG TTT I w L R V E A R V V E L L K S K I 1s7F A A G F
J'-Untran+latad Sequence: 318 ~-taamcfaggagccnccelgaaatgalaanatalaan~Iatatgcalgcfagcaaaamaataanctca~gaagacai~ ancaagntcagghaaIct~cat~agtcgIglaataagangaacaa~agcctcaiggIgIagc~cgatcaga~aatatga ggaangaa~g~acfacmt tang~cglc fng~c~cactgaa~gaata~a~aa laag~~c~~~ I
93-28
1. acfcatccaccatagc---icagcagcaa~ac~~gccatagacactctc~~~aac~~t~tagcfRaIca~Ucta~Cl aagcgtgmgcatagcaagcm
Signel-Peptide Codlng Sequence: 69 102- ATG GGG TAC TCU A M AAC ATA GCO TCC GGC ATG l 7 GCC ATG
144- CTG QTC C l T GCT TCA GCC GTC CTG UCC
I"untranr1ated Sequence: 101
r M G Y S K W O A $ G B O F A M
, I L L L A S A V L S
171-UCCAAC CCU CAC CCT CTC CAG GAC lTC TGT GTC GCC GAC CTC
213- GAT GGC AAG GCG GTC TCG GTG AAC GGG CAC AUG TGC AAG CCC
255- ATG TCG GAG GCC GGC GAC GAC TrC CTC TTC TCT TCC AAG CTT
297- GCC AAG GCC GGC AAC ACA TCC ACC CCG AAC GGC TCC GCU GTG
339- ACG GAU CTC AAC GTG GCC GAG TOG CCU GGT ACO AAC ACA CTG
381- GGT GTG TCC ATG AAC CGU GTG GAC TIT GCA CCA GGO GGC ACC
423- AAC CCA CCO CAC hTC CAC CCG CGC GCC ACT GAG ATC GGC ATC
465- GTG ATG AAA GGT GAG CTC CTC G l T GGU ATC CTO GGC AGC CTC
507-GAC TCU GGG AAC AAG CTC TAC TCC AGG GTG GTG CGC GCU GGA
549- GAG ACG TTC CTC ATC CCO CGC GGO CTC ATG CAC TTC CAG TTC
Mature-Protaln Coding Ssquenc.: 603
, B W P W P L Q D F C V A D L
r 5 D G K A V S V N G H M C K P
s M S E A G D D F L F S S K L
a A K A G N T S T P N G S A V
n T D L W V A E W P G T N T L
n G V S M N R V D F A P G G T
a N P P H I H P R A T E I G I
w V M K G E L L V G I L G S L
I I ~ D S G N K L Y S R V V R A G
591- AAC GTC GGT AAG ACO GAG GCC TCC ATG GTC GTC l T C TTC AAC I ~ E T F L I P R G L M H F Q F
633- AGC CAG AOC CCC AGC OTO GTC TTC GTG CCA CTC ACG CTC TTC
675- GGCTCC AAC CCG CCC ATC CCO AAA CCG GTG CTC ACC AAG GCU
7 4 r N V G K T E A S M V V F F N
~ S S S Q S P S V V F V P L T L F
717- CTC CGG GTG GAG GCU QGG GTC GTG GAA CTT CTC A A I TCC AAG
759- TTC GCU GOT GGG TCT
t s a G S N P P I P K P V L T K A
I m L R V E A Q V V E L L K S K
rsrF h O G 8
77~Iaa~~cfpggag~cegccctgaaatgaI~aa-tataIaaII~atatatgcaIgcfagcaaa~ttaat~anct~mcagaaga catgIancaag~ncngettaa~ctc-catgcagngt--aataagaligaanaagtt~cct~~g~gccttcg~----aac~ a a t a ~ a R g a a n g a a m l g t a c l n m t a t m M g t n n g n a n t ~ ~
J'-Untranslatad Sequence: 312
gf-3.8
FIG. 4. Deoxynucleotide sequences of noncoding strands in mRNA regions of gf-2.8 and gf-3.8. The nucleotide sequences are subdivided into four parts: 5'-UTR (lower case), 3'-UTR (lower case) as well as protein- coding regions for the signal-peptide (upper case) and mature-protein (upper case) parts of germin. The correspond- ing sequences of the proteins are shown in the one-letter code for amino acids. Differences between nucleotides and amino acids in gf-2.8 and gf-3.8 are indicated, in the case of gf-3.8, by hollow lettering, and sites of de!etion (-) or insertion (underlining), used to maximize homology between gf-2.8 and gf-3.8, are likewise indicated in the structure of gf-3.8. BoMface lettering is used in the 5'-UTR to indicate the cap site (position 1) as well as position 20 (gf -2 .8) or 17 ( gf-3.8), which correspond to the 5'-end of a virtually full-length cDNA in gf-2.8 (Fig. Z), or TI233 In . gf-3.8 (fig. 3)) (see text).
Determination of Cap Sites in gf-2.8 and gf-3.8 by Primer Extension-The primer-extension approach to the determi- nation of cap sites (Qu et al., 1983; Shelness and Williams, 1985; Fouser and Friesen, 1986; Kunz et al., 1989; Ham et al., 1989) was used in this study (Shafai, 1989). Primers (20 mers) were chosen in order to maximize differences between gf-2.8 and gf-3.8 mRNA. The primers were complementary to resi- dues l8I7CTA.. . CGC'836 and 1862TTG.. . CCC1ss' in gf-2.8 (Fig. 2) and they were 65-70% homologous with the corre- sponding primers prepared for gf-3.8: 1336ATA . . . TGC'355 and
tween the respective single-strand templates: Sph- 11KpnI:1594-2062 (Fig. 2) in M13mp19 for gf-2.8, and Pst- ISac12: 1153-1693 (Fig. 3) in pEMBL19 for gf-3.8, i.e. the primers only gave discrete sequencing ladders with their 100% homologous partners.
As illustrated in Fig. 6, when bulk mRNA from germinated wheat embryos (isolated at 35-h postimbibition) was used as a source of templates, the gf-2.8 primer, in this case lR"CTA. . . CGP3'j, gave a number of bands, most of which were between residues -1690-1720. This is the expected neighborhood of the cap site since residue 1714 corresponds to the 5'-end of the virtually full-length cDNA that was previously sequenced (Dratewka-Kos et al., 1989). The strong- est band, at residue A"jg5, was also strongest when the other
1 3 8 1 ~ ~ ~ . . . CCC'400 (Fig. 3). These primers discriminated be-
gf-2.8 primer, 18'j2TTG.. . CCC'ssl, was used as primer with the same mRNA (data not shown), and it is therefore assumed to be the cap site in the mRNA that is encoded by gf-2.8.
In accord with the results of Northern analyses, which have indicated that the amount of gf-3.8 mRNA is at least 10-fold smaller than the amount of gf-2.8 mRNA in bulk mRNA from germinated wheat embryo^,^ prominent bands were not ob- served when either of the gf-3.8 primers (e.g. see Fig. 6) was used to prime synthesis with the same bulk mRNA specimen. The 5'-UTR domains are the most highly conserved parts of the regions which are 5'- to the structural-protein coding regions in homologous genes (Efstratiadis et al., 1980). Ac- cordingly, because strong similarity between gf-2.8 (Fig. 2) and gf-3.8 (Fig. 3) begins, very abruptly (Fig. 5A), at sites which correspond to A"jg5 (the cap site) in gf-2.8, and to A'217 in gf-3.8, it seems highly probable that the cap site can be assigned to residue A'217 in gf-3.8. It is relevant to note that this interpretation conforms well with the conclusion (see above) that the putative sites of TATA and CAAT boxes in gf-2.8 andgf-3.8 are about the same distances upstream of the putative cap sites in both mRNA molecules (see above).
DISCUSSION
Studies of germin and its allied coding elements (mRNA and DNA) were initiated in order to broaden perspectives
T. Vaughan, unpublished results.
10466 Germin Gene Family
gf-3.8 1 400 800 120016002000240021003200~ nu 1 1 1 , 1 1 1 1 1 1 1 1 I
gf-2.8 cDNA 1 100 200 300 400 500 600 700 800 9OOlOOO l B I I I I I I I I
200
'\ 00 Spherulin l b cDNA
FIG. 5. Homology matrix comparisons of gf-2.8 and gf-3.8 ( A ) and of germin and spherulin l h cDNAs (R). The search element was 75 nucleotides and the numher of allowahle mismatches was 35, using the Inspector I1 DNA program. In A , the uproard pointing arrorohmd indicates the positions of the cap sites in gf-2.8 and gf"3.8 and the dolcrnumrd pointing arrowhead indicates the posi- tions of t he 3'-ends of the "mRNA sequences" in gf-2.8 and ~ f -3 .8 . In H , the arrorrhcads define the 5' - ( k f f ) and 3'- ( r ight ) extremities of the sequence which encodes the conserved core in the germins.
about the molecular basis of developmental change in germi- nating wheat (Lane, 1988). It is now apparent that investi- gations of the biochemistry and molecular biology of germin have expanding consequence for studies of other cereals, other organisms, and disparate biological phenomena. Accordingly, this discussion of our current findings will he subdivided into two parts, one that deals with the molecular biology of cereal development, and another which deals with what appear to he germin-related processes in closely and distantly related organisms.
Molecular Biology of Cereal Development-The previously reported sequence (Dratewka-Kos et al., 1989) of a virtually full-length germin cDNA (Rahman et al., 1988) was precisely the same as the mRNA part of the gf-2.8 germin-gene se- quence determined in this study (Fig. 2), excepting only that the 5'-end of the virtually full-length germin cDNA corre- sponded to residue G""' (rather than A1"'") in the mRNA part of the gf-2.8 germin gene. It is interesting that 15 of the first 16 residues in the putative mRNA domains of gf-2.8 and gf- 3.8 are identical (Fig. 4) hut that the three residues which intervene between this sequence and G"' in gf-2.8, or T" in gf-3.8, are putative sites of deletion in gf-3.8 (Fig. 4). This suggests that termination of reverse transcription a t C"', duringpreparation of the cDNA, may he related to a structural idiosyncrasy at, or immediately adjacent to C2". It is not surprising that A"'!'" (Fig. 2) is the principal cap site in germin mRNA since the (type 0) m'GpppA cap structure is the principal (4040%) form found in hulk mRNA from germi- nating (Kennedy and Lane, 1979) and mature (Lane, 1981) wheat embryos.
GATS- -m
FIG. 6. Autoradiogram (%day exposure) of a dried sequenc- ing gel that was used to determine the cap sites in gf-2.8 and gf-3.8. The sequencing ladder lor gf-2.8 WIIS generated using as template. SphllK~nl:1694-'LO~2 (from gf-2.8, Fig. 'L) in single- stranded Ml3mpl9, and using as primer. a svnthetic oligonucleotide complementary to residues 1817-1836 ofgf-2.8 (Fig. 2). The sequenc- ing ladder forgf-3.8 was generated using as template / ' . ~ ~ I S o r l ~ : ~ 15% 169.7 (from gf-3.8, Fig. 3 ) in single-strand pEVHL19 and using as primer a synthetic oligonucleotide which was G5"; homologous with the gf-2.8 primer, hut 100% homologous to the corresponding region in gf-3.8. I'rimer-extension reaction mixtures contained gf-2.X ( n , h, c) orgf-3.8 (a' , h', c ' ) primers and mRNA ( 1 p g ) In. 0 ' ) . or h u l k NnCI- insoluhle RNA (10 pg) from germinated ( h , h ' ) or mature (r . c ' ) emhryos. The reference residues (TI'".'. 'r"""'. A""" C.""), indicated hv arrowheads in thegf-88 ladder. denote sites in the noncoding strand ofgf-2.8 (mRNA sense) and are complementary to the correspontliny: residues (A, A. T, C ) in the ladder (for the coding strand ol gf-2.8). Rands were not seen in primer-extension experiments when
Germin Gene Family 10467
Spherullnlb ~ - T ~ T P L P S S A A S P E L V A ~ L L N A P S E L D R I K L Germlngf-2.8 T D P D P L O D F C V A D L D G K A V S V N G H T C K P M S
S l - L K D N Q F V F D F K N S K L G V T a Q T Q G K T V A T - S E A Q D D F L F S S K L A K A G N T S T P N G S A V T E L D
6 1 - R T N F P A V I Q H N V A M T V Q F I E A C G I N L P H T H V A E W P G T N T L Q V S M N R V D F A P Q G T N P P H I H .
~ ~ - P R A T E I N F I A ~ G K F E A G F F - - L E N ~ A K F I G P R A T E ~ Q I V M K G E L L V G I L Q ~ L D S Q N K L Y S
~ Z ~ - H T L E A G M A T V F P ~ G A I H F E I N M N C E P A M F V
. . R V V R A G E T F L I P R G L M H F Q F N V Q K T E A S M V .
~ ~ ~ - A A F N N E D P G V ~ T T A S S F F G L P A D V V G V S L I v S F N S Q N P G I V F V P L T L F G S N P P I P T P V L T
1 8 l . S S l o T V E D L Q K H L P a N P A V A M a A C M K R C Q F S D K A L R V E A R V V E L L K S K F A A G F
FIG. 7. Comparison of the amino acid sequences of spherulin l b and gf-2.8 germin. The one-letter code for amino acids is used and the signal-peptide and mature-protein sequences are numbered, separately, each beginning with the numeral one. For simplicity, a single numbering system (the one for germin) is used for both proteins; this overlooks five deletions (-) and two insertions which were entered in the spherulin-lb sequence in order to maximize similarities with germin gf-2.8. The two insertions in sphemlin lb, one a 15-member amino acid sequence between Ale and P1’ in the signal-peptide sequence, and the other, an asparagine residue between L”’ and in the mature-protein sequence, have been omitted. The larger font is used to show similarities detected by the “Simplify” program (see text), and boldface is used to emphasize identities. The rectangle defines the conserved-core region of germin (residues 61-151) and the asterisks indicate F = L = I = M sites in the core which, if included in the comparison, increase the similarity between spherulin l b and germin gf-2.8 to -60% in this region (see text).
This investigation has shown that there is strong (87%) similarity between the mRNA parts of the gf-2.8 and gf-3.8 germin genes (Fig. 4). An 8% difference between their mature- protein coding regions (174-776) is generally reflected in a 7% difference between the corresponding polypeptide se- quences of the mature proteins (1-201) but most significantly, in a central region of the mature protein, the same 8% difference between their mature-protein coding regions (354- 627) results in no change whatever between the corresponding polypeptide sequences (61-151). Constraint against change between residues 61 and 151 of the mature protein (Fig. 4) strongly suggests that this is a biochemically important part of germin.
Two lines of evidence suggest that there is selective expres- sion of gf-2.8 mRNA during germination: first, the only two germin cDNA clones detected among 2000 colonies in a “full- length” wheat cDNA library (Rahman et al., 1988) were identical and corresponded to the mRNA part of gf-2.8; sec- ond, the amount of gf-2.8 mRNA greatly exceeded the amount of gf-3.8 mRNA in Northern analyses of bulk mRNA from germinated wheat embryos (see “Results”). It therefore seems not unlikely that cis-acting elements in the 5“flanking region of the gf-2.8 gene (see “Results” and Fig. 2) are selectively responsive to trans-acting factors which are present in ger-
mRNA from ungerminated embryos was used as template (data not shown). Accordingly, most bands obtained with bulk NaC1-insoluble RNA are not related to germin mRNA since they are identical for the bulk NaC1-insoluble RNA of germinated and ungerminated em- bryos. It is in fact doubtful if a number of the bands seen in primer- extension experiments, most particularly any which correspond to
In the gf-2.8 ladder, are related to cap sites, since m7GpppU cap structures are not present in the bulk mRNA of wheat embryos (Kennedy and Lane 1979; Lane, 1981).
TI6992 and T1693 ‘
minated wheat embryos. Moreover, because chromosome mapping of hexaploid wheat shows that gf-2.8 derives from chromosome 4D, there may even be selective expression, during germination, of the germin gene of chromosome 4D, which derives from the Tauschii/Aegilops (weed) progenitor of hexaploid wheat (Sears, 1974). Additionally, since the isoform of germin that is peculiar to mature embryos (pseu- dogermin) is present in germinated embryos in only small proportion (relative to the amount of germin) (Lane, 1988), as is gf-3.8 mRNA (relative to gf-2.8 mRNA), the gf-3.8 and gf-2.8 genes may encode the different isoforms.
EcoRI digests of the parent genomic fragment (-11 kbp) from which gf-2.8 was prepared yielded two other fragments (-7 and -0.6 kbp) that gave hybridization signals when Southern blots were probed with germin cDNA. Similarly, EcoRI digests of the parent genomic fragment (-16 kbp) from which gf-3.8 was prepared also yielded an -11-kbp fragment that gave a hybridization signal when Southern blots were probed with germin cDNA (see “Results”). Since each of gf- 2.8 and gf-3.8 contains a full mRNA sequence, these findings indicate that (at least part of) the germin-mRNA sequence (or a closely related sequence) is repeated in each of the parent -11- and -16-kbp fragments. When Southern blots of EcoRI digests of either of the parent genomic fragments (-11 or -16 kbp) were probed with a fragment made from the 5’- flanking region of gf-2.8 (ie. EcoRIISphI1:l-1600 in Fig. 2), only gf-2.8 (from the -11-kbp genomic fragment) gave a hybridization signal. Similarly, when Southern blots of EcoRI digests of either of the parent genomic fragments (-11 or -16 kbp) were probed with a fragment made from the 5”flanking region of gf-3.8 (i.e. EcoRIIPstI:l-1153 in Fig. 3), only gf-3.8 (from the -16-kbp genomic fragment) gave a hybridization
10468 Germin Gene Family
signal. Summarily then, unlike the structural-protein coding sequences in each of gf-2.8 and gf-3.8, the bulk of the 5’- flanking sequences in each ofgf-2.8 andgf-3.8 are not repeated in the parent genomic fragments.
Information about sequence divergence among structural- protein coding alleles in the A, B, and D homeologues of hexaploid wheat (Sears, 1974) is sparse. Since divergence of the A, B, and D progenitors of hexaploid wheat -4 X IO6 years ago5 would not be expected to lead to extreme divergence between allelic genes in hexaploid wheat (Smith and Raikhel, 1989), the absence of similarity between their 5’-flanking sequences indicates that gf-2.8 and gf-3.8 are unlikely to be allelic. Relevant information will emerge as information from chromosome mapping of gf-2.8 and gf-3.8 becomes available, and as nucleotide sequences are determined for the linked structural-protein coding regions in the parent fragments (-11- and -16-kbp) from which gf-2.8 andgf-3.8 are derived. In the meantime, it is encouraging that comparable informa- tion about linked (Futers et al., 1990) and possibly unlinked (Guiltinan et al., 1990) genes for the Em protein of hexaploid wheat may soon become available.
Implications of These and Other Investigations for Under- standing the Biochemical Involvements of Germin and Related Proteins in Cereals and Other Organisms-A protein which had previously been implicated in the osmotic-stress response of salt-resistant barley cultivars (Hurkman, 1990) was re- cently identified as germin (Hurkman et al., 1990). Our early studies had shown that standard barley cultivars synthesize germin during germinative growth (Grzelczak et al., 1985) and our more recent studies (Jaikaran et al., 1990) led us, also, to relate germin to the osmotic properties of cells. To explain the “water growth” that follows germination of wheat embryos (see the Introduction), we proposed (Jaikaran et al., 1990) that germin may play a role in altering the properties of cell walls during germinative growth. In this context, it is of interest that the 5“flanking region in gf-2.8 contains se- quences (‘612GCACATGCA’6’o and ‘63zGCTCCATGCA’640) which are very similar to ones which are known to occur in auxin-responsive genes (McClure et al., 1989)
An important similarity between the structural-protein cod- ing regions in germin mRNAs (gf-2.8 and gf-3.8) (Fig. 4) and spherulin ( la and lb) mRNAs (Bernier et al., 1987), all of which may encode cell-wall proteins, is detectable under con- ditions of reduced stringency that fail to detect similarities between the 5”flanking regions in gf-2.8 andgf-3.8 (Fig. 5A). This is shown in the comparison of the mRNA sequences that encode germin (gf-2.8) and spherulin l b (Fig. 5B). Most significantly, the similarity between germin and spherulin mRNAs is greatest in a region in which the gf-2.8 and gf-3.8 mRNAs differ by 8% but still encode the same, absolutely conserved protein sequence: amino acids 61-151 (see Fig. 4). Overall, there is 44% similarity between the germin gf-2.8 and spherulin Ib sequences when they are compared using the “Bestfit” program, and the default “grouping” of amino acids, in the “Simplify” program of the sequence-analysis package from the University of Wisconsin Genetics Computer Group (UWGCG) (Fig. 7): Pro = Ala = Gly = Ser = Thr; Gln = Asn = Glu = Asp; His = Lys = Arg; Leu = Ile = Val = Met; Phe = Tyr = Trp. The similarity between germins (gf-2.8 and gf- 3.8 ) and spherulins ( la and Ib) increases to 50% in the region of protein sequence that is absolutely conserved between the gf-2.8 and gf-3.8 germins, and if F = L = I = M equivalence is allowed (to emphasize hydrophobicity distribution), simi- larity between the germins and spherulins increases to -60% in the same conserved core of germin (residues 61-151). In a
’ J. Dvorak, personal communication.
more restricted domain (63-123), one which encompasses -60% of the conserved germin core and includes a decapeptide that is 90% homologous with a sequence in Escherichia coli glycerophosphate acyl transferase (Dratewka-Kos et al., 19891, the Bestfit program also shows -50% similarity be- tween gf-2.8 germin and spherulin l b (Fig. 7). In a still more restricted domain (81-96), one that encompasses -20% of the conserved germin core and contains a unique decapeptide sequence (PH(T/I)HPRATEI) that is 90% conserved between the germins (gf-2.8 and gf-3.8) and the spherulins ( la and Ib), similarity increases to -70% even without allowance for equivalences between different amino acids.
These structural relations between germins and spherulins la and Ib are especially interesting in the context of a recently reported evolutionary relation between vertebrate-lens crys- tallins and spherulin 3a (Bernier et al., 1987). Molecular modeling has shown that spherulin 3a (unrelated, structurally, to spherulins l a and lb) can adopt the tertiary structure which is characteristic of a single y-crystallin domain. It has been suggested that spherulin 3a and y-crystallins, together with protein S, which is found in bacterial spores, are part of a superfamily whose members share a similar three-dimen- sional architecture. It is posited (Wistow, 1990) that the earliest members of the family predate the prokaryote/eukar- yote separation and that they originated in cellular responses to environmental stress, including osmotic stress (e.g. during spore and spherule formation). Accordingly, it may be that spherulins generally, and germins in particular, emerged and evolved in a common or related biological context: cellular desiccation/hydration.
In connection with the desiccation of developing wheat embryos, we once suggested (Hofmann et al., 1984) and later adduced evidence in support of (McCubbin et al., 1985) an “anhydrobiosis” role for a protein that we initially designated “spot 7” (Cuming and Lanc, 1979), later called the E protein (Grzelczak et al., 1982) and finally named the Em protein in order to distinguish it from the Ec protein (Hanley-Bowdoin and Lane, 1983), which latter has since been shown to be a zinc metallothionein (Lane et al., 1987). As the most abundant protein in mature wheat embryos (Grzelczak et ai, 1982), the Em protein is well-equipped to “infiltrate” and conserve cy- toplasmic structures in the desiccated cytoplasm by virtue of its high content of hydrophilic amino acids (Grzelczak et al., 1982) and its random-coil conformation (McCubbin et al., 1985).
A similar role has since been proposed for a variety of other glycine-rich proteins in mature seed-embryos (Galau et al., 1987; Chandler et al., 1988; Gomez et al., 1988; Mundy and Chua, 1988; Close et al., 1989), including an Em analogue (Dl9 protein) among the so-called Lea (Baker et al., 1988) or WSP (water-stress proteins) (Dure et al., 1989) proteins in cotton-seed embryos (for review, see Morris et al. (1990)). There is a reciprocal relation between the degradation of the Em protein (Thompson and Lane, 1980; Grzelczak et al., 1982; Cuming, 1984) and the emergence of germin (Thompson and Lane, 1980; Grzelczak and Lane, 1983; Rahman et al., 1988) in germinating wheat embryos: degradation of Em and its mRNA is completed just as the nascent synthesis of germin and its mRNA begin at -5 h postimbibition. The roles of Em and germin in desiccation and hydration during wheat-em- bryo development, maturation and germination will be sub- jects of continuing study in the laboratory, as will the relation between germins and spherulins.
Acknowledgments-It is a pleasure to express our profound grati- tude to Dr. Michael G. Murray (Agrigentics Corporation, Madison, WI) who kindly provided the wheat DNA library that was used in
Germin Gene Family 10469
this investigation. The counsel of Prof. Robert Dunn and Dr. Anthony Rafalski in connection with procedures used to screen the DNA library is warmly acknowledged, as is the counsel of Dr. Jan Dvorak and Dr. Ken Armstrong in connection with the estimated divergence of the A, B, and D genomes in hexaploid wheat, and of Dr. Wenyan Shen in connection with the use of M13mp18(19) phage for deoxy- nucleotide sequencing. We are also pleased to thank Prof. P. N. Lewis for assistance in connection with the Write Now Computer Program.
REFERENCES Baker, J., Steele, C., and Dure, L. S., 111, (1988) Plant Mol. Biol. 1 1 ,
Bernier, F., Seligy, V. L., Pallotta, D., and Lemieux, G. (1986)
Bernier, F., Lemieux, G., and Pallotta, D. (1987) Gene (Amst . ) 5 9 ,
Birnstiel, M. L., Busslinger, M., and Strub, K. (1985) Cell 41, 349-
Chandler, P. M., Walker-Simmons, M., King, R. W., Crouch, M., and
Chet, I., and Rusch, H. P. (1969) J. Bacteriol. 100,674-678 Close, T. J., Kortt, A. A., and Chandler, P. M. (1989) Plant Mol. Biol.
Cuming, A. C. (1984) Eur. J . Biochem. 145,351-357 Cuming, A. C., and Lane, B. G. (1979) Eur. J. Biochem. 99,217-224 Dente, L., Cesareni, G., and Cortese, R. (1983) Nucleic Acids Res. 11 ,
Dratewka-Kos, E., Rahman, S., Grzelczak, Z. F., Kennedy, T. D., Murray, R. K., and Lane, B. G. (1989) J. Biol. Chem. 2 6 4 , 4896- 4900
Dure, L. S., 111, Crouch, M., Harada, J., Ho, T.-H. D., Mundy, J., Quatrano, R. S., Thomas, T., and Sung, Z. R. (1989) Plant Mol. Biol. 12,475-486
Efstratiadis, A., Posakony, J. W., Maniatis, T., Lawn, R. M., O’Con- nell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G., Weissman, S. M., Slightom, J. L., Blechl, A. E., Smithies, O., Baralle, F. E., Shoulders, C. C., and Proudfoot, N. J. (1980) Cell 21,653-668
Feinberg, A. F., and Vogelstein, B. (1983) Anal. Biochem. 132,6-13 Fouser, L. A., and Friesen, J. D. (1986) Cell 45,81-93 Futers, S., Vaughan, T. J., Sharp, P. J., and Cuming, A. C. (1990) J.
Galau, G. A. Bijaisoradat, N., and Hughes, D. W. (1987) Deu. Biol.
Gomez, J., Sanchez-Martinez, D., Stiefel, V., Rigau, J., Puigdome- nech, P., and Pages, M. (1988) Nature 3 3 4 , 262-264
Gorman, J. A., and Wilkins, A. S. (1980) in Growth and Dijjerentiation in Physarum polycephalum (Dove, W. F., and Rusch, H. P., eds) pp. 157-202, Princeton University Press, Princeton, NJ
Grzelczak, Z. F., and Lane, B. G. (1983) Can. J. Biochem. Cell Biol.
Grzelczak, Z. F., and Lane, B. G. (1984) Can. J. Biochem. Cell Biol.
Grzelczak, Z. F., Sattolo, M. H., Hanley-Bowdoin, L. K., Kennedy,
Grzelczak, Z. F., Rahman, S., Kennedy, T. D., and Lane, B. G. (1985)
Gubler, U., and Hoffman, B. J. (1983) Gene (Amst.) 2 5 , 263-269 Guiltinan, M. J., Marcotte, W. R., and Quatrano, R. S. (1990) Science
Ham, J., Moore, D., Rosamond, J., and Johnston, I. R. (1989) Nucleic
277-291
Biochem. Cell Biol. 64,337-343
265-277
359
Close, T. J. (1988) J. Cell. Biochem. Suppl. 12C, 143
13,95-108
1645-1655
Theoret. Appl. Genet. 80,43-48
123,198-212
61,1233-1243
62, 1351-1353
T. D., and Lane, B. G. (1982) Can. J. Biochem. 60,389-397
Can. J. Biochem. Cell Biol. 6 3 , 1003-1013
250,267-271
Acids Res. 17, 5781-5792
Hanley-Bowdoin, L., and Lane, B. G. (1983) Eur. J. Biochem. 135,
Hofmann, T., Kells, D. I. C., and Lane, B. G. (1984) Can. J. Biochem.
Hurkman, W. J . (1990) in Enuironmental Injury to Plants (Katter-
Hurkman, W. J., Tao, H. P., and Tanaka, C. K. (1990) Plant Physiol.
Jaikaran, A. S . I., Kennedy, T. D., Dratewka-Kos, E., and Lane, B.
Johnston, F. B., and Stern, H. (1957) Nature 179 , 160-161 Jump, J. A. (1954)Am. J. Bot. 41, 561-567 Kennedy, T. D., and Lane, B. G. (1979) Can. J. Biochern. 57, 927-
Kunz, D., Zimmermann, R., Heisig, M., and Heinrich, P. C. (1989)
Lane, B. G. (1981) Can. J. Biochem. 59,868-870 Lane, B. G. (1985) in Lipmann Symposium: Cellular Regulation and
Malignant Growth (Ebashi, S., ed) pp. 311-319, Japan Scientific Societies Press, Tokyo
Lane, B. G. (1988) in The Roots of Modern Biochemistry (Kleinkauf, H., von Dohren, H., and Jaenicke, L., eds) pp. 457-476, Walter de Gruyter and Co., New York
Lane, B. G., and Tumaitis-Kennedy, T. D. (1981) Eur. J. Biochem.
Lane, B. G., Grzelczak, Z. F., Kennedy, T. D., Kajioka, R., Orr, J., D’Agostino, S., and Jaikaran, A. (1986) Biochem. Cell Biol. 64,
Lane, B., Grzelczak, Z., Kennedy, T. D., Hew, C., and Joshi, S. (1987)
Lane, B., Kajioka, R., and Kennedy, T. (1987) Biochern. Cell Biol.
Marcus, A. (1969) Symp. SOC. Exp. Biol. 23 , 143-160 McClure, B. A., Hagen, G., Brown, C. S., Gee, M. A., and Guilfoyle,
McCubbin, W. D., Kay, C. M., and Lane, B. G. (1985) Can. J.
Messing, J. (1988) Focus (Bethesda Research Laboratories) 10,21-26 Morris, P. C., Kumar, A., Bowles, D. J., and Cuming, A. C. (1990)
Mundy, J., and Chua, N. H. (1988) EMBO J . 7,2279-2286 Murray, M. G., and Thompson, W. F. (1980) Nucleic Acids Res. 8,
Murray, M. G., Kennard, W. C., Drong, R. F., and Slightom, J. L.
Qu, L. H., Michot, B., and Bachellerie, J-P. (1983) Nucleic Acids Res.
Rahman, S., Grzelczak, A., Kennedy, T., and Lane, B. (1988) Biochem. Cell Biol. 6 6 , 100-106
Raub, T. J., and Aldrich, H. C. (1982) in Cell Biology of Physarum and Didymium (Aldrich, H. C., and Daniel, J. W., eds) Vol. 2, pp. 21-75, Academic Press, New York
Sears, E. R. (1974) in Handbook of Genetics, Vol. 2, Plants, Plant
New York Viruses, and Protists (King, R. C. ed) pp. 59-91, Plenum Press,
Shafai, R. (1989) The Polynucleotide Structure of a Germin Gene.
Shelness, G. S., and Williams, D. L. (1985) J. Biol. Chem. 260,8637- MSc. thesis, University of Toronto
8646 Smith, J. J., and Raikhel, N. V. (1989) Plant Mol. Biol. 13, 601-603 Thompson, E. W., and Lane, B. G. (1980) J. Biol. Chem. 255,5965-
Wistow, G. (1990) J. Mol. Euol. 3 0 , 140-145
9-15
Cell Biol. 2 , 908-913
man, F., ed) pp. 205-229, Academic Press, San Diego, CA
9 3 , (suppl.) 108
G. (1990) J. Biol. Chem. 265,12503-12512
931
Nucleic Acids Res. 17 , 1121-1138
114,457-463
1025-1037
Biochem. Cell Biol. 65,354-362
65,1001-1005
T. J. (1989) Plant Cell 1, 229-239
Biochem. Cell Biol. 63,803-811
Eur. J. Biochem. 190,625-630
4321-4325
(1984) Gene (Amst.) 30,237-240
11,5903-5920
5970