Structure prediction of the EcoRY DNA methyltransferase based on ...

11
Protein Engineering vol.9 no.5 pp.413-423, 1996 Structure prediction of the EcoRY DNA methyltransferase based on mutant profiling, secondary structure analysis, comparison with known structures of methyltransferases and isolation of catalytically inactive single mutants Albert Jeltsch 1 , Tatjana Sobotta and Alfred Pingoud Institut fBr Biochemie, Fachbereich Biologie, Justus-Liebig-Universita't, Heinrich-Buff-Ring 58, 35392 Giessen, Germany ' To whom correspondence should be addressed The EcoRY DNA methyltransferase (MEcoRV) is an a-adenine methyltransferase. We have used two different programs to predict the secondary structure of MrEcoRV. The resulting consensus prediction was tested by a mutant profiling analysis. 29 neutral mutations of M-EcoRV were generated by five cycles of random mutagenesis and selec- tion for active variants to increase the reliability of the prediction and to get a secondary structure prediction for some ambiguously predicted regions. The predicted consensus secondary structure elements could be aligned to the common topology of the structures of the catalytic domains of MHhal and MTaql. In a complementary approach we have isolated nine catalytically inactive single mutants. Five of these mutants contain an amino acid exchange within the catalytic domain of M-EcoRV (Val20- Ala, Lys81Arg, Cysl92Arg, Aspl93Gly, Trp231Arg). The Trp231Arg mutant binds DNA similarly to wild-type M-EcoRV, but is catalytically inactive. Hence this mutant behaves like a bona fide active site mutant. According to the structure prediction, Trp231 is located in a loop at the putative active site of M-EcoRV. The other inactive mutants were insoluble. They contain amino acid exchanges within the conserved amino acid motifs X, i n or IV in M-EcoKV confirming the importance of these regions. Keywords: in vitro evolution/neutral mutations/protein structure prediction/random mutagenesis/restriction modification system Introduction The methylation status of DNA is hereditary and adds addi- tional information to the genome. In eukaryotes DNA methyl- ation has been implicated in the control of gene regulation, genomic imprinting and embryonic development, in prokary- otes it serves to control the initiation of DNA replication, post- replicative repair and phage DNA packaging and also to protect host DNA from attack by restriction enzymes (reviews: Heitman, 1993; Noyer-Weidner and Trautner, 1993; Roberts and Halford, 1993; Razin and Cedar, 1994; Cheng, 1995). DNA methyltransferases (reviews: Homby, 1993; Cheng, 1995) can be divided into two classes according to the chemistry of the reaction catalysed, those transferring the methyl group to a carbon atom (C-methyltransferases), i.e. cytosine C'-methyltransferases, and those transferring it to a nitrogen atom (N-methyltransferases), i.e. cytosine N*- and adenine A^-methyltransferases. All DNA methyltransferases employ 5-adenosylmethionine (AdoMet) as donor for an activated methyl group. Whereas cytosine C 5 -methyltransferases have various amino acid motifs in common, adenine A^-methyltrans- ferases are a more heterogeneous family of enzymes having © Oxford University Press in common two loosely conserved F_G_G and (N/D)PPY motifs, whose sequential order may even be reversed (Smith et al, 1990; review: Wilson, 1992). Based on the order and spacing of the two motifs, cytosine A/ 4 - and adenine M-methyltransferases can be divided into three groups, namely a-methyltransferases, e.g. M-EcoRW, P-methyltransferases, e.g. M-HindUl, and y-methyltransferases, e.g. M-Taql. The order and spacing of the F_G_G and (N/D)PPY motifs of y-methyltransferases corresponds in cytosine C^-methyltrans- ferases to the spacing between the F_G_G motif and motif IV containing the conserved cysteine residue which is responsible for covalent catalysis. Currently, the structures of the two cytosine C^-methyltransferases, M-Hhal [AdoMet complex (Cheng et al, 1993); DNA complex (Klimasauskas et al, 1994)] and MHaem [DNA complex (Reinisch et al, 1995)], and of one y-adenine methyltransferase, M-Taql [AdoMet complex (Labahn et al, 1994)], are known. All enzymes comprise two domains, one responsible for catalysis, the other for DNA recognition. Although MHhal and MTaql do not have significant amino acid sequence similarities, the structures of both catalytic domains are very similar to each other (Schluckebier et al, 1995). In both enzymes, however, the positioning of the DNA recognition domain is different with respect to the secondary structure elements of the conserved catalytic domain. Recently it has been shown by structure- guided multiple sequence alignments that N-methyltrans- ferases, like C-methyltransferases, contain up to nine weakly conserved amino acid motifs. These motifs correspond structur- ally and functionally to those identified in C-methyltrans- ferases, suggesting that all methyltransferases have a catalytic domain of similar structure (Malone et al, 1995). The EcoRV DNA methyltransferase is part of the EcoRV restriction modification system. It specifically methylates DNA within GATATC sequences at the A^-position of the first adenine (Nwosu et al, 1988), thereby protecting the DNA from cleavage by the EcoRV restriction endonuclease. The enzyme is active as a monomer and consists of 298 amino acids (Bougueleret et al, 1984). It is an a-adenine DNA methyltransferase showing amino acid sequence similarities to some GATC-specific DNA methyltransferases (Lauster et al, 1987). As the protein is not very soluble, the determination of the 3D structure by NMR spectroscopy or crystallography is impeded. An appropriate structural model, however, is a prerequisite for further experiments to understand the basis of the specificity of this enzyme and its catalytic mechanism. In this work, we attempted to obtain a structural model of the EcoRV methyltransferase by combining computational and biochemical methods. Our strategy was first to predict the secondary structure elements of MEcoRV and then to refine and verify this prediction by identifying various neutral mutations produced by an in vitro evolution procedure. As the order of the predicted secondary structure elements of M-£coRV corresponds to the common topology of the catalytic domains of MHhal and MTaql, we were able to align the 413 Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479 by guest on 14 March 2018

Transcript of Structure prediction of the EcoRY DNA methyltransferase based on ...

Page 1: Structure prediction of the EcoRY DNA methyltransferase based on ...

Protein Engineering vol.9 no.5 pp.413-423, 1996

Structure prediction of the EcoRY DNA methyltransferase basedon mutant profiling, secondary structure analysis, comparisonwith known structures of methyltransferases and isolation ofcatalytically inactive single mutants

Albert Jeltsch1, Tatjana Sobotta and Alfred Pingoud

Institut fBr Biochemie, Fachbereich Biologie, Justus-Liebig-Universita't,Heinrich-Buff-Ring 58, 35392 Giessen, Germany

' To whom correspondence should be addressed

The EcoRY DNA methyltransferase (MEcoRV) is ana-adenine methyltransferase. We have used two differentprograms to predict the secondary structure of MrEcoRV.The resulting consensus prediction was tested by a mutantprofiling analysis. 29 neutral mutations of M-EcoRV weregenerated by five cycles of random mutagenesis and selec-tion for active variants to increase the reliability of theprediction and to get a secondary structure predictionfor some ambiguously predicted regions. The predictedconsensus secondary structure elements could be alignedto the common topology of the structures of the catalyticdomains of MHhal and MTaql. In a complementaryapproach we have isolated nine catalytically inactive singlemutants. Five of these mutants contain an amino acidexchange within the catalytic domain of M-EcoRV (Val20-Ala, Lys81Arg, Cysl92Arg, Aspl93Gly, Trp231Arg). TheTrp231Arg mutant binds DNA similarly to wild-typeM-EcoRV, but is catalytically inactive. Hence this mutantbehaves like a bona fide active site mutant. According tothe structure prediction, Trp231 is located in a loop at theputative active site of M-EcoRV. The other inactive mutantswere insoluble. They contain amino acid exchanges withinthe conserved amino acid motifs X, i n or IV in M-EcoKVconfirming the importance of these regions.Keywords: in vitro evolution/neutral mutations/protein structureprediction/random mutagenesis/restriction modification system

IntroductionThe methylation status of DNA is hereditary and adds addi-tional information to the genome. In eukaryotes DNA methyl-ation has been implicated in the control of gene regulation,genomic imprinting and embryonic development, in prokary-otes it serves to control the initiation of DNA replication, post-replicative repair and phage DNA packaging and also toprotect host DNA from attack by restriction enzymes (reviews:Heitman, 1993; Noyer-Weidner and Trautner, 1993; Robertsand Halford, 1993; Razin and Cedar, 1994; Cheng, 1995).DNA methyltransferases (reviews: Homby, 1993; Cheng, 1995)can be divided into two classes according to the chemistry ofthe reaction catalysed, those transferring the methyl groupto a carbon atom (C-methyltransferases), i.e. cytosineC'-methyltransferases, and those transferring it to a nitrogenatom (N-methyltransferases), i.e. cytosine N*- and adenineA^-methyltransferases. All DNA methyltransferases employ5-adenosylmethionine (AdoMet) as donor for an activatedmethyl group. Whereas cytosine C5-methyltransferases havevarious amino acid motifs in common, adenine A^-methyltrans-ferases are a more heterogeneous family of enzymes having

© Oxford University Press

in common two loosely conserved F_G_G and (N/D)PPYmotifs, whose sequential order may even be reversed (Smithet al, 1990; review: Wilson, 1992). Based on the orderand spacing of the two motifs, cytosine A/4- and adenineM-methyltransferases can be divided into three groups, namelya-methyltransferases, e.g. M-EcoRW, P-methyltransferases, e.g.M-HindUl, and y-methyltransferases, e.g. M-Taql. The orderand spacing of the F_G_G and (N/D)PPY motifs ofy-methyltransferases corresponds in cytosine C^-methyltrans-ferases to the spacing between the F_G_G motif and motif IVcontaining the conserved cysteine residue which is responsiblefor covalent catalysis. Currently, the structures of the twocytosine C^-methyltransferases, M-Hhal [AdoMet complex(Cheng et al, 1993); DNA complex (Klimasauskas et al,1994)] and MHaem [DNA complex (Reinisch et al, 1995)],and of one y-adenine methyltransferase, M-Taql [AdoMetcomplex (Labahn et al, 1994)], are known. All enzymescomprise two domains, one responsible for catalysis, the otherfor DNA recognition. Although MHhal and MTaql do nothave significant amino acid sequence similarities, the structuresof both catalytic domains are very similar to each other(Schluckebier et al, 1995). In both enzymes, however, thepositioning of the DNA recognition domain is different withrespect to the secondary structure elements of the conservedcatalytic domain. Recently it has been shown by structure-guided multiple sequence alignments that N-methyltrans-ferases, like C-methyltransferases, contain up to nine weaklyconserved amino acid motifs. These motifs correspond structur-ally and functionally to those identified in C-methyltrans-ferases, suggesting that all methyltransferases have a catalyticdomain of similar structure (Malone et al, 1995).

The EcoRV DNA methyltransferase is part of the EcoRVrestriction modification system. It specifically methylates DNAwithin GATATC sequences at the A^-position of the firstadenine (Nwosu et al, 1988), thereby protecting the DNAfrom cleavage by the EcoRV restriction endonuclease. Theenzyme is active as a monomer and consists of 298 aminoacids (Bougueleret et al, 1984). It is an a-adenine DNAmethyltransferase showing amino acid sequence similarities tosome GATC-specific DNA methyltransferases (Lauster et al,1987). As the protein is not very soluble, the determination ofthe 3D structure by NMR spectroscopy or crystallography isimpeded. An appropriate structural model, however, is aprerequisite for further experiments to understand the basis ofthe specificity of this enzyme and its catalytic mechanism. Inthis work, we attempted to obtain a structural model of theEcoRV methyltransferase by combining computational andbiochemical methods. Our strategy was first to predict thesecondary structure elements of MEcoRV and then to refineand verify this prediction by identifying various neutralmutations produced by an in vitro evolution procedure. As theorder of the predicted secondary structure elements ofM-£coRV corresponds to the common topology of the catalyticdomains of MHhal and MTaql, we were able to align the

413Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 2: Structure prediction of the EcoRY DNA methyltransferase based on ...

A Jdtsch, T.SobotU and A.Pingoud

predicted secondary structure of MEcoRV on the topology ofthe catalytic domains of M-Hhal and M-Taql. This approachled us to propose a structure for M EcoRV, which in partdiffers from the general structure predicted by Malone et al.(1995). In a complementary approach, we identified ninecatalytically inactive single mutants which support the struc-tural model.

Materials and methodsCloning, sequencing and purification of M-EcoRVThe gene for MEcoRV was obtained by PCR from pLBM4422(Thielking et al., 1991). To minimize the number of mutationsintroduced by the PCR, the P/w-polymerase (Promega), athermostable polymerase possessing proofreading activity, wasused, and only 20 PCR cycles were carried out. The 5' PCRprimer contained a BamHL cleavage site, the 3' PCR primeran EcoRV and a Sail cleavage site. The PCR product waspurified by a PCR purification kit (Qiagen), cleaved with 10U of BanitH (Amersham) and Sail (USB) and purified onceagain to remove the short cleavage products. The insert wasligated into the large BamHJJSall fragment of pHISRV (Wende,1994) which was dephosphorylated using shrimp alkalinephosphatase (USB). In the resulting pRVMetH6 plasmid thecodons for an affinity tag of six histidine residues weregenetically fused to the 5' terminus of the M-EcoRV gene. Theplasmid also contains an EcoRV recognition site immediately 3'to the gene, an important feature for the mutant profilingprocedure to be described. The MEcoRV gene (918 bp) andthe adjacent EcoRV site were sequenced using an automatedABI DNA sequencer according to the instructions of thesupplier. We used six sequencing primers, three for each DNAstrand, which were synthesized on a Millipore DNA synthesizerand purified by denaturing PAA gel electrophoresis. Thissequencing protocol ensured that each base pair is sequencedby at least three different sequencing primers independently,so that all mutations could be identified unequivocally. Theexpression of the MEcoRV gene which is under the controlof ptac was performed in Escherichia coli LKlll(X). Purifica-tion of the protein was carried out by Ni-NTA-agarose affinitychromatography essentially as described for R-£coRV (Wenzet al, 1994). Briefly, the cells were grown in 500 ml of LBmedium and gene expression was induced by IPTG. Cell lysiswas carried out in 30 mM potassium phosphate (pH 7.2)-0.1mM DTE-0.01% lubrol-100 mM NaCl-10 mM imidazole bysonication and cell debris was removed by centrifugation at13000 g. The supemant was applied to a 2 ml Ni-NTA-agarose column (Qiagen) equilibrated with lysis buffer. Afterwashing with 200 ml of lysis buffer, the protein was elutedwith 4 ml of 30 mM potassium phosphate (pH 7.2)-0.1 mMDTE-0.01% lubrol-100 mM NaCl-0.5 mM EDTA-250 mMimidazole. The concentration of the homogeneous preparationsobtained was determined by the Bradford method using BSAas standard.

Biochemical characterization of M-EcoRVTo test the enzymatic activity of M-EcoRV, two assays wereemployed. The first is based on the in vivo methylation ofplasmids by the methyltransferase. Owing to this activity,plasmids prepared from cells containing an active methyltrans-ferase are resistant to cleavage by the corresponding restrictionendonuclease. Plasmids were grown in E.coli LKlll(X) cellscontaining chromosomally encoded laclq. The transcription ofthe MEcoRV gene which is under the control of ptac was not

J p O random mutagenesls

1 transformation

expression of the MTase

O C y plasmid preparation

restriction enzyme cleavage

plasmid encodedMTase does not protectagainst DNA cleavage by

the restriction enzyme

plasmid encodedMTase protects against

DNA cleavage by therestriction enzyme

• EcoRV methyltransferase (MTase)O pRVMetO methylated pKVMet

Fig. 1. In vitro selection scheme for MEcoRV. In vitro selection wasemployed to enrich catalytically active and inactive mutants after randommutagenesis of the MEcoRV gene (for details, see text).

induced. Plasmid preparations were carried out using DNAmini- and midi-preparation kits (Qiagen) according to theinstructions of the supplier. To test plasmid protection againstR-EcoRV cleavage, 1-2 (xg of plasmid DNA were incubatedwith 100-200 nM REcoRV in 10 fxl of Tris-HCl (pH 7.5)-10 mM MgCl2-50 mM NaCl for 30-60 min. This correspondsto ~100 U of a homogeneous R-EcoRV preparation obtainedas described (Wenz et al. 1994). Subsequently, the sampleswere analysed by agarose gel electrophoresis. Under theseconditions an unmethylated control plasmid was completelycleaved after 5 min, but no non-specific DNA degradation wasobserved. To assay the in vitro activity of purified M-EcoRV,plasmid pAT153(2) DNA which harbours two EcoRV siteswas incubated in the presence of M-EcoRV in 100 mM NaCl-50 mM Tris-HCl (pH 7.5)-l mM AdoMet. After appropriatetimes (10-120 min), aliquots were withdrawn and cleavedwith R-EcoRV as described for the in vivo methylation assay.DNA binding of M-EcoRV and M-EcoRV mutants was analysedby gel electrophoretic mobility shift experiments essentiallyas described for REcoRV (Jeltsch et al, 1995) in a buffercontaining 20 mM Tris-HCl (pH 7.5), 50 mM NaCl, 5 mMDTE, 0.5 ng/nl BSA, 1 mM EDTA and 1 mM AdoMet (Sigma).

Random mutagenesis and selection for neutral mutations(mutant profiling)It should be noted that DNA methyltransferases are ideallysuited for an in vitro evolution approach, because a veryefficient selection procedure for active and inactive mutantsexists (Figure 1). This procedure is based on the fact that uponpropagation in the cell each plasmid is modified by themethyltransferase it is coding for. Hence, for a separation ofplasmids coding for inactive and active methyltransferasevariants, simply a plasmid pool containing randomly mutated

414Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 3: Structure prediction of the EcoRY DNA methyltransferase based on ...

M-EcoRV structure prediction

genes must be cleaved with the corresponding restrictionenzyme. Subsequently, uncleaved plasmids (coding for anactive methyltransferase) and cleaved plasmids (coding for aninactive methyltransferase) can be separated and transformedinto E.coli cells. Hence no screening of individual clones isnecessary.

Production of active MEcoRV variants containing multipleneutral mutations.

As all methods for undirected mutagenesis have some biases,random mutagenesis was carried out using five differentmutagenic agents, namely nitrous acid, formic acid, hydrazineand potassium permanganate, similarly as described by Myerset al. (1985). In four separate reactions, 0.1 ^g of pRVMetH6was incubated in 20 ul containing (i) 250 mM sodium acetate(pH 4.3) and 1.0 M sodium nitrite, (ii) 12 M formic acid, (iii)60% hydrazine or (iv) 100 mM KMnO4 for 30-60 min. Formutagenesis by UV radiation, 0.1 |ig ofDNA in 20 (il wereilluminated for 15-45 min at 254 nm with a UV hand lamp(Bachofer, Reutlingen, Germany). Subsequently, the DNA wasprecipitated with ethanol. The mutated DNA was amplified byPCR using the same primers as described above, except that7a^-DNA polymerase (Promega) and 30 cycles were used.This step introduces additional mutations in the gene andensures that all cloned mutants contain an intact EcoRVrecognition site, because the site is located on the 3' PCRprimer. The success of this mutagenesis procedure is shownby the fact that in the sequenced clones all possible kinds ofnucleotide exchanges were observed. The PCR fragment wasligated into the large BamYWSall fragment of pRVMetH6 asdescribed and the pool was transformed into E.coli LK11 l(k).Subsequently, the plasmids were prepared as a pool. Lowerlimits for the mutation rates were estimated by the amount ofplasmid which is not protected from R-EcoRV cleavage. Theseplasmids contain an inactive MEcoRV gene and, hence, mustcontain at least one mutation. To produce clones containingas many neutral mutations as possible, incubation times wereused, such that 50-95% of all plasmids were not protected.After REcoRW cleavage, the reaction mixture was transformedinto LKlll(X). As superhelical plasmid is transformed muchbetter than linear DNA, usually all of the resulting clonescoded for an active M-EcoRV mutant. We carried out fivecycles of random mutagenesis by the various methods, andselected for active mutants. After each cycle, the DNA codingfor active MEcoRV variants was pooled.

Production of catalytically inactive M-EcoRV single mutants.

To obtain inactive mutants containing only one amino acidexchange, the M£coRV gene was amplified by PCR andcloned into pRVMetH6XBa>nHIXSa/I as described. Causedby errors of the Ta^-polymerase, ~ 1 % of the resulting plasmidpool was cleaved by R-£coRV and, hence, code for an inactiveM-£coRV variant. The linear DNA was isolated from agarosegels using Geneclean (Biolll, La Jolla, CA, USA). The DNAwas ligated using 30 U of T4 DNA ligase (MBI Fermentas,Vilnius, Lithuania) and transformed into E.coli LKU1(X). Allclones obtained were checked for their in vivo methylationactivity. Inactive clones were sequenced. Single mutants wereexpressed and purified as described above.

Computational methods

Profile secondary structure predictions were carried out usingPHD (Rost and Sander, 1993, 1994) and SSP (Mehta et al.,1995). Both programs employ profiles obtained from multiple

sequence alignments to predict secondary structure. Multiplealignments were automatically calculated by PHD and SSP.SSP was used in its standard configuration, i.e. employing aPAM120 matrix, a gap penalty of 13 and a Z-score of 7.0.With PHD two protein sequences were used to construct themultiple sequence alignment, namely mte5_ecoli (MEcoRV)and mt21_strpn (MDpnU). With SSP these and five additionalenzymes were used, namely mtcl_chvnl (M-Cv/BI), dma7_ecoli (E.coli retron EC67 DNA adenine methyltransferase),dma_bpt4 (phage T4 methyltransferase), dma_bpt2 (phageT2 methyltransferase) and mtla_morbo (M-Mbof). This setenlarges the number of M-fcoRV homologous proteinsdescribed by Lauster et al. (1987). Beside MEcoRV andM-Cv/BI which recognize and methylate GATATC andGANTC, respectively, all of these proteins are GATC-specificadenine methyltransferases. The alignments are shown inFigure 2.

We used two different methods for profile secondary struc-ture predictions to improve the reliability of prediction. Therationale of this approach is that secondary structure elementswhich are independently predicted by two different programsemploying different prediction algorithms are expected to havea higher probability of being correct. This point is illustratedin Figure 3, where the results of secondary structure predictionsof MHhal and MTaal by PHD and SSP are compared withthe structures of both enzymes. The secondary structure(a-helix or p*-strand) of 145 amino acid residues was consist-ently predicted by PHD and SSP. With three exceptions(M-Taql positions 110, 111 and 244) the secondary structureprediction of all of these amino acid residues is correct. Thereliability of the combined prediction (98%) contrasts with thefraction of residues whose secondary structure is correctlypredicted by each program alone, which is 67% (PHD) and57% (SSP) for these example proteins. The deviations in thesecondary structure prediction of PHD and SSP for M-EcoRVwere not due to the different set of proteins used, as shownby a PHD prediction using the alignment produced by SSP.This PHD prediction did not differ substantially from thatobtained with the original alignment (data not shown).

Results and Discussion

Cloning, expression and purification of M-EcoRV

We inserted the MEcoRV gene into an expression vector andobtained pRVMetH6. The cloned protein contains a His6 tagon its N-terminus, to facilitate purification. It was shown tobe catalytically active, because plasmids grown in E.coli cellscontaining pRVMetH6 are protected from cleavage by theEcoRV restriction endonuclease and, hence, are completelymethylated at EcoRV sites. Complete protection was observed,although the transcription of the M-EcoRV gene which isunder the control of the ptac promotor was repressed in vivo.As ptac is known not to be very leaky, one can assume thatonly a few MEcoRV molecules are present in each cell,similarly to the case under physiological conditions, wheremethyltransferases of restriction/modification systems areusually expressed at a low level. We purified the MEcoRVenzyme by affinity chromatography to homogeneity. Thepurified protein has a specific activity of 2.5 X106 U/mg, whichis slightly higher than reported previously (Nwosu et al,1988), demonstrating that the Hiss tag does not interferewith activity.

415Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 4: Structure prediction of the EcoRY DNA methyltransferase based on ...

AJeltsch, XSobotta and A.Pingoud

a) multiple sequence alignment used by PHD

mt2ilisjxi

1 1 0 2 0 3 0 4 0 5 0

M K D K V F V P P I K S Q G I K T K L V P C I K R I V P K N F N G V W V E P F M G T G V V A F N V A

I K K V T L Q P F T K W T G G K R Q L L P V I R E L I P K T Y N R Y F E P F V G G G A L F F . D L A

80 70 90 90 100m * 5 _ « c n i P K D A L L C D T N P H L I S F Y N A L K N K D I T G D L V K D F L Y R E G E K L L L S N G E Y Y Ym < 2 1 _ t t p n P K D A V I N D F N A E L I N C Y Q Q I K O n t L I E I L K V H Q E Y N S K E Y Y L D L R S A D R D

1 1 0 1 2 0 1 3 0 1 4 0 1 5 0n * t t _ m a * E V R E R F N N Y K E P L D F L F L N R S C F N G M I R F N S K G G F N V P F C K K P N R F A Q A Ym t 2 1 _ t t r p n E R I D M M S E V Q R A A R I L Y M L R V N F N G L Y R V N S K N Q F N V P Y G R Y K N P K I V D

1 6 0 1 7 0 1 S 0 I S O 2 0 0r r t » 3 _ » c o a I T K I S N Q V D R I S E l I S K G N Y T F L C Q S F E K T I G M V N R O D V V Y C D P P Y I G R Hm O 1 _ e r p n E E L I S A I S V Y I N N N Q L E I K V G D F E K A I v p v R T G D F V Y F D P P Y I p y T

2 1 0 2 2 0 2 9 0 2 4 0 2 5 0t r t » 6 _ « c o i V D Y F N S W G E R D E R L L F E T L S S L N A T F I T S T W H H N D Y R E N K Y V R O L W S S F Rm t 2 1 _ « r p n H E G F S F A D Q V R L R D A F K R L S D T G A Y V M L S N S S S A L V E E L Y K D F N

2 8 0 2 7 0 2 8 0 2 9 0m t » 5 _ « c c i I L T K E H F Y H V G A S E K N R S P M V E A L I T N I A K D I I D H I E K S S G D I L V I E Em t 2 1 _ K i p n I H Y V E A T R T N G A K S S S R G K I S E I I V T N Y E K

b) multiple sequence alignment used by SSP

r n t s S t c o f l . M K D K V F V P P I K S Q G I K T K L V P C I K R I V P K N Fm ( 2 1 _ l t r p n M K I K E I K K V T L Q P F T K W T G G K R Q L L P V I R E L I P K T Y ,

M K K N R A F L K W A G G K Y P L L D D I K R H L P KM K P I V K W S G G K T O E L K R F E D Y I P SM S T l L K W A G N K T A I M S E L K K H L P AM L G A I A Y T G N K Q S L L P E L K S H F P K YM L G A I A Y T G N K Q 3 L L P E L K P H F P K Y

dmaf«coO

N G V W V E P

N R Y F E P

G E

D C S

G P

N

D

C L

T F

R L

R F

R F

E P

E P

E P

D L

D L

M G T

V G G

V G A

A G O

A G S

F C G G

F C G G

. M K P F I K W A G G K N S L L D E I Q K R L P D F V H S Q D F C L V E P F V G G

S 0m t » 5 _ » c o i G V V A F N V A P

m G 1 _ « r p n G A L F F D L A P

G S V F L N T D F

A A T F F H V G N

C A V M M A T D Y

(knmJOfM L S V S L N V N

0 m « _ 6 p C L S V S L N V N

S 0 7 0

K O A L L C D T N P H L I S F Y N A L K N K D I T G D L V K D F L Y R E

K O A V I N D F N A E L I N C Y Q Q I K D

S R Y I L A D I N S D L I S L Y N I V K M

O F E N K V L S D V H V E L V A L Y R A I A N

P S Y L V A D I N P D L I L Y K K I A A

G P V L A N D I

G P V L A N D I

E P I

E P I

E M Y

E M Y

R L I

R L I

U l_TOrtJO G A V S L W A L S D L P H L K Q L V I N D C N A D L I N V Y Q V I K N

N P Q E

R T D E

G K S Q A

D C E A

V S W 0 D V L

V S W D D V L

L I E I L KY V Q A A RI Y D F M KF I S R A RK V I K Q YK V I K Q Y

N P D D L I G Y I E N L Q

mtc1_chvn1<*n>7_«ol

c t m _ b p ( 2

9 0G E K L L LV H Q E Y NE L F V P ES H A N D EV L F E I A. . K L S K

. K L S K

1 0 0

G E Y Y Y E V R

K E Y Y L D L R S A

A E V Y Y Q F R

. T Y Y E V R S W K

E V A Y Y N I R

K E E F L K L R . .

K E E F L K L R .

n t 1 l _ m o r t » S H Y D K L T D L E S K K P Y F Y H K R

1 1 0

E R F N N Y K E P .D R D E R I D M M S E V Q

E E F N K S Q D P .

E D Y V D V A S RQ E F N Y S T E I T

E D Y N K T R D PE D Y N K T R D P .

L D FI A A R I

F R R A V L F L Y LF Y Y L

Y F LL L YL L Y V L

D F M K A

1 2 0

L F L N R S CL Y M L R V N

N R Y G

R K T CY L N R H G

V L H F H G

H F H GD V F N Q R T S N D I E Q A G L F I F L N K S A

1 3 0 1 4 0 1 5 0 1 B 0 1 7 0

n t B _ m x t F N G M I R F N S K G G F N V P F C K K P N R F A Q A Y I T K I S N Q V D R I S E I I S K G N Y TmQ1_Krpri

rnc1_thvn1OT»7_tcadnw_bpC4dma_bp<2

F N G L Y RY N G L C RF R G M M R

N S K N Q F N V P Y G R Y K

N L R G E F N V P F

N K N G O F N V P F

G R Y

G R Y

Y R G L C R Y N K S G H F N I P Y G N Y

F 3 N M I R

F S N M I R

N K S G H F N I P Y G N Y

N D K G N F T T P F G K R

N D K G N F T T P F G K RI f f l 1«_mj rbo F N G L Y R V N K N N O F N V P I G N Y

N P K I V D E E L I S A I S V Y I N N N Q L E

K K P Y F P E A E L Y H F A E K A Q N A F F Y

K T Y N F E D I I M E E Y Y N I L K D T I I L

K N P Y F P E K E I R K F A E K A

I N K N S E K Q Y N H F

I N K N S E K R F N H F

K P T F V D K E N I L N I

Q R A T F I

K O N C D K I I F S

K O N C D K I I F S

S K K L Q N T K I L

rrtt&jKXi F L C Q S F E K T I G M V N R D D V V Y C D

m(21_I<rpn I K V G D F E K A I V D V R T . . G D F V Y F D

d n a _ b f * 4

C EE KC AS LS L

6 Y

S F

S FH FH F

A D 8 M A RD Y I F E T

D E T L A M

K D V K I L

K D V K I L

A D DY N D SL Q V .D GD G

A S V V YS » F V FG D V V Y

D F V Y. D F V Y

C D

L D

C DV DV D

mn«_mort» S G . D F E L V L A H L P N N F P C L F Y L D

P Y IP Y IP Y AP Y OP Y DP Y LP Y LP Y R P I

2 0 0 2 1 0G R H V D Y F N S W G E R D E R

P L S E T S A F T 3 Y T H E O F S F A D Q VP L S A T A N F T A Y H T N S F T L

S V F T D Y G YG T F S G Y H T

E Q Q AC S F Q K E E H VD G F T E D D Q Y

I T V A D Y N K F W S E D E E KI T V A D Y N K F W S E E E E K

S D T A S F T S Y S D N G F D D N E Q K

nt>5_M0l L L F E T L S S L N .rrt21_aipn R L R D A F<*™_«coi H L A E I ArMC1_chvn1 R L S N F F<*TM7_«CC* H L A 3 V Ldm»_b(jt4 O L L N L Lctn»_bpC2 D L L N L L

R L S D T G A Y V MG L V E R H I P V L

T W H H N D Y R E N KL SI SG AV S

K T T K N K C L M V IE Y R 8 S E G H P V I ...D S L N D R G I K F G Q S N V L ED S L N D R G I K F G L S N V L E

t1»_morto R L A N F C K K l D K L G H Y F L L

N S SN H D

T DN S DH H GH H G

S AT M

T SE N

Y V R D L W S S F R I L T K EL V E E L Y K D FL T R E W Y Q R AF I R E L Y D G YL I R S L Y R N FT L L K E W S . .T L L K E W S . .

HI H Y V EK L H V V KI H T E Y ET H H Y I K

N S D P K N T N S S D E F F D E L Y Q D F K I E R I Q

2 8 0 2 7 0 2 a 0

H F Y H V G A S E K N R S P M V E A L I T N I A K D I I D H I E K S S Q D I L V I E E

A T R T N G A K S S S R G K I S E I I V T N

V R R S I S S N G G T R K K V D E L L A L . .

K K Y R . F K L H S G R V O D E I N T T

A K R S I G V S A G E S K S A T E I I A V S G A R

. K K Y N V K H L N K K Y V F N

. K K Y N V K H L N K K Y V F N .

S A N S N G R K K V N E I I V S N . . .1»_morto A N R T I

Y E K . . . . .. . . Y K P G V V S P A K K . .

H L V I K N YC W V G F D P S R G V D S S A V Y E V R V

I Y H S K E K N G T D E V Y I F N. I Y H S K E K N G T O E V Y I F N . .

G V

Fig. 2. Multiple sequence alignments constructed with all proteins sufficiently similar to MEcoRV by (a) PHD and (b) SSP. The numbers given correspondto the ami no acid sequence of MEcoRV.

416Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 5: Structure prediction of the EcoRY DNA methyltransferase based on ...

M-EcoR\ structure prediction

M.H/ial

PHDSSPX-ray

MquencaPHDSSPX-ray

uquancaPHDSSPX-ray

M l E l K D K Q L

H H H H H H H

T

7 0P

G

E

D

LtEE

H

RtE6

D

E

FtEE

1tEE

1EEE

IEEE

DEEE

CEEE

LEEE

A

EE

G G F R L A LHHH

HHH

HHH

HHH

HHH

2 0A G L

9 0 9 0G F P C Q A F S I S G K Q K

H H H

E E E E

3 0E S C G A E C V Y

E E E E E EE E E E E EE E E E E E

4 0S N E W D K Y A

H H

5 0V Y E H N F

H H

H H H H HH H H H HH H H H H H H H

1 0 0 1 1 0G F E D S R G T L F F D I A R I V R E K K P

H H H H H H H H H IH H IE E E E E E E E E E I H H H H H

H H H H H H H I H H I H

e oE G O

1 2 0NV

EEE

V

EE

hE:EE

MEEE

bEEE

1 3 0V K N F A S H D N G N T LH H H H

1 4 0 1 5 0 1 6 0E V V K N T M N E L D Y S F H A K V L N A L D Y G I P Q K R E R I

E E E E E IEH H H H H H H H H H HH H H H H H H H H H HH H H H H H H H H H H H H

1 7 0 I S OY M I C F R N D L N I Q N F

E EE E E E EE E E E E E E E E

E E E E E E EE E E E E E EE E E E E E E

1 0 0 2 0 0Q F P K P F E L N T F V K D L L L

H H H H HE E E E E E

E E E E E E E E E E

2 1 0. . . - . . 2 2 0 2 3 0 2 4 0P D S E V E H L V I D R K D L V U T N Q E I E Q T T P K T V R L G I V G K G G Q G E R

E E EE E E E E E

Mqutnca Q F GPHD [H"|SSP IHIX-ray I H J H H

3 1 0V I N V L Q Y IH H H H HE E E E E

3 2 0A Y N I G S S L N F K P Y

^E

HHH

hiHH

MHH

yHH

HHH

HHH

HHH

HHH

HHH

H H H H H H

E E E E EE E E E E E E E E E E E

E E E E EE E

2 3 0 2 8 0 2 7 0u q u « n c a I Y S T R O I A l T L S A Y G G G I F A K T G G Y L V N G K TP H D E E E E E E [T|SSP E E E E EX-ray E E E E E E E E E EE E IEIE E E E E E

2 9 0 2 0 0 3 0 0R K L H P R E C A R V U G Y P D S Y K V H P S T S Q A Y K

IH H H H H H IH E IE E I IH H H H HI H H H H H H E E E H H H H HIH H H H H H H H H H H H H H

PHDSSPX-ray

MquencsPHDSSPX-ray

Mqu«ncaPHDSSPX-ray

PHDSSPX-ray

PHDSSPX-ray

wquoncaPHOSSPX-ny

PHDSSPX-ray

M G L P P L L S1 0

L P S N S A P R S L2 0 3 0 4 0G R V E T P P E V V D F U V S L A E A P R G O R V

H H H H H H H H H H H H H IE EE E E E E E E | E E

H H H H H H H H H H H H

S 0E P A C A H G P F L

8 0R E A

ElE E E E E

H H H H H H H H HE E E

H H H H H H H H H H

H G T G Y R F

H

7 0V G V E I

E E

E E E E E E

8 0 0 0 1 0 0 1 1 0 1 2 0D L P P W A E G I L A D F L L W E P O E A F D L I L O N P P Y G I V G E A S K Y P I H

H H H H H H H H H H H H H H H H H E E |E El H H HE E E E E E E E E E E E E E E E E E E E E EE E E E E E E E E E E E I I

1 3 0 1 4 0 1 5 0 1 8 0 1 7 0 1 B 0V F K A V K D L Y K K A F S T W K G K Y N L Y G A F L E K A V R L L K P G G V L V F V V P A T W L V L E D F A L L R E F

E H H H H H H H IH H IH H H HHHH

HHH

HHH

HHH

HHH

WHH

HH

HHH

HHH

HHH

HHH

HHH

E E H H H H H H H H H H H H

H HE E E E E E E E E E

H H H H H H H H H H H

H H HH H HH H H

H H

EEE

E E E E E E E EE E E E E E E EE E E E E E E E

E E E E E E H H l H H I E E E EH H H H H H

1 0 0 2 0 0 2 1 0L A R E G K T S V Y Y L O E V F P Q K K V S A V V I R F Q K S G K GH H H

H H H H

E E E E E E E EE E E E E E E EE E E E E E E E

E E E E E E EE E E E E E EE E E E E E E E E

2 2 0L W D T Q E

2 3 0 2 4 0E E G F T P I L W A E Y P H W E G

E E EE E EE E E E E E E E E E

E I R F E TH H H H H H H

E E | E | E E E E E E

2 5 0 2 8 0E E T R K L E I S G H P L G D L F H I R F A A R S P

E E E E E EE E E E E E E E E E

H H H H H H H H H H E E H H H H H

E El

2 7 0 2 8 0 2 0 0 3 0 0E F K K H P A V R K E P G P G L V P V L T G R N L K P

E E E IH H H H H HE E E | E E E E E H H H H H H E E E E E

E E EE E EE E E E E

3 1 0 3 2 0 3 3 0 3 4 0 3 5 0 3 8 0O W V D Y E K N H S O L W U P K E R A K E L R D F Y A T P H L V V A H T K G T R V V A A W O E R A Y P W R E E F H L L P

H H H H H H H H H E E E E E E E E E E EE E E E E E H H H H H H H H H H H E E E E E E E E E E E

E E E E E E E E E E E E E E E E E E E E E E E

3 7 0 3 8 0 3 0 0 4 0 0 4 1 0 4 2 0

K E G V R L D P S S L V Q W L N S E A U Q K H V R T L Y R 0 F V P H L T L R U L E R L P V R R E Y G F H T S P E 8 A R N

H H H H H H H HE E E E E

H H H H H H H H H H H H H H H H H H H HE E E E E E E E E E E E E E E E E

E E E H H H H H H H H H H H H H H H H H H H H H H H H

H H H HH H H HH H H H

H H HH H H HH E E E

E. E EE E E

Fig. 3. PHD and SSP secondary structure prediction of MHhal and MTaql. The results of profile secondary structure prediction by PHD flanes labelledPHD) and SSP (lanes labelled SSP) are shown. The secondary structure as deduced from the X-ray structure of both enzymes is given (Cheng et al, 1993;Labahn et al., 1994). Regions which are consistently predicted by PHD and SSP are boxed.

Profile secondary structure predictionTwo methods for profile secondary structure predictions, PHD(Rost and Sander, 1993, 1994) and SSP (Mehta et al, 1995),were employed as starting points for a structure prediction ofMEcoRW. Both programs constructed slightly different mul-tiple alignments for the prediction of the M-EcoRV structure;PHD identified one homologous protein, the GATC-specificadenine DNA methyltransferases MDpnU, whereas SSP, in

the configuration used, found six homologous enzymes whichare GATC-specific adenine methyltransferases and the CviBImethyltransferase which methylates GANTC. It must bepointed out, however, that the number of homologous proteinsis too low to achieve the optimal prediction accuracy in bothcases. The secondary structure prediction of both programsfor MEcoRV are compared in Figure 4 (lines labelled PHDand SSP). Thirteen regions are predicted to be composed of

417Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 6: Structure prediction of the EcoRY DNA methyltransferase based on ...

A Jeltsch, T.Sobotta and A.Pingoud

Sequence

InDel

PHD

P H D mut

S S P ~

SSP_mut

Pred.

K S Q G I K T K L V ' P C I K R I V P K N F N G V W | V | E | P | F 1 M [ G ] T p 5 ] V V A F N V A P K D A L L C | DM K D K V F V P P

I I G E E HHHH

HoA

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HH

H

HH

H

HH

H

HH

H

E E E E EE E E E E

E E E E E EE E E E E E

E E E E E E81

E E E E E E EE E E E E E

E E E E E E EE E E E

H H H H H H

H H H H H H H HH H H H H H

E E E E E E82

7 0 1 0 0 1 1 0

SequenceInDel

PHDPHD_mutSSPSSP_mut

Pred.

SequenceInDel

PHDPHD_mutSS Pred.SS Pred_mut

Pred.

SequenceInDel

PHDPHD_mutSSPSSP.mut

Pred.

P H L

SequenceInDel

HHHH

H

HHHH

H

HH

H

H H H H H H H H

H H H H H H H

E E E E H H H H H H

E E E E H H H H H

E E E E E E E

H H H N H H H E ^ " " E E E E H H H H H H H H H H H H H H H H H E E E E EH H H H H H H H H E E H H H H H H H H H E E E E E E E

E E E E E E H H H H H H H H H E E E E E E E E f E E E E E E BH H H H H H H H H H H H H H H H E E E E E E E E E E E E E E E

[ H H H H H H H H HuD

- 4 catalytic domain

1 4 0

• • I •

E E E Eu

recognition domain

1 5 0

I

H H H H H H H H H E E E E E E S

1 7 0

S C F N G M I R F N S K G G F N V P F C K K PI i I

N R F A Q A Y I T K I S N Q V D R I S E I I S K G N Y T F L C Q S F E K T

i I I i i I d i i

E E E E EE E E E EE E E E EE E E E E

E E E E Ess

H H H H H H H H H H H HH H H H H H H H H H H H H H

H H H H H H H H H H

HHHH

HaG

HHHH

H

H

HH

H

HHHH

H

HHHH

H

HHHH

H

HEHH

H

HEHH

H

H HH H

E E E E EE E E H H

E E E E E

E E E E E87

2 0 01 9 0

I _ _ _ _ _I G M V N R D D V V Y C | D | P | P | Y | I G R H V D Y F N S | i

i d d

2 1 0 2 2 0

H H H H H HE H K H H HE H H H H H

H H H H H

— recognition

2 4 0

1 G E R D E R L L F E T L S S L N A T F I T S T W H H N D Y R E N K

d d I i i I i i i i

H H H HH HH H H H

H H H H

domain

E E

E E EE E EE E EE E E

E E E68

E-

E E E E

catalytic domain

2 5 0

• I

H H

HHH

HH

HHHH

Hal

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HHHH

H

HHH

H

E E E E EE E E E E

E E E E E EE E E E E E

E E E E E E69

Y V R D . L W S S F R I L T K E H F Y H V G A Sd i I i i i i i i i i i i i i i

2 7 0 2 8 0 2 9 0

I I IE K N R S P M V E A L I T N I A K D I I D H I E K S S G D I L V I E E

i i i i i i i i i i i d

PHDPHD_mutSSPSSP_mut

Pred.

HHEH

H

HH

HH

B EE

H

E

H

HH£E

H

m i

HHE

H

m:::>

H HH H

H H

HH

H

HEE

E610

EEEE

E

EEEE

E

EEEE

E

EEEE

E

EEEE

E

EEEE

E

EE

E

EE

EE E

E E E E E E

E E E E E EE E E E E

E E E E E E

E E E E E E E811

H H H H H H H H H

H H H H H H H H H

E E E E E E E E E

E E E E E E E

E E E E E EB12

E E E E E 'E E E E E

E E E E E E EE E E E E E E

E E E E E E EB13

Fig. 4. Secondary structure prediction of MEcoRV. Amino acid residues which are similar to residues at equivalent positions in the MTaql or MHIialstructures are boxed. InDel gives the location of insertions and deletions in the alignments computed by PHD and SSP. PHD and SSP show the secondarystructure prediction by PHD or SSP, respectively, using wild-type MEcoRV as input. PHD_mut and SSP_mut show the corresponding predictions using theneutral mutations of MEcoRV for additional information. Pred. gives the final prediction obtained employing all of the information available. The predictedsecondary structure elements are numbered consecutively. Structure elements that are consistently predicted by PHD and SSP are boxed and those that arepredicted using the neutral mutations for additional information are shaded. The regions of the predicted catalytic and DNA recognition domains areindicated.

a-helices or P-strands by both methods (boxed in Figure 4;for a discussion of the region between amino acids 45 and 48,see below) and, hence, are likely to be predicted correctly.Moreover, the boundaries between secondary structure ele-ments are well defined in most cases by the predictions andalso the occurrence of insertions and deletions in the alignments(Figure 4).

Refinement of the secondary structure prediction usingneutral mutationsTo refine and verify these predictions, we carried out an in vitroevolution experiment aimed at identifying as many neutralmutations of the MEcoRV protein as possible. To this end,we carried out five cycles of random mutagenesis and selectedfor active mutants to analyse which amino acid exchanges are

418

Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 7: Structure prediction of the EcoRY DNA methyltransferase based on ...

M-ficoRV structure prediction

Table I. Compilation of neutral mutations found by the in vitromutagenesis/selection procedure

No. Amino acid exchange(wt —» mutant)

Corresponding position inM-Hhal

123456789

1011121314151617181920212223242526272829

lie 15 -> MetIle26 -> MetVal34 -> ThrVal34 -> AlaVal45 -> AspLys52 -> ArgAsp58 -» GluLys71 -» GinValSO -> AlaLeu84 -> PheGlu 104 -> LysPhe 123 -» LeuPhe 129 -> ValSerl31 -» AsnLys 141 -> ArgAsn 169 —> AspMet 183 -» ValHe 197 -> ValArg210 -> CysLeu222 -> SerGlu238 -> AspAsp244 -> ValPhe249 -> SerLeu252 -» SerGlu255 -> GlyAsn266 —> SerIle275 -> Thr8

Ala279 -> ThrLeu294 -> Met

302322

13132532405862

TRD b

TRDTRDTRDTRDTRDTRDTRD

8397

110130139144147150161171175_

"Not found by the mutagenesis/selection procedbTRD: target DNA recognition domain.

lure.

compatible with enzymatic activity and, hence, structurallytolerated by the protein. We sequenced 21 of the clones codingfor an active EcoRV methyltransferase. On average, each clonecontained 2.5 amino acid exchanges, and at most we foundsix mutations in one clone. In total, we identified 28 neutralmutations (Table I), most of them being found in duplicate orin triplicate in independent clones. One additional mutation(Ile275Thr) was identified in a similar approach directedtowards the isolation of inactive single M-£coRV mutants.For the secondary structure prediction with the mutants, weconstructed a hypothetical M-EcoRV sequence that containedall neutral mutations identified and used this gene as input forthe secondary structure prediction programs. This approach isjustified as the mutations do not cluster (cf. Table I). Theresults are shown in Figure 4 (lines labelled PHD_mut andSSP_mut). As expected, there are differences in the secondarystructure predictions based on the sequences of wild-typeM-EcoRV and the hypothetical MEcoRV hypermutant protein.These differences are especially useful in regions where theoriginal predictions of both programs disagree, when only thewild-type sequence is used. At seven ambiguously predictedsecondary structure elements the results obtained for themutated gene could be used to decide for one alternative. Atposition 178-185 PHD predicted a p -̂sheet whereas SSPpredicted an a-helix. Using the M-EcoRV hypermutant proteinsequence as input, PHD also predicted an a-helix. Similarly,at amino acid position 198-201 only SSP predicted a p-strand,but this strand was not predicted for the hypermutant protein.Several other regions could be refined by the mutant analysis:

• a P-strand is unlikely between Lys4 and Pro8;• an a-helix is probable between Asp78 and Arg87;• an a-helix is unlikely between Asnl08 and Pro 112;• the boundaries of P-strand 5 become much better defined;• the region of a-helix J acquires a higher probability of

being helical.

Taken together, the secondary structure of additional 35 aminoacids residues can be predicted by this method and threeregions can be shown probably not to have a regular secondarystructure. All regions in which the neutral mutations served todecide between two different predictions are shaded in Figure 4.

No significant deviations between the secondary structurepredictions based on the wild-type and the hypermutant proteinsequences are observed in those 13 secondary structure ele-ments that are predicted both by PHD and SSP for the wild-type protein to have the same secondary structure. This findingsupports the conclusion that these elements were predictedcorrectly.

Secondary structure predictions of nine inactive singlemutants which were produced by a similar in vitro evolutionapproach differed for the Phel72Ser mutant from the secondarystructure predictions of the wild-type enzyme. For this variantPHD does not predict a p-sheet between Tyrl70 and Cysl74.This result shows that even a single mutation can change theprediction of a whole secondary structure element, lendingcredence to the effectiveness of our approach.

Topological alignment of M-EcoRV and known structures ofmethyltransferasesAs the structures of the catalytic domains of M-Hhal andM-TaqI are similar to each other (Schluckebier et al., 1995),although both enzymes belong to different families of methyl-transferases, it was reasonable to test if the catalytic domainof MEcoRV could also have a similar structure. Indeed, theorder of the predicted secondary structure elements in theC-terminal half of MEcoRV corresponds to the commontopology of the catalytic domains of M-Hhal and M-Taql. Wetherefore aligned the predicted secondary structure elementsof the whole enzyme on the topology of M-Hhal and M-Taql(Figure 5).

The motifs characteristic of adenine methyltransferases wereused as starting points in the comparison (Wilson, 1992;Malone et al, 1995). Beginning at the DPPY region (motifIV), the predicted secondary structure elements between aminoacids 158 and 277 in M-EcoRV correspond to those of thex-ray structures. In analogy with the structure of M-Taql, wepredict a pVsheet C-terminal to Asn277. At this position aP-sheet is predicted by SSP, but not, however, by PHD. Thefollowing P-sheet (Asp292-Glu298) corresponds to the M-Taqlstructure. The secondary structures N-terminal to the F_G_Gmotif (motif I) also are similar to those in the known methyl-transferase structures. However, the P-sheet predicted betweenThr42 and Asn58 most probably does not exist, because theF_G_G motif (Phe39-Gly43 in MEcoRV) is located in a loopin all three known methyltransferase structures. This loopalways is directly followed by an a-helix. Hence we suggestthat Val45-Val49 form an a-helix. This helix is expected tobe followed by a P-strand. Based on the locations of theinsertions and deletions in the alignments, this strand probablybegins with Asp53. The occurrence of motif II (Malone et al,1995) suggests that P-strand 2 ends with Asp58. The followinga-helix is very short in M-Taql (helix B in M-Taql, Figure 5).According to the prediction it could correspond to Pro61-

419Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 8: Structure prediction of the EcoRY DNA methyltransferase based on ...

A Jeltsch, T.Sobotta and A.Pingoud

a)M.Hhai

torecognition

domain

Q82from

recognitiondomain

M.Taql

torecognition

domain

fromrecognition

domain

torecognition

domain

motifIV

motif

X

motifI

motifII

motifIII

Fig. 5. (a) Topological drawing of the structures of MHhal and MTaql. Helices are shown as cylinders, p-strand as arrows. This figure is adapted fromSchluckebier el al. (1995). Similar amino acids at corresponding positions in both proteins are boxed, (b) Topological alignment of the predicted secondarystructure elements in the catalytic domain of MEcoRV on the structure of MHhal and MTaql (a). Amino acid residues of MEcoRV which are similar toresidues at equivalent positions in the MTaql or MHhal structures are boxed. The locations of the motifs X, I, II, III and IV are given (Malone et al., 1995).The positions of the amino acid exchanges leading to catalytically inactive mutants are indicated by asterisks.

420

Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 9: Structure prediction of the EcoRY DNA methyltransferase based on ...

M-£coRV structure prediction

Leu63. A following ^-strand is predicted by SSP. a-Helix Dof MEcoRV starts with motif HI, suggesting that this helixcorresponds to the helix C of MTaql and MHhal (Figure 5).It should be noted that the regions between Gly43 and Asn60and also Ile64 and Leu70 are the only part of MEcoRV inwhich the secondary structure predictions by PHD and SSPcould not readily be aligned to the topology characteristic ofthe two methyltransferase structures. The failure of secondarystructure prediction of this region might be explained by itsproximity to the active site of the enzyme (Jenny et al, 1995).

Taken together, 18 secondary structure elements of thecatalytic domain of M-EcoRV are predicted, 11-of them bythe PHD and SSP programs, four using 29 neutral mutationsin the MEcoRV protein as additional information and four byanalogy with the structures of MHhal and MTaql. With theexception of fJ-strand 10, which is completely deleted inthe phage T2 and T4 dam methyltransferases, the predictedsecondary structure elements in M-EcoRV do not span gapregions of the multiple alignments used (Figure 5). All connect-ivities between the secondary structure elements are identicalbetween MEcoRV, MHhal and MTaql. The final structureprediction deviates in about 30% of all residues from thesecondary structure predicitons of PHD or SSP, respectively.This roughly corresponds to the reported accuracy of thesemethods (Rost and Sander, 1993, 1994; Mehta et al, 1995).This structural model is the first for a-A'-methyltransferases;it suggests that a-N-methyltransferases have a similar structureto Y-N-methyltransferases, C-methyltransferases and type Imethyltransferases (Dryden et al, 1995).

Comparison with alternative structure predictions ofMEcoRVRecently, a global structure prediction for A'-methyltransferaseshas been published (Malone et al., 1995). It is based on twocrystal structures, the locations of nine amino acid motifs inN-methyltransferases and the assignment of these motifs tothose observed in C-methyltransferases. The global structureprediction presented there and the specific structure predictionpresented here are very similar, although the borders of thesecondary structure elements could be defined more preciselyin the specific structure prediction. The two predictions deviatefrom each other in the C-terminal region of MEcoRV (follow-ing Thr230). In this region the global structure prediction restson the observation of two rather spurious motifs, whoserelationship to the motifs observed in C- and Y-iV-methyltrans-ferases is not clear. As part of this prediction is not inaccordance with the specific secondary structure predictionpresented here, we conclude that a refined secondary structureprediction might be useful, to improve a structure prediction,that is based only on lead structures and sequence motifs. Onthe other hand, it must be pointed out that the prediction ofMEcoRV helix B, strand 2 and strand 3 would have beenimpossible without the identification of two motifs within ornearby these regions (Malone et al, 1995).

Helical wheel plots of a-helices predicted to occur in thecatalytic domain of M-EcoRVBased on the structure prediction presented here, the catalyticdomain of MEcoRV contains six a-helices which correspondto helices in the catalytic domain of MHhal and MTaql. Thehelices corresponding to MEcoRV helices A and B are buriedin the MHhal structure, whereas those corresponding to thehelices D, I and J are exposed at the surface of the protein.Helical wheel analyses of these regions (helix C is too short

hallx A (18-2S)

R KP _ C

K W LI

V I

helix B (43-49)

VF N

G 9 vA V

helix D (78-86) helix I (213-221)

Y D L L

L ^ P LRV F

helix J (239-248)

W N

KS

Fig. 6. Helical wheel projections of the a-helices predicted within thecatalytic domain of M£coRV.

for a meaningful helical wheel projection) demonstrate thathelix D, the C-terminal part of helix I and helix J areamphiphatic, whereas helices A and B are hydrophobic (Figure6). This result supports the prediction of the helices D, I andJ. Furthermore, it supports the assignment of the MEcoRVhelices to corresponding helices in M-Hhal.

Isolation of catalytically inactive single mutants of M-EcoRVTo map catalytically important regions in the MEcoRV protein,we carried out random mutagenesis experiments and selectedfor clones devoid of a functional methyltransferase activity.Forty clones were sequenced. About one third of them con-tained more than one mutation and, hence, are not meaningfulfor the analysis, as one cannot correlate inactivity with oneparticular amino acid exchange. A considerable fraction ofclones (about one quarter) had single nucleotide deletions inthe coding region of the gene, leading to a frameshift. Thesevariants also are not informative. We could identify ninemutants containing only one amino acid exchange (Table II),many of them being found more than once. The inactive singlemutants were expressed and purified, but only the Trp231Argvariant turned out to be soluble. All other proteins remainedin the pellet after cell disruption. The Trp231Arg mutant iscatalytically inactive in vitro. However, as shown by gel shiftanalyses, the DNA binding ability of the Trp231Arg mutantis very similar to that of wild-type MEcoRV (data not shown).Hence this mutant behaves like a bona fide active site mutant.

Localization of the mutations in the structureIt is tempting to localize the mutations identified in the catalyticdomain of MEcoRV in the framework of the structure ofMHhal (Table I). Three classes of neutral mutations wereobtained, as follows, (i) Conservative exchanges at positionsthat are buried (Hel5, De26, Asp58, De275 and Ala279). Allmutations of buried residues identified are conservative withrespect to the physical properties of the amino acid residues.Most of the residues are hydrophobic; the only exception,Asp58, forms a contact to AdoMet which could be preserved

421Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 10: Structure prediction of the EcoRY DNA methyltransferase based on ...

AJeltsch, T.Sobotta and A.Pingoud

Table II. Compilation of catalytically inactive single mutants found by thein vitro mutagenesis/selection procedure

Amino acid exchange(wt —> mutant)

Biochemicalcharacterization

Localization in theMEcoRV structureprediction

VaJ20-iLys81 -Phell5 •

• Ala• Arg-> Scr

Serl21 -> ProPhel39 •Phel72 -

Cysl92Asp 193

-» Leu-» Ser

-» Arg-> Gly

Trp231 -> Arg

insolubleinsolubleinsolubleinsolubleinsolubleinsoluble, probablymisfoldedinsolubleinsolublebinds DNA, catalyticallyinactive

motif X"motif IIITRD b

TRDTRDTRD

motif IVmotif IVcatalytic centre

'Motif classification as described by Malone et al. (1995); motif X, 11-29;motif III, 74-83; motif IV, 187-204.bTRD: target DNA recognition domain.

in the Asp58Glu mutation, (ii) Conservative exchanges ofsurface exposed residues at Val34, Val80, Del97, Glu238 andAsn266. (iii) Non-conservative exchanges of surface-exposedresidues at Val45, Lys71, Arg210, Leu222, Asp244, Phe249,Leu252 and Glu255. The fact that some of these presumablysurface-exposed residues are hydrophobic might explain whyMEcoRV tends to aggregate already at micromolar concen-trations.

The location of the inactive mutants is shown in Table n.Four of five mutations identified within the putative catalyticdomain of MEcoRW are located within the amino acid sequencemotifs X, in and IV (Malone et al, 1995). This result confirmsthe importance of these motifs. Strikingly, the only amino acidexchange not located within one of these motifs, Trp231Arg,is located in the loop connecting f$-strand 9 and helix J. Thecorresponding loop in MHhal forms part of the binding sitefor the target cytosine, when looped out of the DNA helix.By analogy, this loop has been suggested also to be part ofthe active site of N-methyltransferases (Malone et al, 1995).As described, the properties of the Trp231Arg mutant suggestit to be an active site mutant, confirming the putative role ofthis loop in catalysis. As Trp231 is not conserved even withinthe closely related methyltransferases (Figure 2), it is unlikelythat this residue by itself has a catalytic role. It rather appearsas if the Trp—>Arg mutation has disturbed the conformationof the loop, thereby preventing catalysis. Taken together, onthe basis of the structure prediction for M-EcoRV, importantfunctional roles can be assigned to all amino acid residuesfound to be exchanged in catalytically inactive single mutants.On the one hand, this result supports the structural model, andon the other it demonstrates the great potential of randommutagenesis/selection approaches for investigation of struc-ture-function relationships in proteins.

Structure of the DNA recognition domainAccording to our prediction, the DNA recognition domain ofMEcoRV is located between a-helix D and (J-strand 8. Basedon alignments of MEcoRV and DNA methyltransferasesrecognizing the related sequence GATC in addition to methyl-transferases, which recognize unrelated sequences, one part ofthis region (Phel23-Del51) was suggested previously to beresponsible for DNA recognition in M£coRV (Guschlbauer,1988). The length of the proposed recognition domain (93

amino acid residues) is in good agreement with the length ofthe various recognition domains in C'-methyltransferases. Forexample, the small domain of MHhal comprises 81 aminoacid residues (Cheng et al, 1993).

Unfortunately, the profile methods employed here are notwell suited for the prediction of the secondary structure of theMEcoRV DNA recognition domain. As these methods workwith sequence profiles, their structure predictions apply to analignment rather than to a real protein. This point becomesevident in cases where the structures of the proteins used forthe alignment- deviate from each other. Such deviations are tobe expected in the DNA recognition domains of enzymeswhich are designed by evolution to recognize different DNAsequences. Unfortunately, MEcoRV is the only enzyme in theset of homologous proteins which recognizes GATATC, mostof the others being directed towards GATC. Since SSP usedseven proteins recognizing sequences different from GATATC,but PHD only one, the predictions of PHD for amino acidswithin the recognition domain appear to be more relevant forthe structure analysis of MEcoRV. Having this in mind, (3-strand 4 and a-helices E and F were predicted. As two of theinactivating mutations were on fi-strand 5 (Phell5Ser andSerl21Pro), one might speculate that this strand has animportant structural or functional role. It should be pointed outthat the mutant profiling approach circumvents this difficulty,because active mutants can be considered with great confidenceto have a structure nearly identical with that of the wild-type protein. This is another important advantage of mutantprofiling; it is expected that the generation of more neutralmutations within the DNA recognition domain in the futurewill allow for a refinement of the structure prediction ofthis domain.

ConclusionWe have presented a refined secondary structure predictionand a prediction of the toplology of the secondary structureelements of M-EcoRV. The model suggests that the catalyticdomains of a-adenine methyltransferases have a similar struc-ture to y-adenine methyltransferases and cytosine methyltrans-ferases. It is in accordance with a structural model derived onthe basis of amino acid motifs conserved among adeninemethyltransferases (Malone et al, 1995). Moreover, we haveidentified a number of catalytically inactive single mutants ofMEcoRV. These mutants demonstrate the importance of themotifs defined by Malone et al. (1995) and support the locationof the active centre of the enzyme.

AcknowledgementsTechnical assistance by Ms H.BUngen is gratefully acknowledged. We thankDr W.Wende for his help with the cloning procedure. Thanks are due to DrsX.Cheng, B.A.Connolly, D.T.F.Dryden, W.Saenger and G.Schluckebier for thecommunication of results prior to publication and Dr X.Cheng and DrW.Saenger for making available coordinates of the MTaq] and MHhalstructure, respectively. This work was supported by grants from the BMBFand the Fonds der Chemischen Industrie.

ReferencesBougueleret,L., Schwarzstein.M., Tsugita,A. and Zabeau,M. (1984) Nucleic

Acids Res., 12, 3659-3676.Cheng.X. (1995) Curr. Opin. Struct. Biol., 5, 4-10.Cheng,X., Kumar.S., PosfaiJ., PflugrauV.W. and Roberts.RJ. (1993) Cell,

74, 299-307.Chothia,C. and Lest^A.M. (1986) EMBO J., IS, 823-826.Dryden.D.T.F., Sturrock,S.S. and Winter.M. (1995) Nature Struct. Biol, 2,

632-635.

422

Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018

Page 11: Structure prediction of the EcoRY DNA methyltransferase based on ...

M-EcoRV structure prediction

Guschlbauer.W. (1988) Gene, 74, 211-214.HeitmanJ. (1993) In SetlowJ.K. (ed.), Genetic Engineering. Plenum Press,

New York, Vol. 15, pp. 57-108.Hornby.D.P. (1993) Methods Mol. Biol., 16, 201-211.Jeltsch.A., Maschke.H., Selent,U., Wenz,C, Kohler,E., Connolly.B.A.,

Thorogood.H. and Pingoud.A., (1995) Biochemistry, 34, 6239-6246.Jenny.T.R, GerloffJD.L., Cohen,M.A. and Benner.S.A. (1995) Pwteins: Struct.

Fund. Genet., 21, 1-10.Klimasauskas.S., Kumar.S., Roberts.R J. and Cheng,X. (1994) Cell, 76,

357-369.LabahnJ., GranzinJ., Schluckebier.G., Robinson.D.P, Jack.W.E.,

Schildkraut,I. and Saenger.W. (1994) Proc. Natl Acad. Sci. USA, 91,10957-10961.

Lauster,R., Kriebardis,A. and Guschlbauer.W. (1987) FEBS Lett., 220, 167-176.

Malone.T., Blumenthal,R.M. and Cheng,X. (1995)7. Mol Bwl., 253, 618-632.Mehta.P.K., HenngaJ. and Argos.P. (1995) Protein Sci., 4, 2517-2525.Myers,R.M., Lerman.L.S. and Maniatis.T. (1985) Science, 229, 242-247.Noyer-Weidner,M. and Trautner.T.A. (1993) In JostJ.P. and Saluz.H.P. (eds),

DNA Methylation: Molecular Biology and Biological Significance.Birkhauser, Basle, pp. 40-108.

Nwosu.V.U., Connolly,B.A., Halford.S.E. and GarnetU. (1988) Nucleic AcidsRes., 16, 3705-3720.

Razin.A. and Cedar.H. (1994) Cell, 77, 473-^76.Reinisch.K.M., ChenX-. Verdine.G.L. and Lipscomb.W.N. (1995) Cell, 82,

143-153.Roberts.R J. and Halford.S.E. (1993) In Linn.S.M., Lloyd,R.S. and Roberts.R J.

(eds), Nucleases. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,NY, 2nd edn, pp. 35-88.

Rost,B. and Sander.C. (1993) J. Mol. Biol., 232, 584-599.Rost.B. and Sander.C. (1994) Proteins, 19, 55-77.SaliA and Blundell,T.L. (1990) /. Mol. Bwl., 212, 403-428.Schluckrbier.G., O'Gara,M., Saenger,W. and Cheng^X. (1995) J. Mol. Biol.,

247, 16-20Smith.H.O., Annau.T.M. and Chandrasegaran.S. (1990) Proc. Natl Acad. Sci.

USA, 87, 826-830.Thielking.V. et at. (1991) Biochemistry, 30, 6416-6422.Wende.W. (1994) PhD Thesis, Universitat Hannover.Wenz.C, Selent,U., Wende.W., Jeltsch^-, Wolfes.H. and Pingoud^V. (1994)

Biochim. Biophys. Acta, 1219, 73-80.Wilson.G.G. (1992) Methods Enzymol, 216, 259-279.

Received December 14, 1995; revised January 29, 1996; accepted February5, 1996

423Downloaded from https://academic.oup.com/peds/article-abstract/9/5/413/1463479by gueston 14 March 2018