Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by...

35
RESEARCH ARTICLE Open Access Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes: implications for the synthesis of the osmoprotectant glycine betaine Rosario A Muñoz-Clares 1* , Héctor Riveros-Rosas 2 , Georgina Garza-Ramos 2 , Lilian González-Segura 1 , Carlos Mújica-Jiménez 1 and Adriana Julián-Sánchez 2 Abstract Background: Plant ALDH10 enzymes are aminoaldehyde dehydrogenases (AMADHs) that oxidize different ω-amino or trimethylammonium aldehydes, but only some of them have betaine aldehyde dehydrogenase (BADH) activity and produce the osmoprotectant glycine betaine (GB). The latter enzymes possess alanine or cysteine at position 441 (numbering of the spinach enzyme, SoBADH), while those ALDH10s that cannot oxidize betaine aldehyde (BAL) have isoleucine at this position. Only the plants that contain A441- or C441-type ALDH10 isoenzymes accumulate GB in response to osmotic stress. In this work we explored the evolutionary history of the acquisition of BAL specificity by plant ALDH10s. Results: We performed extensive phylogenetic analyses and constructed and characterized, kinetically and structurally, four SoBADH variants that simulate the parsimonious intermediates in the evolutionary pathway from I441-type to A441- or C441-type enzymes. All mutants had a correct folding, average thermal stabilities and similar activity with aminopropionaldehyde, but whereas A441S and A441T exhibited significant activity with BAL, A441V and A441F did not. The kinetics of the mutants were consistent with their predicted structural features obtained by modeling, and confirmed the importance of position 441 for BAL specificity. The acquisition of BADH activity could have happened through any of these intermediates without detriment of the original function or protein stability. Phylogenetic studies showed that this event occurred independently several times during angiosperms evolution when an ALDH10 gene duplicate changed the critical Ile residue for Ala or Cys in two consecutive single mutations. ALDH10 isoenzymes frequently group in two clades within a plant family: one includes peroxisomal I441-type, the other peroxisomal and non-peroxisomal I441-, A441- or C441-type. Interestingly, high GB-accumulators plants have non-peroxisomal A441- or C441-type isoenzymes, while low-GB accumulators have the peroxisomal C441-type, suggesting some limitations in the peroxisomal GB synthesis. Conclusion: Our findings shed light on the evolution of the synthesis of GB in plants, a metabolic trait of most ecological and physiological relevance for their tolerance to drought, hypersaline soils and cold. Together, our results are consistent with smooth evolutionary pathways for the acquisition of the BADH function from ancestral I441-type AMADHs, thus explaining the relatively high occurrence of this event. Keywords: Osmoprotection, Osmotic stress, Aminoaldehyde dehydrogenase, Enzyme kinetics, Substrate specificity, Enzyme subcellular location, Protein stability, Protein structure, Protein evolution * Correspondence: [email protected] 1 Departamento de Bioquímica, Facultad de Química, Universidad Nacional Autónoma de México, México D.F., México Full list of author information is available at the end of the article © 2014 Muñoz-Clares et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 http://www.biomedcentral.com/1471-2229/14/149

description

Background: Plant ALDH10 enzymes are aminoaldehyde dehydrogenases (AMADHs) that oxidize different ω-amino or trimethylammonium aldehydes, but only some of them have betaine aldehyde dehydrogenase (BADH) activityand produce the osmoprotectant glycine betaine (GB). The latter enzymes possess alanine or cysteine at position 441 (numbering of the spinach enzyme, SoBADH), while those ALDH10s that cannot oxidize betaine aldehyde (BAL) have isoleucine at this position. Only the plants that contain A441- or C441-type ALDH10 isoenzymes accumulate GB in response to osmotic stress. In this work we explored the evolutionary history of the acquisition of BAL specificity by plant ALDH10s.Results: We performed extensive phylogenetic analyses and constructed and characterized, kinetically and structurally, four SoBADH variants that simulate the parsimonious intermediates in the evolutionary pathway from I441-type to A441- or C441-type enzymes. All mutants had a correct folding, average thermal stabilities and similaractivity with aminopropionaldehyde, but whereas A441S and A441T exhibited significant activity with BAL, A441V and A441F did not. The kinetics of the mutants were consistent with their predicted structural features obtained by modeling, and confirmed the importance of position 441 for BAL specificity. The acquisition of BADH activity could have happened through any of these intermediates without detriment of the original function or protein stability. Phylogenetic studies showed that this event occurred independently several times during angiosperms evolution when an ALDH10 gene duplicate changed the critical Ile residue for Ala or Cys in two consecutive single mutations. ALDH10 isoenzymes frequently group in two clades within a plant family: one includes peroxisomal I441-type, theother peroxisomal and non-peroxisomal I441-, A441- or C441-type. Interestingly, high GB-accumulators plants have non-peroxisomal A441- or C441-type isoenzymes, while low-GB accumulators have the peroxisomal C441-type, suggesting some limitations in the peroxisomal GB synthesis.Conclusion: Our findings shed light on the evolution of the synthesis of GB in plants, a metabolic trait of most ecological and physiological relevance for their tolerance to drought, hypersaline soils and cold. Together, our results are consistent with smooth evolutionary pathways for the acquisition of the BADH function from ancestral I441-type AMADHs, thus explaining the relatively high occurrence of this event.

Transcript of Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by...

Page 1: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149http://www.biomedcentral.com/1471-2229/14/149

RESEARCH ARTICLE Open Access

Exploring the evolutionary route of the acquisitionof betaine aldehyde dehydrogenase activity byplant ALDH10 enzymes: implications for thesynthesis of the osmoprotectant glycine betaineRosario A Muñoz-Clares1*, Héctor Riveros-Rosas2, Georgina Garza-Ramos2, Lilian González-Segura1,Carlos Mújica-Jiménez1 and Adriana Julián-Sánchez2

Abstract

Background: Plant ALDH10 enzymes are aminoaldehyde dehydrogenases (AMADHs) that oxidize different ω-aminoor trimethylammonium aldehydes, but only some of them have betaine aldehyde dehydrogenase (BADH) activityand produce the osmoprotectant glycine betaine (GB). The latter enzymes possess alanine or cysteine at position441 (numbering of the spinach enzyme, SoBADH), while those ALDH10s that cannot oxidize betaine aldehyde (BAL)have isoleucine at this position. Only the plants that contain A441- or C441-type ALDH10 isoenzymes accumulateGB in response to osmotic stress. In this work we explored the evolutionary history of the acquisition of BALspecificity by plant ALDH10s.

Results: We performed extensive phylogenetic analyses and constructed and characterized, kinetically andstructurally, four SoBADH variants that simulate the parsimonious intermediates in the evolutionary pathway fromI441-type to A441- or C441-type enzymes. All mutants had a correct folding, average thermal stabilities and similaractivity with aminopropionaldehyde, but whereas A441S and A441T exhibited significant activity with BAL, A441Vand A441F did not. The kinetics of the mutants were consistent with their predicted structural features obtained bymodeling, and confirmed the importance of position 441 for BAL specificity. The acquisition of BADH activity couldhave happened through any of these intermediates without detriment of the original function or protein stability.Phylogenetic studies showed that this event occurred independently several times during angiosperms evolutionwhen an ALDH10 gene duplicate changed the critical Ile residue for Ala or Cys in two consecutive single mutations.ALDH10 isoenzymes frequently group in two clades within a plant family: one includes peroxisomal I441-type, theother peroxisomal and non-peroxisomal I441-, A441- or C441-type. Interestingly, high GB-accumulators plants havenon-peroxisomal A441- or C441-type isoenzymes, while low-GB accumulators have the peroxisomal C441-type,suggesting some limitations in the peroxisomal GB synthesis.

Conclusion: Our findings shed light on the evolution of the synthesis of GB in plants, a metabolic trait of mostecological and physiological relevance for their tolerance to drought, hypersaline soils and cold. Together, ourresults are consistent with smooth evolutionary pathways for the acquisition of the BADH function from ancestralI441-type AMADHs, thus explaining the relatively high occurrence of this event.

Keywords: Osmoprotection, Osmotic stress, Aminoaldehyde dehydrogenase, Enzyme kinetics, Substrate specificity,Enzyme subcellular location, Protein stability, Protein structure, Protein evolution

* Correspondence: [email protected] de Bioquímica, Facultad de Química, Universidad NacionalAutónoma de México, México D.F., MéxicoFull list of author information is available at the end of the article

© 2014 Muñoz-Clares et al.; licensee BioMed CCreative Commons Attribution License (http:/distribution, and reproduction in any mediumDomain Dedication waiver (http://creativecomarticle, unless otherwise stated.

entral Ltd. This is an Open Access article distributed under the terms of the/creativecommons.org/licenses/by/4.0), which permits unrestricted use,, provided the original work is properly credited. The Creative Commons Publicmons.org/publicdomain/zero/1.0/) applies to the data made available in this

Page 2: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 2 of 16http://www.biomedcentral.com/1471-2229/14/149

BackgroundThe synthesis of the osmoprotectant glycine betaine(GB) is a metabolic trait of great adaptive importancethat allows the plants possessing it to contend with os-motic stress caused by drought, salinity or low tempera-tures. Since these adverse environmental conditions arethe major limitations of agricultural production, engin-eering the synthesis of GB in crops that naturally lackthis ability has been, and still is, a biotechnological goalfor improving their tolerance to osmotic stress (reviewedin [1]). Also, it is becoming increasingly appreciated thatthe GB content of an edible plant is valuable for humanand animal nutrition [2].In plants, GB is formed by the NAD+-dependent oxidation

of betaine aldehyde (BAL) catalyzed by betaine aldehyde de-hydrogenases (BADHs). Within the aldehyde dehydrogenase(ALDH) superfamily, plant BADHs belong to the family 10[3] whose members are ω-aminoaldehyde dehydrogenases(AMADHs) that in vitro can oxidize small aldehydes posses-sing an ω-primary amine group, such as 3-aminopropional-dehyde (APAL) and 4-aminobutyraldehyde (ABAL) [4-12], atrimethylammonium group, such as 4-trimethylaminobutyr-aldehyde (TMABAL) [9,12], or a dimethylsulfonium group,such as 3-dimethylsulfoniopropionaldehyde [4,5]. In vivo, de-pending of the substrate used, these enzymes may participatein diverse biochemical processes, which range from thecatabolism of polyamines to the synthesis of several osmo-protectants (alanine betaine, 4-aminobutyrate, carnitine or3-dimethylsulfoniopropionate) in addition to GB. Althoughthe biochemically characterized plant ALDH10s oxidize allthe above-mentioned aldehydes, only some of them effi-ciently use BAL as substrate [9,13-18] and therefore can par-ticipate in the synthesis of GB. The difference in BALspecificity among the plant ALDH10s was puzzling giventhe high structural similarity between BAL and the otherω-aminoaldehydes, as well as between the plant ALDH10enzymes. Recently, by means of X-ray crystallography, insilico model building, site-directed mutagenesis, and kineticstudies of the ALDH10 enzyme from spinach (SoBADH), wefound that only an amino acid residue at position 441 is crit-ical for an ALDH10 enzyme being able to accept or rejectBAL as a substrate [19]. This residue, located in the secondsphere of interaction with the substrate behind the indolegroup of the tryptophan equivalent to W456 in SoBADH, de-termines the size of the pocket formed by the Trp and Tyrresidues equivalent to Y160 and W456 (SoBADH number-ing) where the bulky trimethylammonium group of BALbinds. If this residue is an Ile it pushes the Trp against theTyr, thus hindering the binding of BAL, whereas if it is anAla or a Cys the Trp adopts a conformation that leavesenough room for productive BAL binding [19]. This conclu-sion was drawn by Díaz-Sánchez et al. [19] by comparing thecrystal structures of the SoBADH (PDB code 4A0M) withthose of the two pea AMADH enzymes, which do not have

BADH activity (PsAMADH1 and PsAMADH2, PDB codes3IWK and 3IWJ, respectively, [12]), and was later confirmedby Kopěcný et al. [20] when they reported the crystal struc-tures of the maize ALDH10 isoenzyme, which contains Cysat position equivalent to 441 (SoBADH numbering) and ex-hibits BADH activity (ZmAMADH1a; PDB code 4I8P), andof a tomato ALDH10 isoenzyme, which contains Ile at thisposition and is devoid of BADH activity (SlAMADH1; PDBcode 4I9B). Moreover, by correlating the reported level ofBADH activity of ALDH10 enzymes with the presence of ei-ther of these residues, Díaz-Sánchez et al. [19] predicted thatthose enzymes that have an Ile at position 441—which wewill name hereafter as I441-type isoenzymes—would haveonly AMADH activity while those that have either Ala orCys—which we will name hereafter as A441- or C441-typeisoenzymes—would exhibit also BADH activity. And sincean almost perfect correlation was found between the re-ported ability of the plant to accumulate GB and the pres-ence of an ALDH10 isoenzyme with proved or predictedBADH activity, it was proposed that the absence of this kindof isoenzyme is a major limitation for the synthesis of GB inplants [19]. Indeed, a significant BADH activity would be ne-cessary not only to produce significant levels of GB but alsoto prevent the accumulation of BAL, which is formed in theoxidation of choline by choline monooxygenase (CMO), upto toxic concentrations.Amino acid sequence analysis showed that most plants

have two ALDH10 isoenzymes, probably as a conse-quence of gene duplication, and that the I441-type iso-enzyme was the commonest [19]. The latter observationled to the suggestions that this residue corresponds tothe ancestral feature in the plant ALDH10 family, andthat a functional specialization occurred in some plantswhen the Ile at position equivalent to 441 of SoBADHmutated to Ala or Cys in one of the two copies of theduplicated gene [19]. Since the codons for Ile differ fromthose for Ala or Cys in two positions, we reasoned thatany of these changes had to occur through an intermedi-ate. To explore the evolutionary history of the synthesisof GB in plants, we generated and characterized theSoBADH mutants A441V, A441S, A441T and A441F,which simulate the four parsimonious intermediates inthe pathway from the plant ALDH10 isoenzymes exhi-biting only AMADH activity, exemplified by the A441ISoBADH mutant, to those that also exhibited BADHactivity, exemplified by the wild-type SoBADH or theA441C mutant. In this work, by comparing the kineticproperties and the thermo-stabilities of the mutantswith those of the wild-type enzyme, we confirm thatthe size of the residue at position 441 greatly affect thespecificity for betaine aldehyde, and conclude that theacquisition of the new BADH function occurred with-out detriment of either the oxidation of other aminoal-dehydes or the protein stability. Also, we present here

Page 3: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 3 of 16http://www.biomedcentral.com/1471-2229/14/149

strong phylogenetic evidence that confirms that peroxi-somal I441-type isoenzymes correspond to the ALDH10ancestral form and that independent duplication eventsoccurred in monocots and eudicots plants. Indeed, thechange to A441-type isoenzymes was the commonest ineudicots, whereas the change to C441-type isoenzymeswas in monocots.

ResultsPhylogenetic analysis of the ALDH10 enzymesWe expanded the amino acid sequence alignments ofplant ALDH10 enzymes, including in this phylogeneticstudy three times more sequences than in previousworks [19,20]. The retrieved non-redundant sequencesbelong mainly to plants (122 sequences), but ALDH10proteins were also found in fungi, protists, and proteo-bacteria; none in animals or archea (Additional file 1:Table S1). Figure 1 shows an unrooted phylogenetic treethat includes all identified ALDH10 sequences (panel A),as well as detailed phylogenetic trees from monocots(panel B) and eudicots (panel C). As expected, land plants(Embriophytes) form a well-supported monophyletic group,as well as Spermatophytes (seed plants) and Angiosperms(flowering plants). In Figure 1B it can be observed thatprimitive plants with a known genome like Ostrococcustauri, O. lucimarinus, Micromonas pusilla, Chlamydomo-nas reinhardtii, Volvox carteri (Chlorophyta), Physcomi-trella patents (Briophyta) and Selaginella moellendorffii(Lycopodiophyta) contain only one ALDH10 enzyme. Inter-estingly, all these enzymes possess Ile at position equivalentto 441 (SoBADH numbering), which is also the residuemost frequently found in ALDH10 enzymes of theother plant families (Figures 1B and 1C). Thus, amongthe 122 non-redundant plant ALDH10 sequences analyzed,88 possess Ile, 19 Ala, and 10 Cys. Only three ALDH10 iso-enzymes—from Vitis vinifera, Solanum tuberosum andPandanus amaryllifolius— have Val at this position, andtwo—from Auluropus lagopoides and Theobroma cacao—have Thr. These data strongly support the previous pro-posal [19] that I441-type isoenzymes correspond to theancestral protein of the ALDH10 family.Figure 1B also shows that all known monocot ALDH10

genes cluster together, which suggests that the dupli-cated ALDH10 genes in monocots originated after themonocot-eudicot divergence. All monocot plants ofknown genome possess two genes coding for ALDH10proteins, except maize that possesses three genes. Aspreviously found [20], in the Poaceae family—which in-cludes most of the known sequences from monocots—each of the two ALDH10 genes forms a different cladein the phylogenetic tree: one (which we name Poaceae 1)exclusively includes I441-type isoenzymes while the sec-ond (which we name Poaceae 2) mainly contains C441-type. Because the limited number of monocot ALDH10

sequences available, it is not yet possible to know whetheror not every monocot family, besides Poaceae, possess twoALDH10 isoenzymes.Eudicots of known genomes have a variable number of

genes coding for ALDH10 proteins (Figure 1C). Some spe-cies have only one gene—Ricinus communis (Euphorbia-ceae), Citrus clementina (Rutaceae), Aquilegia coerulea(Ranunculaceae), Fragaria vesca (Rosaceae), Cucumis sati-vus (Cucurbitaceae), and Mimulus guttatus (Phrymaceae)—,others two genes—Arabidopsis thaliana, A. lyrata, Capsellarubella, Brassica rapa (Brassicaceae), Glycine max, Medi-cago truncatula (Fabaceae), Gossypium raimondii, Theo-broma cacao (Malvaceae), Populus trichocarpa (Salicaceae),Solanum lycopersicum, S. tuberosum (Solanaceae), and Betavulgaris (Amaranthaceae)—, and another—Vitis vinifera(Vitaceae)—three genes. Glycine max, in addition to thetwo ALDH10 genes, possesses an additional copy that cor-responds to a pseudogene. The complex distribution pat-tern exhibited by ALDH10 genes in eudicots stronglysuggests that several independent gene-duplication eventsoccurred during their evolution after monocot-eudicot di-vergence (Figure 1C). Thus, at least four independent dupli-cation events, those that took place in Fabaceae, Salicaceae,Solanaceae and Amaranthaceae, exhibit a very high boot-strap support (>90%). In species of the Brassicaceae, Faba-ceae, Salicaceae, Rosaceae, and Solanaceae families theprotein coded by the duplicate gene conserved the Ile at theposition equivalent to 441, but in plants of the Amarantha-ceae and Acanthaceae this residue was changed to an Ala,in Malvaceae to an Ala or Thr, and in the only sequencedspecies of Vitaceae to a Val. As in the case of monocots,two different clades can be observed in the phylogenetictree of several eudicot families: the first includes the originalI441-type isoenzymes, with the only exception of Solanumtuberosum where this Ile mutated to a Val; the second in-cludes the duplicate I441-type isoenzymes or the A441-,V441- or T441-type derived from the I441-type. Interest-ingly, the majority of the A441-type isoenzymes are clus-tered in the Amaranthaceae 2 clade, with the exception ofthe A441-type of Amaranthus hypochondriacus, which isphylogenetically very close to the I441-type of the sameplant, suggesting a recent duplication event. The genome ofthis plant has not been yet completely sequenced, so itcould be that this plant possesses another A441-type isoen-zyme that groups with the Amaranthaceae 2. Also, we can-not yet explain the unexpected position of the A441-typeisoenzyme from Ophiopogon japonicus (a monocot), whichclustered with the A441-type isoenzymes in the Amarantha-ceae 2 clade. Since the A441-type isoenzymes are predictedto have BADH activity, i.e. the ability to oxidize BAL [19], itis interesting that O. japonicus CMO also has higher aminoacid sequence identity with CMO proteins from the Amar-anthaceae family than with CMO from monocots [21]. Onepossible explanation to this anomalous behavior is that both

Page 4: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Figure 1 (See legend on next page.)

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 4 of 16http://www.biomedcentral.com/1471-2229/14/149

Page 5: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

(See figure on previous page.)Figure 1 Phylogenetic analysis of plant ALDH10 enzymes. A) Unrooted phylogenetic tree that includes all identified ALDH10 proteinsequences showing the taxonomic group to which they belong. B) Monocot and non-flowering plant ALDH10 sequences. C) Eudicot ALDH10sequences. Indicated are the presence/absence of a peroxisomal-targeting signal PST1 that fits to the consensus sequence (S/A/C)-(K/R/H)-(L/M)(in red and underlined) as well as the amino acid residue and codon at position equivalent to 441 of SoBADH. The tree was inferred from 500replicates using the ML method [61]. The best tree with the highest log likelihood (−32886.4851) is shown. Similar trees were obtained with MP,ME and NJ methods. The analysis involved 131 amino acid sequences (122 from plants and 9 from non-plants). In panel A the branches of theunrooted tree are drawn to scale, with the bar length indicating the number of substitutions per site. In panels B and C only the branch topologyis shown. The proportion of replicate trees in which the associated taxa clustered together in a bootstrap test (500 replicates) is given next to thebranches. Branches with a very low bootstrap value (<20%) are collapsed. For each sequence, the accession number and the name assigned inpublished papers (in the case of proteins previously studied) are given. X indicates an unidentified amino acid/nucleotide.

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 5 of 16http://www.biomedcentral.com/1471-2229/14/149

genes were acquired by O. japonicus by horizontal genetransfer, which is a significant force in the evolution of plantgenomes [22,23]. Further studies are needed to provide evi-dence in favor or against this possibility.We confirmed the previous observation [19,20] that

the majority of the I441-type isoenzymes possesses aperoxisomal targeting signal type 1 (PST1) that fits tothe consensus sequence (S/A)-(K/R)-(L/M/I) [24,25]whereas all the A441-type and the majority of C441-typeisoenzymes lacks it (Figure 1). In the case of the C441-type the exceptions are those from maize, shorghum andfoxtail millet (Setaria italica), which have the SKL signaland that of Zoysia, which have an SKI signal. The per-oxisomal targeting signal was lost by the change of aresidue or by truncation of the C-terminal region. Thecodon that encodes the missing Leu in the peroxisomaltargeting signal of some of the C441-type isoenzymes ofPoaceae 2 can be changed with only one punctual muta-tion to a stop codon. The same occurs with the genesthat code for A441-type isoenzymes from the Amar-anthaceae family, where the codons for the first missingSer can be transformed with just one punctual mutationto a stop codon. The sequence divergence pattern of theAmaranthaceae supports the proposal that the loss ofthe peroxisomal signal PST1 occurred in this family be-fore the gene duplication that gave rise to the A441-typeisoenzymes. In other eudicot and monocot families, theenzymes with a truncated or mutated C-terminus clusterin the phylogenetic clades 2, a finding that gives add-itional support to the idea that these enzymes derivedfrom the peroxisomal I441-type of clades 1.

Possible ALDH10 evolutionary intermediatesThe phylogenetic analysis described above strongly sup-ports that A441- and C441-type isoenzymes evolved fromthe I441-type ones. Ile can be coded by three differenttriplets, ATT, ATC and ATA, and of these the most fre-quently found in monocots and eudicots ALDH10 genesis ATT and the least frequent ATA, which indeed was notfound in monocots; alanine is coded by four, GCT, GCC,GCA, and GCG, of which GCT is the most used in eudi-cot ALDH10 genes; and cysteine is coded by two, TGTand TGC, and both are present in monocots ALDH10

genes. The observed frequency of each of these codons inmonocots and eudicots is given in Figure 2A. From thesedata it can be observed that the triplet ATT at this pos-ition is more frequent than the ATC one.Since the most parsimonious pathways from Ile to either

Ala or Cys involve two nucleotide substitutions, thereshould have been intermediates in the evolution from theI441-type to the A441- or C441-type ALDH10 isoen-zymes. Several pathways could be followed depending onthe Ile codon of the original enzyme, but in all cases theamino acid substitution in the evolutive intermediate hasto be either Val or Thr in the pathway from Ile to Ala, andPhe or Ser in that from Ile to Cys (Figure 2B). This is con-sistent with the five different amino acids found at thecritical position 441 in the ALDH10 enzymes of knownsequence (Figure 1). Ile is the most frequently codedamino acid (71.9%), followed by Ala (15.7%), Cys (8.3%),Val (2.5%), and Thr (1.6%). Interestingly, ALDH10 isoen-zymes containing Ser or Phe at position equivalent to 441,which are the possible intermediates in the pathway fromI441 to C441, have not been found so far. Since C441-typeisoenzymes are present in monocots, and the numberof available ALDH10 sequences from monocots is stilllow (30 sequences) when compared with the numberof available eudicot sequences (82 sequences), it is tobe expected that the missing intermediates will befound when the number of known monocots ALDH10sequences increases.

Construction and kinetic characterization of the SoBADHA441 mutantsTo simulate the possible evolutionary intermediates wegenerated four SoBADH variants: A441V, A441S, A441T,and A441F, and we characterized them, both kineticallyand structurally. In a previous work [19] we had con-structed the A441I mutant, which represents the putativeoriginal ALDH10 isoenzyme with only AMADH activity,i.e., devoid of significant BADH activity, and the A441Cmutant, which, together with the wild-type SoBADH, rep-resents those isoenzymes that in addition to the AMADHactivity also have BADH activity. The kinetics of the wild-type SoBADH and of the A441I and A441C mutant en-zymes were previously studied using BAL, APAL, ABAL

Page 6: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Figure 2 Evolutionary pathways for the acquisition of BADHactivity by plant ALDH10 enzymes. Possible nucleotide changesin the pathway from the isoenzymes containing Ile at position 441(SoBADH numbering) to those containing Ala (A) or Cys (B) at thisposition. The frequency of the observed codons for monocots (M;30 sequences), and eudicots (E; 82 sequences) is given. Experimentallyobserved codons and amino acids are underlined.

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 6 of 16http://www.biomedcentral.com/1471-2229/14/149

and TMABAL as substrates [19], at pH 8.0 and at fixed0.2 mM NAD+, conditions both that are nearly physio-logical [26,27]. Now we extend that work by studying thesteady-state kinetics of the other four mutants with BALor APAL as substrates. APAL can be used as representa-tive of the other ω-aminoaldehydes that do not have abulky trimethylammonium group close to the carbonylgroup, and therefore their binding is not sterically con-strained by the size of the cavity formed by the residuesequivalent to Y160 and W456 (PaBADH numbering).Consequently, the kinetics of these aldehydes are similarand very different from the kinetics of BAL, as previouslyfound not only in the wild-type SoBADH but also in theA441C and A441I mutants [19].Taking the kinetics of BAL as the criterion, two groups

of enzymes were observed: one that includes the wild-type and the A441C, A441S and A441T mutants, whichhad a relatively high kcat, low Km(BAL) and high kcat/Km

(BAL) values, and another formed by the A441V, A441Fand A441I mutants, which exhibited low kcat, high Km

(BAL), and very low kcat/Km(BAL), particularly the A441Imutant (Figure 3 and Table 1). The enzymes in the firstgroup have, therefore, significant BADH activity, while theenzymes in the second group will be devoid of this activityat the expected intracellular BAL levels, which should not

be high given the known toxicity of this aldehyde [28]. Ithas to be noted that the BADH activity of the enzymes inthe first group is achieved not only by their much smallerKm(BAL) values, which most likely reflect a much betterbinding of the aldehyde, but also, although not so import-antly, by their significantly higher kcat(BAL) values whencompared with the enzymes of the second group. We alsofound clear differences between these two groups in theirKm(NAD

+) values, which were higher in the enzymesexhibiting BADH activity than in the enzymes with onlyAMADH activity, with the exception of the A441F(Table 1). As the Km value for the first substrate in a bi-biordered steady-state kinetic mechanism depends not onlyon the second-order rate constant of the binding of thissubstrate but also in first-order rate constants associatedwith the steps after the central ternary complex is formed,the finding that the BADHs enzymes have higher Km

(NAD+) values than those of the only-AMADHs onescould be due to the higher kcat values of the former.The differences between the two groups of enzymes

vanish when their kinetics with APAL as substrate arecompared (Table 2), indicating that APAL oxidation washardly affected by the kind of residue at position 441.The exception was the A441F mutant, which showedlower kcat/Km values for both BAL and APAL than thewild-type and the other mutant enzymes, which suggestimportant structural alterations in the active site of thismutant. The catalytic efficiencies (kcat/Km) for APAL ofthe wild-type and mutants SoBADH enzymes are higherthan that for BAL (Table 2), as it has been found withother A441- or C441-type ALDH10 isoenzymes that inaddition to the AMADH activity exhibit BADH activity[7,9,20]. Another difference between the kinetics withBAL and APAL is that APAL produced a small but clearsubstrate inhibition in the wild-type and mutant SoBADHs,while this inhibition was not observed in the saturationkinetics with BAL in the concentration range studied.The observed degree of inhibition by high APAL con-centrations was roughly the same in all the enzymes,but the highest APAL concentration used in our experi-ments, 0.2 mM, did not allow the accurate estimationof the substrate inhibition constant, KIS, values, which arenot given in Table 2 for this reason. As previously shown[29], substrate inhibition arises from the non-productivebinding of the aldehyde to the enzyme-NADH complex.Therefore, these results suggest that the enzyme-NADHcomplexes have lower affinity for BAL than for APAL, asit also the case of the productive enzyme-NAD+ com-plexes, as judged by the Km values for the aldehydes.

Structural characterization of the SoBADH A441 mutantsSince the two main aspects that determine the evolutionof a protein are function and protein stability, we inves-tigated whether the changes made at position 441 affect

Page 7: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Figure 3 Effects of mutation of A441 on the steady-state kinetic parameters of SoBADH. Wild-type and mutant SoBADH enzymes wereassayed at pH 8.0 and 30°C with BAL as variable substrate at fixed 0.2 mM NAD+. Other conditions are given in the Methods section. The kineticparameter values were calculated from the best fit of initial velocity data to the Michaelis-Menten equation by non-linear regression. Each saturationcurve was determined at least in duplicate using enzymes from two different purification batches. Bars indicate standard deviations. In the inset thekcat/Km values of A441V, A441F and A441I are plotted using a scale smaller than that of the main figure.

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 7 of 16http://www.biomedcentral.com/1471-2229/14/149

the structure and/or thermal stability of the mutant en-zymes. The amino acid substitutions made at position441 were well tolerated, and the levels of expression ofsoluble mutant proteins were similar to that of the wild-type enzyme (results nor shown). The mutations did notaffect either the native dimeric state of the enzymes, asjudged by gel filtration experiments (results not shown),or the protein secondary structure, as judged by theiralmost identical far-UV CD spectra (Figure 4A). Changesin the protein tertiary structure could be detected intheir CD spectra in the near-UV range (Figure 4B), wherethe signals originate from aromatic residues. The near-UV-CD spectrum of wild-type SoBADH shows well-

Table 1 Steady-state kinetic parameters of wild-type and mut

Enzyme Variable substrate kcat (s

BAL

Wild type 3.36 ±

A441C 2.99 ±

A441S 3.29 ±

A441T 2.64 ±

A441V 0.54 ±

A441F 0.29 ±

A441I 0.74 ±

NAD+

Wild type 4.25 ±

A441C 2.20 ±

A441S 3.27 ±

A441T 2.39 ±

A441V 0.51 ±

A441F 0.39 ±

A441I 0.68 ±

Initial velocities were obtained at 30°C in 50 mM HEPES-KOH buffer, pH 8.0, containof NAD+ was 0.2 mM, and in the experiments with variable NAD+ the fixed BAL concenfixed 0.2 mM NAD+. The apparent kinetic parameters were estimated by non-linear reggiven in the Table are the mean ± standard deviation of the kinetic parameters estimadifferent purification batches. Values for kcat are expressed per subunit.

defined positive maximum bands at 284 and 291 nm,characteristic of tryptophan residues, and a minimum be-tween 260 to 280 nm [30], which were also observed inthe mutant enzymes. The exception was A441F, which ex-hibited an altered near-UV CD spectrum with a pro-nounced decrease in the intensity of the peaks at 284 and291 nm and a reduction of the deep of the trough between260 to 280 nm. These changes probably are the result ofthe interaction of the Phe benzyl ring with the neighborside chain of W456, as will be discussed below.The stability of the mutant SoBADH enzymes was mea-

sured by thermal denaturation, which was monitored byfollowing the far-UV CD signal at 222 nm. As previously

ant SoBADH enzymes in the oxidation of BAL

Kinetic parameters-1) Km (μM) kcat/Km (mM-1s-1)

0.13 98 ± 15 35 ± 4

0.19 90 ± 6 33 ± 2

0.12 119 ± 8 28 ± 3

0.16 180 ± 6 15 ± 1

0.05 512 ± 79 1.1 ± 0.3

0.02 605 ± 26 0.48 ± 0.05

0.05 1791 ± 115 0.41 ± 0.05

0.16 22 ± 2 195 ± 8

0.09 14 ± 1 179 ± 17

0.18 29 ± 3 114 ± 20

0.00 24 ± 4 100 ± 15

0.00 6.4 ± 0.5 80 ± 5

0.02 18 ± 1 22 ± 0

0.03 2.8 ± 0.0 243 ± 11

ing 0.1 mM EDTA. In the experiments with variable BAL, the fixed concentrationtrations were at least 10-times their appKm values estimated for each enzyme atression fit of the experimental data to the Michaelis-Menten equation. The valuested in two duplicate saturation experiments performed with enzymes from two

Page 8: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Table 2 Steady-state kinetic parameters of wild-type and mutant SoBADH enzymes in the oxidation of APAL

Kinetic parameters

Enzyme Variable substrate kcat (s-1) Km (μM) kcat/Km (mM-1s-1)

APAL

Wild type 0.99 ± 0.04 3.9 ± 0.1 256 ± 5

A441C 0.67 ± 0.02 0.72 ± 0.16 931 ± 188

A441S 1.12 ± 0.00 2.0 ± 0.5 550 ± 142

A441T 1.50 ± 0.12 4.6 ± 0.5 326 ± 13

A441V 0.52 ± 0.10 1.1 ± 0.3 473 ± 216

A441F 0.34 ± 0.04 3.7 ± 0.0 92 ± 11

A441I 1.85 ± 0.06 4.8 ± 1.3 375 ± 80

NAD+

Wild type 0.99 ± 0.04 4.0 ± 0.1 250 ± 7

A441C 0.89 ± 0.00 2.6 ± 0.5 342 ± 76

A441S 0.88 ± 0.02 2.8 ± 0.0 314 ± 13

A441T 0.76 ± 0.02 4.0 ± 0.4 190 ± 16

A441V 0.54 ± 0.06 1.7 ± 0.2 318 ± 2

A441F 0.43 ± 0.01 5.9 ± 1.4 73 ± 17

A441I 2.11 ± 0.01 5.5 ± 0.1 382 ± 9

Initial velocities were obtained at 30°C in 50 mM HEPES-KOH buffer, pH 8.0, containing 0.1 mM EDTA. In the experiments with variable APAL, the fixed concentra-tion of NAD+ was 0.2 mM, and in the experiments with variable NAD+ the fixed APAL concentrations were at least 10-times their appKm values estimated for eachenzyme at fixed 0.2 mM NAD+. The apparent kinetic parameters were estimated by non-linear regression fit of the experimental data to the Michaelis-Menten equation(saturation by NAD+ at fixed APAL) or to Equation 1 given in the main text (saturation by APAL at fixed NAD+). The values given in the Table are the mean ± standarddeviation of the kinetic parameters estimated in two duplicate saturation experiments performed with enzymes from two different purification batches. Values for kcatare expressed per enzyme subunit. Substrate inhibition constants for APAL are not given because they could not be accurately estimated in the concentration rangeused in these experiments, but the observed degree of inhibition by high APAL concentrations was roughly the same in all the enzymes.

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 8 of 16http://www.biomedcentral.com/1471-2229/14/149

found with the wild-type enzyme [30], all mutant proteinswere irreversibly denatured at 90°C and the meltingcurves exhibited monophasic transitions (Figure 4C). Theirreversibility of the thermal denaturation of all enzymesprecludes equilibrium thermodynamic analysis of theprocess. However, the use of the same measurement pa-rameters and of the same experimental conditions allowedus to evaluate the possible effect of the changed aminoacid on the mutant enzymes stability by comparing thetransitions midpoints of their thermal transitions, i.e. theirapparent Tm values. The estimated apparent Tm values ofthe A441C, A441S, A441T and A441F mutants were simi-lar to that of the wild-type enzyme, around 50°C, butthose of A441V and A441I were approximately 10°Chigher (Figure 4C). These findings clearly indicate thestabilizing effect of the presence of a hydrophobic side-chain of medium or large volume inside the protein atposition 441. In the case of the A441F mutant, althoughthe side chain introduced is highly hydrophobic and theminimized model of this mutant indicates that it maymake μ-stacking interactions with the side chain ofW456 (see below), the strain exerted on the protein toaccommodate the bulky benzyl ring of Phe, also indi-cated by the model, probably causes a decrease in thestability of this enzyme when compared with that of the

A441I or A441V mutants. Not considering the A441Fmutant, it is interesting that the differences in thermo-stability of the enzymes also indicate the existence ofthe same two groups identified by the differences in thekinetic parameters of the reaction with BAL as substrate.Clearly both effects depend on the packing of the side-chains in the region surrounding the position 441.

Models of the SoBADH A441 mutantsTo interpret the observed kinetic and stability propertiesof the SoBADH mutants, we got an estimation of thepossible position and contacts in the SoBADH structureof the changed 441 residue by performing in silico muta-tions followed by energy minimizations of the mutatedstructures. The results of these simulations are consist-ent with the known crystal structures of the ALDH10enzymes from pea and tomato, which have Ile at pos-ition 441, and maize, which has a Cys at this position(Figure 5). This support the validity of the models of themutant enzymes for which there is no a homolog crystalstructure. When compared with the wild-type enzyme,the models of A441V and, particularly, of A441I show asimilar displacement of W456 to that observed in thecrystal structures of the I441-type isoenzymes of pea andtomato. This displacement causes the narrowing of the

Page 9: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Figure 4 Effects of mutation of A441 on the structural propertiesof SoBADH. Conformational characteristics of wild-type andmutant SoBADH enzymes examined by near- (A) and far-UV (B)CD spectra. (C) Thermal denaturation followed by changes at 222 nmin the far-UV CD signal. The temperature range was 20–90°C andthe scan rate 1.5°C/min. The solid lines represent the best fit ofthe thermal transition data to a sigmoidal Boltzman function bynon-linear regression.

Figure 5 Structural comparisons of the A441 region of ALDH10isoenzymes. A) Superimposition of the side-chains of residues atposition 441, 443 and 456 (SoBADH numbering) in the known crystalstructures of plant ALDH10 enzymes. Side-chains are shown as stickswith oxygen atoms in red, nitrogen in blue, and sulphur in yellow.Carbon atoms are green in SoBADH (PDB code 4A0M), cyan inPsAMADH2 (PDB code 3IWJ), magenta in ZmAMADH1a (PDB code4I8P), and black in SlAMADH1 (PDB code 4I9B). B) Superimpositionof the same region in the minimized models of the in silico SoBADHmutants in which the residue at position 441 was changed. Side-chainsare shown as sticks with oxygen atoms in red, nitrogen in blue, andsulphur in yellow. Carbon atoms are green in the wild-type enzyme,magenta in A441C, grey in A441S, yellow in A441T, brown in A441V,salmon in A441F, and cyan in A441I. In the figure, the wild-type andA441T models mask the A441C and A441S models, respectively. Thefigure was generated using PyMOL (www.pymol.org).

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 9 of 16http://www.biomedcentral.com/1471-2229/14/149

cavity where the trimethylammonium group of BALbinds, thus explaining the low BADH activity of thesetwo mutants. On the contrary, W456 occupies almostthe same position in the models of A441C, A441S andA441T than in the wild-type spinach and maize enzymes(Figure 5A and B), which is consistent with the signifi-cant BADH activity exhibited by these three mutants.In the crystal structure of SoBADH the side chain

of A441 interacts only with the active site residue W456,

but in the known structures of the other three plantALDH10s the residue at this position also makes con-tacts with W443 (SoBADH numbering) (Additionalfile 2: Figure S1A and Table S2). In all plant ALDH10sof known sequence, W456 is highly conserved (97.5%)and W443 strictly (100%) conserved. W433 is located atthe interface between monomers and has a similar con-formation in the ALDH10 crystal structures so far deter-mined. The models of the SoBADH A441 mutants showthat the number of contacts made by the side chainof the residue at position 441 considerably increaseswhen the size of its side chain increases (Additional file2: Figure S1B and Table S2). Regardless of the positionof the thiol group at the start of the simulation, we al-ways obtained a model of A441C that has the thiol

Page 10: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 10 of 16http://www.biomedcentral.com/1471-2229/14/149

group pointing at W443, as in the ZmAMADH1a crystalstructure where the sulphur of the cysteine residue atposition 441 makes closer contacts with W448 (equiva-lent to W443 in SoBADH) than with W461 (equivalentto W456 in SoBADH). The serine residue in A441S canhave two alternative positions: one in which the hydroxylgroup is pointing at W443 and another where the hy-droxyl points to W456. The first of these two possibleconformations, which is similar to that adopted by Cys,is less stable since the hydroxyl in this conformationwould be too far from the aromatic ring of W443 tomake any interaction with it. The second conformation,which is the one of the model shown in Additional file2: Figure S1B, is favored because the hydroxyl groupmakes van der Waals contacts, and possibly polar-μ in-teractions, with the ring of W456. In the A441T mutant,the hydroxyl group has only one possible conformation,the one that points to W456, similarly to that of the Serin the A441S mutant model. In the A441T minimizedmodel the distance of the hydrogen atom of the hydroxylgroup from the centroid of the aromatic face of W456ring is of 2.96 Å, suggesting the possible existence of ahydroxyl-aromatic-ring hydrogen bond of the kind de-scribed by Levitt and Perutz [31]. The Val side chain inA441V fits well in this position and makes contacts withW456. The side chain of the Ile in the mutant A441Imakes almost the same contacts that those observed inthe crystal structures of the pea and tomato enzymes(PDB codes 3IWJ and 4I9B, respectively), and similarlypushes the side-chain of W456. Finally, in the mutantA441F, the only possible way to accommodate the bulkybenzyl group is by stacking against the aromatic ring ofW456. However, the A441F model showed a clash of theF441 ring with the main chain in the region of residuesP455, W456 and G457, which causes this region to movein order to accommodate the phenylalanine residue, thusnarrowing the aldehyde entrance tunnel (not shown).

DiscussionImportance of size, polarity and conformation of the sidechain at position 441 of SoBADH for the kinetics andstabilityAll data in this work agree with our previous proposal thatonly one amino acid residue at position 441 is critical forALDH10 enzymes to accept or reject BAL as substrate[19]. I441-type isoenzymes posses low or very low activitywith BAL whereas A441- and C441-type isoenzymes ex-hibit high activity with BAL [19]. The SoBADH mutantsthat have a residue of similar size to Ala or Cys, as A441Sor A441T, exhibit a high activity with BAL, but the mu-tants with a bulky nonpolar residue, as A441V or A441I,have a very low activity with BAL (Figure 3). The exquisitesensitivity of SoBADH affinity for BAL to the size of theside chain of the residue at position 441 is reflected in the

finding of a Km(BAL) of the A441T mutant lower thanthat of the A441V. Val and Thr have similar sizes but themethyl group of Val pointing at W456 is a hydroxyl groupin Thr. Since the van der Waals radius of oxygen is 0.27 Ålower than that of carbon [32], this difference would resultin a lesser steric impediment of W456 to the binding ofthe trimethylammonium group of BAL in the A441T thanin the A441V mutant. Also, the polar μ-interaction be-tween the hydroxyl group of T441 and the aromatic ringof W456 suggested by the model would reduce the dis-tance between these two groups, thus widening thetrimethylammonium-binding cavity when compared withthat of the A441V mutant. Although the van der Waalsradius of oxygen is 0.39 Å lower than that of sulphur [32],the A441C mutant has a slightly but significantly lowerKm(BAL) than A441S, which can be explained by the dif-ferent conformation adopted by their side chains accord-ing to the models (Figure 5B and Additional file 2: FigureS1B). In the A441C mutant the atom closer to the W456ring is a hydrogen, as is in the wild-type enzyme, whereasin the A441S mutant the position of this hydrogen is oc-cupied by a bulkier hydroxyl group. This could explainthat the A441C SoBADH mutant has similar kinetic pa-rameters for BAL than the wild-type enzyme. Regardingthe mutant A441F, which has the bulkiest of the sidechains introduced at this position, it was surprising to usthat it had a lower Km(BAL) than the mutant A441I. Thismay be due to the stacking of the aromatic ring of F441with that of W456, as suggested by the model, whichwould result in a more compact packing of both sidechains, and therefore in a lesser steric impediment ofW456 to the binding of the trimethylammonium group ofBAL that in the A441I mutant. The movement of themain chain, also suggested by our model, and the conse-quent narrowing of the aldehyde-binding tunnel could ex-plain the low kcat/Km(APAL) exhibited by this enzyme. Insummary, it is not only the size but also the conformationadopted by the side chain of residue 441 what matters indetermining the position of the side chain of W456, andtherefore the size of the pocket where the trimethylammo-nium group of BAL binds and the affinity for BAL. As-suming that Km values are an indication of affinity, thesize of the residue at position 441 also appears to affectthe binding of the nucleotide, although to a much lesserextent than the binding of the aldehyde, for as yet notclear reasons.The acquisition of new functions by proteins is limited

because most mutations have destabilizing effects, par-ticularly if the mutated residue is buried [33], as is theresidue at position 441. However, the findings that thevariants of SoBADH with six different residues at pos-ition 441 were correctly folded and have thermal stabili-ties in the range expected for mesophilic enzymes,indicate that this position is highly evolvable and able

Page 11: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 11 of 16http://www.biomedcentral.com/1471-2229/14/149

to accommodate several mutations, some of them giv-ing rise to a new function: the BADH activity andtherefore, the capacity to participate in the synthesis ofthe osmoprotectant GB. The higher thermal stability ofthe A441V and A441I mutants relative to that of thewild-type and A441C, A441S, and A441T is consistentwith the extensive interactions made by the side chainsof Val and Ile, as indicated by our models and theknown crystal structures of ALDH10 enzymes that con-tain Ile at this position, and with the hydrophobicity ofthese two residues. In the case of the A441F mutant thestrain exerted on the protein by the bulky benzyl ringof Phe probably causes a decrease in the stability of thisenzyme when compared with that of the A441I orA441V mutants.

Evolution of plant ALDH10 enzymesOne of the strategies used by plants to respond to thedemands of their environment involves the evolution ofnovel metabolic pathways by recruiting enzymes fromold ones. The phylogenetic evidence presented here indi-cates that this is the case of the enzymes exhibitingBADH activity that participate in the synthesis of theosmoprotectant glycine betaine (GB), which in angio-sperms evolved from I441-type ALDH10 enzymes.Our results show that non-flowering plants possess

only one ALDH10 gene expressing an I441-type enzyme,but flowering plants (monocots plus eudicots) have avariable number of ALDH10 genes coding for I441-,A441- or C441-type isoenzymes. The presence of add-itional gene copies in monocots and eudicots can be ex-plained by several small- and large-scale duplications,which were accompanied by gene losses and genomerearrangements [34]. Indeed, the distribution of the du-plicated ALDH10 enzymes that can be seen in thephylogenetic tree in Figure 1 is in agreement with previ-ous observations: (i) various eudicot families independ-ently underwent genome duplication [35,36]; (ii) onewhole-genome duplication occurred early in the mono-cot lineage after its divergence from the eudicot lineage[37]; and (iii) recent duplication events took place inmaize, soybean and grape [34]. Duplication is a promin-ent feature of the plant genomic architecture, and muchof plant diversity may have arisen following the duplica-tion and adaptive specialization of pre-existing genes[38]. In this context, duplication of the ALDH10 gene insome plants allowed a functional specialization whenI441 mutated to A441 or C441 in one of the two copiesof the duplicated gene, as proposed by Díaz-Sánchezet al. [19]. The ancient enzymes most probably had thesame AMADH activity that the present-time I441-typeisoenzymes, but they may have had also a vestigialBADH activity, which was improved during the evolu-tion of plants via an evolutionary pathway of sequential

single mutations of the duplicates. In this way, the A441-or C441-type isoenzymes acquired an extra BADH activity,necessary for GB synthesis, while the original peroxisomalI441-type isoenzymes remained as AMADHs devoid ofBADH activity and most likely retained their metabolicrole. Although the gain of the BADH activity does notimply the loss of the other AMADH activities whenassayed in vitro ([19]; this work), the above observa-tions indicate that the physiological functions of theseisoenzymes are not interchangeable. Indeed, in many plantspecies the duplication of the ALDH10 gene did not resultin functional redundancy, since the duplicated genes notonly derived in enzymes with BADH activity but also innon-peroxisomal I441-type AMADH isoenzymes, as inthe Solanaceae, Malvaceae, Rosacease and Brassicaceaefamilies, whose different cellular location probably allowsthem to perform a different physiological function thanthe peroxisomal ones. This is one possible reason for thepermanence of the duplicate genes that encode non-peroxisomal I441-type isoenzymes.Our phylogenetic analysis also shows that all known

plant genomes contain at least one ADH10 gene codingfor a peroxisomal I441-type isoenzyme, which is the onepresent even in those plants with a sequenced genomethat possess only one ALDH10 gene. This underscores theimportance of the biochemical processes in which theseperoxisomal I441-type isoenzymes participate, and indi-cate that, as mentioned above, in vivo they cannot be re-placed by the ALDH10 isoenzymes that have gained theextra BADH activity, i.e. the A441- or C441-type isoen-zymes. Together, our findings strongly support that theactivity of the peroxisomal I441-type isoenzymes is essen-tial for the plant, which is not unexpected considering thatthe AMADH activity of these enzymes is involved in thecatabolism of polyamines taking place in the peroxisome[39]. As mentioned above, the cytosolic I441-type isoen-zyme may be involved in other physiological processes, forinstance the synthesis of several osmoprotectants, such as4-aminobutyrate, β-alanine, or carnitine by oxidizingABAL, APAL or TMABAL. The different intracellular lo-cation of the ALDH10 isoenzymes, predicted and in somecases experimentally proven, suggest that there are twokinds of the I441-type—peroxisomal (the commonest)and cytosolic—, both devoid of significant BADH activity,and possibly two kinds of the A441- and C441-type—chloroplastic and cytosolic in the first case, and peroxi-somal and cytosolic in the second—, all of them havingBADH activity.Regarding the intracellular location, the Amaranthaceae

A441-type isoenzymes most likely are chloroplastic, asthose from spinach [40] and sugarbeet [28], although theylack a typical chloroplast-targeting signal [14]. Therefore,in the Amaranthaceae the synthesis of GB most likelytakes place in the chloroplasts, as experimentally proven

Page 12: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 12 of 16http://www.biomedcentral.com/1471-2229/14/149

for spinach [13]. This is in accordance with the chloro-plastic location of the CMO enzyme in this plant [41] andwith the fact that the CMO activity requires the electronsprovided by reduced ferredoxin [17] and plant ferredoxinsare plastidic proteins [42]. But in the Poaceae plants thathave the C441-type isoenzyme, GB synthesis may takeplace in the cytosol—since some of these isoenzymes arenon-peroxisomal and do not have a clear chloroplast-leading signal—or in peroxisomes, since other C441-typeisoenzymes have the peroxisomal signal. Indeed, it hasbeen suggested that in barley the synthesis of GB may takeplace in the peroxisome, given the finding of a peroxi-somal CMO in this plant [43]. But this proposal is at oddswith the proven non-peroxisomal location of the C441-type ALDH10 isoenzyme of barley [9], and with the factthat peroxisomal plant ferredoxins have not beenso far reported. Regardless of the peroxisomal or non-peroxisomal CMO location, it seems probable that theneed of transport into the peroxisomes of either cholineor BAL limits the synthesis of GB in those plants withC441-type peroxisomal isoenzymes in comparison withthose in which this isoenzyme is non-peroxisomal. This isin full agreement with the finding that the level of GB ac-cumulation observed in cereal plants that have a predictedperoxisomal C441-type isoenzyme—maize, sorghum andfoxtail millet—is much lower than that in plants that havethe non-peroxisomal C441-type, as wheat and barley[44-46]. It could be predicted that other Poaceae specieswhich have a moderate ability to accumulate GB, suchas Pennisetum [44] and Panicum [46], have peroxisomalC441-type isoenzymes, while those that are high GB accu-mulators, such as Secale [44,46], have the non-peroxisomalC441-type isoenzyme.Since the ability to synthesize the osmoprotectant GB

protects the plant against the most frequent environ-mental stresses such as salinity, drought, and low tem-peratures, as well as indirectly against other stresses thatusually accompany the formers, such as oxidative stressand high temperatures (rev. in [47,48]), it is clear thatthe presence of A441- or C441-type isoenzymes providesa strong adaptive advantage. And not only to halophytes,which grow in habitats where saline soils are prevalent,but also to mesophytes, which may experience sporadicepisodes of water deficit or freezing temperatures. This,together with the relatively easiness of the evolutionaryprocess, explains why the A441- and C441-type isoen-zymes evolved independently several times through theevolution of flowering plants. However, the A441- orC441-type isoenzymes exhibit a limited phyletic distribu-tion (Figures 1B and 1C), which agrees with the findingthat GB accumulation is also restricted to some eudicotand monocot families [19]. This finding further supportthe proposal that a significant BADH activity is essentialfor GB accumulation [19]. Indeed, a significant BADH

activity would be necessary not only to produce thelevels of GB needed for an osmoprotectant, but also toprevent the accumulation of the BAL—which is pro-duced by CMO—up to toxic concentrations. But al-though the BADH activity is a sine qua non conditionfor the plant to become a GB-accumulator, the currentexperimental evidence, obtained studying wild-type GB-accumulator plants and transgenic non-accumulatorplants that express BADH or CMO genes (rev in [1]),suggests that the acquisition of the ability to synthesizeGB by evolving a BADH activity in some plant ALDH10enzymes should have been accompanied by other adap-tations, such as: (i) the gain of a significant CMO activ-ity; (ii) the adequate intracellular targeting of both theCMO and BADH enzymes; (iii) the regulation by os-motic stress signals of the expression of the CMO andBADH genes, as well as of those coding for critical en-zymes of the choline biosynthetic pathway; and (iv)probably also, the development of an efficient cholineand GB transport through intracellular membranes.

Evolutionary pathway for the acquisition of BADH activityby plant ALDH10 enzymesOur results indicate that plant ALDH10 enzymes mayhave followed several evolutionary pathways for the acqui-sition of BADH activity, particularly the ones involvingany of the parsimonious intermediates with Val, Thr, Serat position equivalent to 441 of SoBADH. The evolution-ary intermediate carrying F441 would have a diminishedAMADH activity, as judged by the low kcat/Km values ofthe SoBADH A441F mutant, which could make the routethough it less probable. But as the enzymes that undergothese mutations were coded by gene duplicates and theoriginal I441-type enzymes were conserved, the plantwould not have experienced any deleterious effect even ifthe mutation to Phe took place. But this mutation wouldhave not provided any advantage to the plant either, incontrast to the Ala o Ser mutations, and even to the muta-tion that led to non-peroxisomal I441-type isoenzymes.Thus, the Phe441 mutation could have been eliminatedduring evolution. Interestingly, natural ALDH10s of theT441-type (2 sequences) and V441-type (3 sequences),which the intermediary mutants in the parsimoniouspathway to A441 from I441, were found within the plantALDH10 protein sequences retrieved from public data-bases. We anticipate that enzymes of the S441-type will befound when more monocot sequences are known. Therelatively high frequency of A441-type ALDH10 isoen-zymes may result from a bias in the available reports, sincemost of the studies have been carried out in halophyticplants in which these enzymes participate in their osmoto-lerance mechanisms. Regarding the C441-type isoen-zymes, they are present in cereal plants, which have beenextensively studied for their high agronomical interest.

Page 13: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Figure 6 Comparison of thermal stabilities and catalyticefficiencies of wild-type and A441 SoBADH mutants. Thepercentage of catalytic efficiency for the oxidation of BAL (greendown triangles) or APAL (red up triangles) was calculated from thekcat/Km values given in Table 1 and Table 2, respectively, taking as100% the values of the A441I mutant, which was assumed torepresent the ancestral AMADH enzyme. Thermal stability data(apparent Tm values) are those given in Figure 4. SoBADH Enzymes:A, wild-type; C, A441C; S, A441S; T, A441T; V, A441V; F, A441F; I, A441I.

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 13 of 16http://www.biomedcentral.com/1471-2229/14/149

Indeed, in cereals an evolutionary process directed byhumans to increase their tolerance to drought or salinestress may have occurred.The mutation of the original Ile residue at this position

to Val, Thr or Ser could be considered as neutral since itwould not have had any deleterious effect on the primi-tive AMADH function, as judged by the kcat/Km(APAL)values of the mutant enzymes studied here. But while inthe case of the mutation to Val the resulting enzymewould still be an AMADH devoid of significant BADHactivity, in the case of the mutation to Thr or Ser the en-zyme would have experienced an important increase inthe vestigial BADH activity of the original enzyme,which would make these enzymes true BADHs. The kin-etic data of the mutant enzymes suggest that the BADHactivity of the T441-type or the S441-type intermediateswould be increased if a second mutation to Ala or Cystook place. The same BADH activity would be gainedafter the mutation of Val to Ala. These increases couldbe of enough physiological importance to explain the se-lection of the enzymes with this second mutation to ei-ther Ala or Cys, which in turn would explain, at least inpart, the relatively higher frequency of the A441- orC441-type isoenzymes over that of the V441-, T441-, orS441-type.The decrease in the thermal stability of those ALDH10

enzymes that underwent the mutational event by whichthey acquire BADH activity—as suggested by our resultswith the SoBADH mutants at position 441—would notcompromise their stability under physiological conditions.Neither would be the folding affected, as judged by thesimilar level of expression of these mutant enzymes as sol-uble proteins in E. coli. As a summary, Figure 6 shows thecomparison of the thermal stabilities and catalytic efficien-cies of the seven SoBADH proteins studied in this work—the wild type and six mutants— to illustrate the possiblefitness landscape of the evolution of the BADH activitywithin the ALDH10 family. As can be seen, the change ofthe Ile at position 441 does not affect the catalytic effi-ciency with APAL as substrate but it has a profound effecton the catalytic efficiency with BAL. The thermostabilitycost of this change would be surpassed by the advantageof the gain of the BADH activity. Indeed, this may be thereason for the scarcity of V441-type isoenzymes in thepresent-time plant ALDH10s, which is rather surprisinggiven that the change of I441 for V441 has almost no ef-fect either on AMADH activity or protein stability. It canbe speculated that a subsequent mutation to A441 is soadvantageous for the plant that most of the V441 enzymesevolved to this final stage.

ConclusionsThe phylogenetic, biochemical, and structural evidencepresented here support relatively smooth evolutionary

processes for the gain of the BADH activity in plants,and therefore for the acquisition of the potential tosynthesize the osmoprotectant GB, since evolution tookplace in duplicated genes and the four parsimoniousevolutionary intermediates are functionally and structur-ally feasible, particularly three of them. Our resultsstrongly support that ancestral genes encoding peroxi-somal I441-type isoenzymes devoid of BADH activitymutated in several occasions during terrestrial plantevolution, so that the present time A441- or C441-typeisoenzymes that exhibit BADH activity do not form amonophyletic group inside the ALDH10 protein family.We also provide additional evidence of the critical roleplayed by the residue at position 441 (SoBADH num-bering) in the affinity for BAL of plant ALDH10 en-zymes. Finally, our findings about the differences inintracellular location of the isoenzymes with BADH ac-tivity indicate that the peroxisomal synthesis of GB isconstrained by some factor(s), and explain the knownfact that there are high and medium-GB accumulatorplants, the former having non-peroxisomal A441- orC441-type isoenzymes, the latter being those with aperoxisomal C441-type.

MethodsChemicals and biochemicalsBetaine aldehyde chloride, the diethylacetal of APAL andNAD+ were obtained from Sigma-Aldrich Química, S.A.

Page 14: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 14 of 16http://www.biomedcentral.com/1471-2229/14/149

de C.V. (Toluca, México). APAL was freshly prepared byhydrolyzing its diethylacetal form as described by Floresand Filner [49]. The exact concentration of the resultingfree APAL was determined in each experiment by theamount of NADH produced after its complete oxidationin the reaction catalyzed by SoBADH in the presence ofan excess of NAD+.

Site-directed mutagenesis, production and purification ofwild-type and mutant SoBADH enzymesThe plasmid pET28-SoBADH, containing the full se-quence of the spinach BADH gene and a N-terminalHis-tag [19], was used as template for site-directed mu-tagenesis, which was performed via polymerase chainreaction (PCR) using the Quick Change XL-II Site Di-rected Mutagenesis system (Agilent) and the followingmutagenic primers: A441V, GAAGGCTCTAGAAGTTGGAGTTGTTTGGGTTAATTGCTCAC (forward) andTTGTGAGCAATTAACCCAAACAACTCCAACTTCTAGAGCC (reverse); A441S, GAAGGCTCTAGAAGTTGGAACTGTTTGGGTTAATTGCTCAC (forward) and TTGTGAGCAATTAACCCAAACAGTTCCAACTTCTAGAGCC (reverse); A441T, GAAGGCTCTAGAAGTTGGAACCGTTTGGGTTAATTGCTCAC (forward) and TTGTGAGCAATTAACCCAAACGGTTCCAACTTCTAGAGCC(reverse); A441F, GAAGGCTCTAGAAGTTGGATTTGTTTGGGTTAATTGCTCAC (forward) and TTGTGAGCAATTAACCCAAACAAATCCAACTTCTAGAGCC (re-verse);. The non-complementary mutagenic codons are initalics. Mutagenesis was confirmed by DNA sequencing.The expression and purification of the recombinant pro-teins were carried out as reported [19]. Protein concentra-tions were determined spectrophotometrically, using themolar absorptivity at 280 nm of 86,400 M−1 cm−1 deducedfrom the amino acid sequence by the method of Gill andvon Hippel [50].

Activity assay and kinetic characterization of the wild-type and mutant SoBADH enzymesSteady-state initial velocities of wild-type SoBADH andits mutants were measured spectrophotometrically asreported [19] at pH 8.0 using 0.2 mM NAD+ and vari-able concentrations of BAL or APAL. The fixed con-centration of NAD+ used in these experiments wassaturating, as indicated by the finding of very close kcatvalues in experiments where the nucleotide was thevariable substrate and the aldehyde kept constant at aconcentration ten times its Km value. All assays wereinitiated by addition of the enzyme. Each saturationcurve was determined at least in duplicate using enzymesfrom two different purification batches. Initial velocitydata were analyzed by non-linear regression calculationsusing the Michaelis-Menten equation. Equation 1 was

used for the APAL saturation data, which exhibited sub-strate inhibition:

v= E½ � ¼ kcat S½ �= Km þ S½ � 1þ S½ �=K ISð Þf g ð1Þ

where v is the experimentally determined initial velocity,[E] is the enzyme concentration, kcat is the first orderrate constant for product formation at saturating sub-strate (equal to the maximal velocity, Vmax, divided bythe enzyme concentration), [S] is the concentration ofthe variable substrate, Km is the concentration of sub-strate at half-maximal velocity, and KIS is the substrateinhibition constant. Although substrate inhibition ofSoBADH by aminoaldehydes is partial [19], the highestsubstrate concentration used in our experiments did notallow distinguishing between total and partial inhibition,and therefore Equation 1 for total inhibition was used.

Circular dichroism spectroscopyCD signals were recorded with a Jasco J-715 spectropolar-imeter (Jasco Inc., Easton, MD) equipped with a Peltier-type temperature control system (Model PTC-423S) andcalibrated with d-10-(+)-camphorsulfonic acid. Near-UV(250–320 nm) and far-UV (200–250 nm) CD spectra wererecorded for samples of 0.25 or 1.0 mg/mL protein con-centration, respectively, placed in quartz cuvettes of1.0-cm and 0.1-cm path length, respectively. Data werecollected at 0.5 nm (near-UV) or 1.0 nm (far-UV) inter-vals, a bandwidth of 1.0 nm and at a scan rate of 20 nm/min. Spectra were averaged over 5 scans and the averagespectrum of a reference sample without protein was sub-tracted. The observed ellipticities were converted to meanresidue ellipticities [Θ] on the basis of a mean molecularmass per residue of 109.1. Thermal-induced protein de-naturation was monitored by following the changes in el-lipticity at 222 nm by increasing the temperature from 20to 90°C at a constant rate of 1.5°C/min. Determination ofapparent Tm values was performed by non-linear regres-sion fit of the data to a single Bolztman sigmoidal func-tion. ORIGIN software (OriginLab Corp.) was used fordata analysis and display.

In silico mutagenesis and modelingMutations in the crystal structure of the SoBADH (PDBcode 4A0M) at position 441 were generated in silico usingthe standard rotamer library of Coot [51]. Mutant modelswere subjected to a 1000-step energy-minimization processusing the Amber force field parameters in the UCSFChimera [52].

Sequence analysesALDH10 amino acid and nucleotide sequences were re-trieved by Blast searches at the NCBI site [53] (http://blast.ncbi.nlm.nih.gov/Blast.cgi) or Phytozome v9.1 database ([54];

Page 15: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 15 of 16http://www.biomedcentral.com/1471-2229/14/149

http://www.phytozome.net/). Progressive multiple amino acidsequence alignments were performed with ClustalXversion 2 ([55], http://www.clustal.org/clustal2/) usingas a guide a structural alignment constructed with the VASTalgorithm [56] that included all non-redundantALDH10protein structures in the PDB [57]. Amino acid sequencealignments were corrected manually using BioEdit ([58],http://www.mbio.ncsu.edu/bioedit/bioedit.html) accord-ing to gapped BLASTP results. To identify protein sequencesas members of the ALDH10 family the phylogeneticanalyses previously performed by Julián-Sánchez et al.[59] were used as a reference. Phylogenetic analyses wereconducted using the MEGA5 software ([60]; http://www.megasoftware.net). Four methods were used to infer phylo-genetic relationships: maximum likelihood (ML), maximumparsimony (MP), minimum evolution (ME), and neighborjoining (NJ). The amino acids substitution model de-scribed by Whelan and Goldman [61] using a discreteGamma distribution with five categories, was chosen asthe best substitution model, since it gave the lowestBayesian Information Criterion values and correctedAkaike Information Criterion values [62] in MEGA5[60]. The gamma shape parameter value (+G parameter= 1.1824) was estimated directly from the data withMEGA5. Confidence for the internal branches of thephylogenetic tree, obtained using ML method, was deter-mined through bootstrap analysis (500 replicates each).

Additional files

Additional file 1: Table S1. Aldehyde dehydrogenases identified asmembers of the ALDH10 family.

Additional file 2: Figure S1. Interactions of the residue at positionequivalent to A441 of SoBADH. Table S2. Distances of the side-chainatoms of the residue at position 441 to their closest neighbors.

AbbreviationsALDH: Aldehyde dehydrogenase; ABAL: 4-Aminobutyraldehyde;AMADH: Aminoaldehyde dehydrogenase; APAL: 3-Aminopropionaldehyde;BADH: Betaine aldehyde dehydrogenase; BAL: Betaine aldehyde;CMO: Choline monooxygenase; GB: Glycine betaine; PDB: Protein data bank;PsAMADH: Pea AMADH; SlAMADH: Tomato AMADH; SoBADH: Spinach BADH;TMABAL: 4-Trimethylaminobutyraldehyde; WT: Wild type; ZmAMADH: MaizeAMADH.

Competing interestsAll the authors declare that they have no competing interests.

Authors’ contributionsRAMC conceived, designed and coordinated the study, analyzed kinetic andstructural data, and wrote the manuscript. HRR and AJS carried out thephylogenetic analysis, interpreted and wrote the results of this analysis, andcontributed to the general discussion and writing of the manuscript. CMJconstructed and purified the mutant proteins and performed the kineticexperiments. GGR carried out the CD and thermostability experiments andanalyzed these data. LGS made the in silico mutants, constructed theirmodels and helped with the making of the figures. All authors read andapproved the final manuscript.

AcknowledgementsThis work was supported by UNAM-PAPIIT grants IN217814 to RAMC andIN216513 to HRR. We thank Dr. LP Martínez-Castilla (Faculty of Chemistry,UNAM) for helpful discussions and to Dr. I. Chávez-Béjar (Faculty of Chemistry,UNAM) for her help with the sequencing of the recombinant mutant enzymes.

Author details1Departamento de Bioquímica, Facultad de Química, Universidad NacionalAutónoma de México, México D.F., México. 2Departamento de Bioquímica,Facultad de Medicina, Universidad Nacional Autónoma de México, México D.F., México.

Received: 27 March 2014 Accepted: 22 May 2014Published: 29 May 2014

References1. Chen THH, Murata N: Glycinebetaine protects plants against abiotic

stress: mechanisms and biotechnological applications. Plant Cell Environ2011, 34(1):1–20.

2. Craig SA: Betaine in human nutrition. Am J Clin Nutr 2004, 80(3):539–549.3. Vasiliou V, Bairoch A, Tipton KF, Nebert DW: Eukaryotic aldehyde

dehydrogenase (ALDH) genes: human polymorphisms, andrecommended nomenclature based on divergent evolution andchromosomal mapping. Pharmacogenetics 1999, 9(4):421–434.

4. Vojtechová M, Hanson AD, Muñoz-Clares RA: Betaine-aldehydedehydrogenase from amaranth leaves efficiently catalyzes theNAD-dependent oxidation of dimethylsulfoniopropionaldehyde todimethylsulfoniopropionate. Arch Biochem Biophys 1997, 337(1):81–88.

5. Trossat C, Rathinasabapathi B, Hanson AD: Transgenically expressedbetaine aldehyde dehydrogenase efficiently catalyzes oxidation ofdimethylsulfoniopropionaldehyde and ω-aminoaldehydes. Plant Physiol1997, 113(4):1457–1461.

6. Šebela M, Brauner F, Radová A, Jacobsen S, Havliš J, Galuszka P, Peč P:Characterisation of a homogeneous plant aminoaldehydedehydrogenase. Biochim Biophys Acta 2000, 1480(1–2):329–341.

7. Livingstone JR, Maruo T, Yoshida I, Tarui Y, Hirooka K, Yamamoto Y, Tsutui N,Hirasawa E: Purification and properties of betaine aldehyde dehydrogenasefrom Avena sativa. J Plant Res 2003, 116(2):133–140.

8. Oishi H, Ebina M: Isolation of cDNA and enzymatic properties of betainealdehyde dehydrogenase from Zoysia tenuifolia. J Plant Physiol 2005,162(10):1077–1086.

9. Fujiwara T, Horia K, Ozaki K, Yokota Y, Mitsuya S, Ichiyanagi T, Hattori T, Takabe T:Enzymatic characterization of peroxisomal and cytosolic betaine aldehydedehydrogenases in barley. Physiol Plant 2008, 134(1):22–30.

10. Bradbury LMT, Gillies SA, Brushett D, Waters DLE, Henry RJ: Inactivation ofan aminoaldehyde dehydrogenase is responsible for fragrance in rice.Plant Mol Biol 2008, 68(4–5):439–449.

11. Brauner F, Šebela M, Snégaroff J, Peč P, Meunier J-C: Pea seedling aminoaldehydedehydrogenase: primary structure and active site residues. Plant Physiol Biochem2003, 41(1):1–10.

12. Tylichová M, Kopečný D, Moréra S, Briozzo P, Lenobel R, Snégaroff J,Šebela M: Structural and functional characterization of plantaminoaldehyde dehydrogenase from Pisum sativum with a broadspecificity for natural and synthetic aminoaldehydes. J Mol Biol 2010,396(4):870–882.

13. Hanson AD, May AM, Grumet R, Bode J, Jamieson GC, Rhodes D: Betainesynthesis in chenopods: localization in chloroplasts. Proc Natl Acad Sci U SA 1985, 82(11):3678–3682.

14. Weretilnyk EA, Hanson AD: Betaine aldehyde dehydrogenase fromspinach leaves: purification, in vitro translation of the mRNA, andregulation by salinity. Arch Biochem Biophys 1989, 271(1):56–63.

15. Arakawa K, Takabe T, Sugiyama T, Akazawa T: Purification of betaine-aldehydedehydrogenase from spinach leaves and preparation of its antibody.J Biochem 1987, 101(6):1485–1488.

16. Valenzuela-Soto E, Muñoz-Clares RA: Purification and properties ofbetaine aldehyde dehydrogenase extracted from detached leaves ofAmaranthus hypocondriacus L. subjected to water deficit. J Plant Physiol1994, 143(2):145–152.

17. Burnet M, Lafontaine PJ, Hanson AD: Assay, purification, and partialcharacterization of choline monooxygenase from spinach. Plant Physiol1995, 108(2):581–588.

Page 16: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Muñoz-Clares et al. BMC Plant Biology 2014, 14:149 Page 16 of 16http://www.biomedcentral.com/1471-2229/14/149

18. Hibino T, Meng YL, Kawamitsu Y, Uehara N, Matsuda N, Tanaka Y, IshikawaH, Baba S, Takabe T, Wada K, Ishii T, Takabe T: Molecular cloning andfunctional characterization of two kinds of betaine-aldehyde dehydro-genase in betaine-accumulating mangrove Avicennia marina (Forsk.)Vierh. Plant Mol Biol 2001, 45(3):353–363.

19. Díaz-Sánchez AG, González-Segura L, Mújica-Jiménez C, Rudiño-Piñera E,Montiel C, Martínez-Castilla LP, Muñoz-Clares RA: Amino acid residuescritical for the specificity for betaine aldehyde of the plant ALDH10isoenzyme involved in the synthesis of glycine betaine. Plant Physiol2012, 158(4):1570–1582.

20. Kopěcný D, Konečitíková R, Tylichová M, Vigouroux A, Moskalíková H, SouralM, Ŝebela M, Moréra S: Plant ALDH10 family. identifying critical residuesfor substrate specificity and trapping a thiohemiacetal intermediate.J Biol Chem 2013, 288(13):9491–9507.

21. Joseph S, Murphy D, Bhave M: Glycine betaine biosynthesis in saltbushes(Atriplex spp.) under salinity stress. Biologia 2013, 68(5):879–895.

22. Richardson AO, Palmer JD: Horizontal gene transfer in plants. J Exp Bot2007, 58(1):1–9.

23. Bock R: The give-and-take of DNA: horizontal gene transfer in plants.Trends Plant Sci 2010, 15(1):11–22.

24. Lingner T, Kataya AR, Antonicelli GE, Benichou A, Nilssen K, Chen X-Y, Siemsen T,Morgenstern B, Meinicke P, Reumann S: Identification of novel plantperoxisomal targeting signals by a combination of machine learningmethods and in vivo subcellular targeting analyses. Plant Cell 2011,23(4):1556–1572.

25. Chowdhary G, Kataya ARA, Lingner T, Reumann S: Non-canonicalperoxisome targeting signals: identification of novel PTS1 tripeptidesand characterization of enhancer elements by computationalpermutation analysis. BMC Plant Biol 2012, 12:142.

26. Werdan K, Heldt HW: Accumulation of bicarbonate in intact chloroplastsfollowing a pH gradient. Biochim Biophys Acta 1972, 283(3):430–441.

27. Heineke D, Riens B, Grosse H, Hoferichter P, Peter U, Flugge U, Heldt HW:Redox transfer across the inner chloroplast envelope membrane.Plant Physiol 1991, 95(4):1131–1137.

28. Rathinasabapathi B, McCue KF, Gage DA, Hanson AD: Metabolicengineering of glycine betaine synthesis: plant betaine aldehydedehydrogenases lacking typical transit peptides are targeted to tobaccochloroplasts where they confer betaine aldehyde resistance. Planta 1994,193(2):155–162.

29. Vojtechová M, Rodríguez-Sotres R, Valenzuela-Soto EM, Muñoz-Clares RA:Substrate inhibition by betaine aldehyde dehydrogenase fromleaves of Amaranthus hypochondriacus L. Biochim Biophys Acta 1997,1341(1):49–57.

30. Garza-Ramos G, Mújica-Jiménez C, Muñoz-Clares RA: Potassium and ionicstrength effects on the conformational and thermal stability of two aldehydedehydrogenases reveal structural and functional roles of K+-binding sites.PLoS One 2013, 8(1):e54899.

31. Levitt M, Perutz MF: Aromatic rings act as hydrogen bond acceptors.J Mol Biol 1988, 201(4):751–754.

32. Alvarez S: A cartography of the van der Waals territories. Dalton Trans2013, 42(24):8617–8636.

33. Tokuriki N, Stricher F, Serrano L, Tawfik DS: How protein stability and newfunctions trade off. PLoS Comput Biol 2008, 4(2):e1000002.

34. Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K: The floweringworld: a tale of duplications. Trends Plant Sci 2009, 14(12):680–688.

35. Van de Peer Y, Maere S, Meyer A: The evolutionary significance of ancientgenome duplications. Nat Rev Genet 2009, 10(10):725–732.

36. Fawcett JA, Maere S, Van de Peer Y: Plants with double genomes mighthave had a better chance to survive the Cretaceous–Tertiary extinctionevent. Proc Natl Acad Sci U S A 2009, 106(14):5737–5742.

37. Tang H, Bowers JE, Wang X, Paterson AH: Angiosperm genomecomparisons reveal early polyploidy in the monocot lineage. Proc NatlAcad Sci U S A 2010, 107(1):472–477.

38. Flagel LE, Wendel JF: Gene duplication and evolutionary novelty inplants. New Phytol 2009, 183(3):557–564.

39. Moschou PN, Sanmartin M, Andriopoulou AH, Rojo E, Sanchez-Serrano JJ,Roubelakis-Angelakis KA: Polyamine oxidase responsible for a fullback-conversion pathway in Arabidopsis. Plant Physiol 2008,147(4):1845–1857.

40. Weigel P, Weretilnyk EA, Hanson AD: Betaine aldehyde oxidation byspinach chloroplasts. Plant Physiol 1986, 82(3):753–759.

41. Brouquisse R, Weigel P, Rhodes D, Yocum CF, Hanson AD: Evidence for aferredoxin-dependent choline monooxygenase from spinach chloroplaststroma. Plant Physiol 1989, 90(1):322–329.

42. Hanke G, Mulo P: Plant type ferredoxins and ferredoxin-dependentmetabolism. Plant Cell Environ 2013, 36(6):1071–1084.

43. Mitsuya S, Kuwahara J, Ozaki K, Saeki E, Fujiwara T, Takabe T: Isolation andcharacterization of a novel peroxisomal choline monooxygenase inbarley. Planta 2011, 234(6):1215–1226.

44. Hitz WD, Hanson AD: Determination of glycine betaine by pyrolysis-gaschromatography in cereals and grasses. Phytochemistry 1980, 19(11):2371–2374.

45. Lerma C, Rich PJ, Ju GC, Yang W-J, Hanson AD, Rhodes D: Betaine deficiencyin maize: complementation tests and metabolic basis. Plant Physiol 1991,95(4):113–1119.

46. Ishitani M, Arakawa K, Mizuno N, Kishitani S, Takabe T: Betaine aldehydedehydrogenase in the gramineae: levels in leaves of both betaine-accumulatingand nonaccumulating cereal plants. Plant Cell Physiol 1993, 34(3):493–495.

47. Chen THH, Murata N: Glycinebetaine: an effective protectant againstabiotic stress in plants. Trends Plant Sci 2008, 13(9):499–505.

48. Ahmad R, Lim CJ, Kwon S-Y: Glycine betaine: a versatile compound withgreat potential for gene pyramiding to improve crop plant performanceagainst environmental stresses. Plant Biotechnol Rep 2013, 7:49–57.

49. Flores HE, Filner P: Polyamine catabolism in higher plants: characterizationof pyrroline dehydrogenase. Plant Growth Reg 1985, 3:277–291.

50. Gill SC, von Hippel PH: Calculation of protein extinction coefficients fromamino acid sequence data. Anal Biochem 1989, 182(2):319–326.

51. Emsley P, Cowtan K: Coot: model-building tools for molecular graphics.Acta Crystallogr D Biol Crystallogr 2004, 60:2126–2132.

52. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC,Ferrin TE: UCSF Chimera—a visualization system for exploratory researchand analysis. Comput Chem 2004, 13:1605–1612.

53. Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank.Nucl Acids Res 2014, 42:D32–D37.

54. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, DirksW, Hellsten U, Putnam N, Rokhsar DS: Phytozome: a comparative platformfor green plant genomics. Nucleic Acids Res 2012, 40:D1178–D1186.

55. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliamH, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ,Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics 2007,23(21):2947–2948.

56. Madej T, Lanczycki CJ, Zhang D, Thiessen PA, Geer RC, Marchler-Bauer A,Bryant SH: MMDB and VAST+: tracking structural similarities betweenmacromolecular complexes. Nucleic Acids Res 2013, 42:D297–D303.

57. Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, Green RK,Goodsell DS, Prlic A, Quesada M, Quinn GB, Ramos AG, Westbrook JD,Young J, Zardecki C, Berman HM, Bourne PE: The RCSB protein databank: new resources for research and education. Nucleic Acids Res 2013,41:D475–D482.

58. Hall TA: BioEdit: a user-friendly biological sequence alignment editor andanalysis program for Windows 95/98/NT. Nucl Acids Symp Ser 1999, 41:95–98.

59. Julián-Sánchez A, Riveros-Rosas H, Martínez-Castilla LP, Velasco-García R,Muñoz-Clares RA: Phylogenetic and structural relationships of thebetaine aldehyde dehydrogenases. In Enzymology and Molecular Biology ofCarbonyl Metabolism. 13th edition. Edited by Weiner H, Plapp B, Lindahl R,Maser E. West Lafayette, IN: Purdue University Press; 2007:64–76.

60. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5:molecular evolutionary genetics analysis using maximum likelihood,evolutionary distance, and maximum parsimony methods. Mol Biol Evol2011, 28(10):2731–2739.

61. Whelan S, Goldman N: A general empirical model of protein evolutionderived from multiple protein families using a maximum-likelihoodapproach. Mol Biol Evol 2001, 18(5):691–699.

62. Posada D, Buckley TR: Model selection and model averaging inphylogenetics: advantages of Akaike information criterion and Bayesianapproaches over likelihood ratio tests. Syst Biol 2004, 53(5):793–808.

doi:10.1186/1471-2229-14-149Cite this article as: Muñoz-Clares et al.: Exploring the evolutionary routeof the acquisition of betaine aldehyde dehydrogenase activity by plantALDH10 enzymes: implications for the synthesis of the osmoprotectantglycine betaine. BMC Plant Biology 2014 14:149.

Page 17: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

1

Table S1. Aldehyde dehydrogenases identified as members of ALDH10 family

Organisms with a complete sequenced genome are shaded in light blue. Phytozome data are colored green and

NCBI data black.

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Plants

Chlorophyta

Mamiellophyceae

Micromonas pusilla CCMP1545

Chlorophyta; Mamiellophyceae; Mamiellales

Phytozome: partially finished v3 genome assembly and annotation

10 660 total loci containing protein-coding transcripts

XP_003064309

(C1N8N9)

MicpuC2.e_gw1.17.47.1

(MICPUCDRAFT_23534)

505 aa

Micromonas pusilla RCC299

Chlorophyta; Mamiellophyceae; Mamiellales

Phytozome: finished v3 genome assembly and annotation

17 chromosomes

10 103 total loci containing protein-coding transcripts

Absent

Ostreococcus lucimarinus CCE9901

Chlorophyta; Mamiellophyceae; Mamiellales

Phytozome: finished v.2 genome assembly and annotation

21 chromosomes

7,796 total loci containing protein-coding transcripts

XP_001420154

estExt_Genewise_ext.C_Chr_100439

OSTLU_50750

Chromosome 10

523 aa

Ostreococcus tauri

Chlorophyta; Mamiellophyceae; Mamiellales

8 110 genes

CAL56178

XP_003081654

Ot10g02210

Chromosome 10

506 aa

Trebouxiophyceae

Coccomyxa subellipsoidea C-169

Chlorophyta; Trebouxiophyceae; Coccomyxaceae

Phytozome: improved v2 genome assembly and annotation

9 629 total loci containing protein-coding transcripts

EIE20955

I0YRD6

fgenesh1_pm.14_#_152

COCSUDRAFT_37718

498 aa

Chlorophyceae

Chlamydomonas reinhardtii (strain: CC-503 cw92 mt+)

(Chlamydomonas smithii)

Chlorophyta; Chlorophyceae; Chlamydomonadales;

Chlamydomonadaceae

Phytozome: v5.3.1 of Chlamydomonas annotations

17 linkage groups (chromosomes)

12,264 stable gene (13,450 stable transcript)

XP_001699134

A8JAX9

Cre13.g605650

CHLREDRAFT_24394

504 aa

Volvox carteri f. nagariensis (strain: Eve, forma: nagariensis)

(Green alga)

Chlorophyta; Chlorophyceae; Chlamydomonadales; Volvocaceae

Phytozome: version 2, 8x genome assembly and annotation

14,971 total loci containing protein-coding transcripts

(314 total alternatively spliced transcripts)

XP_002947147

D8TLH8

Vocar20010574m.g

VOLCADRAFT_73155

503 aa

Streptophyta

Page 18: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

2

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Embriophyta

Physcomitrella patens subsp. patens (Moss)

Streptophyta; Embryophyta; Bryophyta; Bryophytina; Bryopsida;

Funariidae; Funariales; Funariaceae

Phytozome: v3.0 assembly (preliminary)

27 chromosomes

26610 loci containing protein-coding transcripts

XP_001756623

A9RQZ4

PHYPADRAFT_204867

507 aa

Selaginella moellendorffii (Spikemoss)

Streptophyta; Embryophyta; Tracheophyta; Lycopodiophyta;

Isoetopsida; Selaginellales; Selaginellaceae

Phytozome: v1.0 Dec 20, 2007 FilteredModels3 annotation

27 chromosomes

22 285 protein-coding transcripts

XP_002974519

D8RTU8

174224

SELMODRAFT_174224

503 aa

XP_002990779

D8T5B3

SELMODRAFT_272160

503 aa

(redundant copy because the

assembled genome includes two

haplotypes)

Gymnosperm

Picea sitchensis (Sitka spruce) (Pinus sitchensis)

Tracheophyta; Spermatophyta; Coniferopsida; Coniferales; Pinaceae

ABK24463

A9NV02

(ABK24261)

503aa

Angiosperm

Liliopsida (monocots)

Aegilops tauschii (Tausch's goatgrass)

Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales;

Poaceae; BEP clade; Pooideae; Triticeae

EMT10403

M8C5I2

F775_31015

503 aa

Aeluropus lagopoides

Liliopsida; Poales; Poaceae; PACMAD clade; Chloridoideae;

Cynodonteae; Aeluropinae

AEV53927

G9JNB0

334 aa (fragment)

Agropyron cristatum

(Crested wheatgrass) (Bromus cristatus)

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Triticeae

ACZ67850

D2K6H6

394 aa (fragment)

Brachypodium distachyon

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Brachypodieae

BioProject PRNJ74771

v1.0 8x assembly

5 chromosomes

26,552 loci containing protein-coding transcripts

XP_003579919

LOC100826926

Chromosome 5

506 aa

XP_003574495

LOC100845613

Chromosome 3

501 aa

Page 19: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

3

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Hordeum brevisubulatum

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Triticeae

AAS66641

505aa

Hordeum vulgare subsp. vulgare (domesticated barley)

Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales;

Poaceae; BEP clade; Pooideae; Triticeae /cultivar="Haruna-nijyo

BAB62846

BBD2

503 aa

BAB62847

BBD1

506 aa

Hordeum vulgare

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Triticeae /cultivar:

116 Jeonju Native Korca

Q40024

LOC548296

505 aa

(BAB62847 ortolog)

Leymus chinensis

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Triticeae

BAD86758

LcBADH2

502 aa

BAM10432

BAD86757

LcBADH1

506 aa

Lolium perenne (Perennial ryegrass)

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Poeae; Loliinae

AFA36547

288aa (fragment)

Ophiopogon japonicus

Liliopsida; Asparagales; Asparagaceae; Nolinoideae

ABG34273

Q153G6

BADH

500 aa

Oryza sativa Japonica Group

Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales;

Poaceae; BEP clade; Ehrhartoideae; Oryzeae

BioProject PRJNA13141:

30534 genes (12 chromosomes)

28555 proteins

BioProject PRJNA13139

37032 genes (12 chromosomes)

35394 proteins

XP_482470

NP_001061833

BADH2

Os08g0424500

(Phytozome: LOC_Os08g32870)

Chromosome 8

503 aa

O24174

NP_001053016

Os04g0464200

BADH1

(Phytozome: LOC_Os04g39020)

Chromosome 4

505 aa

Page 20: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

4

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Oryza sativa Indica Group (long-grained rice)

Tracheophyta;Spermatophyta; Magnoliophyta; Liliopsida; Poales;

Poaceae; BEP clade; Ehrhartoideae; Oryzeae

BioProject PRJN361

39285 genes (12 chromosomes)

37358 proteins

CM000133

Chromosome 8

503 aa

(unreported gen)

EEC77423

B8AV58

OsI_16212

Chromosome 4

505 aa

Almost identical to:

ABB83473

505 aa

Pandanus amaryllifolius

Liliopsida; Pandanales; Pandanaceae

AFD62259

H9BS79

BADH2

387 aa (fragment)

Setaria italica (foxtail millet)

Liliopsida; Poales; Poaceae; PACMAD clade; Panicoideae; Paniceae

chromosome-scale release of the 8.3x whole genome shotgun

assembly (98.9% of the sequence data is represented in the 9

pseudomolecules).

35,471 loci containing protein-coding transcripts

Si009902m

Si009902m.g

505 aa

Si013592m

Si013592m.g

505 aa

Sorghum bicolor (sorghum)

Liliopsida; Poales; Poaceae; PACMAD clade; Panicoideae;

Andropogoneae

NCBI: BioProject PRJNA38691, PRJNA13876; Assembly: Sorbi1

10 chromosomes

33,080 genes

Phytozome: v1.0 release, comprising the Sbi1 assembly and Sbi1.4

gene set

20 chromosomes

34,496 loci containing protein-coding transcripts

AAC49268_NC012875

SORBIDRAFT_06q019200

Chromosome 6

506 aa

XP_002444357

SORBIDRAFT_07g020650

Chromosome 7

505 aa

Triticum aestivum (bread wheat)

Liliopsida; Poales; Poaceae; BEP clade; Pooideae; Triticeae

AAL05264

CBK51339

503 aa

Triticum urartu (Red wild einkorn)

Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales;

Poaceae; BEP clade; Pooideae; Triticeae

EMS68665

M8ATW1

TRIUR3_04819

417 aa

EMS48376

M7YN64

TRIUR3_05640

544 aa

Page 21: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

5

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Zea mays

Liliopsida; Poales; Poaceae; PACMAD clade; Panicoideae;

Andropogoneae

BioProject B73RefGen_V2

10 chromosomes

39 454 genes

AAT70230

NP_001105781

LOC606443

AMADH1B

505 aa

NP_001157807

(PDB: 4I8P)

C0P9J6

gpm154

AMADH1A

LOC100302679

505 aa

NP_001157804

AMADH2

506 aa

Zoysia tenuifolia

Liliopsida; Poales; Poaceae; PACMAD clade; Chloridoideae;

Zoysieae; Zoysiinae

BAD34955

clone="18

BAD34954

BAD34947

504 aa

BAD34957

clone="ZBD1"

BAD34952

BAD34953

507 aa

Eudicotyledons

Amaranthus hypochondriacus (grain amaranth)

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

O04895

AAB58165

BADH4

Chloroplastic

501 aa

AAB70010

O22467

BADH17

500 aa

Ammopiptanthus mongolicus

eudicotyledons; core eudicotyledons; rosids; fabids; Fabales;

Fabaceae; Papilionoideae; Thermopsideae

ABC86862

Q2I693

241aa (fragment)

Page 22: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

6

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Aquilegia coerulea Goldsmith (Rocky mountain columbine;

Colorado blue columbine)

Streptophyta; Embryophyta; Tracheophyta; Spermatophyta;

Magnoliophyta; eudicotyledons; Ranunculales; Ranunculaceae

Phytozome: initial 8X unmapped genome assembly and the version

1.1 annotation

24,823 loci containing protein-coding transcripts

Aquca_002_00337

503 aa

Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)

eudicotyledons; core eudicotyledons; rosids; malvids; Brassicales;

Brassicaceae

Phytozome: includes JGI release v1.0

32670 protein-coding transcripts

XP_002877599

D7LRJ5

ARALYDRAFT_906054

937567

ALDH10A9

503 aa

XP_002889000

D7KSI7

ARALYDRAFT_895359

926872

ALDH10A8

501 aa

Arabidopsis thaliana (thale cress)

eudicotyledons; core eudicotyledons; rosids; malvids; Brassicales;

Brassicaceae; Camelineae

Phytozome; includes TAIR annotation release 10 of the Arabidopsis

thaliana genome release 9

27416 loci containing protein-coding transcripts

AAL34161

NP_190400

ALDH10A9

AT3G48170

Chromosome 3

Peroxisome

503 aa

Q9S795

NP_565094

NP_001185399

ALDH10A8

AT1G74920

ALDH10A8

Chromosome 1

501 aa

Atriplex centralasiatica

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

AAM19157

Q8L8I3

BADH

500aa

Atriplex hortensis

(Mountain spinach)

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ABF72123

Q19QV8

(P42575; same gene)

500 aa

Page 23: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

7

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Atriplex micrantha

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ABM97658

A2TJX7

BADH

500 aa

Atriplex prostrata

(Spear-leaved orache) (Atriplex triangularis)

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

AAM08913

Q8RX99

BADH1

(AAP13999)

500 aa

AAM08914

Q8RX98

BADH2

424 aa (fragment)

Atriplex tatarica

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ABQ18317

A5HMM5

BADH

500 aa

Avicennia marina (betaine-accumulating mangrove) (Grey

mangrove) (Sceura marina)

eudicotyledons; core eudicotyledons; asterids; lamiids; Lamiales;

Acanthaceae; Avicennioideae

BAB18544

Q9FRY2

Clone 13

502 aa

BAB18543

Q9FRY3

Clone 2

503 aa

Beta vulgaris (sugar beet)

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

NCBI: BioProject PRJNA176558, assembly BwSeq-1

9 chromosomes

P28237

CAA41377

CAA41376

Chloroplastic

500 aa

(not reported yet at NCBI)

Locus 1572

500 aa

Brassica napus (rape)

eudicotyledons; core eudicotyledons; rosids; malvids; Brassicales;

Brassicaceae; Brassiceae

AAQ55493

503 aa

Brassica rapa (field mustard; Turnip)

eudicotyledons; core eu dicotyledons; rosids; malvids; Brassicales;

Brassicaceae; Brassiceae

Phytozome: v1.2 annotation is on assembly v1.1.

26,374 total loci containing protein-coding transcripts

Bra003781

Bra003781

501 aa

Bra019528

Bra019528

503 aa

Page 24: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

8

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Camellia sinensis (tea)

eudicotyledons; core eudicotyledons; asterids; Ericales; Theaceae

AFP19449

I7F760

505 aa

Capsella rubella

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; malvids; Brassicales; Brassicaceae;

Camelineae

Phytozome: initial 22x genome assembly and a preliminary

annotation

26,521 total loci containing protein-coding transcripts

EOA23841

CARUB_v10017058mg

503 aa

EOA35065

CARUB_v10020176mg

501 aa

Carthamus tinctorius (safflower)

eudicotyledons; core eudicotyledons; asterids; campanulids;

Asterales; Asteraceae; Carduoideae; Cardueae; Centaureinae

ADW80905

G8D1H1

510 aa

Chorispora bungeana

eudicotyledons; core eudicotyledons; rosids; malvids; Brassicales;

Brassicaceae; Chorisporeae

AAV67891

Q5Q033

502 aa

Chrysanthemum lavandulifolium (Daisy) (Dendranthema

lavandulifolium)

eudicotyledons; core eudicotyledons; asterids; campanulids;

Asterales; Asteraceae; Asteroideae; Anthemideae; Artemisiinae

AAY33871

Q4U5F2

DlBADH1

503 aa

AAY33872

Q4U5F1

DlBADH2

506 aa

Cicer arietinum (chickpea) (garbanzo)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; fabids; Fabales; Fabaceae; Papilionoideae;

Cicereae

XP_004508822

LOC101506136

Chromosome Ca7

503 aa

XP_004501961

LOC101507930

Chromosome Ca5

503 aa

Citrus clementina (clementine mandarin)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; malvids; Sapindales; Rutaceae

Phytozome: (clementine1.0) integrates 1.560 M ESTs with homology

and ab initio-based gene predictions.

9 chromosomes

24,533 protein-coding loci

Ciclev10019800m

Ciclev10019800m.g

505 aa

Ciclev10019822m

Ciclev10019800m.g

501 aa

(same locus, alternative transcript)

Page 25: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

9

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Citrus sinensis (sweet orange)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; malvids; Sapindales; Rutaceae

Phytozome: (v.1) of the assembly is 319 Mb spread over 12,574

scaffolds

25,376 protein-coding loci

Located in scaffold_0016_14

(non-reported gene)

Corylus heterophylla

eudicotyledons; core eudicotyledons; rosids; fabids; Fagales;

Betulaceae

ADW80331

E9NRZ9

503 aa

Cucumis melo (muskmelon)

eudicotyledons; core eudicotyledons; rosids; fabids; Cucurbitales;

Cucurbitaceae; Benincaseae

AEK81574

G1EH68

503 aa

Cucumis sativus (cucumber)

eudicotyledons; core eudicotyledons; rosids; fabids; Cucurbitales;

Cucurbitaceae; Benincaseae

Phytozome: Roche/JGI v1 annotation

21491 loci containing protein-coding transcripts

XP_004138132

LOC101213657

Cucsa.197230

503 aa

Fragaria vesca subsp. Vesca (woodland strawberry)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; fabids; Rosales; Rosaceae; Rosoideae;

Potentilleae; Fragariinae

NCBI BioProject: PRJNA66853, PRJNA60037; assembly:

FraVesHawaii_1.0

7 chromosomes

25,247 genes

23,319 proteins

Phytozome: includes gene annotation of genome release 1.1.

7 chromosomes

32,831 loci containing protein-coding transcripts

XP_004299590

mrna09492.1-v1.0-hybrid

LOC101300913

gene09492-v1.0-hybrid

505 aa

Page 26: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

10

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Glycine max (Soybean) (Glycine hispida)

eudicotyledons; core eudicotyledons; rosids; fabids; Fabales;

Fabaceae; Papilionoideae; Phaseoleae

50202 genes (20 chromosomes)

44642 proteins

Phytozome: Soybean Glyma1.0 annotation; 20 chromosomes, with a

small additional amount of mostly repetitive sequence in unmapped

scaffolds.

54,175 protein-coding loci and 73,320 transcripts have been

predicted

NP_001234990

BADH1

B0M1A6

Glyma06g19820

Chromosome 6

503 aa

NP_001238427

B0M1A5

ADN03184

BADH2

Glyma05g01770

Chromosome 5

503 aa

XP_003550754

LOC100795267

Glyma17g10120

Chromosome 17

315 aa (pseudogene)

Gossypium hirsutum (upland cotton)

eudicotyledons; core eudicotyledons; rosids; malvids; Malvales;

Malvaceae; Malvoideae

AAR23816

503 aa

Gossypium raimondii (cotton)

eudicotyledons; core eudicotyledons; rosids; malvids; Malvales;

Malvaceae; Malvoideae

Phytozome: v2.1 annotation release is on genome assembly v2.0

13 chromosomes

37,505 protein coding genes

Gorai.001G071200.2

Gorai.001G071200

503 aa

Gorai.007G047400.1

Gorai.007G047400

502 aa

Halocnemum strobilaceum

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

AFB74193

H6VX92

BADH

500 aa

Halostachys caspica

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ABO45931

A4LAP2

500 aa

Haloxylon ammodendron

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ACS96437

D0E0H5

BADH

500 aa

Haloxylon persicum

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

AEW31327

G9I1P9

BADH

500 aa

Page 27: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

11

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Helianthus annuus (common sunflower)

eudicotyledons; core eudicotyledons; asterids; campanulids;

Asterales; Asteraceae; Asteroideae; Heliantheae alliance; Heliantheae

ACU65243

C8CBI9

BADH

503 aa

Jatropha curcas

eudicotyledons; core eudicotyledons; rosids; fabids; Malpighiales;

Euphorbiaceae; Crotonoideae; Jatropheae

ABO69575

B2BBY6

BADH

503 aa

(AFY98894-521aa, incorrect

insertion)

Kalidium foliatum

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ABI95806

Q06AI9

500 aa

Ligusticum sinense

eudicotyledons; core eudicotyledons; asterids; campanulids; Apiales;

Apiaceae; Apioideae; apioid superclade; Selineae

ADL61811

H2KKR8

508 aa

Lycium barbarum (Matrimony vine)

eudicotyledons; core eudicotyledons; asterids; lamiids; Solanales;

Solanaceae; Solanoideae; Lycieae

ACQ99195

D2DEK8

503 aa

Medicago sativa (alfalfa)

eudicotyledons; core eudicotyledons; rosids; fabids; Fabales;

Fabaceae; Papilionoideae; Trifolieae

AFS33786

J9XXT4

BADH

505 aa

Medicago truncatula (barrel medic)

eudicotyledons; core eudicotyledons; rosids; fabids; Fabales;

Fabaceae; Papilionoideae; Trifolieae

45000 genes (8chromosomes)

46092 proteins

ABE82378

XP_003608928

G7JNS2

(AFK34878)

(AFK43263)

MTR_4g106510

Chromosome 4

503 aa

(ACJ85836; fragment)

AFK44601

(NC_016409)

Chromosome 3

503 aa

Mimulus guttatus (spotted monkey flower; Yellow monkey flower)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; asterids; lamiids; Lamiales; Phrymaceae

Phytozome: includes the JGI gene annotation v1.1 of assembly v1.0

26718 loci containing protein-coding genes

mgv1a004929m

mgv1a004929m.g

504 aa

Morus alba var. Multicaulis (mulberry)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; fabids; Rosales; Moraceae

AGG82698

M4NDU5

501 aa

Page 28: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

12

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Panax ginseng (Korean ginseng)

eudicotyledons; core eudicotyledons; asterids; campanulids; Apiales;

Araliaceae

AAQ76705

Q6JSK3

BADH1

503 aa

Phaseolus vulgaris (Kidney bean; French bean)

eudicotyledons; core eudicotyledons; rosids; fabids; Fabales;

Fabaceae; Papilionoideae; Phaseoleae

Phytozome: release of V1.0, the first chromosome scale version

11 chromosomes

Phvul.009G182300.1

Phvul.009G182300

Chromosome 9

503 aa

Phvul.003G196700.1

Phvul.003G196700

Chromosome 3

503 aa

Pisum sativum (pea)

eudicotyledons; core eudicotyledons; rosids; fabids; Fabales;

Fabaceae; Papilionoideae; Fabeae

CAC48393

Q93YB2

(PDB: 3IWJ)

AMADH2

503 aa

CAC48392

(PDB: 3IWK)

AMADH1

503 aa

Populus euphratica (Euphrates poplar)

eudicotyledons; core eudicotyledons; rosids; fabids; Malpighiales;

Salicaceae; Saliceae

AFA53117

H6V966

BADH2

503 aa

AFA53116

H6V965

BADH1

503 aa

Populus trichocarpa (Populus balsamifera subsp. trichocarpa) (black

cottonwood tree; Western balsam poplar))

eudicotyledons; core eudicotyledons; rosids; fabids; Malpighiales;

Salicaceae; Saliceae

JGI v3.0 gene annotation of assembly v3

41335 loci containing protein-coding transcripts

73013 protein-coding trancripts

45555 protein-coding genes (19 chromosomes)

XP_002322147

(B9IEL4)

Potri.015G070600

(POPTRDRAFT_666405)

Chromosome LGXV

503 aa

XP_002318630

(B9I351)

Potri.012G075600

(POPTRDRAFT_661953)

Chromosome LGXII

503 aa

Page 29: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

13

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Prunus persica (peach)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; fabids; Rosales; Rosaceae; Maloideae;

Amygdaleae

EMJ18033

M5X943

PRUPE_ppa022568mg

503 aa

EMJ11127

M5WBJ6

PRUPE_ppa004563mg

503 aa

Pyrus betulifolia (birch-leaf pear)

eudicotyledons; core eudicotyledons; rosids; fabids; Rosales;

Rosaceae; Amygdaloideae; Maleae

AER10508

G8EWJ4

503 aa

Ricinus communis (Castor bean)

eudicotyledons; core eudicotyledons; rosids; fabids; Malpighiales;

Euphorbiaceae; Acalyphoideae; Acalypheae

Phytozome: includes TIGR/JCVI release v0.1

31221 protein-coding transcripts

XP_002511463

B9R8Y8

RCOM_1512620

30147.t000284

503 aa

Sesuvium portulacastrum (Shoreline sea purslane) (Portulaca

portulacastrum)

eudicotyledons; core eudicotyledons; Caryophyllales; Aizoaceae

AEK98521

G1FG21

500aa

Solanum lycopersicum (tomato) (Lycopersicon esculentum)

eudicotyledons; core eudicotyledons; asterids; lamiids; Solanales;

Solanaceae; Solanoideae; Solaneae

NCBI BioProject: PRJNA66163, assembly SL2.40

12 chromosomes

27,466 genes

Phytozome: ITAG2.3 made by International Tomato Annotation

Group (ITAG) on assembly ITAG2.3

12 chromosomes

34,727 loci containing protein-coding transcripts

AAX73303

Q56R04

(PDB: 4I8Q; 4I9B)

Chromosome 6

AMADH1

Solyc06g071290.2

504 aa

NP_001234235

B6ECN9

Chromosome 3

AMADH2

Solyc03g113800.2

505 aa

Solanum torvum

eudicotyledons; core eudicotyledons; asterids; lamiids; Solanales;

Solanaceae; Solanoideae; Solaneae

AFA37976

H6V8T9

BADH

505 aa

Solanum tuberosum (Potato) (SOLTU)

eudicotyledons; core eudicotyledons; asterids; lamiids; Solanales;

Solanaceae; Solanoideae; Solaneae

Phytozome: Solanum tuberosum Group Phureja DM1-3 516R44

(CIP801092) Genome Annotation v3.4 mapped to pseudomolecule

sequence

12 chromosomes

35,119 loci containing protein-coding transcripts

PGSC0003DMP400055759

PGSC0003DMT400083025

PGSC0003DMG400033028

504 aa

PGSC0003DMP400042549

PGSC0003DMT400063205

PGSC0003DMG400024582

505 aa

Page 30: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

14

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Spinacia oleracea (spinach)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; Caryophyllales; Amaranthaceae; Chenopodioideae;

Anserineae

P17202

(PDB: 4A0M)

chloroplastic

497 aa

Suaeda liaotungensis

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

AAL33906

Q8W5A1

BADH

501aa

Suaeda maritima (Annual sea blite) (Suaeda spicata)

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

AFW04226

K7T3P2

501 aa

Suaeda salsa (Seepweed) (Chenopodium salsum)

eudicotyledons; core eudicotyledons; Caryophyllales; Amaranthaceae

ABG23669

Q155V4

BADH

501 aa

Theobroma cacao cultivar="Matina 1-6" (cacao) (cocoa)

Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicotyledons; rosids; malvids; Malvales; Malvaceae;

Byttnerioideae

Phytozome: V1.1 release

29,452 total loci containing protein-coding transcripts

EOY20585

Domain 1

TCM_011968

Chromosome 3

503 aa

EOY20585

Domain 2

TCM_011968

Chromosome 3

511 aa

Comprise two adjacent ALDH10

genes

Page 31: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

15

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Vitis vinifera (wine grape)

eudicotyledons; core eudicotyledons; rosids; Vitales; Vitaceae

NCBI BioProject: PRJNA34679

19 chromosomes

24,508 genes

Phytozome: 12X March 2010 release of the draft genome and

annotation of Vitis vinifera by the French-Italian Public Consortium

for Grapevine Genome Characterization

19 chromosomes

26346 loci containing protein-coding transcripts

XP_003634296

LOC100251899

Chromosome 17

497 aa

(CBI15092) fragment

XP_002283690

D7SHY3

GSVIVT01007829001

LOC100246770

GSVIVT01007829001

Chromosome 17

503 aa

XP_002281984

GSVIVG01032588001

LOC100250859

GSVIVG01032588001

Chromosome 14

499 aa

Fungi

Ascomycota

Schizosaccharomyces pombe (fission yeast)

Fungi; Dikarya; Ascomycota; Taphrinomycotina;

Schizosaccharomycetes; Schizosaccharomycetales;

Schizosaccharomycetaceae

Bioproject: PRJNA127, PRJNA13836, PRJNA20755

3 chromosomes

5133 protein coding genes

O59808

NP_588102

SPCC550.10

Chromosome III

500 aa

Protist

Alveolata

Perkinsus marinus ATCC 50983

Eukaryota; Alveolata; Perkinsea; Perkinsida; Perkinsidae

Bioproject: PRJNA46451, PRJNA12737

23654 protein coding genes

XP_002784436

Pmar_PMAR003695

394aa (fragment)

incomplete on both ends

Cryptophyta

Guillardia theta CCMP2712

Eukaryota; Cryptophyta; Pyrenomonadales; Geminigeraceae

Bioproject: PRJNA223305, PRJNA53577

24822 protein coding genes

EKX35999

GUITHDRAFT_117911

542 aa

Stramenopiles

Ectocarpus siliculosus (brown alga)

Eukaryota; Stramenopiles; PX clade; Phaeophyceae; Ectocarpales;

Ectocarpaceae

Bioproject: PRJEA42625

4005 protein coding genes

CBN79171

Esi_0010_0069

517 aa

Page 32: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

16

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Phytophthora sojae

Eukaryota; Stramenopiles; Oomycetes; Peronosporales

Bioproject: PRJNA17989

26489 protein coding genes

EGZ28599

G4YMH3

PHYSODRAFT_248120

473 aa

EGZ28472

G4YIQ2

PHYSODRAFT_284276

418 aa

Bacteria

Alphaproteobacteria

Rhizobium leguminosarum bv. trifolii WSM2297

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales;

Rhizobiaceae; Rhizobium/Agrobacterium group

WP_003575619

EJC83393

ZP_18315655

Rleg4DRAFT_5155

496 aa

Rhizobium sp. CCGE 510

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales;

Rhizobiaceae; Rhizobium/Agrobacterium group

EJT02404

ZP_10838419

J6DKY5

RCCGE510_27926

plasmid="pRspCCGE510d"

496 aa

Betaproteobacteria

Burkholderia ambifaria MC40-6

Betaproteobacteria; Burkholderiales; Burkholderiaceae; Burkholderia

6784 genes

ZP_01555907

YP_001809915

BamMC406_3226

Chromosome 2

493 aa

Burkholderia multivorans ATCC 17616

Betaproteobacteria; Burkholderiales; Burkholderiaceae; Burkholderia

6373 genes

ZP_01568328

YP_001585229

Bmul_5267

Chromosome 2

493 aa

Burkholderia pseudomallei 305

Betaproteobacteria; Burkholderiales; Burkholderiaceae; Burkholderia

ZP_01765086

BURPS305_6395

482 aa

Burkholderia sp. 383

Burkholderia cepacia R18194

Betaproteobacteria; Burkholderiales; Burkholderiaceae; Burkholderia

7824 genes

YP_366413

Bcep18194_C6720

Chromosome 3

490 aa

Gammaproteobacteria

Colwellia psychrerythraea 34H (COLPS)

5054 genes (chromosome)

YP_266864

CPS_0096

491 aa

Page 33: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

17

Organism

Lineage

Genome assembly

Number of genes

ALDH10

Protein accession number

Gene locus

Amino acid sequence length

Pseudomonas fluorescens Pf0-1

Gammaproteobacteria; Pseudomonadales; Pseudomonadaceae;

Pseudomonas

5833 genes (chromosome)

YP_350198

Pfl01_4470

483 aa

Pseudomonas protegens Pf-5 (PSEF5)

Gammaproteobacteria; Pseudomonadales; Pseudomonadaceae;

Pseudomonas

6233 genes (chromosome)

YP_261892

PFL_4811

482 aa

YP_260018

PFL_2912

476 aa

Pseudomonas putida W619 (PSEPW)

5309 genes (chromosome)

ZP_01639992

YP_001751324

PputW619_4475

476 aa

Page 34: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

C446

W448

W461

ZmAMADH1a

A441

W443

W456

SoBADH

I444

W459

W446

PsAMADH2 SlAMADH1

A

C441

W443

W456

B

S441

W443

W456

T441

W443

W456

V441

W443

W456

F441

W443

W456

I441

W443

W456

W460

W447

I445

Figure S1 Interactions of the

residue at position equivalent

to Ala441 of SoBADH. (A)

Molecular surface

representations showing the side-

chains of residues at position

441, 443 and 456 (SoBADH

numbering) in the known crystal

structures of plant ALDH10

enzymes. (B) Molecular surface

representations of the same

residues in the minimized

models of the in silico SoBADH

mutants in which A441 was

changed. Side-chains are shown

as sticks with oxygen atoms in

red, nitrogen in blue, and sulfur

in yellow. Carbon atoms are

colored as shown. The following

PDB codes were used:

SoBADH, 4A0M; PsAMADH2,

3IWJ; ZmAMADH1a, 4I8P;

SlAMADH1, 4I9B. The figure

was generated using PyMOL

(www.pymol.org).

Page 35: Exploring the evolutionary route of the acquisition of betaine aldehyde dehydrogenase activity by plant ALDH10 enzymes

Table S2. Distances of the side-chain atoms of residue at position 441 to their closets neighbor

Enzyme (residue) Atoms Distance (Å)

SoBADH (A441) CB A441-CZ3 W456 3.71

ZmAMADH1 (C446) CB C446-CZ3 W461 3.90

SG C446-CZ3 W461 3.94

SG C446-CZ2 W448 3.62

SG C446-CE2 W448 3.52

SG C446-NE1 W448 3.76

PsAMADH2b (I444) CD1 I444-CG W459 3.90

CD1 I444-CD2 W459 3.39

CD1 I444-CE2 W459 3.54

CD1 I444-CZ2 W459 3.85

CD1 I444-CZ3 W459 3.84

CD1 I444-CE3 W459 3.55

CG2 I444-CH2 W446 3.74

CG2 I444-CZ2 W446 3.59

CG2 I444-CE2 W446 3.81

SlAMADH (I445) CD1 I445-CH2 W460 3.81

CD1 I445-CD2 W460 3.51

CD1 I445-CE2 W460 3.56

CD1 I445-CZ2 W460 3.74

CD1 I445-CZ3 W460 3.76

CD1 I445-CE3 W460 3.62

CG2 I445-CH2 W447 3.81

CG2 I445-CZ2 W447 3.54

CG2 I445-CE2 W447 3.67

SoBADH A441C mutant CB C441-CZ3 W456 3.67

SG C441-CZ3 W456 3.94

SG C441-CE2 W443 3.74

SoBADH A441S mutant CB S441-CZ3 W456 3.67

OG S441-CZ3 W456 3.56

OG S441-CE3 W456 3.50

SoBADH A441T mutant CB T441-CZ3 W456 3.69

CG2 T441-CZ3 W456 3.72

OG1 T441-CZ3 W456 3.34

OG1 T441-CE3 W456 3.30

SoBADH A441V mutant CG2 V441-CZ3 W456 3.73

CG2 V441-CE3 W456 3.70

SoBADH A441F mutant CG F441-CZ3 W456 3.60

CG F441-CE3 W456 3.56

CD1 F441-CE3 W456 3.70

CE1 F441-CE3 W456 3.67

CZ F441-CE3 W456 3.48

CZ F441-CD2 W456 3.70

CZ F441-CG W456 3.74

CD2 F441-CZ3 W456 3.59

CD2 F441-CE3 W456 3.36

CE2 F441-CE3 W456 3.32

CE2 F441-CD2 W456 3.40

CE2 F441-CG W456 3.71

CZ F441-CE3 W456 3.48

CZ F441-CD2 W456 3.70

CZ F441-CG W456 3.74

SoBADH A441I mutant CD1 I441-CZ3 W456 3.41

CD1 I441-CH2 W456 3.86

CD1 I441-CE3 W456 3.17

CD1 I441-CD2 W456 3.41

CD1 I441-CE2 W456 3.85

aThe distances given are the average of those observed in the two monomers of each crystal structure or model. The cutoff was made at 4.0 Å. bThe crystal

structure of the PsAMDH2 enzyme is very similar in this region to that of PsAMDH1, which is not included here for this reason. The PDB codes of the crystal

structures are: SoBADH, 4A0M; ZmAMADH1a, 4I8P; PsAMADH2, 3IWJ; SlAMADH1, 4I9B.