Post on 15-Apr-2022
Acc
epte
d A
rtic
le
This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/tpj.13625
This article is protected by copyright. All rights reserved.
PROFESSOR YILIU XU (Orcid ID : 0000-0003-1564-9972)
Article type : Resource
The pomegranate (Punica granatum L.) genome and the
genomics of punicalagin biosynthesis
Gaihua Qin1,2,*, Chunyan Xu3,*, Ray Ming4,5*, Haibao Tang4, Romain Guyot6, Elena M.
Kramer7, Yudong Hu3, Xingkai Yi1,2, Yongjie Qi1,2, Xiangyang Xu3, Zhenghui Gao1,2, Haifa
Pan1,2, Jianbo Jian3, Yinping Tian3, Zhen Yue3,# & Yiliu Xu1,2,#
1Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei 230031, China;
2Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Anhui
Province, Hefei 230031, China;
3BGI Genomics, BGI-Shenzhen, Shenzhen 518083, China;
4Fujian Agriculture and Forestry University and University of Illinois at Urbana-Champaign
School of Integrative Biology Joint Center for Genomics and Biotechnology, Fujian
Agriculture and Forestry University, Fuzhou 350002, China;
5Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois
61822, USA;
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
6Institut de Recherche pour le Développement, Diversité, Adaptation et Développement des
Plantes, Montpellier 34394, France;
7Department of Organismic and Evolutionary Biology, Harvard University, Cambridge MA
02138, USA
* Equal contributors.
# Corresponding authors.
Yiliu Xu
Address: No. 40 Nongkenan Road, Hefei, Anhui Province, PR China
Tel: 0086-551-62160121
Fax: 0086-551-65160937
E-mail:yiliuxu@163.com
Zhen Yue
Address: No. 21 Hongan 3rd Street, Yantian District, Shenzhen, 518083, PR China
Tel: 0086- 189-2675-8015
Fax: 0086-755-36307125
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
E-mail:yuezhen@genomics.cn
Running head: Pomegranate genome
Keywords: Pomegranate; Genome; System evolution; Adaption; Punicalagin biosynthesis
SUMMARY
Pomegranate (Punica granatum L.) is a perennial fruit crop grown since ancient times
has been planted worldwide and is known for its functional metabolites, particularly
punicalagins. We have sequenced and assembled the pomegranate genome with 328 Mb
anchored into nine pseudo-chromosomes and annotated 29 229 gene models. A Myrtales
lineage-specific whole-genome duplication (WGD) event was detected that occurred in the
common ancestor before the divergence of pomegranate and Eucalyptus. Repetitive
accounted for 46.1% of the assembled genome. We found that the integument development
gene INNER NO OUTER (INO) was under positive selection and potentially contributed to
development of the fleshy outer layer of the seed coat, an edible part of pomegranate fruit.
genes encoding the enzymes for synthesis and degradation of lignin, hemicelluloses, and
cellulose were also differentially expressed between soft- and hard-seeded varieties,
differences in their accumulation in cultivars differing in seed hardness. Candidate genes for
punicalagin biosynthesis were identified and their expression patterns indicated that gallic
synthesis in tissues could follow different biochemical pathways. The genome sequence of
pomegranate provides a valuable resource for the dissection of many biological and
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
biochemical traits and also provides important insights for the acceleration of breeding.
Elucidation of the biochemical pathway(s) involved in punicalagin biosynthesis could assist
breeding efforts to increase production of this bioactive compound.
INTRODUCTION
Pomegranate (P. granatum L., Lythraceae) is a perennial fruit tree that is native from Iran
to the Himalayan Mountains in northern India (Morton and Dowling, 1987). It has been
cultivated since prehistoric times and was domesticated during the Neolithic period (Levin,
2006; Seeram et al., 2006). Pomegranate was introduced to China during the Tang Dynasty
(618–907 AD) via the Silk Road (McColl, 1991; Bretschneider, 1935), and has been
cultivated widely since then. Today, China has become one of the world’s main producers of
pomegranate (http://www.discoverlife.org/mp/20m?kind=Punica+granatum). In addition to
being consumed as fresh fruit, juice, and wine, pomegranate is also used as a traditional
medicine in many countries (Li, 1998; Kim et al., 2002; Sanchez-Lamar et al., 2008).
The potential antioxidant, anticancer, and antiatherosclerotic properties of the phenolic
compounds in pomegranate fruit have been extensively investigated (Longtin, 2003;
Sreekumar et al., 2014). These bioactive phenolic compounds include ellagitannins (ETs),
anthocyanins, flavonols, and flavonoids. Punicalagin (an ET unique to pomegranate), gallagic
acid, ellagic acid (EA), and punicalin are the most abundant antioxidant polyphenolic
compounds in pomegranate (Gil et al., 2000; Tzulker et al., 2007). Ingested EA from
pomegranate juice does not accumulate in the blood (Seeram et al., 2004); thus, EA from
pomegranate juice does not appear to be directly biologically active in vivo. However,
punicalagin is hydrolyzed in the gut to release EA, which is then further processed by the
microflora into urolithins (Cerda et al., 2004; Selma et al., 2014), which have biological
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
in organisms (Espin et al., 2013; Ryu et al., 2016). Urolithin A (UA) might prevent
mitochondrial dysfunction with aging to extend lifespan in C. elegans, and to prevent
age-related muscle function decline in mice (Ryu et al., 2016). To date, 69 clinical trials have
been registered with the National Institutes of Health to examine the effects of consumption
pomegranate extracts or juice on a variety of human disorders such as cancer, cardiovascular
diseases, and aging (https://clinicaltrials.gov/ct2/results?term=pomegranate). Although there
have been many studies to identify and characterize pomegranate phenolic compounds
et al., 2011; Nallely et al., 2015), and their nutritional and medical properties (Viuda-Martos
al., 2010; Landete et al., 2011), the absence of genomic data and need for more
information for pomegranate have seriously limited its genetic improvement.
is the first fruit tree in the Myrtales to be sequenced. Its genome will provide an important
reference for genetic improvement to increase the content of punicalagin and other secondary
metabolites.
In addition to its rich array of secondary metabolites, unique morphological features make
pomegranate attractive for studying system evolution and selective adaptation. The word
pomegranate is derived from Old French pome grenate (Modern French grenade) and arose
directly from Medieval Latin pomum granatum, literally ‘apple with many seeds’ (Holland et
al., 2009), although the fruit is actually a berry. This fruit was mentioned in the Bible (Janick,
2007) and Koran (Langley, 2000) and is often associated with fertility. The hundreds of seeds
in each fruit are derived from an equal number of ovules distributed among ~5–7 carpels,
which makes pomegranate an interesting subject for the study of reproductive development
and evolutionary biology. The edible fleshy outer seed coat that encapsulates the fibrous
inner seed coat is unusual among fruits and has important evolutionary and adaptive
significance. Here, we report a high-quality draft of the pomegranate genome, which not only
valuable resources for dissection of many biological and metabolic traits, but also provides a
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
powerful tool to accelerate breeding to increase production of its bioactive compounds,
particularly punicalagin.
RESULTS
Genome assembly and annotation
The genome of an elite pomegranate cultivar ‘Dabenzi’ (2n = 2x = 18) (Figure S1),
commonly grown in Anhui Province, China, was sequenced at 100× coverage using Illumina
paired-end reads of libraries with insert sizes ranging from 170 bp to 40 kb (Table S1). The
final genome assembly is 328.38 Mb, or ~92% of the estimated total genome size based on
17-mer analysis (356.98 Mb) (Figure S2), close to the estimates obtained by flow cytometry
(Figure S3). The N50 contig length is 66.97 kb and the N50 scaffold length is 1.89 Mb (Table
S2), which is a satisfactory result from assembling only whole-genome shotgun reads. When
the quality of the genome was assessed using assembled unigenes from the transcriptome,
more than 97.37 % of 105 472 unigenes could be aligned to the genome, which indicated
extensive genomic coverage (Table S3).
To construct pseudo-chromosomes, we sequenced 195 F1 individuals from the cross
between ‘Dabenzi’ and ‘Baiyushizi’, constructed a genetic map for ‘Dabenzi’ spanning
1018.67 cM with an average of 232.45 kb/cM, and established nine linkage groups
corresponding to the haploid chromosome number of pomegranate (Figure S4 and Table S4).
A total of 310 scaffolds (>2 Kb) were anchored to the nine linkage groups with 2421 SNP
markers, corresponding to 70.32% of the reference genome. Among these, 140 scaffolds,
corresponding to 62.8 % of the reference genome (Table S5), were oriented.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
A combined structure- and homology-based analysis identified 155.3 Mb of repetitive
sequences that account for 46.1% of the pomegranate assembly. Analysis of these repetitive
sequences indicated that retrotransposons comprised 40.5% of the pomegranate genome and
that only 2.1% of these repeats were uncharacterized. Long terminal repeat (LTR)
retrotransposons comprise 18.9% of the genome (Gypsy 9.8% and Copia 4.8%), and
non-autonomous LTR retrotransposons comprise 19.8% of the genome (Table S6).
Somewhat lower percentages of non-autonomous LTR retrotransposons have been observed
in the coffee (10%) (Denoeud et al., 2014) and Eucalyptus (15%) (Myburg et al., 2014)
genomes. By estimating the insertion time of identified LTR retrotransposons based on the
divergence of LTR sequences (Sanmiguel et al., 1998), we found relatively lower divergence
of Copia than Gypsy elements, suggesting a recent or ongoing wave of retrotransposition for
Copia elements (Figure S5). Considering that Copia elements had inserted in both gene-rich
and gene-poor regions (Figure 1), their activities represent a potential source of genetic and
epigenetic variation (Lisch, 2013).
To facilitate gene annotation, we conducted12 transcriptomes of six different organs or
tissues, including root, leaf, flower, pericarp, and the inner and outer seed coats. Next, we
created protein-coding gene models by combining de novo and homology-based prediction
with RNA-Seq data to train the prediction program. In total, 29,229 protein-coding gene
models with an average gene length of 2574.61 bp were annotated (Table S7), excluding
sequences similar to known TEs. Functional annotations for 62.52% of these predicted genes
were found in the SwissProt database (http://www.uniprot.org/) (Table S8).
Genome evolution
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
The 29 229 annotated gene models were clustered by sequence similarity and yielded 13
640 gene families comprised of at least two genes. A total of 6230 orphan pomegranate genes
were identified that could not be clustered with any genes from P. granatum, A. thaliana,
Carica papaya, Eucalyptus grandis, Malus domestica (apple), Oryza sativa (rice), Populus
trichocarpa (poplar), Theobroma cacao (cocoa), Vitis vinifera (grape), Ziziphus jujuba
(jujube), Solanum lycopersicum (tomato), or Prunus persica (peach) (Table S9). We
conducted analysis of Gene Ontology (GO) annotations for these orphan genes and found
they were enriched in GO terms including catalytic activity, metabolic process, cellular
process, and cell and cell part (Figure S6). KEGG pathway enrichment analysis of these
orphan genes indicated that they were enriched mainly in pathways for biosynthesis of
secondary metabolites and phenylpropanoids (Table S10). Polyphenolic compounds
including flavonoids, anthocyanins, and tannins are the main secondary metabolites in
pomegranate, and have potential antiviral, antifungal, or antibacterial properties
(Johanningsmeier and Harris, 2011). The orphan genes identified here might be responsible
for the biosynthesis of these secondary metabolites, and could ultimately have contributed to
the adaptation of pomegranate.
A comparison of pomegranate, Eucalyptus, grape, tomato, and Arabidopsis sequences
indicated 8854 gene families in common among these species and 1109 gene families unique
to pomegranate, while 1028 were unique to grape, 953 were unique to Arabidopsis and 1435
were unique to tomato (Figure S7). To study gene families that had expanded or contracted
only in pomegranate, we compared gene families from pomegranate to those of 11 other
representative species and to ancestors species. In total, 546 gene families had expanded, and
1926 gene families had contracted in the pomegranate genome compared to its most recent
common ancestor (Figure 2). Results of KEGG pathway enrichment analysis revealed that
these expanded gene families were concentrated in phenylpropanoid biosynthesis (Table
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
S11), an observation consistent with the enrichment of genes unique to pomegranate. This
enrichment information provided vital insights into secondary metabolism in pomegranate.
To address the phylogenetic position of pomegranate, we performed whole-genome
analysis of 12 sequenced plant genomes, and generated 509 orthologous gene clusters.
Pomegranate, just as Eucalyptus, is in the order Myrtales, which is classified as a sister taxon
to the rosids (Figure 2), but not to the Malvids clade within the eurosids as in the Angiosperm
Phylogeny Group (APG, IV) classification (Angiosperm Phylogeny Group, 2016). The
phylogenetic position of pomegranate based on this genome-wide analysis provided more
evidence as to the grouping of Myrtales, and is in agreement with other recent whole-genome
studies (Argout, et al., 2011). The discrepancy between these classifications highlights the
trade-offs between the two methodologies (Martin et al., 2005), because the genetic evidence
used in the APG classification is based on single genes or on chloroplast, plastid,
mitochondrial, and nuclear ribosomal genes (Soltis et al.,2011).
Based on previous studies of pairwise synonymous substitution rates (Ks values)
between paralogs, we calculated the fourfold synonymous third-codon transversion (4DTv)
rates (Figure S8) to determine the age of the pomegranate WGD based on diversification of
these nucleotide sequences. A lineage-specific WGD and additional small-scale duplication
events were detected in pomegranate that had occurred since its divergence from grape.
In addition to the temporal evidence such as the distribution of 4DTv rates, spatial
evidence such as cross-species synteny further supports the lineage-specific WGD. Synteny
analyses between the genome of pomegranate and those of grape and peach showed clear
evidence of a single WGD event in the pomegranate lineage (Figure 3). For each genomic
region in grape and peach, we typically found two matching regions in pomegranate with a
similar level of divergence and gene-level fractionation (Figure S9a and S10a). The genomes
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
of grape and peach all lacked recent WGDs after the genome triplication event (the γ event)
that is shared among all core eudicots (Argout et al., 2011; Jaillon et al., 2007; Jiao et al., 2012;
Verde et al., 2013). The overall 2:1 synteny patterns between pomegranate and grape or peach
suggest that pomegranate had experienced another WGD after its divergence from these
related rosid genomes, which lacked recent WGDs. The pomegranate WGD was further
confirmed by comparing the pomegranate genome to itself (Figure S9b) and readily
identifying large duplicated blocks of relatively recent origin.
The dating of the pomegranate WGD can be further narrowed down by comparison to the
related Eucalyptus genome. Both Eucalyptus and pomegranate belong to the order Myrtales
under the APG IV classification (Angiosperm Phylogeny Group 2016; Magallon et al., 2015),
a conclusion that is also well supported by our whole-genome phylogeny. A previous study
inferred that Eucalyptus had undergone a lineage-specific WGD event, estimated to have
occurred 109.9 (105.9–113.9) million years ago (mya) (Myburg et al., 2014). With the
pomegranate genome sequence, we found that the most recent WGDs in Eucalyptus
and pomegranate occurred in their common ancestor. The Eucalyptus genome had undergone a
WGD after the known paleo-hexaploidization event γ (Myburg et al., 2014), and also showed a
clear 2:1 synteny pattern when compared to the genomes of grape or peach (Figure 3). The
similar 2:1 synteny patterns suggested that Eucalyptus and pomegranate might once have had
the same ploidy level (Figure 3 and Figure S10b). Additionally, a clear 1:1 synteny pattern was
observed in a Eucalyptus-pomegranate comparison by reciprocal best BLAST matches (Figure
S9c and S10c), indicating that the WGD predated their divergence, i.e., they shared a WGD
event. If the WGDs in these two species were independent events that occurred after the
divergence of Eucalyptus and pomegranate, we would expect instead to see a 2:2 synteny
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
pattern. Thus, the observed 1:1 synteny pattern excludes the possibility of two separate WGD
events.
The WGD in the Myrtales, first discovered in the Eucalyptus genome, has now also been
observed in the pomegranate genome. The typical macro- and micro-synteny pattern reflected
a 1:2 relationship between genomic regions from grape or peach compared to the two
Myrtales genomes (Figure 3). The pomegranate genome sequences have provided more
precise timing for the Myrtales WGD event, which has now been placed closer to the origin
of this clade.
Disease resistance genes and transcription factors in the pomegranate genome
Plants have evolved a wide range of resistance mechanisms in their constant struggle for
survival. Beyond simple physical or chemical barriers, more sophisticated biochemical
mechanisms are based on gene-for-gene interactions between plants and infectious agents. By
querying specific protein domains and applying classification schemes (Sanseverino et al.,
2010; Lehti-Shiu and Shiu, 2012), we identified 710 (2.43 % of the gene models) R genes and
995 (3.40% of the gene models) genes encoding protein kinases in the pomegranate genome
(Table S12 and S13), numbers similar to those in the grape and Arabidopsis genomes (Jaillon
et al., 2007; Arabidopsis Genome Initiative 2000). We mapped the putative R genes to nine
pseudo-chromosomes and found that they were nonrandomly distributed (Figure S11). The
observed enrichment of R genes in these certain genomic regions indicated that R genes
might have evolved by tandem duplication and subsequent divergence of linked gene
families.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Transcription factors (TFs) and transcriptional regulators (TRs) also play key roles in
adaption to environmental changes by regulating the transcription of their target genes in a
spatiotemporal manner. Based on the comprehensive rules for TF identification and
classification in the Plant Transcription Factor Database (PlantTFDB 3.0, Jin et al., 2014), we
identified a total of 1867 (6.39% of the gene set) TFs in the pomegranate genome. For
comparison, the total numbers of TFs in grape, Arabidopsis and Eucalyptus were 1690
(Jaillon et al., 2007), 2195 (Arabidopsis Genome Initiative 2000) and 2207 (Myburg et al.,
2014), respectively (Table S14).
To analyze the evolution of transcription factors, we explored the evolutionary patterns of
the MADS-box superfamily in pomegranate. MADS-box TFs are central controllers of floral
organ identity (Smaczniak et al., 2012). The development and growth of floral organs is of
great importance from both biological and agricultural perspectives, for the reproduction and
spread of species and the improvement of yield in seed crops, respectively. Over recent
decades, extensive genetic and molecular research have supported the elaboration of a
framework for the development of flower organ identity, known as the ABC(E) model
(Cucinotta et al., 2014). Variation in flower structure is key to the evolutionary success of
angiosperms (Yan et al., 2016). Genome-wide genetic information could provide a global
view of the gene regulatory networks necessary for specifying flower organ development in
pomegranate. Specific MADS-box gene lineages in pomegranate have been duplicated and
become diversified. In particular, we see dramatic expansions in the copy number of type I
homologs of the Mα and Mγ subfamilies, although this type of expansion is relatively
common for the type I loci (Figure S12a). Members of these lineages control aspects of
gametophyte and endosperm development that appear to evolve quite rapidly (Masiero et al.,
2011). More surprising are the large expansions observed in the AGL22/SHORT
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
VEGETATIVE PHASE (AGL22/SVP) subfamily, although independent increases in copy
number in this subfamily were also observed in both Vitis and the close pomegranate relative
Eucalyptus (Figure S12b). AGL22/SVP homologs contribute to the regulation of flowering
time in Arabidopsis, but participate in other aspects of floral development, including
inflorescence structure and fruit abscission (Thouet et al., 2012) in other dicots. Further study
of these recent paralogs will be necessary to understand how they might contribute to
pomegranate development.
Development of seed coats in pomegranate
The ovules of pomegranate are bitegmic; that is, they possess two integuments that
develop into the distinct inner and outer seed coats of pomegranate seeds that protect the
endosperm and embryo and contribute to seed dormancy, dispersal, and germination
(Debeaujon et al., 2000). The soft, edible outer coat of pomegranate is highly vacuolized
and is rich in sugars, organic acids, anthocyanins, and minerals (Fawole and Opara, 2013a).
The compact cells of the inner seed coat show little to no vacuolization and are enriched in
lignin and fiber. The molecular mechanisms of integument and seed coat development were
previously unknown because of limited molecular genetic information available for
pomegranate. The genomic and transcription data described here will provide a basis for
further insights into the molecular mechanisms of integument and seed coat development in
pomegranate.
Lignin, cellulose, and hemicellulose comprise a substantial portion of seed biomass and
determine seed hardness (Hu et al., 1999; Perez et al., 2002). We extracted genes involved in
lignin, cellulose and hemicellulose metabolism, and conducted transcriptomic analysis of the
inner and outer seed coats of the hard-seeded pomegranate cultivar ‘Dabenzi’ at 50, 95, and
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
145 days after pollination (DAP), and of the inner seed coat of the semi-soft-seeded cultivar
‘Baiyushizi’ and the soft-seeded cultivar ‘Tunisi’ at 50 DAP (Table S15). The sets of genes
encoding the key enzymes involved in lignin, cellulose, and hemicellulose synthesis and
degradation were nearly the same in the outer and inner seed coats. Expression of genes
involved in lignin, cellulose, and hemicellulose synthesis increased regardless of stage of
development or cultivar, as did those involved in degradation of these molecules (Figure 4).
Thus, the structural differences in seed coats among cultivars might be related to the relative
abundance of transcripts encoding enzymes catalyzing lignin, cellulose, and hemicellulose
synthesis and degradation.
Seed size, which can depend on seed coat development, is important for both fitness and
yield (Xia et al., 2013). The functions of genes involved in the development of the inner or
outer integument have been shown previously in other plant species (Prasad et al., 2010;
Villanueva et al., 1999). In pomegranate, genes orthologous to FLORAL BINDING
PROTEIN 24 (FBP24) and ABERRANT TESTA SHAPE (ATS) are expressed during inner
integument development, whereas genes orthologous to INNER NO OUTER (INO), SEUSS
(SEU), and KANADI1/2 (KAN1/2) are expressed during outer integument development (Table
S16). Plant organs develop through processes involving organized cell growth, proliferation,
and expansion (Prasad and Ambrose 2010). The expression of genes involved in integument
development (Endress 2011a; Prasad and Ambrose 2010; Skinner et al., 2004) has been
analyzed for inner and outer seed coats and flowers to capture early developmental stages of
the inner and outer seed coats. TRANSPARENT TESTA GLABRA 1 (TTG1) and
TRANSPARENT TESTA GLABRA 2 (TTG2) (Gonzalez et al., 2009), which are involved in
integument development through cell expansion, and BIG BROTHER (BB) and DA 1 (which
means ‘large’ in Chinese) (Li et al., 2008; Xia et al., 2013), which affect integument
development through repression of cell expansion, are all expressed dynamically in the inner
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
and outer seed coats of pomegranate. The dynamic expression patterns of these genes during
integument development indicated that DSO, BELL1 (BEL1), TTG1, BB, YAB1, and YAB2
might contribute to inner seed coat development, and that INNER NO OUTER (INO), DSO,
KAN1, KAN2, and TTG2 might contribute to outer seed coat development in pomegranate
(Figure 4).
The fleshy, edible outer part of the pomegranate seed coat is unique among fruits,
compared to the receptacle of pear and apple, the ovary of peach and apricot, and the aril of
lychee, which represent the edible portions of those fruits. Genes in the YABBY family exhibit
polar expression in the apical meristem and flower primordium (Siegfried et al., 1999;
Villanueva et al., 1999) and the functions of INO, YABBY1 (FIL/AFO), YABBY2 (YAB2), and
YABBY3 (YAB3) in integument growth have been shown in Arabidopsis (Meister et al.,
2005). Given the highly conserved YABBY genes in eudicots (Toriba et al., 2007), no obvious
expansion of the YABBY gene family was observed in the pomegranate genome (Figure 5a).
The amino acid sequences of putative YABBY orthologs from 16 species belonging to 10
families in 14 genera (including pomegranate, apple, Eucalyptus, grape, and Arabidopsis)
were aligned (Figure S13 and S14) for analysis of selection using PAML (V4.8) software. An
amino acid site under positive selection was detected at the 35th amino acid residue (Q) in the
predicted protein encoded by the pomegranate INO gene (Figure 5b). INO is essential for the
development and asymmetric growth of the outer integument and is critical for outer
integument development (Mcabee et al., 2005). Polar expression of INO during outer
integument development is regulated by the expression of ANT and BEL1 (Meister et al.,
2004; Sieber et al., 2004; Skinner et al., 2004). Positive selection has also been detected in
the predicted protein encoded by the pomegranate BEL1 gene (Figure S15). Positive selection
acting on the INO, and its regulator BEL1 might account for the polar expression of INO
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
during integument development, which ultimately affects the expansion of the fleshy outer
seed coat of pomegranate.
The TPS gene family
The biosynthesis of terpenes, catalyzed by terpene synthase (TPS), is an important
metabolic pathway in plants. Terpenes are important volatiles with a variety of roles in
mediating both antagonistic and beneficial interactions among organisms (Gershenzon and
Dudareva 2007). Monoterpenes such as α-pinene, β-myrcene, and limonene are major
components of pomegranate flavor (Kirshinbaum, 2012; Melgarejo et al., 2011) that appeal
to consumers. Monoterpenes such as 1,8-cineole from Eucalyptus can act as powerful natural
antimicrobials and pesticides (Batish et al., 2008). To characterize the TPSs that participate in
terpene biosynthesis, we predicted and compared TPS genes from the pomegranate and
Eucalyptus genomes, and from their ancestral lineage, grape, with those from the Arabidopsis
genome. The TPS genes of pomegranate were evolutionarily conserved in every subfamily
branch of the TPS gene family. The phylogenetic tree indicated that the TPS-b and TPS-g
subfamilies in pomegranate were the result of post-speciation gene duplications and
functional diversification during evolution. Although gene duplication may be the
predominant pattern for evolution of genes in the TPS-a subfamily (Figure S16) that produces
monoterpenes in grape (Martin et al., 2010), the different monoterpenes produced in
pomegranate and Eucalyptus may not attribute to the differential evolutionary of TPSs.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Anthocyanin metabolism
Anthocyanins are the major pigments responsible for the color of pomegranate fruits and
flowers. In addition to their dietary antioxidant properties, anthocyanin pigments function to
attract pollinators to flowers and seed dispersers to fruits (Winkel-Shirley, 2001).
Anthocyanin metabolism in the plant phenylpropanoid pathway has been studied extensively,
and many genes involved in the pathway have been isolated (Zhao et al., 2015). Studies of
anthocyanin biosynthesis have revealed that the expression of late genes in the anthocyanin
biosynthetic pathway depend on the activity of WD-repeat/MYB/bHLH transcriptional
complexes (Gonzalez et al., 2008; Pelletier et al., 1999). In the present study, we predicted
gene models for the late anthocyanin structural genes and their transcription factors and
studied their expression patterns in flowers and outer seed coats at different developmental
stages (50, 95, and 140 DAP) (Table S17and S18). The expression of several genes in the
anthocyanin metabolism pathway encoding dihydroflavonol 4-reductase (DFR),
leucoanthocyanidin dioxygenase (LDOX/ANS), anthocyanidin 3-O-glucosyltransferase
(BZ1), and anthocyanidin 3-O-glucoside-5-O- glucosyltransferase (UGT75C), in addition to
genes encoding MYB, basic Helix-Loop-Helix (bHLH) and WD40-repeat, increased in
concert with the accumulation of anthocyanins in the outer seed coats during development
(Figure 6). These genes are therefore considered candidate genes for anthocyanin
biosynthesis in pomegranate. The expression patterns of candidate genes and TFs involved in
anthocyanin metabolism in pomegranate flowers and outer seed coats indicated that different
candidate genes and TFs contribute to the anthocyanin accumulation in flower and outer seed
coats. This information will be useful for further studies of anthocyanin metabolism in
pomegranate.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Punicalagin metabolism
Pomegranate is an important source of bioactive compounds and has been used in
medicine for many centuries (Longtin, 2003). Its potential pharmaceutical properties are
attributable to high contents of polyphenols, such as ellagitannins. Gallic acid is a common
precursor for EA and punicalagin synthesis, although its metabolism in plants is still an open
question. We have drafted a gallic acid biosynthesis pathway according to KEGG and the in
vivo reactions that have been reported (Mishra and Gold 2006; Ossipov et al., 2003), and
have surveyed genes encoding the corresponding enzymes and their expressions in different
tissues of pomegranate (Table S19). Genes encoding the bifunctional enzyme dehydroquinate
dehydratase/shikimate dehydrogenase (EC 4.2.1.10/1.1.1.25) (Singh and Christendat, 2006),
3-dehydroshikimate dehydratase (EC 4.2.1.118), dehydroxylase (1.17.98.x), and
methyltransferase (EC 2.1.1.-) were found in the pomegranate genome, so gallic acid in
pomegranate might be derived from shikimic acid and syringate, but not quinic acid.
Pgr015317.4 was the only gene encoding 3-dehydroshikimate dehydratase identified in
pomegranate genome.Its expression implied that gallic acid in root and flower might be
derived from shikimic acid, while that in leaf might be derived from syringate. The
expression of genes encoding these enzymes in the pericarp and in the inner and outer seed
coats at different developmental stages indicated that the two biosynthetic pathways might
both occur in the outer seed coat, while the gallic acid in inner seed coat and perpicarp might
be derived from syringate at early stages of development and shikimic acid at later stages of
development (Figure 7).
Derivatives of gallic acid, EA and punicalagin, have been identified in different
pomegranate tissues, but their metabolism has not yet been fully characterized.
Carboxylesterase (EC 3.1.1.1), glycosyltransferase (EC 2.4.1.x), acyltransferase (EC 2.3.1.x),
and glucohydrolase (EC 3.2.1.21) are candidate enzymes in the punicalagin biosynthesis
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
pathway (Figure 7). Because UDP-glycosyltransferases (UGT) that form glucose esters with
benzoates belong to the phylogenetic group L (Lim et al., 2002), genes encoding
galloyl-O-β-d-glucosyltransferase (UGGT) in pomegranate probably also belong to group L.
We detected 13 genes encoding UGGT in the pomegranate genome (Figure S17). Similarly,
genes encoding an acyltransferase that uses hydroxycinnamoyl/ benzoyl CoA as an acyl
donor (D'auria, 2006) might participate in the transfer of O-galloyl groups during punicalagin
synthesis. We identified eight genes encoding this acyltransferase in the pomegranate genome
(Figure S18). One gene encoding carboxylesterase, an enzyme that interconverts gallic acid
and ellagic acid, was identified in the pomegranate genome. Its low expression in the inner
seed coat implied that little interconversion occurred in this tissue, which might account for
its low ellagic acid contents (Fischer et al., 2011). The pathway for punicalagin metabolism
and candidate genes predicted here will provide invaluable information for the genetic
improvement of production of bioacitive compounds in pomegranate.
Punicalagins are hydrolyzed to EA in the gut, and then EA is then gradually metabolized
by the intestinal microbiota to produce different types of urolithins (Cerda et al., 2004). This
process could include lactone-ring cleavage, decarboxylation and dehydroxylation reactions
(Selma et al., 2009), and then tannin acetylhydrolase (EC 3.1.1.20), dehydroxylase (EC
1.17.98.x) and gallate decarboxylase (EC 4.1.1.59) are candidate enzymes that might
participate in the metabolism of EA to urolithins (Figure 7). We surveyed genes encoding
these enyzmes in intestinal microflora, such as Lactobacillus plantarum, Lactobacillus
paraplantarum, Clostridium scindens, and Debaryomyces hansenii and obtained orthologous
genes encoding acetylhydrolase and dehydroxylase, but no reference sequences encoding
gallate decarboxylase could be found in the data for these microflora. Refer to the domains of
reference sequences, we identified 33genes encoding dehydroxylase in pomegranate genome,
but no gene encoding acetylhydrolase. Alignment of pomegranate gene sequences for
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
dehydroxylase to the sequences from microflora (Figure S19) showed that a low degree of
conservation of genes encoding dehydroxylase among kingdoms might account for whether
these enzymes function in punicalagin degradation in different species. Lacking gene
encoding acetylhydrolase may also be responsible for the lacking punicalagin degradation in
pomegranate.
DISCUSSION
Pomegranate, an ellagitannin-rich fruit crop, is a widely used source of dietary
polyphenols. An analysis of the antioxidant activity of various dietary supplements has
shown that the antioxidant activity of pomegranate juice exceeded that of orange juice
(Parashar, 2011), milk thistle, green tea, grape seed, goji, acai (Henning et al., 2014), and
infusions of red wine or green tea (Gil et al., 2000). In addition to antioxidant function, EA
and punicalagins in pomegranate have been implicated as potent anticancer and
anti-atherosclerotic agents (Adams et al., 2010; Usta et al., 2013; Viuda-Martos et al., 2010).
Although the potential health benefits of punicalagins have been recognized, the biosynthesis
of punicalagins has not yet been thoroughly characterized. Based on the pathways for
biosynthesis of gallic acid and the structures of its derivatives, we predicted the punicalagin
biosynthetic pathway and identified candidate genes encoding enzymes in this pathway. The
pomegranate genome sequence offers an unprecedented opportunity to validate these
candidate genes and fully characterize punicalagin metabolism, which will provide
knowledge and tools for pomegranate breeding to increase punicalagin production.
Punicalagins are metabolized into bioavailable hydroxy-6H-dibenzopyran-6-one
derivatives (urolithins) by the colonic microflora in healthy humans. However, its precursors,
punicalagin, and ellagic acid, could not be detected in either plasma or urine (Espin et al.,
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
2013), although that could be due to rapid excretion (Seeram, et al., 2004). Health benefits of
urolithins in humans and animals have previously been reported (Espin, et al., 2013;
Kasimsetty et al., 2010; Ryu et al., 2016), and the putative pathway for urolithin metabolism
has provided a point of reference to understand their function and utility.
Besides their nutrient and potential health-promoting effects, secondary metabolites
including ellagitannins, fragrant monoterperenes, and colored anthocyanins in flowers and
fruits also play a vital role as defense (against herbivores, microbes, and competing plants)
and signal compounds (to attract pollinating or seed-dispersing animals) (Wink 2003). The
occurence of these secondary metabolites could reflect adaptation and particular life
strategies within a particular plant phylogenetic framework. The genome of pomegranate now
provides powerful tools for identification of genes on a large-scale and elucidation of
pathways for these secondary metabolites.
Analysis of adaptive evolution and survival mechanisms in plants are long-standing
research objectives. Enhancing stress resistance and modifying morphological structures via
gene duplication and neofunctionalization are strategies that have allowed plants to resist or
tolerate biotic and abiotic stresses during evolution (Benderoth et al., 2006). As the key
feature of angiosperm seeds, the integument can be traced back to the evolution of the earliest
angiosperm plants (Endress, 2011a). Pomegranate has bitegmic ovules (Endress, 2011b), and
its outer integument has evolved into a fleshy, juicy, and expanded outer seed coat, the edible
part of the fruit. From a molecular developmental genetics perspective, the outer integument
is thought to have arisen due to the assymmetric expression of INO, a marker gene for outer
integument development (Eshed, 2001). Positive selection acts on the frequency of mutations
and can result in species diversification that provides an evolutionary strategy to cope with
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
challenges imposed by biotic and abiotic stresses. Positive selection operating on the INO
gene in pomegranate might have contributed to expansion of the outer seed coat that attracts
herbivores to disperse seeds and contributes to seed size and yield, with subsequent effects on
species fitness and survival.
This high-quality genome of pomegranate will facilitate future investigation into its
adaptation and biosynthesis of secondary metabolites, and will provide valuable information
for elucidating many other desirable traits in this nutritious fruit crop with medicinal
properties.
EXPERIMENTAL PROCEDURES
Plant materials, DNA and RNA preparation
DNA extracted using CTAB method (Murray and Thompson 1980) from fresh leaf
tissues of pomegranate cultivar ‘Dabenzi’ was used for whole-genome shotgun sequencing.
RNAs extracted using the Plant RNA Isolation Kit (AutoLab, Beijing, China) were used for
transcriptome analysis of the following tissues: seeds of the semi-soft seeded cultivar
‘Baiyushizi’ (BY) and the soft-seeded cultivar ‘Tunisi’ (T) at 50 day after pollination (DAP);
and leaf (L), root (R), flower (F), and pericarp (P), inner seed coat (ISC), and outer seed coat
(OSC) at 50, 95, and 140 DAP of the hard-seeded cultivar ‘Dabenzi’ (DB). Flower and inner
seed coat at 50, 95, and 140 DAP of ‘Dabenzi’ were used for total anthocyanin content
determination. Outer seed coat at 50, 65, 80, 95, 110, 125, and 140 DAP of ‘Dabenzi’ were
used for seed hardness determination. Pericarp, inner seed coat and outer seed coat at 50, 95,
and 140 DAP of ‘Dabenzi’ are abbreviated as DB-P50, DB-P95, DB-P140, DB-ISC50,
DB-ISC95, DB-ISC140, DB-OSC50, DB-OSC95, and DB-OSC140, respectively (Table
S20).
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Genome sequencing and de novo assembly
We carried out whole-genome shotgun sequencing using the Illumina HiSeq 2000
platform. Two paired-end libraries with 170-bp or 500-bp inserts, and five mate-paired
libraries with 2-kb, 5-kb, 10-kb, 20-kb, or 40-kb inserts were constructed and sequenced.
Total sequencing depth was about 305.52×, considering the predicted genome size of 357 Mb
according to the k-mer method.
The pomegranate genome was assembled using SOAPdenovo2 (Luo et al., 2012)
(Version_2.04) with clean reads from seven libraries. Gap Closer
(http://sourceforge.net/projects/soapdenovo2/files/GapCloser/) was used to fill gaps based on
paired-end reads. This generated assembly V1.0 (Table S21).
Estimating the pomegranate genome size by flow cytometry (FCM)
The genome size of pomegranate cultivar ‘Dabenzi’ was estimated by FCM, using the
genome of Arabidopsis thaliana Col-0 as reference. Fluorescence intensity determination is
shown in Figure S3. The genome size calculation for pomegranate was performed as:
250.46/95.41 × 125 Mb = 328.13 Mb, which was close to the k-mer predicted size.
Genetic map construction and chromosome assembly
A genetic map was constructed using 195 F1 lines developed from a cross between
‘Dabenzi’ and ‘Baiyushizi’ by a reduced representation restriction-associated DNA (RAD)
sequencing method using the RAD-Seq protocol (Baird et al., 2008).
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
SNP-calling for each individual RAD read was carried out using a pipeline. Each
individual’s paired-end RAD reads were mapped onto the assembled reference genome using
the alignment software SOAP2 (Li et al., 2009). SNP calling was performed using SOAPsnp
(Li et al., 2009). Markers (SNPs) showing significantly distorted segregation (p-value < 0.01)
according to Chi-square tests were excluded from use in map construction. Markers with
missing rates of less than 50% were used for linkage analysis using JoinMap 4.0 software
(https://www.kyazma.nl/index.php/JoinMap/General/) with CP (outbreeder full-sib family)
population type codes and the double pseudo-test cross strategy was applied (Grattapaglia
and Sederoff, 1994). Linkage groups were designated at a logarithm of the odds (LOD)
threshold of 6.0 and ordered using the regression-mapping algorithm.
Genome quality evaluation
Evaluation of assembly redundancy. We aligned the entire assembled genome against
itself using the MUMmer3.23/nucmer (Kurtz et al., 2004) program with parameters
-maxmatch -c 90 -l 20 to assess genome redundancy. Sequences were considered redundant if
they met all of the following criteria: (i) Match length ≥200 bp; (ii) Match identity ≥90%;
and (iii) one contig or scaffold totally contained within another contig or scaffold.
Evaluation of gene region coverage. Assembled unigenes derived from RNA-Seq data
for the 12 tissues described above in Plant materials were used to verify the completeness of
the genome assembly. A total of 105 472 unigenes were mapped to the assembled genome
using BLAT software (Kent, 2002) under default parameters. Analysis was performed at
three different percent sequence homology levels and percent coverage using custom Perl
scripts.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Genome annotation
Annotation of repeats. De novo and homology-based approaches were used to identify
the content of repeats in the pomegranate genome. Firstly, a pomegranate de novo consensus
repeat database was built using the software packages LTR_FINDER (Xu and Wang 2007)
(Version 1.0.6), PILER (Edgar and Myers 2005) (V1.0), and RepeatScout (Price et al., 2005)
(Version 1.0.5). Transposable elements were then identified using RepeatMasker and
RepeatProteinMask Version 3.3.0 (http://www.repeatmasker.org/) by using the Repbase
database (Jurka et al., 2005) to query the pomegranate genome (Table S22 and S23). Tandem
repeats in the genome were annotated using TRF (Benson 1999) (Version 4.04). The REPET
TEdenovo package (V2.2) (Flutre et al., 2011; Quesneville et al., 2005) was also used for de
novo identification and classification of transposable elements.
To infer insertion times of LTR retrotransposons, 496 full-length LTR retrotransposons
were identified using LTR_STRUC (Mccarthy and Mcdonald, 2003) with default parameters
and classifications (Figure S5).
Gene prediction. We used both homology-based and de novo methods for gene
prediction. For homology-based prediction, translated proteins from Arabidopsis thaliana
(Arabidopsis Genome Initiative, 2000), apple (Malus domestica) (Velasco et al., 2010),
tomato (Solanum lycopersicum) (The Tomato Genome Consortium, 2012), Eucalyptus
grandis (Myburg et al., 2014) and grape (Vitis vinifera) (Jaillon et al., 2007) were mapped
onto the assembled pomegranate genome using tblastn (Mount, 2007), then the program solar
was used to concatenate the several high-scoring segments (HSP) to the completed region for
each gene, and then we included 1-kb upstream and downstream regions to locate start and
stop codons and ensure that each gene region was complete. Finally, the GeneWise 2.2.0
(Birney and Durbin 2000) was used to predict gene structures and define gene models based
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
on these complete regions. For de novo gene prediction, transposons and repeats were first
masked using RepeatMasker and RepeatProteinMask. Then AUGUSTUS (Stanke et al.,
2006) and SNAP (Majoros et al., 2004) were performed using a Hidden Markov Model
(HMM) trained by complete pomegranate genes (homolog predicted genes). These
complementary gene sets from homolog-based and de novo predictions were merged to
produce a non-redundant reference gene set using GLEAN
(http://sourceforge.net/projects/glean-gene/). In addition, RNA-Seq data from the 12 tissues
described above in Plant materials were also incorporated to aid gene annotation. The
RNA-Seq data were mapped to the assembled genome using TopHat (Trapnell et al., 2009),
and transcriptome-based gene structures (Figure S20) were obtained using Cufflinks
(http://cufflinks.cbcb.umd.edu/). We then compared this gene set with the previous GLEAN
gene set using Cuffcompare program to obtain the final non-redundant gene set (Table S7).
Functional annotation. To annotate predicted pomegranate genes, we aligned predicted
proteins translated from each gene to the SwissProt and TrEMBL databases (UniProt release
2015-04) using blastp (E-value ≤1e-5), and the function of the best hit was assigned to each
gene. We also annotated motifs and domains using InterProScan (InterProScan-5.11-51.0)
(Jones et al., 2014) by searching against publicly available databases, including Pfam
(Mistry and Finn 2007), PRINTS (Attwood et al., 1994), PANTHER
(http://www.pantherdb.org/), HAMAP (Pedruzzi et al., 2015), Gene3D (Lam et al., 2016),
PROSITE (Hulo et al., 2006), SUPERFAMILY (Gough et al., 2001), ProDom (Bru et al.,
2005), and SMART (Schultz et al., 1998). Gene Ontology (Ashburner et al., 2000) functional
annotations were retrieved from InterProScan. We also mapped all of the pomegranate genes
to KEGG pathways by searching the KEGG databases (Release 76) (Kanehisa and Goto,
2000) and identifying the best hit for each gene.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Genome evolution analysis
Gene sets from P. granatum, A. thaliana (Arabidopsis Genome Initiative, 2000)
(TAIR10), E. grandis (Myburg, et al., 2014) (JGI_9.0), C. papaya (Ming et al., 2008)
(JGI_8.0), V. vinifera (Jaillon et al., 2007) (Genoscope 12X), S. lycopersicum (The Tomato
Genome Consortium, 2012) (ITAG2.3_release), O. sativa (Goff et al., 2002) (IRGSP1.0), M.
domestica (Velasco et al., 2010) (JGI_9.0), P. trichocarpa (Tuskan et al., 2006) (JGI_9.0),
P. persica (Verde et al., 2013) (JGI_7.0), T. cacao (Argout, et al., 2011) (CIRAD_v0.9), and
Z. jujube (Liu et al., 2014) were used for analyses of genome evolution, including gene
clustering, phylogeny construction, divergence time estimation, and identification of
chromosome collinearity.
Gene clusters were identified using OrthoMCL (Li et al., 2003) with inflation value set
to 1.5 and using other default parameters. Gene families in these 12 species (Table S9) and
shared gene families in a subset of five species (P. granatum, E. grandis, A. thaliana, V.
vinifera, and S. lycopersicum) were also generated (Figure S7). Based on the OrthoMCL gene
clusters, the proteins of 509 single-copy gene families were aligned using MUSCLE (Edgar
2004) (http://www.ebi.ac.uk/Tools/msa/muscle/), then fourfold degenerate sites (4dTv) were
extracted and orthologous single-copy genes in each species were used for phylogeny
construction and divergence time estimation. The software MrBayes (Huelsenbeck and
Ronquist, 2001) was chosen to construct the phylogenetic tree of orthologs using the GTR
model. To validate this phylogenetic tree, we also constructed other phylogenetic trees using
PhyML (Guindon et al., 2010) software based on single-copy genes (Figure S21).
A Markov Chain Monte Carlo algorithm for Bayesian inference was implemented to
estimate the divergence times of species using the MCMCTree program in the PAML
package (Yang 2007) with a fixed corrected time point of ~71.9 (~70–74) mya as the time of
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
the split between Arabidopsis and papaya used for the calibration time (Wikstrom et al.,
2001). The phylogenetic relationships among these species and the estimates of divergence
time between species are shown in Figure S22. The time of the split between pomegranate
and Eucalyptus is estimated at 91.6 (66.4–112.6) mya within the Myrtales branch of the
rosids.
MCScan (http://chibba.agtec.uga.edu/duplication/mcscan) was used to construct blocks
of chromosome collinearity between pomegranate, Eucalyptus, and grape. Syntenic blocks
containing at least five genes were identified based on similarities between gene pairs. All of
the orthologous or paralogous gene pairs were extracted from the syntenic blocks, then used
to calculate the 4dTv (Huang et al., 2009) distances using the HKY substitution model
(Hasegawa et al., 1985).
We extracted similar gene pairs using the OrthoMCL gene families of pomegranate
(blastp: E <1e-15, identity >40%, match length >100 amino acids) after removing tandem
duplicated gene pairs, then calculated the average synonymous substitutions per synonymous
site (Ks) of the 5570 duplicated gene pairs in the pomegranate genome (Figure S23).
We performed pairwise genome alignments (Phytozome version 11) (Goodstein et al.,
2012) between the grape and Eucalyptus genomes and the pomegranate genome. For each
pairwise alignment, the coding sequences of predicted gene models were compared using
LAST (Kielbasa et al., 2011). Our synteny search pipeline defines syntenic blocks by
chaining the LAST hits with a distance cutoff of 20 genes, and requires at least four gene
pairs per synteny block. For homologs inferred from syntenic alignments, we aligned the
protein sequences using CLUSTALW (Larkin et al., 2007) and used the protein alignments to
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
guide coding sequence alignments in PAL2NAL (Suyama et al., 2006). Finally, we used the
Nei-Gojobori (Nei and Gojobori, 1986) method implemented in the yn00 program of the
PAML package (Yang 2007) to calculate Ks values.
Functional enrichment of genes duplicated in WGD
We extracted all genes in the syntenic blocks of the M-WGD, the γ-WGD, and recent
small-scale gene duplication events in the pomegranate genome, and then carried out
functional enrichment analysis using the KEGG pathways (Kanehisa et al., 2008) and Gene
Ontology (Conesa et al., 2005) (GO) databases with Q-value (corrected p-value) <0.01
(Table S24-25 and Figure S24-25).
Expansion and contraction of gene families
CAFÉ (De Bie et al., 2006) (V2.1) software was used to analyze the expansion and
contraction of gene families (Figure 1). In CAFÉ, a random birth and death model is
proposed to study gene gain and loss in gene families across a specified phylogenetic tree.
The divergence time is represented by the branch length values. The global parameter λ
(lambda) is estimated using a maximum likelihood method, and a gene family with p <0.05
is considered to have gained or lost genes. The 88 genes identified as members of
significantly expanding gene families (p-value ≤0.01) in pomegranate were mapped to the
KEGG pathways and used for further analysis of functional enrichment(Figure S26).
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Identification of transcription factors, transcriptional regulators, protein kinases and R
genes
Searches for consensus transcription factor (TF) and other transcriptional regulator (TR)
domains in pomegranate proteins were performed in the Plant Transcription Factor Database
PlnTFDB (Perez-Rodriguez et al., 2010) (http://plntfdb.bio.uni-potsdam.de/v3.0/) and the
PlantTFDB (Jin et al., 2014) (http://planttfdb.cbi.pku.edu.cn/) using HMMER3.0
(http://hmmer.org/). TFs and TRs were then identified and classified according to the
consensus rules including the required and prohibited protein domains for each TF or TR
gene family summarized at the PlnTFDB.
Plant protein kinases (PKs) were also identified based on domain information. Sequences
with a significant hit for protein kinase domains (PF00069 and PF07714) in the Pfam
database were classified into protein kinase gene families by comparing their sequences to a
set of HMMs of kinase domains (Lehti-Shiu and Shiu, 2012). Based on the above methods,
we predicted TFs and protein kinases in the genomes of pomegranate, Eucalyptus, grape, and
Arabidopsis.
Profile HMMs of Resistance (R) gene domains from the PRGDB database (Sanseverino
et al., 2010) (http://www.prgdb.org) were used to predict R genes. The predicted proteome of
pomegranate was used to query the PRGDB database using hmmsearch in HMMER 3.0 with
default thresholds. The putative R genes predicted in pomegranate, grape, Eucalyptus, and
Arabidopsis were then classified into 10 types according to the domains’ combination rules in
PRGDB (Sanseverino et al., 2010) (http://prgdb.crg.eu/). As for the CC motif in the
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
N-terminal region, all proteins were searched using the program paircoil2 (Mcdonnell et
al,2006) with a P-score cut-off of 0.025.
Identification, classification, and phylogenetic analysis of MADS-box genes
We identified 88 MADS-box genes in the pomegranate genome using protein
motif/domain designations and surveyed their expression patterns in the 12 transcriptomes of
‘Dabenzi’ described above in Plant materials. Each of these 88 protein sequences were used
to query the SwissProt database (release-2015_04) using blastp, and the function of the best
hit was assigned to each protein. Sequences of MADS-box genes from A. thaliana were
downloaded from The Arabidopsis Information Resource database
(http://www.arabidopsis.org/browse/genefamily/MADSlike.jsp) (Parenicova et al., 2003),
sequences of MADS-box genes from rice (O. sativa), poplar (P. trichocarpa), grape (V.
vinifera), petunia (Petunia×hybrida), Eucalyptus (E. grandis), and columbine (Aquilegia
coerulea) were downloaded from PlantTFDB
(http://planttfdb.cbi.pku.edu.cn/family.php?fam=M-type). Multiple alignments of these
protein sequences were conducted using MUSCLE software. The phylogenetic tree was
constructed with a maximum likelihood method using PhyML software and a Whelan and
Goldman (WAG) model, and was then visualized and customized using EvolView
(http://evolgenius.info/). Duplication, anchoring, and expression of MADS-box genes is
shown in Figure S27 and Table S26.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Determination of pomegranate seed hardness
Seed hardness in fruits of ‘Dabenzi’ at 50, 65, 80, 95, 110, 125, and 140 DAP,
‘Baiyushizi’ at 50 DAP, and ‘Tunisi’ at 50 DAP were determined using a TA.AX PLUS
texture analyzer(Food Technology Corporation, Sterling, VA)equipped with a HDP/VB
detector. The parameters were as follows. Pre-test, post-test, and test speeds were each 1
mm/s. The touch force was 5 g and the puncture depth was 2 mm. The value at the maximum
peak was the force at seed break, noted as seed hardness (g) (Fawole and Opara, 2013b). The
average hardness of 30 seeds per fruit represented the seed hardness of the fruit (Table S27).
Statistical analysis of hardness was conducted using Tukey’s test in SPSS (Statistical Package
for the Social Sciences,) V16.0 for Windows for fruits at different development stages.
Identification and analysis of candidate genes potentially involved in cellulose,
hemicellulose, and lignin metabolism
Enzymes that participate in the synthesis and degradation of cellulose, hemicellulose, and
lignin were surveyed in KEGG. All of the genes from the pomegranate genome were mapped
to KEGG pathways (Version 76) using blast. Genes that could be aligned to the KEGG
Orthology entries of relevant pathways using blastx (E-value ≤1e-20 and identity ≥60%)
were considered candidate genes for cellulose, hemicellulose, or lignin metabolism.
Log2-transformed FPKM values were used to represent gene expression as a heat map
(Figure 4 and Table S15).
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Identification and analysis of candidate genes potentially involved in integument
development
We aligned all pomegranate predicted genes to the SwissProt database
(release-2015_04) using blastx with identity ≥40% and reference coverage ≥50% to obtain
functional annotations for each gene. The function of the best hit was assigned to each
homologous pomegranate gene. Finally, we identified 43 candidate genes potentially
involved in integument development in pomegranate. Genes repressing integument
development ares marked with a minus sign to distinguish them from those promoting
integument development. Detailed data regarding gene expression are provided in Table S16
and Log2-transformed FPKM values were used to show genesexpression as a heat map
(Figure 4).
Phyletic evolution and analysis of positive selection in the YABBY gene family
Genes in the pomegranate genome that contain YABBY domains were identified using
the same rules those used for TF identification described above. Sequences of YABBY genes
in Arabidopsis including YAB1 (AFO), YAB2, YAB3, YAB5, CRC, and INO were downloaded
from the TAIR database (http://www.arabidopsis.org/index.jsp). Multiple sequence
alignments with protein sequences were conducted using MUSCLE. The phylogenetic tree
was constructed using PhyML software with a maximum likelihood method and a WAG
model. The phylogenetic tree was then visualized and customized using EvolView
(http://evolgenius.info/) (Figure 5). YAB1 (AFO), YAB2, YAB5, CRC, and INO were
identified in pomegranate genome.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
We downloaded the INO sequences for 15 species belonging to 13 genera from NCBI to
analyze selection pressures on these genes. Firstly, multiple sequence alignments of proteins
were conducted using GUIDANCE2 with the PRANK algorithm. Then, analysis of selection
was performed using the alignments and phylogenetic trees using the CodeML program in
the PAML (V4.8) package. The average sequencing depth of pomegranate INO was about
105×. Upon manual inspection, we determined that the 35th amino acid residue encoded by
the pomegranate INO gene had likely undergone positive selection during its evolutionary
history (Figure 5).
Sequences of four other YABBY genes, YAB1 (AFO), YAB2, YAB5, CRC, and the
regulators of INO, ANT and BEL1, were then predicted in the gene sets of the above 15
species using Genewise. The phylogenetic tree was constructed using PhyML software with a
maximum likelihood method and a WAG model, and was then visualized and
customized using EvolView (http://evolgenius.info/) (Figure S13). Multiple protein sequence
alignments were visualized and customized using ENDscript (Robert and Gouet 2014)
(Figure S14). Analysis of selection was performed using the PAML (V4.8) package and
positive selection was also detected in the predicted proteins encoded by the pomegranate
ANT and BEL1 genes (Figure S15).
Analysis of the terpene synthase gene family
Pomegranate TPS genes were identified on a genome-wide scale using a homology-based
method in (See Supplementary Note). For E. grandis, we downloaded TPS genes from its
genome (Myburg et al., 2014), then filtered and removed redundancy by picking one
representative protein coding gene of each locus. The sequences of grape and Eucalyptus TPS
genes were extracted from their respective published genomic data (Martin et al., 2010;
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Myburg et al., 2014). Sequences of the TPS genes from Arabidopsis were obtained from the
TAIR database. To investigate the phylogenetic relationships among members of the TPS
gene family in pomegranate grape, Eucalyptus, and Arabidopsis, we constructed a
phylogenetic tree as follows.
Multiple alignments of all TPS protein sequences from these species were conducted
using MUSCLE software. The phylogenetic tree was then constructed using PhyML software
with a maximum likelihood method and a WAG model. The phylogenetic tree of TPS
sequences was then visualized and customized using EvolView (Figure S16).
Identification and analysis of genes putatively involved in anthocyanin metabolism and
gene families of their putative transcriptional regulators
We mapped the pomegranate genes to the Anthocyanin biosynthesis pathway
(map00942) in KEGG and aligned them to the KO entries using blastx with E-value ≤1e-20
and identity ≥50%. The bHLH and MYB genes were identified according to same rules as
those used for the prediction of TFs above. WD40-repeat proteins were predicted by
searching the Pfam database with WD40 domains (PF00400).
To study the relationship between anthocyanin content and gene expression, anthocyanin
content and gene expression patterns were analyzed in pomegranate flowers and outer seed
coats at different developmental stages. The total anthocyanin content was determined by
extracting tissues with methanol containing 1% HCl, and measuring absorbence at 530, 620,
and 650 nm by UV spectrophotometer (Puxi Tongyong Apparatus Ltd., Beijing, China).
Anthocyanin content was determined by the formula: OD = (A530 - A620) - 0.1 (A650 -
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
A620) (Sarni-Manchado et al., 2000). Anthocyanin content was expressed in units g-1 fresh
weight (FW), where one unit is defined as a hundredfold change in OD.
We calculated the expression levels of the structural genes encoding later steps in the
anthocyanin pathway, DFR, LDOX, BZ1, and UGT75C, and the regulators bHLH, MYB, and
WD40, in flowers and outer seed coats and represented relative expression levels as
log2-transformed FPKM values in heat maps (Figure 6, Table S17 and S18).
Presumptive punicalagin metabolism pathway
Gallic acid is the common precursor for EA and punicalagin biosynthesis. Gallic acid
biosynthesis has been partially elucidated in microbes and plants, where quinate
dehydrogenase (EC 1.1.1.24), 3-dehydroquinate dehydratase (EC 4.2.1.10), shikimate
dehydrogenase (EC 1.1.1.25), dehydroshikimate dehydrogenase (EC 4.2.1.118),
dehydroxylase (1.17.98.x), and methyltransferase (EC 2.1.1.-) are known to participate in the
pathway. As a derivative of gallic acid and EA, the metabolism of punicalagin might also
involve carboxylesterase (EC 3.1.1.1), glycosyltransferase (EC 2.4.1.x), acyltransferases (EC
2.3.1.x), and glucohydrolase (EC 3.2.1.21), according to the molecular structure of each
substrate and the likely required characteristics of each participating enzyme (Figure 7).
In the human intestine, punicalagins are hydrolyzed to EA under physiological
conditions and EA is then gradually metabolized by the intestinal microbiota into different
types of urolithins (Cerda et al., 2004). This process may include lactone-ring cleavage,
decarboxylation, and dehydroxylation reactions (Selma et al., 2009). Tannin acetylhydrolase
(EC 3.1.1.20), dehydroxylase (EC 1.17.98.x), and gallate decarboxylase (EC 4.1.1.59) likely
participate in the transformation of EA to urolithins (Figure 7).
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Identification and analysis of candidate genes putatively involved in punicalagin
metabolism
To survey for genes encoding enzymes putatively involved in punicalagin metabolism,
the pomegranate gene set were aligned to KO entries using blastx with E-value ≤1e-20 and
identity ≥60%. For candidate enzymes, the function of the best hit was assigned to each
homologous pomegranate gene and their expression was analyzed in the transcriptomes of the
12 tissues mentioned above in Plant materials.
For methyltransferase (EC 2.1.1.-), the functions of the best hits KO15064 and
KO15066 were assigned to each gene, and then genes containing the domains IPR006222,
IPR013977, and IPR027266 were extracted and considered as candidate genes.
For glycosyltransferase (EC 2.4.1.x), 1174 genes obtained by alignment to KO entries
and 22 reference genes encoding A (AtUGT79B1, AtUGT91A1), B (AtUGT89B1), C
(AtUGT90A1), F (AtUGT78D1), G (AtUGT85A1), H (AtUGT76B1),
I (AtUGT83A1), J (AtUGT87A1), K (AtUGT86A1), L (AtUGT75B1, AtUGT84A1,
AtUGT74B1), M (AtUGT92A1), N (AtUGT82A1), O (AtUGT73B1), P (GRMZM5G834303),
Q (GRMZM2G082037, GRMZM2G037545), and R (Pilosella officinarum UGT95A1) group
glycosyltransferases were used to establish a phylogenetic tree. Genes clustered into the L
group were considered as candidate glycosyltransferase genes (Figure S17). For
acyltransferases (EC 2.3.1.x), eight genes containing IPR023213 and IPR003480, the
consensus domains of the V clade acyltransferases, and nine reference genes encoding
members of the I (NtMAT1, Pf3AT), II (CER2, Glossy2), III (SAAT), IV (ACT, SalAT), and V
(NtBEBT, AtHCT) clades of acyltransferases were used to establish a phylogenetic tree.
Genes clustered into clade V were considered as candidate acyltransferase (Figure S18).
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Log2-transformed FPKM values shown as a heat map indicate the gene expressions (Figure 7
and Table S19).
Genes encoding tannin acetylhydrolase (EC3.1.1.20) and dehydroxylase (EC 1.17.98.x)
from Lactobacillus plantarum, Lactobacillus paraplantarum, Clostridium scindens, and
Debaryomyces hansenii were downloaded from NCBI. Coding sequences of the IPR016040,
IPR002347, and IPR020904 71 domains were used to extract dehydroxylase (EC 1.17.98.x)
sequences from the pomegranate genome. Coding sequences of the IPR029058 and
IPR011118 domains were used to extract tannin acetylhydrolase (EC3.1.1.20) sequences
from the pomegranate genome. Multiple alignments of the protein sequences were then
conducted using MUSCLE (Figure S19).
ACCESSION NUMBERS
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under
the accession MTKT00000000. The version described in this paper is version
MTKT01000000. Raw sequence data of the transcriptomes have been deposited into the
Sequence Read Archive (SRA) under accession SRP100581.
ACKNOWLEDGEMENTS
This project is supported by funding from the Agriculture Research System of Anhui
Province (AHNYCYTX-10), and the Key Laboratory of Genetic Improvement and
Ecophysiology of Horticultural crop, Anhui Province.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Authors’ contributions
Y.X., G.Q., R.M., and Z.Y. conceived the project. G.Q., Z.Y., and C.X wrote the
manuscript. R.M. and E.K. revised the manuscript. G.Q., Y.X., R.M., Z.Y., and C.X designed
the analyses. Z.G. and H.P. prepared the samples. C.X., Y.H., and Y.T. contributed to
sequencing, sequence assembly, genome annotation, and genome structure analyses. Y.Q.
contributed to flow cytometry analysis. G.Q., C.X, X,Y., Y.Q., Z.G, X.X., Y.H., and H.P.
contributed to the transcript analysis. H.T and Y.C contributed to the evolutionary analyses.
R.G. contributed to the TE analysis. E.K. contributed to MADS-box analysis. J.J., R.M.,
C.X., G.Q., and Z.Y. contributed to genetic mapping and chromosome anchoring. G.Q.,
R.M., Y.X., and C.X. contributed to the pathway analysis.
CONFLICT OF INTEREST
The authors declare that they have no conflict of interest.
SUPPORTING INFORMATION
Supplementary note. Additional information about detailed methods and results.
Supplementary Figures
Figure S1 Chromosomes in root tip cells of pomegranate (2n = 2x = 18).
Figure S2 Estimation of pomegranate genome size k-mer analysis.
Figure S3 Flow cytometric analysis of DNA content.
Figure S4 The nine linkage groups of the pomegranate genome.
Figure S5 Timing of LTR retrotransposon insertions in pomegranate.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Figure S6 Diagram showing the functional enrichment of GO (Gene Ontology) terms in
pomegranate orphan gene families.
Figure S7 Venn diagram showing orthologous groups shared among pomegranate, grape,
tomato, E. grandis, and A. thaliana.
Figure S8 Distribution of 4dTv distances between duplicated genes of syntenic regions in
pomegranate (green), Eucalyptus (red), and grape (blue).
Figure S9 Dot plots of pairwise syntenic blocks in pomegranate and grape and Eucalyptus.
Figure S10 Syntenic depths in comparisons of the pomegranate genome to the genomes of
grape and Eucalyptus.
Figure S11 The genomic distribution of pomegranate R genes.
Figure S12 Phylogenetic tree of MADS-box genes in pomegranate.
Figure S13 Phylogenetic tree of the YABBY gene family in pomegranate.
Figure S14 Multiple sequence alignment and analysis of positive selection in YABBY genes
in pomegranate.
Figure S15 Multiple sequence alignment of ANT genes and positive selection analysis of
BEL1 genes.
Figure S16 Phylogenetic tree of the TPS gene family in pomegranate.
Figure S17 Phylogenetic tree of glycosyltransferase (EC: 2.4.1.x) genes from pomegranate
and other species.
Figure S18 Phylogenetic tree of acyltransferases (EC: 2.3.1.x) genes from pomegranate and
other plant species.
FigureS19 Multiple sequence alignment dehydroxylase (EC 1.17.98.x) genes between
pomegranate and C. scindens.
Figure S20 Statistics of RNA-Seq support for the final gene set.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Figure S21 Phylogenetic tree constructed with Phase 1 sites of orthologous genes using
MrBayes.
Figure S22 Estimation of divergence time for pomegranate.
Figure S23 Distribution of Ks in the pomegranate gene set.
Figure S24 Enrichment of KEGG pathway functions in pomegranate genes during γ-WGD
events.
Figure S25 Enrichment of KEGG pathway functions in pomegranate genes during
lineage-specific M-WGD events.
Figure S26 Enrichment of KEGG pathway functions in significantly expanded gene families
in the pomegranate genome.
Figure S27 Duplication, anchoring, and expression of MADS-box genes.
Supplementary Tables
Table S1 Statistics for libraries with different insert sizes used in genome assembly.
Table S2 Comparison of the pomegranate genome to other sequenced plant genomes. Table
S3 Evaluation of the genome assembly with unigenes.
Table S4 Statistics regarding the genetic map of the F1 population from the cross between
pomegranate cultivars ‘Dabenzi’ and ‘Baiyushizi’.
Table S5 Comparison of final anchored genetic maps of pomegranate, pear, jujube, and mei.
Table S6 Length statistics of classified transposable elements identified de novo.
Table S7 Statistics regarding gene annotation of pomegranate.
Table S8 Functional annotation of the pomegranate gene set.
Table S9 Gene families clustered using OrthoMCL in 12 species.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Table S10 Significantly enriched KEGG pathways among orphan gene families (Q value <
0.01) in pomegranate.
Table S11 Significantly enriched KEGG pathways among genes in significantly expanded
pomegranate gene families.
Table S12 Resistance (R) genes in pomegranate, Arabidopsis, Eucalyptus, and grape
genome.
Table S13 Protein kinases in the pomegranate, Arabidopsis, Eucalyptus, and grape genomes.
Table S14 TFs in pomegranate, Arabidopsis, Eucalyptus, and grape genomes.
Table S15 Expression of genes involved in hemicellulose, cellulose, and lignin metabolism
in pomegranate.
Table S16 Expression of genes involved in integument development of pomegranate.
Table S17 The expression of genes encoding DFR, LDOX, BZ1 and UGT75C in
pomegranate.
Table S18 Expression of members of the bHLH, MYB, and WD40 gene families in
pomegranate.
Table S19 Expression of candidate genes for punicalagin metabolism in pomegranate.
Table S20 Plant materials used for genome sequencing and RNA-Seq.
Table S21 Statistics of the pomegranate genome assembly.
Table S22 Statistics of repetitive DNA sequences in the pomegranate genome.
Table S23 Statistics of TE content of the assembled pomegranate genome.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Table S24 Significantly enriched KEGG pathway terms among pomegranate genes affected
by two WGD events and one small-scale gene duplication event.
Table S25 Significantly enriched GO (Gene Ontology) terms among pomegranate genes
affected by two WGD events and one small-scale gene duplication event.
Table S26 Expression of pomegranate MADS-box genes in 10 RNA-Seq samples.
Table S27 Hardness of seeds at different developmental stages and among different cultivars.
REFERENCES
Adams, L.S., Zhang, Y., Seeram, N.P., Heber, D. and Chen, S. (2010) Pomegranate ellagitannin-derived
compounds exhibit antiproliferative and antiaromatase activity in breast cancer cells in vitro. Cancer
Prev. Res. (Phila), 3, 108-113.
Angiosperm Phylogeny Group. (2016) An update of the Angiosperm Phylogeny Group classification for
the orders and families of flowering plants: APG IV. Botanical J. Linn. Soc, 181, 1-20.
Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant
Arabidopsis thaliana. Nature, 408, 796-815.
Argout, X., Salse, J., Aury, J.M., et al. (2011) The genome of Theobroma cacao. Nat. Genet., 43, 101-108.
Ashburner, M., Ball, C.A., Blake, J.A., et al. (2000) Gene ontology: tool for the unification of biology.
The Gene Ontology Consortium. Nat. Genet., 25, 25-29.
Attwood, T.K., Beck, M.E., Bleasby, A.J. and Parry-Smith, D.J. (1994) PRINTS--a database of protein
motif fingerprints. Nucleic Acids Res., 22, 3590-3596.
Baird, N.A., Etter, P.D., Atwood, T.S., Currey, M.C., Shiver, A.L., Lewis, Z.A., Selker, E.U., Cresko,
W.A. and Johnson, E.A. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD
markers. PLoS One, 3, e3376.
Batish, D.R., Singh, H.P., Kohli, R.K. and Kaur, S. (2008) Eucalyptus essential oil as a natural pesticide.
Forest Ecol. Manag., 256, 2166-2174.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Benderoth, M., Textor, S., Windsor, A.J., Mitchell-Olds, T., Gershenzon, J. and Kroymann, J. (2006)
Positive selection driving diversification in plant secondary metabolism. Proc. Natl. Acad. Sci. U S A.,
103, 9118-9123.
Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27,
573-580.
Birney, E. and Durbin, R. (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res.,
10, 547-548.
Bretschneider E. (1935). History of European botanical discoveries in China. Hamburg: K.F. Koehlers
Antiquarium, pp. 1-57.
Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S. and Kahn, D. (2005) The ProDom database
of protein domain families: more emphasis on 3D. Nucleic Acids Res., 33, D212-D215.
Cerda, B., Espin, J.C., Parra, S., Martinez, P. and Tomas-Barberan, F.A. (2004) The potent in vitro
antioxidant ellagitannins from pomegranate juice are metabolised into bioavailable but poor antioxidant
hydroxy-6H-dibenzopyran-6-one derivatives by the colonic microflora of healthy humans. Eur. J. Nutr.,
43, 205-220.
Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M. and Robles, M. (2005) Blast2GO: a
universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics,
21, 3674-3676.
Cucinotta, M., Colombo, L. and Roig-Villanova, I. (2014) Ovule development, a new model for lateral
organ formation. Front Plant Sci., 5, 1-12.
D'Auria, J.C. (2006) Acyltransferases in plants: a good time to be BAHD. Curr. Opin. Plant Biol., 9,
331-340.
De Bie, T., Cristianini, N., Demuth, J.P. and Hahn, M.W. (2006) CAFE: a computational tool for the
study of gene family evolution. Bioinformatics, 22, 1269-1271.
Debeaujon, I., Leon-Kloosterziel, K.M. and Koornneef, M. (2000) Influence of the testa on seed
dormancy, germination, and longevity in Arabidopsis. Plant Physiol., 122, 403-414.
Denoeud, F., Carretero-Paulet, L., Dereeper, A., et al. (2014) The coffee genome provides insight into the
convergent evolution of caffeine biosynthesis. Science, 345, 1181-1184.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res., 32, 1792-1797.
Edgar, R.C. and Myers, E.W. (2005) PILER: identification and classification of genomic repeats.
Bioinformatics, 21, i152-i158.
Endress, P.K. (2011a) Angiosperm ovules: diversity, development, evolution. Ann.. Bot., 107, 1465-1489.
Endress, P.K. (2011b) Evolutionary diversification of the flowers in angiosperms. Am. J. Bot.,
98,370-396.
Espin, J.C., Larrosa, M., Garcia-Conesa, M.T. and Tomas-Barberan, F. (2013) Biological significance
of urolithins, the gut microbial ellagic acid-derived metabolites: the evidence so far. Evid. Based
Complement. Alternat. Med., 2013, 270-418.
Fawole, O.A. and Opara, U.L. (2013a) Changes in physical properties, chemical and elemental
composition and antioxidant capacity of pomegranate (cv. Ruby) fruit at five maturity stages. Sci.
Hortic., 150, 37-46.
Fawole, O.A. and Opara, U.L. (2013b) Fruit growth dynamics, respiration rate and physico-textural
properties during pomegranate development and ripening. Sci. Hortic., 157: 90-98.
Fischer U A, Carle R, Kammerer D R. (2011) Identification and quantification of phenolic compounds
from pomegranate (Punica granatum, L.) peel, mesocarp, aril and differently produced juices by
HPLC-DAD–ESI/MSn. Food Chem.,127:807-821.
Flutre, T., Duprat, E., Feuillet, C. and Quesneville, H. (2011) Considering transposable element
diversification in de novo annotation approaches. PLoS One, 6, e16526.
Gershenzon, J. and Dudareva, N. (2007) The function of terpene natural products in the natural world.
Nat. Chem. Biol., 3, 408-414.
Gil, M.I., Tomas-Barberan, F.A., Hess-Pierce, B., Holcroft, D.M. and Kader, A.A. (2000) Antioxidant
activity of pomegranate juice and its relationship with phenolic composition and processing. J. Agric.
Food Chem., 48, 4581-4589.
Goff, S.A., Ricke, D., Lan, T.H., et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp.
japonica). Science, 296, 92-100.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Gonzalez, A., Mendenhall, J., Huo, Y. and Lloyd, A. (2009) TTG1 complex MYBs, MYB5 and TT2,
control outer seed coat differentiation. Dev. Biol., 325, 412-421.
Gonzalez, A., Zhao, M., Leavitt, J.M. and Lloyd, A.M. (2008) Regulation of the anthocyanin biosynthetic
pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J., 53,
814-827.
Goodstein, D.M., Shu, S., Howson, R., et al. (2012) Phytozome: a comparative platform for green plant
genomics. Nucleic Acids Res., 40, D1178-D1186.
Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) Assignment of homology to genome
sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol.
Biol., 313, 903-919.
Grattapaglia, D. and Sederoff, R. (1994) Genetic linkage maps of Eucalyptus grandis and Eucalyptus
urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics, 137, 1121-1137.
Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W. and Gascuel, O. (2010) New
algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of
PhyML 3.0. Syst. Biol., 59, 307-321.
Hasegawa, M., Kishino, H. and Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of
mitochondrial DNA. J. Mol. Evol., 22, 160-174.
Henning, S.M., Zhang, Y., Rontoyanni, V.G., Huang, J., Lee, R.P., Trang, A., Nuernberger, G. and
Heber, D. (2014) Variability in the antioxidant activity of dietary supplements from pomegranate, milk
thistle, green tea, grape seed, goji, and acai: effects of in vitro digestion. J. Agric. Food Chem., 62,
4313-4321.
Holland, D., Hatib, K. and Bar-Ya’akov, I. (2009) Pomegranate botany, horticulture, breeding. Hortic.
Rev. ( Am. Soc. Hortic. Sci.), 35, 127-191.
Hu, W.J., Harding, S.A., Lung, J., Popko, J.L., Ralph, J., Stokke, D.D., Tsai, C.J. and Chiang, V.L.
(1999) Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees.
Nat. Biotechnol, 17, 808-812.
Huang, S., Li, R., Zhang, Z., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nat. Genet.,
41, 1275-1281.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Huelsenbeck, J.P. and Ronquist, F. (2001) MRBAYES: Bayesian inference of phylogenetic trees.
Bioinformatics, 17, 754-755.
Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M.
and Sigrist, C.J. (2006) The PROSITE database. Nucleic Acids Res., 34, D227-D230.
Jaillon, O., Aury, J.M., Noel, B., et al. (2007) The grapevine genome sequence suggests ancestral
hexaploidization in major angiosperm phyla. Nature, 449, 463-467.
Janick, J. (2007) Fruits of the Bibles. HortScience, 42, 1072-1076.
Jiao, Y., Leebens-Mack, J., Ayyampalayam, S., et al. (2012) A genome triplication associated with early
diversification of the core eudicots. Genome Biol., 13, 1-14.
Jin, J., Zhang, H., Kong, L., Gao, G. and Luo, J. (2014) PlantTFDB 3.0: a portal for the functional and
evolutionary study of plant transcription factors. Nucleic Acids Res., 42, 1182-1187.
Johanningsmeier, S.D. and Harris, G.K. (2011) Pomegranate as a functional food and nutraceutical
source. Annu. Rev. Food Sci. Technol., 2, 181-201.
Jones, P., Binns, D., Chang, H.Y., et al. (2014) InterProScan 5: genome-scale protein function
classification. Bioinformatics, 30, 1236-1240.
Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O. and Walichiewicz, J. (2005)
Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res., 110, 462-467.
Kanehisa, M., Araki, M., Goto, S., et al. (2008) KEGG for linking genomes to life and the environment.
Nucleic Acids Res., 36, D480-D484.
Kanehisa, M. and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res.,
28, 27-30.
Kasimsetty, S.G., Bialonska, D., Reddy, M.K., Ma, G., Khan, S.I. and Ferreira, D. (2010) Colon cancer
chemopreventive activities of pomegranate ellagitannins and urolithins. J. Agric. Food Chem., 58,
2180-2187.
Kent, W.J. (2002) BLAT--the BLAST-like alignment tool. Genome Res., 12, 656-664.
Kielbasa, S.M., Wan, R., Sato, K., Horton, P. and Frith, M.C. (2011) Adaptive seeds tame genomic
sequence comparison. Genome Res., 21, 487-493.
Kim, N.D., Mehta, R., Yu, W., et al. (2002) Chemopreventive and adjuvant therapeutic potential of
pomegranate (Punica granatum) for human breast cancer. Breast Cancer Res. Treat., 71, 203-217.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Kirshinbaum, L.M. (2012) Identification of aroma-active compounds in ‘Wonderful’ pomegranate fruit
using solvent-assisted flavour evaporation and headspace solid-phase micro-extraction methods. Eur.
Food Res.Technol., 235,277-283.
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S.L.
(2004) Versatile and open software for comparing large genomes. Genome Biol., 5, R12.
Lam, S.D., Dawson, N.L., Das, S., Sillitoe, I., Ashford, P., Lee, D., Lehtinen, S., Orengo, C.A. and Lees,
J.G. (2016) Gene3D: expanding the utility of domain assignments. Nucleic Acids Res., 44, D404-409.
Landete, J.M. (2011) Ellagitannins, ellagic acid and their derived metabolites: A review about source,
metabolism, functions and health. Food Res. Int., 44, 1150–1160.
Langley, P. (2000) Why a pomegranate? BMJ, 321, 1153-1154.
Larkin, M.A., Blackshields, G., Brown, N.P., et al. (2007) Clustal W and Clustal X version 2.0.
Bioinformatics, 23, 2947-2948.
Lehti-Shiu, M.D. and Shiu, S.H. (2012) Diversity, classification and function of the plant protein kinase
superfamily. Philos. Trans. R. Soc. Lond. B Biol. Sci., 367, 2619-2639.
Levin, G.M. (2006) Pomegranate Roads: A Soviet Botanist's Exile from Eden. Forestville, CA: Floreat
Press, pp.37-38.
Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003) OrthoMCL: identification of ortholog groups for
eukaryotic genomes. Genome Res., 13, 2178-2189.
Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K. and Wang, J. (2009) SOAP2: an improved
ultrafast tool for short read alignment. Bioinformatics, 25, 1966-1967.
Li, S. (1998) Compendium of materia medica essence. Beijing: Science Press, pp.416-417.
Li, Y., Zheng, L., Corke, F., Smith, C. and Bevan, M.W. (2008) Control of final seed and organ size by
the DA1 gene family in Arabidopsis thaliana. Genes. Dev., 22, 1331-1336.
Lim, E.K., Doucet, C.J., Li, Y., Elias, L., Worrall, D., Spencer, S.P., Ross, J. and Bowles, D.J. (2002)
The activity of Arabidopsis glycosyltransferases toward salicylic acid, 4-hydroxybenzoic acid, and other
benzoates. J. Biol. Chem., 277, 586-592.
Lisch, D. (2013) How important are transposons for plant evolution? Nat Rev Genet, 14, 49-61.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Liu, M.J., Zhao, J., Cai, Q.L., et al. (2014) The complex jujube genome provides insights into fruit tree
biology. Nat. Commun., 5, 5315.
Longtin, R. (2003) The pomegranate: nature's power fruit? J Natl Cancer Inst, 95, 346-348.
Luo, R., Liu, B., Xie, Y., et al. (2012) SOAPdenovo2: an empirically improved memory-efficient
short-read de novo assembler. Gigascience, 1, 18.
Magallon, S., Gomez-Acevedo, S., Sanchez-Reyes, L.L. and Hernandez-Hernandez, T. (2015) A
metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytol.,
207, 437-453.
Majoros, W.H., Pertea, M. and Salzberg, S.L. (2004) TigrScan and GlimmerHMM: two open source ab
initio eukaryotic gene-finders. Bioinformatics, 20, 2878-2879.
Martin, D.M., Aubourg, S., Schouwey, M.B., Daviet, L., Schalk, M., Toub, O., Lund, S.T. and
Bohlmann, J. (2010) Functional annotation, genome organization and phylogeny of the grapevine (Vitis
vinifera) terpene synthase gene family based on genome assembly, FLcDNA cloning, and enzyme
assays. BMC Plant Biol., 10, 226.
Martin, W., Deusch, O., Stawski, N., Grunheit, N. and Goremykin, V. (2005) Chloroplast genome
phylogenetics: why we need independent approaches to plant molecular evolution. Trends Plant Sci., 10,
203-209.
Masiero, S., Colombo, L., Grini, P.E., Schnittger, A. and Kater, M.M. (2011) The emerging importance
of type I MADS box transcription factors for plant reproduction. Plant Cell, 23, 865-872.
McAbee, J.M., Kuzoff, R.K. and Gasser, C.S. (2005) Mechanisms of derived unitegmy among Impatiens
species. Plant Cell, 17, 1674-1684.
McCarthy, E.M. and McDonald, J.F. (2003) LTR_STRUC: a novel search and identification program for
LTR retrotransposons. Bioinformatics, 19, 362-367.
McColl, R. W. (1991). China's Silk Roads: A modern journey to China's western regions. Focus, 41, 1-6.
Mcdonnell, A.V., Jiang, T., Keating, A.E. and Berger, B. (2006) Paircoil2: improved
prediction of coiled coils from sequence. Bioinformatics, 22, 356-358.
Meister, R.J., Oldenhof, H., Bowman, J.L. and Gasser, C.S. (2005) Multiple protein regions contribute to
differential activities of YABBY proteins in reproductive development. Plant Physiol., 137, 651-662.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Meister, R.J., Williams, L.A., Monfared, M.M., Gallagher, T.L., Kraft, E.A., Nelson, C.G. and
Gasser, C.S. (2004) Definition and interactions of a positive regulatory element of the Arabidopsis
INNER NO OUTER promoter. Plant J., 37, 426-438.
Melgarejo, P., Calin-Sanchez, A., Vazquez-Araujo, L., Hernandez, F., Martinez, J.J., Legua, P. and
Carbonell-Barrachina, A.A. (2011) Volatile composition of pomegranates from 9 Spanish cultivars
using headspace solid phase microextraction. J. Food Sci., 76, S114-120.
Ming, R., Hou, S., Feng, Y., et al. (2008) The draft genome of the transgenic tropical fruit tree papaya
(Carica papaya Linnaeus). Nature, 452, 991-996.
Mishra, N.C. and Gold, B. (2006) Synthesis of 13C- and 14C‐labeled ellagic acid. J. Labelled Comp., 28,
927-941.
Mistry, J. and Finn, R. (2007) Pfam: a domain-centric method for analyzing proteins and proteomes.
Methods Mol. Biol., 396, 43-58.
Morton, J.F. and Dowling, C.F. (1987) Fruits of warm climates. Miami, Florida: Florida Flair Books, pp.
352-355.
Mount, D.W. (2007) Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc, 2007, pdb
top17.
Murray, M.G. and Thompson, W.F. (1980) Rapid isolation of high molecular weight plant DNA. Nucleic
Acids Res., 8, 4321-4325.
Myburg, A.A., Grattapaglia, D., Tuskan, G.A., et al. (2014) The genome of Eucalyptus grandis. Nature,
510, 356-362.
Nallely N.j., Paulina N., Sandra M.p., Francisca H., Ángel A. C.B, Aneta W. (2015) Identification and
quantification of major derivatives of ellagic acid and antioxidant properties of thinning and ripe
Spanish pomegranates. J. Funct. Foods, 12,354-364.
Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of synonymous and
nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3, 418-426.
Ossipov, V., Ossipova, S., Salminen, J.-P., Haukioja, E. and Pihlaja, K. (2003) Gallic acid and
hydrolysable tannins are formed in birch leaves from an intermediate compound of the shikimate
pathway. Biochem. Syst. Ecol., 31, 3-16.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Parashar, A. (2011) Pomegranate juice is potentially better than orange juice in improving antioxidant
function in elderly subjects. Pharma Res., 1, 14-23.
Parenicova, L., de Folter, S., Kieffer, M., et al. (2003) Molecular and phylogenetic analyses of the
complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world.
Plant Cell, 15, 1538-1551.
Pedruzzi, I., Rivoire, C., Auchincloss, A.H., et al. (2015) HAMAP in 2015: updates to the protein family
classification and annotation system. Nucleic Acids Res., 43, D1064-1070.
Pelletier, M.K., Burbulis, I.E. and Winkel-Shirley, B. (1999) Disruption of specific flavonoid genes
enhances the accumulation of flavonoid enzymes and end-products in Arabidopsis seedlings. Plant Mol.
Biol., 40, 45-54.
Perez-Rodriguez, P., Riano-Pachon, D.M., Correa, L.G., Rensing, S.A., Kersten, B. and
Mueller-Roeber, B. (2010) PlnTFDB: updated content and new features of the plant transcription factor
database. Nucleic Acids Res., 38, D822-827.
Perez, J., Munoz-Dorado, J., de la Rubia, T. and Martinez, J. (2002) Biodegradation and biological
treatments of cellulose, hemicellulose and lignin: an overview. Int. Microbiol., 5, 53-63.
Prasad, K. and Ambrose, B.A. (2010) Shaping up the fruit: control of fruit size by an Arabidopsis B-sister
MADS-box gene. Plant Signal. Behav., 5, 899-902.
Prasad, K., Zhang, X., Tobon, E. and Ambrose, B.A. (2010) The Arabidopsis B-sister MADS-box
protein, GORDITA, represses fruit growth and contributes to integument development. Plant J., 62,
203-214.
Price, A.L., Jones, N.C. and Pevzner, P.A. (2005) De novo identification of repeat families in large
genomes. Bioinformatics, 21 Suppl 1, i351-i358.
Quesneville, H., Bergman, C.M., Andrieu, O., Autard, D., Nouaud, D., Ashburner, M. and
Anxolabehere, D. (2005) Combined evidence annotation of transposable elements in genome
sequences. PLoS Comput. Biol., 1, 166-175.
Robert, X. and Gouet, P. (2014) Deciphering key features in protein structures with the new ENDscript
server. Nucleic Acids Res., 42, W320-324.
Ryu, D., Mouchiroud, L., Andreux, P.A., et al. (2016) Urolithin A induces mitophagy and prolongs
lifespan in C. elegans and increases muscle function in rodents. Nat. Med., 22, 879-888.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Sanchez-Lamar, A., Fonseca, G., Fuentes, J.L., et al. (2008) Assessment of the genotoxic risk of Punica
granatum L. (Punicaceae) whole fruit extracts. J. Ethnopharmacol., 115, 416-422.
SanMiguel, P., Gaut, B.S., Tikhonov, A., Nakajima, Y. and Bennetzen, J.L. (1998) The paleontology of
intergene retrotransposons of maize. Nat. Genet., 20, 43-45.
Sanseverino, W., Roma, G., De Simone, M., Faino, L., Melito, S., Stupka, E., Frusciante, L. and
Ercolano, M.R. (2010) PRGdb: a bioinformatics platform for plant resistance gene analysis. Nucleic
Acids Res., 38, D814-D821.
Sarni-Manchado, P., Le Roux, E., Le Guerneve, C., Lozano, Y. and Cheynier, V. (2000) Phenolic
composition of litchi fruit pericarp. J. Agric. Food Chem., 48, 5995-6002.
Schultz, J., Milpetz, F., Bork, P. and Ponting, C.P. (1998) SMART, a simple modular architecture
research tool: identification of signaling domains. Proc. Natl. Acad. Sci. U S A, 95, 5857-5864.
Seeram, N.P., Lee, R. and Heber, D. (2004) Bioavailability of ellagic acid in human plasma after
consumption of ellagitannins from pomegranate (Punica granatum L.) juice. Clin. Chim. Acta, 348,
63-68.
Seeram, N.P., Schulman, R.N. and Heber, D. (2006) Pomegranates ancient roots to modern medicine.
Boca Raton, Florial: CRC Press, pp. 63-183.
Selma, M.V., Beltran, D., Garcia-Villalba, R., Espin, J.C. and Tomas-Barberan, F.A. (2014)
Description of urolithin production capacity from ellagic acid of two human intestinal Gordonibacter
species. Food Funct., 5, 1779-1784.
Selma, M.V., Espin, J.C. and Tomas-Barberan, F.A. (2009) Interaction between phenolics and gut
microbiota: role in human health. J. Agric. Food Chem., 57, 6485-6501.
Sieber, P., Gheyselinck, J., Gross-Hardt, R., Laux, T., Grossniklaus, U. and Schneitz, K. (2004) Pattern
formation during early ovule development in Arabidopsis thaliana. Dev. Biol., 273, 321-334.
Siegfried, K.R., Eshed, Y., Baum, S.F., Otsuga, D., Drews, G.N. and Bowman, J.L. (1999) Members of
the YABBY gene family specify abaxial cell fate in Arabidopsis. Development, 126, 4117-4128.
Singh, S.A. and Christendat, D. (2006) Structure of Arabidopsis dehydroquinate dehydratase-shikimate
dehydrogenase and implications for metabolic channeling in the shikimate pathway. Biochem., 45,
7787-7796.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Skinner, D.J., Hill, T.A. and Gasser, C.S. (2004) Regulation of ovule development. Plant Cell, 16 Suppl,
S32-S45.
Smaczniak, C., Immink, R.G., Angenent, G.C. and Kaufmann, K. (2012) Developmental and
evolutionary diversity of plant MADS-domain factors: insights from recent studies. Development, 139,
3081-3098.
Soltis D. E., Smith S. A., Cellinese N., et al. (2011) Angiosperm phylogeny: 17 genes, 640 taxa. Am. J.
Bot., 98, 704-730.
Sreekumar, S., Sithul, H., Muraleedharan, P., Azeez, J.M. and Sreeharshan, S. (2014) Pomegranate
fruit as a rich source of biologically active compounds. Biomed. Res. Int., 2014, 1-12.
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S. and Morgenstern, B. (2006) AUGUSTUS: ab
initio prediction of alternative transcripts. Nucleic Acids Res., 34, W435-W439.
Suyama, M., Torrents, D. and Bork, P. (2006) PAL2NAL: robust conversion of protein sequence
alignments into the corresponding codon alignments. Nucleic Acids Res., 34, W609-W612.
Tarailo-Graovac, M. and Chen, N. (2009) Using RepeatMasker to identify repetitive elements in genomic
sequences. Curr. Protoc. Bioinformatics, 4.10, 1-14.
The Tomato Genome Consortium. (2012) The tomato genome sequence provides insights into fleshy fruit
evolution. Nature, 485, 635-641.
Thouet, J., Quinet, M., Lutts, S., Kinet, J.M. and Perilleux, C. (2012) Repression of floral meristem fate
is crucial in shaping tomato inflorescence. PLoS One, 7, e31096.
Toriba, T., Harada, K., Takamura, A., Nakamura, H., Ichikawa, H., Suzaki, T. and Hirano, H.Y.
(2007) Molecular characterization the YABBY gene family in Oryza sativa and expression analysis of
OsYABBY1. Mol. Genet. Genomics, 277, 457-468.
Trapnell, C., Pachter, L. and Salzberg, S.L. (2009) TopHat: discovering splice junctions with RNA-Seq.
Bioinformatics, 25, 1105-1111.
Tuskan, G.A., Difazio, S., Jansson, S., et al. (2006) The genome of black cottonwood, Populus
trichocarpa (Torr. & Gray). Science, 313, 1596-1604.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Tzulker, R., Glazer, I., Bar-Ilan, I., Holland, D., Aviram, M. and Amir, R. (2007) Antioxidant activity,
polyphenol content, and related compounds in different fruit juices and homogenates prepared from 29
different pomegranate accessions. J. Agric. Food Chem., 55, 9559-9570.
Ulrike A. Fischer, R.C., and Dietmar R. K. (2011) Identification and quantification of phenolic
compounds from pomegranate (Punica granatum L.) peel, mesocarp, aril and differently
produced juices by HPLC-DAD–ESI/MSn. Food Chem., 127, 807-821.
Usta, C., Ozdemir, S., Schiariti, M. and Puddu, P.E. (2013) The pharmacological use of ellagic acid-rich
pomegranate fruit. Int. J. Food Sci. Nutr., 64, 907-913.
Velasco, R., Zharkikh, A., Affourtit, J., et al. (2010) The genome of the domesticated apple (Malus x
domestica Borkh.). Nat. Genet., 42, 833-839.
Verde, I., Abbott, A.G., Scalabrin, S., et al. (2013) The high-quality draft genome of peach (Prunus
persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet.,
45, 487-494.
Villanueva, J.M., Broadhvest, J., Hauser, B.A., Meister, R.J., Schneitz, K. and Gasser, C.S. (1999)
INNER NO OUTER regulates abaxial- adaxial patterning in Arabidopsis ovules. Genes Dev., 13,
3160-3169.
Viuda-Martos, M. Femández-López, J. Pérez-Álvarez,J.A. (2010) Pomegranate and its many functional
components as related to human health: a review. Compr Rev. Food Sci. Food Saf., 9, 635-654.
Wikstrom, N., Savolainen, V. and Chase, M.W. (2001) Evolution of the angiosperms: calibrating the
family tree. Proc. Biol. Sci., 268, 2211-2220.
Wink, M. (2003) Evolution of secondary metabolites from an ecological and molecular phylogenetic
perspective. Phytochem., 64:3-19.
Winkel-Shirley, B. (2001) Flavonoid biosynthesis. a colorful model for genetics, biochemistry, cell
biology, and biotechnology. Plant Physiol., 126, 485-493.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
Xia, T., Li, N., Dumenil, J., Li, J., Kamenski, A., Bevan, M.W., Gao, F. and Li, Y. (2013) The ubiquitin
receptor DA1 interacts with the E3 ubiquitin ligase DA2 to regulate seed and organ size in Arabidopsis.
Plant Cell, 25, 3347-3359.
Xu, Z. and Wang, H. (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR
retrotransposons. Nucleic Acids Res., 35, W265-W268.
Yan, W., Chen, D. and Kaufmann, K. (2016) Molecular mechanisms of floral organ specification by
MADS domain proteins. Curr. Opin. Plant Biol., 29, 154-162.
Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol., 24, 1586-1591.
Zhao, X., Yuan, Z., Feng, L. and Fang, Y. (2015) Cloning and expression of anthocyanin biosynthetic
genes in red and white pomegranate. J. Plant Res., 128, 687-696.
FIGURE LEGENDS
Figure 1 Characterization of the pomegranate genome. The innermost circle represents
ideograms of the nine pomegranate pseudo-chromosomes and the synteny relationship of
gene blocks between pseudo-chromosomes. All density information from the inner to the
outer circle was counted in non-overlapping 200-kb windows. a, LINE repeat; b, Gypsy
repeat; c, DNA repeat; d, Copia repeat; e, unknown repeat; f, exon; g, GC content.
Figure 2 Phylogenetic tree and expansion and contraction of gene families. The phylogenetic
tree was constructed from a concatenated alignment of 509 families of single-copy genes
from 12 higher plant species. MRCA is the most recent common ancestor. Gene family
expansions are indicated in red, and gene family contractions are indicated in green; the
corresponding proportions among total changes are shown using the same colors in the pie
charts. Blue portions of the pie charts represent the conserved gene families. Inferred
divergence dates (in millions of years) are indicated in black at each node. Circles in gray
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
represent WGD events (γ and M). The numbers of gene families, the numbers of genes within
each gene family, and the numbers of families of unique genes by species are shown on the
right.
Figure 3 Collinearity patterns between grape, peach, pomegranate, and Eucalyptus. (a)
Typical macro-collinearity patterns between genomic regions from grape, peach,
pomegranate, and Eucalyptus. The macro-colinearity pattern shows that a typical ancestral
region in the grape genome can be traced to one region in peach and up to two regions each
in pomegranate and Eucalyptus, highlighting the inferred Myrtales WGD. Grey wedges in the
background indicate syntenic blocks spanning more than 30 genes between the genomes; one
of them is highlighted in red. (b) Micro-collinearity pattern between grape, peach,
pomegranate, and Eucalyptus. Rectangles represent predicted gene models with colors
showing relative orientations (blue: same strand, green: opposite strand). Matching gene pairs
are displayed as connected boxes. Gray wedges connect matching gene pairs; one of them is
highlighted in red.
Figure 4 Expression patterns of genes involved in seed coat development. (a) Candidate
genes potentially involved in lignin, cellulose, and hemicellulose synthesis; (b) Candidate
genes potentially involved in lignin, cellulose, and hemicellulose degradation. RNAs isolated
from inner and outer seed coats of ‘Dabenzi’ at 50, 95, and 140 DAP, inner seed coat of
‘Baiyushizi’ and ‘Tunisi’ at 50 DAP designated as DB-ISC50, DB-ISC95, DB-ISC140,
DB-OSC50, DB-OSC95, DB-OSC140, BY-ISC50, and T-ISC50, respectively, were
sequenced to survey expression of genes potentially involved in lignin, cellulose, and
hemicellulose synthesis and degradation in pomegranate. Enzyme numbers (EC) correspond
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
to the following enzymes: EC 4.3.1.24, phenylalanine ammonia-lyase; EC 1.14.13.11,
trans-cinnamate 4-monooxygenase; EC 6.2.1.12, 4-coumarate--CoA ligase; EC 1.2.1.44,
Cinnamoyl-CoA reductase; EC 1.1.1.195, Cinnamyl-alcohol dehydrogenase; EC 1.11.1.7,
Peroxidase; EC 1.14.13.36, p-coumarate 3-hydroxylase; EC 2.1.1.68, Caffeic acid
3-O-methyltransferase; EC 1.14.-.-, ferulate-5-hydroxylase; EC 2.4.1.12, Cellulose synthase
(CesA,UDP-forming); EC2.4.1.32, beta-mannan synthase; EC 2.4.2.24, 1,4-beta-D-xylan
synthase; EC 1.10.3.2, laccase; EC 2.4.1.43, polygalacturonate
4-alpha-galacturonosyltransferase; EC 2.4.2.39, xyloglucan 6-xylosyltransferase; EC 3.2.1.4,
endoglucanase; EC 3.2.1.24, alpha-mannosidase; EC 3.2.1.6, endo-1,3-beta-D-glucanase; EC
3.2.1.21, beta-glucosidase/beta-xylosidase; EC 3.2.1.55, alpha-N-arabinofuranosidase; EC
3.2.1.78, endo-1,4-beta-mannosidase; EC 3.2.1.114, alpha-mannosidase II; EC 3.2.1.113,
mannosidase 1A; EC 3.2.1.37, xylan 1,4-beta-xylosidase/beta-D-xylosidase; EC 3.2.1.8,
Endo-1,4-beta-xylanase; EC 3.2.1.177, alpha-xylosidase; EC 2.4.1.207, xyloglucan
endotransglucosylase. Detailed gene expression data are provided in Table S15. (c) Candidate
genes potentially involved in integument development. RNAs isolated from flower, inner and
outer seed coats at 50, 95, and 140 DAP of ‘Dabenzi’ designated as DB-F, DB-ISC50,
DB-ISC95, DB-ISC140, DB-OSC50, DB-OSC95, and DB-OSC140, respectively, were used
to survey the expression patterns of genes potentially involved in inner and outer integument
development. The expression of genes that could function to promote development is
indicated in red, and the expression of genes that could function to repress integument
development is indicated in blue. Detailed gene expression data for genes potentially
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
involved in integument development are provided in Table S16. Gene expression is
represented as Log2FPKM (fragments per kilobase of exon per million fragments mapped).
Figure 5 Phylogenetic tree of the YABBY gene family and analysis of selection evolution on
INO genes. (a) Phylogenetic tree of the YABBY gene family. The tree was constructed using
the maximum likelihood method and 1000 bootstrap iterations. Bootstrap values supported by
≥50% are shown. Red indicates gene family members from pomegranate cultivar ‘Dabenzi’;
blue indicates gene family members from A. thaliana. (b) Analysis of selection evolution on
INO genes. The site that was subject to selection is indicated in yellow at the 35th amino acid
residue. The sequencing depth for each amino acid is shown in bp above the corresponding
amino acid site. The green arrow is the initial position of the domain. The 16 species used for
this analysis include: Ath, Arabidopsis thaliana; Bna, Brassica napus; Bol, Brassica
oleracea; Bra, Brassica rapa; Car, Cicer arietinum; Cme, Cucumis melo; Egr, Eucalyptus
grandis; Fve, Fragaria vesca; Gma, Glycine max; Mdo, Malus domestica; Pgr, Punica
granatum ‘Dabenzi’; Pmu, Prunus mume; Rco, Ricinus communis; Sin, Sesamum indicum;
Vvi, Vitis vinifera; and Zju, Ziziphus jujuba.
Figure 6 Schematic of pomegranate anthocyanin metabolism and expression profiles of
candidate genes involved in anthocyanin metabolism. RNAs isolated from flowers and outer
seed coats at 50, 95, and 140 DAP of ‘Dabenzi’ indicated as DB-F, DB-OSC50, DB-OSC95,
and DB-OSC140, respectively, were used to survey the expression patterns of genes
including transcription factors that are likely involved in late steps of anthocyanin
metabolism. The numbers in boxes below the photographs of fruit denote total anthocyanin
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
content (Unit·g-1 FW) in flowers and outer seed coats. Detailed expression data for genes
related to anthocyanin metabolism are provided in Tables S17 and 18.
Figure 7 Schematic of proposed punicalagin metabolism and expression patterns of candidate
genes encoding each enzyme. Solid black arrows indicate reaction pathways according to
KEGG; dotted black arrows indicate reaction pathways presumed according to the structures
of compounds. Arrows in black indicate that the reaction occurs only in plant cells; arrows in
red indicate that the reaction occurs only in the mammalian intestine; arrows in green indicate
that the reaction occurs in either plants or intestine. Enzyme numbers in black indicate that
there are genes in pomegranate that encode that enzyme; enzyme numbers in grey indicate
that there is no gene encoding that enzyme in pomegranate. Enzyme numbers (EC)
correspond to the following enzymes: EC 1.1.1.25, shikimate dehydrogenase; EC 1.1.1.282,
quinate/shikimate dehydrogenase; EC 1.1.1.24, quinate dehydrogenase; EC 4.2.1.10,
3-dehydroquinate dehydratase; EC 4.2.1.118, dehydroshikimate dehydrogenase; EC 2.1.1.-,
syringate/3-O-methylgallate O-demethylase; EC ? , an enzyme that catalyzes formation of a
dimeric derivative from two gallic acid molecules or from gallic acid and ellagic acid; EC
3.1.1.1, carboxylesterase; EC 2.4.1.x, glycosyltransferase; EC 2.3.1.x, O-galloyltransferase;
EC 3.1.1.21, β-D-glucoside glucohydrolase; EC 3.1.1.20, tannin acetylhydrolase;
EC 4.1.1.59, gallate decarboxylase; and EC 1.17.98.x, dehydroxylase. Samples along the
horizontal axes of heatmaps from left to right are DB-R, DB-L, DB-F, DB-ISC50,
DB-ISC95, DB-ISC140, DB-OSC50, DB-OSC95, DB-OSC140, DB-P50, DB-P95, and
DB-P140, represent root, leaf, flower, and inner and outer seed coats and pericarp at 50, 95,
and 140 DAP of ‘Dabenzi’. Detailed expression data for genes related to punicalagin
metabolism are provided in Table S19.
Acc
epte
d A
rtic
le
Figure 1 Characterization of the pome
egranate genome.
Acc
epte
d A
rtic
le
Figure 2 Phylogenetic tree and expan
nsion and contraction of gene families.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
(a)
(b)
Figure 3 Collinearity patterns between grape, peach, pomegranate and eucalyptus.
Acc
epte
d A
rtic
le(a)
(b)
Acc
epte
d A
rtic
le
(c)
Figure 4 Expression patterns of genes
s involved in seed coat development.
Acc
epte
d A
rtic
le
This article is protected by copyright. All rights reserved.
(a)
(b)
Figure 5 Phylogenetic tree of the YABBY gene family and selection analysis of INO genes.
Acc
epte
d A
rtic
le
Figure 6 Schematic of pomegranate a
genes involved in anthocyanin metabo
anthocyanin metabolism and expression profiles of
olism.
f
Acc
epte
d A
rtic
le
Figure 7 Schematic of proposed punic
encoding each enzyme.
calagin metabolism and expression patterns of gen
nes