The high quality draft genome of peach (Prunus persica ... · 1 The high quality draft genome of...
Transcript of The high quality draft genome of peach (Prunus persica ... · 1 The high quality draft genome of...
1
The high quality draft genome of peach (Prunus persica) identifies
unique patterns of genetic diversity, domestication and genome evolution
The International Peach Genome Initiative
Principal Investigators: Ignazio Verde
1, Albert G Abbott
2, 3, Francesco Salamini
4, 5, Jeremy
Schmutz6, 7
, Bryon Sosinski8, Michele Morgante
9, 10, Daniel S Rokhsar
6, 11.
Sequencing and assembly: Jeremy Schmutz6, 7
(Leader), Daniel S Rokhsar6, 11
, Michele Morgante9, 10
,
Susan Lucas6, Federica Cattonaro
9, Jane Grimwood
6, 7, Alberto Policriti
9, 12, Tatyana Zhebentyayeva
2,
Jerry Jenkins 6, 7
.
Linkage mapping, sequence anchoring and map integration: Ignazio Verde1 (Leader), Jeremy
Schmutz6, 7
, Maria Teresa Dettori1, Pere Arús
13, Stefano Tartarini
14, Albert G Abbott
2, 3, Tatyana
Zhebentyayeva2, Elisa Vendramin
1, Jerry Jenkins
6, 7, Valeria Aramini
1, Luca Dondini
14, Stephen
Ficklin15
.
Repeats analyses: Simone Scalabrin9 (Leader), Andrea Zuccolo
9, 20, Michele Morgante
9, 10, Veronique
Decroocq3, Albert G Abbott
2, 3, Pengfei Xuan
16, Cristian Del Fabbro
9, Tatyana Zhebentyayeva
2, Dario
Copetti9.
Gene annotation Shengqiang Shu6 (Leader), Daniel S Rokhsar
6, 11, Dorrie Main
15, David M
Goodstein6, Simon Prochnik
6.
Manual annotation, gene family characterization and small RNA analysis: Albert G Abbott2, 3
(Leader), Francesco Salamini4, 5
, Herman Silva17
, Giannina Vizzotto10
, Christina Wells18
, Ariel
Orellana19
, Pietro Tonutti20
, David S Horner21
, Dorrie Main15
, Ignazio Verde1, Federica Cattonaro
9,
Tatyana Zhebentyayeva2, Lee A Meisel
22, 23, Guido Cipriani
1, 10, Laura Rossini
4, 24, Abdelali Barakat
2,
Elisa Vendramin1, Simone Scalabrin
9, Alessandra Stella
4, 25, Cristian Del Fabbro
9, Susana Gonzalez
19,
Rachele Falchi10
, Jonathan Maldonado17
, Erica Mica20
, Stephen Ficklin15
, Barbara Lazzari4, Douglas
Bielenberg18
, Raul Pirona4.
Resequencing and Prunus diversity analysis: Michele Morgante9, 10
(Leader), Ignazio Verde1,
Francesco Salamini4, 5
, Raffaele Testolin9, 10
, Fabio Marroni9, 10
, Simone Scalabrin9, Federica
Cattonaro9, Maria Teresa Dettori
1, Elisa Vendramin
1, Laura Rossini
4, 23, Mara Miculan
9.
Comparative analysis and peach genome evolution: Sook Jung 14
(Leader), Dorrie Main15
, Daniel S
Rokhsar6, 11
, Francesco Salamini 4, 5
, Therese Mitros26
.
1) Consiglio per la Ricerca e la Sperimentazione in Agricoltura (CRA) – Centro di Ricerca per
la Frutticoltura, Rome 00134, Italy.
2) Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina
29634, USA.
3) Institut National de la Recherche Agronomique (INRA), Université de Bordeaux, UMR
1332 BFP, BP81, Villenave d’Ornon cedex 33883, France.
4) Parco Tecnologico Padano (PTP), Lodi 26900, Italy.
5) Istituto Agrario San Michele all’Adige (IASMA), Research and Innovation Centre,
Fondazione Edmund Mach, S. Michele all’Adige, Trento 38010, Italy.
6) US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA.
7) HudsonAlpha Institute of Biotechnology, Huntsville, Alabama 35806, USA.
Nature Genetics: doi:10.1038/ng.2586
2
8) Department of Horticultural Science, North Carolina State University, Raleigh, North
Carolina 27695, USA.
9) Istituto di Genomica Applicata (IGA), Udine 33100, Italy.
10) Dipartimento di Scienze Agrarie e Ambientali, University of Udine, Udine 33100, Italy.
11) Center for Integrative Genomics University of California, Berkeley California 94720 USA.
12) Dipartimento di Matematica e Informatica, University of Udine, Udine 33100, Italy.
13) Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Centre de Recerca en
Agrigenòmica CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra (Cerdanyola del Vallès),
Barcelona 08193, Spain.
14) Department of Fruit Tree and Woody Plant Sciences, University of Bologna, Bologna
40127, Italy.
15) Department of Horticulture & Landscape Architecture, Washington State University,
Pullman, Washington 99163, USA.
16) School of Computing, Clemson University, Clemson South Carolina 29634, USA.
17) Laboratorio de Genómica Funcional & Bioinformática, Facultad de Ciencias Agronómicas,
Universidad de Chile, 8820808 La Pintana, Chile.
18) School of Agricultural, Forest and Environmental Sciences, Clemson University Clemson,
South Carolina 29634, USA.
19) FONDAP-Center for Genome Regulation; Centro de Biotecnología Vegetal, Facultad de
Ciencias Biológicas, Universidad Andres Bello, Santiago 8370146, Chile.
20) Scuola Superiore Sant' Anna (SSSA), Pisa 56127, Italy.
21) Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano,
Milano 20133, Italy.
22) Instituto de Nutrición y Tecnología de los Alimentos (INTA), Universidad de Chile,
Santiago 7830490 Chile.
23) Centro de Biotecnología Vegetal Facultad de Ciencias Biológicas Universidad Andrés
Bello, Santiago 8370146, Chile.
24) Dipartimento di Scienze Agrarie e Ambientali - Produzione, Territorio, Agroenergia,
Università degli Studi di Milano, Milano 20133, Italy.
25) Istituto di Biologia e Biotecnologia Agraria, CNR, Lodi 26900, Italy.
26) Energy Biosciences Institute, University of California, Berkeley California 94720, USA.
Correspondence should be addressed to Ignazio Verde ([email protected]), Albert G.
Abbott ([email protected]), Michele Morgante ([email protected]), Daniel S.
Rokhsar ([email protected]).
Nature Genetics: doi:10.1038/ng.2586
3
SUPPLEMENTARY INFORMATION
I. SUPPLEMENTARY NOTE
S1. SIGNIFICANCE AND IMPACT OF PEACH AS KEY GENOME FOR
TREES
The Rosaceae is comprised of many of the most important fruiting and forestry species worldwide.
Species in this family are grown for the fruit (e.g. peaches, strawberries, raspberries), the lumber (black
cherry) and the ornamental value (roses). According to FAO statistics in 2010 (http://faostat.fao.org)
rosaceous crops constitute the most important fruit species, providing about 135 million tons of edible
fruits per year (about 20% of the world fruit production). Within the family, the species that produce
drupe fruit (peaches, apricots, almonds, plums, cherries) are significant agricultural crops in many local
economies worldwide, providing important vitamins, minerals, fiber and antioxidant compounds for
healthy diets. Among these species, peach [Prunus persica (L.) Batsch], originally domesticated (about
4,000-5,000 years ago) in China with subsequent dispersion in Europe from Persia in the final centuries
B.C.1, is one of the most important species with a world production of 20 million tons/year (FAOSTAT
2010, http://faostat.fao.org). However, as is the case in most crop species, rosaceous breeding programs
are continually confronted with the need to find genetic solutions to ever changing problems posed by
disease and pests (e.g. viruses, fungi, insects) and the ever-changing environmental landscape (e.g.
drought, global warming, cold temperatures, marginal lands).
Until recently, the genetic understanding of characters governing fruit and forest tree growth,
production and sustainability lagged well behind that of the large commodity crops species (e.g.
maize). This was primarily due to lack of significant public investment and the refractory nature of the
plants for achieving high-resolution genetic manipulations (long juvenility periods, requirements for
large amounts of labor and space). However, with the increased worldwide awareness of the
importance of the improvement and sustainability of our tree resources, the importance of developing a
fundamental understanding of the biology and genetics of key tree species came to the forefront.
Among the trees as genetic/genomic resources, the fruit trees are unique in that their great age of
domestication and intensive breeding have captured the genetic components necessary for identifying
the genes that control basic tree growth, development and sustainability. Arguably, peach is the most
highly genetically characterized tree species currently serving as a reference genome important to both
Nature Genetics: doi:10.1038/ng.2586
4
fruit and forest tree research. In peach, genetic and genomic efforts have identified gene containing
intervals and candidate genes regulating a number of traits controlling important characters (for
comprehensive list see Supplementary Table 1).
Although the species of Rosaceae are typified by broad diversity in growth habit, fruit type,
reproductive traits and other important characters, we have insufficiently capitalized on this unique
resource for basic research. However, with the introduction and application of molecular genetic
technologies and the tools of genomics and transgenics, these species, that in the past were considered
unsuitable for basic research, are now in the spotlight because they present unique opportunities to
extend and enhance our understanding of plant genome evolution and of specialized gene systems. It is
in this regard that the Rosaceae offers one of the best systems for the comparative study of tree genome
evolution. This fact has become rapidly apparent, as genomic projects on Rosaceae species have
revealed that the diploid species representatives of this family have very small, compact genomes: 200
Mb strawberry2, 290 Mb raspberry
3, 220-294 Mb rose
4 and 227 Mb peach and at the same time exhibit
a huge diversity in growth form ranging from herbaceous to cane to bush to tree forms, respectively.
Thus, of the plant families available for genomic scale comparative studies in respect to the adoption of
particular growth forms, the Rosaceae genome is unparalleled. However, to develop an understanding
of the genome scale evolution of specific character states, whole genome sequences of the key diploid
species genomes are critical.
Completion of the peach genome sequence provides a critical cornerstone underpinning our
understanding of the evolution of tree genomes. We can now explore on a genome wide scale, the
evolution of specific gene systems representing different classes of genes associated with specific
developmental stages (e.g. drupe fruit development and quality). The evolution of genes comprising
these systems is intimately interwoven with the consequences of exploitation of specific growth habits
and specialized tissues (e.g. the phenylpropanoid pathway, genes encoding the lignification enzymes or
the synthetic pathway leading to fruit color). Thus, current research is capitalizing on the utility of this
compact genome system for comparative genomics while exploiting gene system knowledge from
model systems such as Arabidopsis to significantly accelerate our understanding of the genes that
underlie important traits.
Since the first release of the peach whole genome sequence there has been a rapid acceleration in basic
fruit tree research. The peach sequence has been instrumental for the development of important genetic
Nature Genetics: doi:10.1038/ng.2586
5
tools. Illumina SNP arrays have been developed in peach5 and in cherry
6. These are being utilized to
significantly advance marker map precision and perform Genome Wide Association Studies (GWAS)
allowing the fine resolution mapping of trait containing intervals.
Concomitant with higher resolution mapping of trait containing intervals, candidate gene discovery in
mapped intervals in Prunus is occurring at an unprecedented level. This is evident in Prunus trees as
the recent discovery of candidate genes for important traits that control disease resistance (PPV
resistance7), to fruit related traits such as flesh color
8, to genes controlling architecture
9 and to genes
controlling phenological traits such as chilling requirement10
. Interestingly the peach genome is being
used to advance our identification of potential candidate genes for traits in other tree species outside the
family as well (e.g. chestnut11,12
).
Mining the peach genome for transcribed sequences has enabled several transcriptome and small RNA
analyses to investigate important biological phenomena: endo dormancy13
, disease resistance14
, fruit
and organ development15
.
Finally, the peach genome provides a high quality assembled reference of high contiguity,
completeness and correctness that greatly facilitates the reconstruction of the genomes of many other
species through the use of next generation sequencing and reference-guided assembly methods both
within the family: apricot (Zhebentyayeva T., unpublished results), plum (Dardick C., personal
communication) and sweet cherry (Morgante M., unpublished results; Dhingra A. and Silva, H.,
unpublished results); as well as outside the family: pseudoassembly for the Cannabis genome
(http://genomevolution.org/wiki/index.php/Pseudoassembly), creosote (a desert bush) genome16
,
chestnut genome (Staton M., personal communication).
S2. SEQUENCING AND ASSEMBLY
S2.1 Genome Sequencing
All sequencing reads were collected with standard Sanger sequencing protocols on ABI 3730XL
capillary sequencing machines at the Department of Energy Joint Genome Institute in Walnut Creek,
California and the Istituto di Genomica Applicata in Udine, Italy from a double haploid of the cv.
Lovell. The tree (PLov2-2N) is hosted at Clemson University (accession number DPRU0280). Five
different sized libraries were used as templates for the subclone sequencing process and both ends were
Nature Genetics: doi:10.1038/ng.2586
6
sequenced. 536,032 reads from the 2.8 kb sized library, 606,680 reads from the 4.4 kb sized library,
2,106,103 reads from the 7.8 kb sized library, 419,424 reads from the 35.3 kb fosmid library, and
61,440 reads from the 69.5 kb BAC library were sequenced (see Supplementary Table 2 for clone size
breakdowns).
S2.2 Construction of the 8.5-fold scaffold assembly
A total of 3,729,679 reads (8.47-fold final sequence coverage; see Supplementary Table 2 for clone
size breakdowns) were assembled using a modified version of Arachne v.20071016 (ref. 17). This
produced 454 scaffold sequences, with L50 of 9.0 Mb, 49 scaffolds larger than 100 kb, and total
scaffold size of 226.6 Mb (see Supplementary Table 3 for scaffold and contigs totals). Each scaffold
was aligned against bacterial proteins, organelle sequences and GenBank nr and removed if found to be
a contaminant. Additional scaffolds were removed if they consisted of greater than 95% 24mers that
occurred 4 other times in the scaffolds larger than 50 kb or if the scaffold contained only unanchored
RNA sequences.
S2.3 Screening and final assembly release
We classified the scaffolds in various bins depending on sequence content. We checked contamination
using megablast against GenBank NT and BLASTX against a set of known microbial proteins. No
scaffolds were identified as contaminants. Scaffolds classified as unanchored rDNA (95),
mitochondrion (4), chloroplast (2), and repetitive (62) were removed from the final release. We also
removed 62 scaffolds that were less than 1 kb in sequence length.
S2.4 Assessment of accuracy and completeness of the assembly
A set of 13 fosmids were sequenced in order to assess the accuracy of the assembly. In 11/13 clones,
the alignments were relatively contiguous and of high quality, with a representative example given in
Supplementary Fig. 1. The overall error in aligned, non-gap adjacent regions of this is estimated to be
0.04% (Supplementary Table 4). The 17,695 bp difference in fosmid length and aligned bases is due to
an insertion in fosmid AC253548 that is not present in the genome assembly, along with several small
low quality, gap-adjacent regions. Two of the fosmid clones (AC253548 and AC253545) exhibited
noteworthy discrepancies. Clone AC253548 aligns to a region where there is a 4.2 kb gap in the
genome relative to the clone (Supplementary Fig. 2), while clone AC253545 (Supplementary Fig. 3) is
wholly repetitive and did not align to the genome across the repeats. Clone AC253539 is a noteworthy
Nature Genetics: doi:10.1038/ng.2586
7
alignment, since it is primarily composed of a degraded transposon (Supplementary Fig. 4). However,
the overall accuracy of the assembly is quite high. Completeness of the euchromatic portion of the
assembly was assessed by aligning 74,606 Prunus ESTs obtained from GenBank onto the genome, and
only 1,402 (1.94%) of the ESTs were not found. Further examination of the unaligned ESTs revealed
two main problems with the publicly available EST set. A subset of the unaligned ESTs were
composed of several short homopolymer regions that are clearly not coding for a gene as they do not
align to any other plant gene in GenBank. Two examples are given below:
>gi|115592176|gb|AM289480.1|AM289480 AM289480 Prunus persica fruit skin mature fruit Prunus
persica cDNA clone Skin92H11, mRNA sequence
CCCGGGGTCAAGACCCAGGGGCTCTTGGTTTGGGGGTCCCCCCGGGCCCTTTCCGTTGGGG
CCCCCCCCCCGGGGGAAATCCCTTTCCAGCCCCCTTTTGGGCTTTTTGGGGCTTTCCCCGG
GTCCGCTTTTGGGGAGGGGGGGGCTGCTTCCCAGGGTGTCCGGGCCCAGAGGGGAGGGGG
GGGACCCCAGGGGGCTGGGGGAGTTTTAGAGGGGGGTTTAAAATGTCCAGGGGCCCCCCA
GGGTTTTTTTTTAAAGGCTTTGTCTTTGTCGGTGGGGTCTTGCTTTTTGCCCTTTGGGTTGG
GTTTTGTTGCTTTCCCGCCCTTTTTCTGGGGCTCCTTTAAAAAGAAATAAAATTGGGGTTTT
TTTTTCCCCCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA
>gi|89498091|gb|DY653885.1|DY653885 PU4_plate2_N02 PU4 Prunus persica cDNA similar to
hypothetical protein (Q95JC9) Basic proline-rich protein, mRNA sequence
CTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTATAAAAAAA
AAAAAAAAAAAAAAAGGGGAAAAAAGACCCCCCCCACACCCCCGGGGGGGGCCGGGGG
GGGGCCGGCCCCCGGGGGGGGGGGGGCGGGGGGGGCGGGGCCGGGGAGGGGCCCCGCC
CCGGGGGGCCCCCCCCCCCCCCCCGGGGGGGGGGGCCCCCCCCCCCCCCAGGGGGGCGCC
GCCCCCGGGGGGGGCCCCCCCCCGGGAGAACCCGGGGAACCCCACGCCGGGGGGGCCCC
GGGGGCGGCGCTCGGGGGGCCCAAGGGGGGGGGAAAACGCGGGGCCAAGCCGCCGCAA
CCCCAGCCCGGGGGAACCCCCCCCCGGGCCGGGAGGGCAAGGGCCGGGCCGGGGGGCAA
AAAACCCAGCCCGGCCGGCACCCACGGGGGGCGGGGG
Nature Genetics: doi:10.1038/ng.2586
8
The second problem we identified with the public dataset is ESTs that have high alignment identity but
low coverage on the ends of the EST. This indicates that it is likely that a portion of the individual EST
is of low quality. Unfortunately, public EST datasets do not generally come with quality scores or trim
coordinates, making it impossible to trim low quality regions. We estimated that nearly 45% of the
unaligned ESTs could be excluded due to the problems outlined above, leaving only 1.0% of ESTs
unaligned.
S3. LINKAGE MAPPING, SEQUENCE ANCHORING AND MAP
INTEGRATION
S3.1 Chromosome-scale assembly and pseudomolecule construction
For the map integration and chromosome-scale pseudomolecule construction, 827 markers from an
updated version of the previously published Prunus reference map18,19
were used to anchor the WGS
scaffolds. Five breaks were made in chimeric scaffolds based on mapped genetic marker data and 234
main genome scaffolds (> 1 kb) were retained after breaking and filtering. Forty mapped scaffolds were
appended to the Prunus linkage groups and 32 joins were made to generate 8 pseudomolecules
(Supplementary Fig. 5) that were named according to the linkage group nomenclature adopted in
Prunus20
. Each map join is denoted with 10,000 N bps. The chromosome-scale pseudomolecules
contain 215.9 Mb (96.1%) of 224.6 Mb of the assembled sequence. Twenty-eight mapped scaffolds
with 192.4 Mb (85.6%) of the assembled sequence have known orientation along the pseudomolecules
while the remaining 12 mapped scaffolds are assigned with a random orientation. One hundred ninety-
four scaffolds (totaling 8.7 Mb) are unmapped in the current release. The final chromosome-scale
assembly, named Peach v1.0, contains 202 scaffolds (8 pseudomolecules and 194 scaffolds) that cover
224.6 Mb of the genome with a contig L50 of 214.2 kb and a scaffold L50 of 26.8 Mb. The resulting
final statistics of Peach v1.0 are shown in Supplementary Table 5.
S3.2 Assembly refinements
Marker coverage was uneven in the Peach v1.0 genomic sequence. Ten scaffolds larger than 300 kb
had no markers at all with the largest marker-deficient scaffold being 2.1 Mb in size. To provide
additional markers, SSRs and SNPs were retrieved and primers were designed to univocally target
genomic regions chosen to address the following issues: a) scaffolds > 300 kb without markers; b)
scaffold regions without markers within 500 kb of a scaffold end; c) unoriented scaffolds with only one
Nature Genetics: doi:10.1038/ng.2586
9
marker or with co-segregating markers. SSRs were identified in the assembly using Sputnik software
(see Repeats analysis section). SNP markers were identified from the Illumina sequence reads of the F1
parent of ‘Contender’ x ‘Ambra’ aligned against the peach reference genome (see Resequencing and
Prunus diversity section). Three peach maps, Peach x P. ferganensis BC1 (P x F)21
, ‘Contender’ x
‘Ambra’ F2 (C x A)22
and ‘Contender’ x Fla.92-2C (C x Fla)23
were also used to validate marker-less
scaffold ends and to test orientation and ordering of WGS scaffolds in critical compressed regions.
These maps have a higher power of resolution than ‘Texas’ x ‘Earligold’ (T x E) map but a lower
marker density due to the low level of polymorphisms observed in peach intraspecific progenies.
Moreover, additional markers localized within the uncovered scaffold ends were also retrieved from
recent literature24–26
.
Ten unmapped scaffolds larger than 300 kb, totaling 7.2 Mb of the assembled sequence, were assigned
to a pseudomolecule location: 5 of them, 4.8 Mb in size, were also oriented (Supplementary Table 6).
Nine randomly oriented mapped scaffolds within the Peach v1.0 pseudomolecules, containing 20.0 Mb
of the assembled sequences, were oriented as well (Supplementary Table 7). The position of one
mapped scaffold (Sc23), wrongly ordered along the pseudomolecule 2, was also refined
(Supplementary Table 7). Finally, twenty-three scaffold ends were checked and six break points were
identified. The mapping of new SNP and SSR markers helped to relocate six broken-scaffolds (totaling
5.8 Mb of assembled sequence) and to orient five along the Peach v1.0 pseudomolecules
(Supplementary Table 8). A high density PxF linkage map (not shown) recently obtained using 1,259
SNPs from the IPSC peach 9K SNP array v1 (ref. 5) confirmed the structure of the refined
pseudomolecules and allowed the orienting of three small scaffolds (Sc450, Sc33 and Sc7_1, totaling
3.2 Mb; Supplementary Tables 6-8). As a result of all the refinements, 56 scaffolds, totaling 223.1 Mb
(99.3%) of 224.6 Mb of the assembled sequence, are now anchored to linkage groups, while 50 of
them, totaling 219.9 Mb (97.9%), are oriented. All these refinements will be included in a future
release of the peach genome and are summarized in Supplementary Fig. 5 and Supplementary Tables 6-
8. Supplementary Table 9 shows the statistics of the linkage maps used to anchor, order and orient the
WGS scaffolds. The refinements can be visualized in a dedicated track available on the GDR at the
following URL: http://www.rosaceae.org/species/prunus_persica/genome_v1.0_refinements. The
unmapped sequences (184 scaffolds, 1.5 Mb) contain less than 100 predicted protein-coding genes. The
resulting statistics of the upcoming refined assembly are shown in Supplementary Table 10.
Nature Genetics: doi:10.1038/ng.2586
10
S3.3 Comparison of genetic and genomic distances and position of centromeres and
telomeres
Genetic (cM) and physical distances were compared by plotting T x E, P x F and C x A maps
(Supplementary Fig. 6). The plots had a similar trend in all maps with a slight overall suppression of
recombination in the interspecific one (439.8 kb/cM in T x E vs. 383.4 and 330.9 kb/cM in C x A and P
x F, respectively). Differences were observed both among and within the individual linkage groups.
Clear signatures of putative centromeric regions can be seen in all chromosomes based on gene-poor
highly repeated regions and suppression of recombination observed in the three different linkage maps
(Fig. 1 and Supplementary Fig. 6). We have compared the predicted centromere positions of the
putative peach chromosomes, with the sequence methylation density plots for the eight
pseudomolecules in peach (Zongrang Liu, personal communication) and in each case, a peak of DNA
methylation corresponds to the centromere positions previously predicted. However, peach did not
show a broad suppression of recombination in the putative pericentromeric regions as observed in the
soybean genome27
. At the end of G4 a strong suppression of recombination was observed in all three
linkage maps with values up to 9.2 Mb/cM in the C x A intraspecific cross while at the top of G2 broad
suppression was observed only in the interspecific cross (T x E) with values up to 6.1 Mb/cM while in
P x F, which has a higher number of markers in this region, no suppression was observed. C x A that
has a low number of markers in this region displayed a moderate level of suppression of recombination
with values up to 1.7 Mb/cM. These two regions also showed an increased level of sequence diversity
among peach accessions with an SNP density up to 5 fold higher than the rest of the genome (see Fig. 2
and Prunus diversity section) and the region on pseudomolecule 4 displayed also markedly higher
linkage disequilibrium (LD) levels than the genome average, which is consistent with a suppression of
recombination observed in intraspecific crosses. Telomeric signatures were captured in the Peach v1.0
assembly on three pseudomolecule ends (Pp 4, Pp 5 and Pp 7). Two small unmapped scaffolds
(Scaffold_140 and Scaffold_322) have telomeric signatures as well.
S3.4 Comparison to other plant genome sequences
In Supplementary Table 11 the peach genome assembly was compared to those of other plant genomes.
According to the standards reported by Chain et al.28
the sequence described in this work can be
defined as “Improved high-quality draft genome sequences” with some steps toward “Annotation-
directed improvements”. Based upon the reported statistics (contiguity, % of mapped sequences), if we
exclude the Arabidopsis29
and rice30
genomes, that represent to date the only plants with finished
Nature Genetics: doi:10.1038/ng.2586
11
genomes, the peach genome can be considered one of the best genomes comparable to that of
Brachypodium31
and certainly the best tree genome. This goal was reached thanks to the use of a
completely homozygous genotype and of the high quality Sanger reads.
S4. REPEATS ANALYSES
The analysis of repeats was done using a combination of de novo identification tools (REPET:
http://urgi.versailles.inra.fr/index.php/urgi/Tools/REPET, RepeatScout32
and ReAS33
), structural tools
(LTR_finder34
and Dotter35
) and similarity searches (RepeatMasker36
, BLAST). All the sequences
identified as repetitive mask 37.14% of the genome (Supplementary Table 12). All the major orders of
TEs are represented. The most abundant one is that of Long Terminal Repeat retroelements (LTR-RTs)
representing 19.56% of the genome. The abundance of the two major LTR-RT superfamilies seems to
be similar: 9.97% for the Ty3-gypsy and 8.60% for the Ty1-copia elements. A small amount (1.00%) of
the LTR-RT related sequences could not be clearly assigned to any of the two superfamilies. As
commonly seen in plants, LINEs are quite underrepresented totaling just 0.63% of the genome. DNA
transposable elements constitute 9.05% of the peach genome. In addition, 7.54% of the genome has
been classified as repetitive but could not be further characterized.
S4.1 Class I elements identification
Both RepeatScout32
and ReAS33
were used to perform de novo identification of repeats. The fourth step
of the RepeatScout pipeline was run using a threshold of 20 copies; ReAS was run using default
parameters on a set of 100,000 sequences, declared repetitive by the assembler. All the repeat
candidates were clustered using cd-hit-est37
with a similarity threshold of 80% to remove redundancies.
The repeat sequence candidates were then characterized using similarity searches against RepBase38
and the nr division of GenBank39
. In the first case, we ran TBLASTX, and in the second, we used
BLASTX40
; in both searches we set an e-value threshold of 1x10-5
. The results were parsed to find
evidence of significant similarity to known class I elements and to remove the candidates similar to
gene families or other duplicated coding regions.
Structural identification of complete LTR-RTs was done running LTR_finder34
on the peach assembly
with the following structural constraints: LTRs having both TG and CA dinucleotides at their ends and
Nature Genetics: doi:10.1038/ng.2586
12
at least two out of Target Site Replication (TSR), Primer Binding Site (PBS) or Polypurinic Tract
(PPT) motifs.
A careful manual refinement of the structural search for LTR-RTs enabled the identification of 2,015
complete elements. We also estimated the insertion time of these elements using the molecular
paleontology approach proposed by SanMiguel et al.41
. Given the lack of a substitution rate estimate
for peach, we limited our analysis to the estimate of the nucleotide divergence separating each LTR
pair. LTRs were aligned using the program Stretcher and the nucleotide distances for each alignment
were calculated using the program DistMat applying the Kimura 2 parameters model correction. Both
are part of the EMBOSS software package42
. The overall majority of insertions seems to have taken
place during a relatively recent evolutionary time since the nucleotide divergence between LTRs is
extremely low (Supplementary Fig. 7). Notably, 253 LTR-RTs (12.56% of the total) have identical
LTRs and could correspond to very recent insertions.
Two regions of the reverse transcriptase domain for Ty1-copia and Ty3-gypsy elements were searched
in peach and woodland strawberry2 genomes. Any hit having an e-value lower than 1x10
-5 and a length
greater than 60 amino acid residues was extracted. A random subset of 100 positive hits for each
species was collected and aligned using Muscle43
. The alignments were used to construct phylogenetic
trees using the Mega 5 software44
and applying Neighbor-Joining method. Bootstrap values were
calculated on 1,000 replicates. For both Ty1-copia and Ty3-gypsy elements, we observed clades in
which elements from the two species are mixed (Supplementary Figs. 8 and 9) indicating that the
diversification into families largely predates the divergence between the two genera.
S4.2 Class II elements identification
As a first approach, we submitted the peach assembled scaffolds to the REPET pipeline to generate a
primary file of consensus, classified TE (TEdenovo file). Complete or incomplete transposable
elements belonging to DNA transposon families were identified. Sequences were further manually
curated using both BLASTX (http://www.ncbi.nlm.nih.gov/blast) and Censor from the RepBase
database (http://www.girinst.org/censor/index.php). The number of representatives of the MITE
subgroup was increased by submitting the eight major peach scaffolds to the MUST-MITE pipeline
(http://csbl1.bmb.uga.edu/ffzhou/MUST/). CACTA-like peach transposons were obtained using a
homology-based strategy. CACTA full-length, consensus sequences were retrieved from the TREP
database (http://wheat.pw.usda.gov/ITMI/Repeats/total_TREP_list.html) and aligned. We then
Nature Genetics: doi:10.1038/ng.2586
13
performed a BLASTX search within the peach genome and manually curated the data for full-length
peach CACTA sequences. Other DNA transposons were confirmed by subjecting the TEdenovo file to
the following Helitron Finder45
.
S4.3 Simple Sequence Repeats
Microsatellites were identified with Sputnik (http://espressosoftware.com/sputnik/) using default
parameters. A total of 32,091 type I SSRs (>19 bp) were detected and are displayed in a dedicated track
on the Gbrowse (http://services.appliedgenomics.org/fgb2/iga/prunus_public/gbrowse/prunus_public/).
S5. GENE ANNOTATION
Transcipt assemblies were constructed using PASA46
from P. persica ESTs (~88K) and ESTs of
related species (~424K, downloaded from NCBI, they are in Genera Malus, Rosa, Fragaria and
Prunus). Loci were determined by BLAST alignments of above transcript assemblies and/or BLASTX
alignments of peptides from Arabidopsis thaliana, rice, soybean, grape and poplar to repeat-soft-
masked P. persica genome using RepeatMasker36
with up to 2 kb of extension on both ends unless the
locus extended into another locus on the same strand. Gene models were predicted by homology-based
predictors, mainly by FGENESH+ (ref. 47) with additional use of GenomeScan48
if FGENESH+ failed
to produce a model at the locus. Predicted gene models were improved by PASA. Improvement
included adding UTRs, splice site correction, and adding alternative transcripts. PASA-improved gene
model peptides were subject to peptide homology analyses to the above-mentioned proteomes to obtain
a Cscore and peptide coverage. Cscore is a ratio of a peptide BLAST score to a mutual best-hit (MBH)
BLAST score. Peptide coverage is defined by the highest percentage of peptide sequence that aligns to
its homologs. PASA-improved transcripts were selected based on Cscore, peptide coverage, EST
coverage, and the degree of overlap of the coding sequence (CDS) with repeat sequences. The
transcripts were selected if their Cscore was larger than or equal to 0.3 and their peptide coverage
larger than or equal to 0.3, or they had EST coverage, but their CDS overlapping with repeats was less
than 20%. The final selected set has 27,852 protein coding genes and 28,689 protein coding transcripts.
Out of 27,852 genes, 27,017 (97%) have either 80% of the CDS covered by ESTs or a 0.5 Cscore.
Using the BLASTX algorithm (Exp < 1e-6
) a parallel analysis revealed that 24,423 annotated
transcripts have Arabidopsis homologs, 18,822 have Swiss-Prot homologs and 26,731 have TrEMBL
homologs.
Nature Genetics: doi:10.1038/ng.2586
14
S6. MANUAL ANNOTATION, GENE FAMILY CHARACTERIZATION AND
SMALL RNA ANALYSIS
S6.1 Transcriptome Analysis
Reads from different tissues (fruit at different ripening stages, roots, fully expanded leaves, embryos
and cotyledons; Supplementary Table 15) were obtained by an RNA-Seq approach and ranged from 24
to 43 millions per tissue. The track obtained with these reads are hosted and displayed in Peach
Genome Browser at the Istituto di Genomica Applicata
(http://services.appliedgenomics.org/projects/drupomics/gbrowse/).
To validate and improve the Peach v1.0 gene models, we manually annotated 672 gene models from
141 diverse gene families (Supplementary Table 16) relevant to fruit quality related traits. Gene models
were selected for manual annotation based on BLAST scores in various databases (e.g. NCBI,
Phytozome) with a sequence similarity cut off value of 70% or higher and protein identity cut off of
60% or higher or BLASTN e-value <-20. We emphasized gene families relevant to fruit quality related
traits including genes involved in the pathways for cell wall metabolism, sugar metabolism and
transport, abscisic acid and carotenoid synthesis, volatile compound metabolism, anthocyanin
biosynthesis and specific global plant regulatory genes. Selected genes were examined and edited using
minor variations of essentially the following strategy: putative orthologous genes were examined
manually, and edited by means of existing EST information (relevant EST databases, e.g. GDR),
Illumina RNA-Seq data and comparison with previously described genes from other organisms. Two
hundred and twenty three genes (Supplementary Table 16) were corrected using sequences present in
other databases (GDR, Phytozome) and RNA-Seq data, thus confirming the quality of the automated
prediction of Peach v1.0 sequence release. A track displaying the edited gene models can be seen at
this URL: http://www.rosaceae.org/gb/gbrowse/prunus_persica/.
Gene family comparative analyses were performed with all gene families from other sequenced
genomes. Comparison was carried out between peach and other species based on the protein prediction
database available on Phytozome (http://www.phytozome.net/) (Arabidopsis, grape, papaya, poplar and
peach), and on the GDR apple and Fragaria Genome Browsers (Supplementary Table 17).
S6.2 Gene family analysis (See Supplementary Tables 16 and 17 for details)
S6.2.1 Cell wall biosynthesis
Nature Genetics: doi:10.1038/ng.2586
15
The peach genome offers a great opportunity to search for cell wall genes that are important in woody
plants. In addition, it provides a unique opportunity to identify genes that are involved in the formation
of polysaccharides that contribute to the formation of fiber and the composition of the cell wall and
middle lamella of the fruit.
The P. persica genome v1.0 was searched for genes encoding enzymes involved in the biosynthesis of
principle polysaccharides that compose the cell wall (cellulose, hemicelluloses and pectins). From this
analysis 42 genes had confirmed gene structure from expression data (ESTs plus RNA-Seq). Of these
confirmed genes, 40 are expressed in the fruit. They encode enzymes involved in the biosynthesis of
cellulose, the hemicelluloses (xyloglucan, glucuronoxylan and mannan/galactomannan) and the pectins
(homogalacturonan, rhamnogalacturonan I and rhamnogalacturonan II). Manual annotation confirmed
most structures as predicted by the UTRs and exon/intron boundaries, with the exception of two genes
likely involved in homogalacturonan biosynthesis and one gene likely involved in cellulose
biosynthesis that showed modifications.
The search for putative orthologues of the 42 cell wall genes in the other plant genomes showed that all
of them are present in most species analyzed. Some genes involved in cellulose and glucuronoxylan
show an expanded number of gene copies in poplar in comparison to the other species. Additionally,
apple shows the largest number of cell wall gene orthologues. Both these species have undergone more
recent polyploidization events than the other compared species and this would contribute to the
increase in copy number within the gene families.
S6.2.2 Cell wall degradation
Peach fruit softening during ripening involves a coordinated series of modifications to the
polysaccharide components of the primary cell wall and middle lamella, resulting in a weakening of the
structure and contributing to the loss of flesh firmness. As pectins and pectolytic enzymes such as
endo- and exo-polygalacturonases (PG, endo- EC 3.2.1.15; exo- EC 3.2.1.67), pectate lyases (PL; EC
4.2.2.2), and pectin methylesterases (PME; EC 3.1.1.11) are central to this process we focused our
annotations on these pectolytic genes. Depolymerization processes are carried out by PG and PL, while
de-esterification is the result of the activation of the PME. Other genes involved in pectin degradation
also include pectinacetylesterases (PAE) and polygalacturonase inhibitor (PGIP). On the whole, we
identified 68 genes belonging to these families, of which 14 have been modified in their structure, but
mainly in the UTR sequence. Twenty-three PG/PG-like were identified in the peach Genome v1.0 of
Nature Genetics: doi:10.1038/ng.2586
16
which only four genes of the PG/PG-like family have been modified in their structure. For PL a total of
15 loci have been identified and a total of 16 PME putative genes have been identified.
Peach amino acidic sequences of PG/PG-like, PME and PL genes were compared to other species, in
order to find putative orthologues in other sequenced genomes. The search for putative orthologues of
the 3 families involved in pectin metabolism in other plant genomes (strawberry, apple, papaya, Vitis,
poplar and Arabidopsis) showed that all of them are present in the analyzed species.
S6.2.3 Sugar metabolism and transport
In fruit crops, sugars represent a noteworthy determinant of fruit quality traits, directly affecting the
economic performance of the species. As a rule, sucrose accounts for the majority of translocated
carbon in plants, but in the Rosaceae, sorbitol is also synthesized and transported. The type and level of
sugar accumulated in the fruit is unique for each species, and the peach drupe is characterized by a high
concentration of sucrose during the final steps of development.
Manual annotation of genes involved in sugar transport/metabolism was based on the degree of
similarity to known genes. In total, 77 gene models were examined and annotated: 7 gene families
relevant to sugar metabolism and 3 to sugar transport. A comparison of these families in peach and six
other sequenced species (Arabidopsis, grape, poplar, papaya, apple and strawberry) was carried out.
The analysis identified a similar number of genes that are shared between peach and Arabidopsis for
sucrose synthase (SuSy), neutral invertase (NI), sucrose phosphate synthase (SPS), and aldose 6-
phosphate reductase (A6PR) families. As expected, peach and apple showed the highest number of
sorbitol dehydrogenase (SDH) gene copies (7 and 6, respectively), in accordance with the specific role
of sorbitol in the Rosaceae family. Notably, strawberry exhibited only two gene models encoding SDH,
and 3 sorbitol transporters (SUT), while apple displayed the highest number of copies for the gene
families SuSy, NI, SPS, and SDH. Forty-two amino acid sequences, classified as hexose (29), sorbitol
(10) and sucrose (3) transporters were used to characterize phylogenetic relationships. Annotated
sequences were compared to the 39 of Arabidopsis and the 35 of Vitis by the construction of a distance
tree using UPGMA algorithms of Mega 5 (ref. 44; Supplementary Fig. 10). The analysis of protein
sequences revealed that sucrose and monosaccharide transporters form two separate clades, validating
the classification procedure carried out in the annotation step. As expected, transcripts annotated as P.
persica sorbitol transporters cluster in the tree together with polyol transporters, and they are more
numerous than their orthologues, consistent with the importance of this carbohydrate in peach
Nature Genetics: doi:10.1038/ng.2586
17
metabolism. The mapping of SOT gene cluster in peach and apple support the hypothesis of gene
expansion in this family predating the divergence on the two genera. SOT genes in P. persica are
located on scaffold 8 (9 genes) in position from bp 12,442,074 to 12,957,297. This region is
orthologous to LG12 of Malus x domestica which has 15 SOT genes (14 in tandem between bp
8,813,736 and 9,170,299); the same LG12 region contains a duplicated fragment present four times in
the genome. One is on LG3 and has three SOT genes (two in a tight cluster); the other is on scaffold 7
(one SOT gene and one SOT pseudogene from bp 20,683,386 to 20,687,445). This region is
orthologous to fragments of LG15 and LG2 of Malus x domestica. LG15 contains two SOT genes in a
large fragment of around 8 million bp; LG2 has two SOT genes tightly coupled in about 2,000 bp and a
third one 0.9 million bp apart. In the genome of Malus x domestica, in addition to the gene clusters
mentioned above, two syntenic regions on LG17 and 9 also have SOT genes (respectively six and two).
(http://www.rosaceae.org/gb/gbrowse_syn/peach_apple_strawberry).
S6.2.4 ABA and carotenoid metabolism
Another attribute of fruit quality that influences the commodity preference/acceptance is color.
Carotenoids are a large family of pigments accumulated in leaves, flowers and in a wide variety of
yellow-orange colored fruits. In addition, these substances fulfill numerous physiological functions in
plants such as protecting the photosynthetic apparatus from photo-oxidation. Specific molecules (α-, ß-,
γ-carotene) are also recognized as nutritional components because they are precursors of vitamin A.
Moreover, the phytohormone abscisic acid (ABA), an important signaling molecule, is synthesized via
an oxidative cleavage of carotenoids.
Manual annotation of genes involved in ABA/carotenoid biosynthetic pathway identified 19 gene
models belonging to 9 different families.
The number of members in the families was comparable in peach and other genomes, and apparently
was not related to fruit color, e.g. poplar and apple showed more gene copies of phytoene synthase
(PSY). On the other hand, peach, poplar and apple had the highest number of lycopene ß-cyclase
(LYBC), and zeaxanthin epoxidase (ZEP) genes. The 9-cis-epoxycarotenoid dioxygenase (NCED)
family seemed crucial for ABA biosynthesis, being the more variable in the analyzed species.
S6.2.5 Volatile compound metabolism
Nature Genetics: doi:10.1038/ng.2586
18
Volatile compounds involved in peach fruit aroma have been intensively studied and more than 100
molecules have been identified49–53
. Among them, γ- and δ-decalactones, C6 aldehydes and alcohols
(e.g. (E)-2-hexenal and (E)-2-hexenol), esters (e.g. (Z)-3-hexenyl acetate), and terpenoids are
considered as compounds that significantly impact aroma in peach fruit49–53
. Using the whole genome
sequence assembly we: 1) identified and manually annotated genes potentially involved in the
metabolism of aroma volatile compounds; 2) characterized the epoxide hydrolase (EH) gene family in
peach and other fruit species with fully sequenced genomes using a comparative genomics approach.
In total, 95 gene models belonging to 12 gene families known to be involved in the biosynthesis of
aroma compounds such as esters, aldehydes, alcohols, lactones, ionones, monoterpenes, and aromatic
amino acid-derived volatiles were refined manually. As many volatiles are biosynthesized from
intermediates or products of primary metabolic pathways, selection of the genes was focused on the
specific final steps of the volatile biosynthetic pathways. Representation of each gene was compared in
the peach, apple, strawberry, grape, poplar, papaya and Arabidopsis genomes (Supplementary Table
17).
Because of the important contribution of lactones to peach aroma, the EH gene family54
was analyzed
in more detail using BLASTP (e-value cutoff e-90
). Twelve genes were identified, all containing the
conserved amino acids 36-39 and the distinctive amino acids of catalytic sites55
. The highest percentage
of identity (99%) among peach EHs was found between ppa008885m and ppa008897m, and between
ppa019902m and ppa008918m. To understand the relationships and evolution of peach EHs, a
phylogenetic analysis was performed comparing peach EHs with those found in six other sequenced
genomes (Supplementary Fig. 11): 16 in apple, 10 in grape, 14 in strawberry, 6 in papaya, 14 in poplar,
and 7 in Arabidopsis.
Phylogenetically, EHs formed three groups following the evolutionary relationships of the species
analyzed56
. Grape, poplar, papaya, and Arabidopsis EHs were grouped separately from strawberry,
apple, and peach. Compared to the other species, the absence of group 3 EH in the Arabidopsis genome
suggests a functional differentiation of this subfamily. This is consistent with previous results showing
that Arabidopsis has a lower number of genes involved in volatile, aromatic-compound, pigment,
antioxidant and sorbitol biosynthetic pathways compared to apple57
, grape58
and poplar59
genomes.
Rosaceae EHs were sub-grouped following species phylogeny, with apple and peach belonging to the
Nature Genetics: doi:10.1038/ng.2586
19
subfamily Spiraeoideae and strawberry belonging to the Rosoideae subfamily56
. EHs syntenic blocks
were conserved among the three species25
.
S6.2.6 Phenylpropanoid metabolism
The Prunus fruit trees like the other rosaceous crops are valued in the human diet for their specialized
biosynthesis of phenylpropanoid metabolites, which contribute to the flavor and pigmentation of the
fruits and are also involved in response to pathogens. A total phenolic content in peach was estimated
to be in the range of 14-111 mg/kg of fresh tissue and phenolic composition varies greatly among
cultivars60,61
. Peach phenolic compounds are accumulated mainly in peel though some cultivars have a
significant content of phenolic acids in the flesh61
. Developing stone fruit is a unique biological system
with spatially defined and well-timed switches in the flux of common precursors among the
anthocyanin (early and late fruit development), the lignin (stone formation) and the free phenolic acids
biosynthetic pathways. Therefore, functional annotation of genes involved into phenylpropanoid
metabolism in peach and comparative analysis with other woody trees (poplar and grapevine) and
weedy plants (Arabidopsis and diploid strawberry) is of special interest.
Identification of genes associated with the phenylpropanoid pathways for monolignol, free phenolic
acids and anthocyanin biosynthesis in peach was done using information available for other model
systems, wood formation in poplar and seed coat coloration in Arabidopsis. The functional annotation
of peach predicted proteins was done by running BLAST against functionally annotated proteins in
Arabidopsis thaliana and Populus trichocarpa at AraCyc8.0 and PoplarCyc3.0
(http://www.plantcyc.org).
Sequence comparison with functional phenylpropanoid enzymes from other species identified in total
56 genes potentially involved in monolignol and anthocyanin biosynthesis in peach. Of these, 39 genes
have signature domains for completing predicted enzymatic reactions and were considered as the ‘bona
fide’ genes. The low number of genes specifying the enzymes for producing the diversified set of
secondary metabolic products is the most striking feature of the peach phenylpropanoid network. With
two exceptions, C3H and HCT/HQT, the peach phenylpropanoid toolbox has a minimal number of
“players” required for a “generalized” angiosperm, fruit producing tree. The C3H and HCT/HQT gene
families involved in the critical enzymatic reactions towards monolignols biosynthesis are expanded in
peach compared with gene family sizes in herbaceous Fragaria vesca and woody plants such as poplar
and grape. We speculate that the specialized set of the C3H and HCT/HQT isoforms (in addition to a
Nature Genetics: doi:10.1038/ng.2586
20
common set of enzymes for cell wall biosynthesis in woody plants) might be required for formation of
a highly lignified endocarp in peach and other Prunus species.
S6.2.7 Regulatory genes: MADS-box genes
MADS-box genes encode a family of eukaryotic transcription factors distinguished by the presence of a
highly-conserved ~58 amino acid DNA-binding and dimerization domain at the N-terminal (the
MADS-box62
). In plants, MADS-box genes are best known as master regulators of flowering time and
floral organ development, although they also function in the development of leaves, roots, fruits, seeds
and gametophytes63
. Recent work in perennial species suggests an additional role for MADS-box genes
in the control of seasonal dormancy64
.
The HMMER-3.0 software package65
was used to build profile hidden Markov models from full Pfam
alignment files for the MADS-box (SRF-TF PF00319) and K-box domains (K-box PF01486).
Resulting models were used to search the database of predicted peach peptides and identify potential
MADS-box proteins (e-value threshold 1xe-10
, with manual inspection of sequences close to the
threshold). The full peach genomic sequence was also queried with nucleic acid sequences from
representative Arabidopsis and Vitis MADS-box genes using NCBI BLAST tools66
to identify putative
MADS box genes not present in the predicted protein set.
A 15 kb region around each peach MADS box was extracted, and the full gene structure was predicted
using FGENESH (Softberry, Inc., Mount Kisco, NY), AUGUSTUS67
and SNAP68
gene prediction
programs in the DNA Subway annotation pipeline (http://dnasubway.iplantcollaborative.org). Predicted
models were refined by manual inspection and comparison with homologous Arabidopsis sequences,
peach ESTs and mapped RNA-Seq reads.
Our analyses identified 79 MADS-box gene models (41 Type I and 38 Type II) distributed across all 8
psedomolecules. These include representatives from most known subfamilies within the Type II
lineage, with the exception of the TT16/GOA subfamily. Type I genes include members of the Mα,
Mβ, and Mγ subfamilies and are found predominantly in peach-specific monophyletic lineages. Type I
MADS box genes generally consist of a single exon, and approximately half of their JGI gene models
required no coding sequence modification. In contrast, Type II MADS-box genes typically contain 8 or
more exons. Their coding sequences and gene structures were predicted less accurately by the
annotation pipeline, with the exception of those genes whose sequences had previously been identified
Nature Genetics: doi:10.1038/ng.2586
21
in EST libraries. In general, JGI gene models contained no 5' or 3' UTR information for the MADS box
genes.
S6.2.8 Miscellaneous genes involved in fruit development and quality
Fruit quality depends upon the coordinated expression of genes during the developmental cycle as well
as the postharvest period. Fruit quality is also affected by environmental factors that cause biotic and/or
abiotic stresses, such as pathogen infection, cold-stress or drought stress69–73
. With the completion of
the peach genome, we have identified loci that are associated with some of these processes. In
particular, we have focused our analyses on: pathogen related proteins, ROS related proteins, abiotic
stress related proteins, ripening related proteins, auxin and cytokinin related proteins, as well as, genes
associated with the biosynthesis of jasmonic acid, salicylic acid and ethylene. Comparative genome
analyses with species such as apple, strawberry, grape, papaya, poplar and Arabidopsis demonstrate
that these genes are conserved in peach. Interestingly, there are several gene duplications that are
conserved between these genomes especially in the PR1, PR5 (thaumatin), PR10 (RNase),
polygalacturonase, pectin methylesterase inhibitors, pectate lyase and NAC1. Many of these gene
duplications are located in tandem within the peach genome. It has been suggested that similar tandem
duplicates in stress response genes in Arabidopsis are likely important for adaptive evolution to rapidly
changing environments74
. Remarkable expansion and reduction in copy number was observed in peach
for some gene families in comparison with the other sequenced plant genomes. The PR1 (pathogenesis-
related gene1) and the PR10 (pathogenesis-related RNase) gene families are notable instances of gene
number increase only in the peach lineage (respectively, 22 genes in contrast to an average of 5 for the
other six genomes, and 28 genes in contrast to 2.5). A unique case is represented by the gene family 1-
aminocyclopropane-1-carboxylate oxidase (ACO; ethylene biosynthesis) which is present with 24 and
41 members in apple and strawberry, respectively, but is reduced to four members in peach. Because
strawberry split before peach and apple, this observation suggests that a drastic reduction in gene
number has occurred in the peach lineage.
S6.3 Non-coding RNA analysis
S6.3.1 MicroRNAs
MicroRNAs (miRNAs) are small RNAs (sRNAs) approximately 21 nucleotides in length that
negatively control gene expression by cleaving or inhibiting the translation of target gene transcripts.
Nature Genetics: doi:10.1038/ng.2586
22
Within this context, miRNAs and sRNAs are coming to the forefront as molecular mediators of gene
regulation in plant development and plant response to biotic and abiotic stress.
Barakat et al.13
queried the P. persica genome with mature miRNA sequences from miRBase (release
17) and evaluating miRNA structures using MirCheck program and by visual inspection allowed the
identification of 189 unique miRNAs distributed on all peach pseudomolecules (Fig. 1). The 189
miRNA sequences belong to 57 conserved miRNA families with number of members ranging from 1 to
52. All of conserved miRNA families, except one, showed a size similar to their counterparts in
Arabidopsis. However, the size of miR5021 family expanded in Prunus persica to 52 members. Most
of the miRNAs that are conserved in Arabidopsis, rice or Populus were found in peach. Other miRNA
families exhibit various gain/loss patterns in different angiosperm lineages13
.
A target search using P. persica proteome (http://www.phytozome.net) enabled the target prediction for
179 (94.7%) of identified miRNAs13
. For five families (miR3627, miR4376, miR479, miR827,
miR861), no targets could be predicted. Those miRNA could be young miRNA genes that are still in
the process of generation as suggested by a previous study75
. An alternative hypothesis is that these
miRNAs lost their function in P. persica. Gene ontology annotation showed that peach miRNA targets
cover a wide range of genes governing various biological processes. New targets were predicted for
several conserved miRNA families in peach.
S6.3.2 Other Non-coding RNA (ncRNA) analysis
tRNAscan-SE predicted 474 tRNAs decoding 20 amino acids as well as 25 tRNA pseudogenes
(Supplementary Table 13). Fourteen predictions overlapped annotated genes. Of these, 10 fell within
introns of annotated coding sequences. As in other higher eukaryotes, 18s, 5.8s and 28s ribosomal
RNAs are present as multiple tandem repeats separated by two internal transcribed spacer sequences.
5s rRNAs (530 loci on main genome assembly, 419 on ribosomal RNA scaffolds) were typically
arranged as pairs of two closely positioned gene copies and were dispersed along the genome, but with
notable clustering in intervals of several megabases on scaffolds 4, 6 and 8, as well as in several
smaller unplaced scaffolds that consisted predominantly of tandem arrays of 5s rRNA genes. A similar
situation has been observed in A. thaliana, where only the main clustered loci produce functional 5s
rRNA76
. The number of 5s loci observed is likely to be an underestimate due to the highly repetitive
nature of the clustered loci.
Nature Genetics: doi:10.1038/ng.2586
23
Seven hundred and sixty nine other ncRNAs were predicted by INFERNAL 1.0 (Supplementary Table
14). The numbers of loci predicted for most families were comparable to those observed in other plant
genomes77
. Of these predictions, 144 overlapped or fell within gene models. Of these 144 loci, 66 were
predicted to be intronic, and several instances of multiple non-coding RNAs being clustered in single
introns were observed, notably in the cases of 60S ribosomal proteins L18, L28 and L32. Other
ncRNAs fall within introns of genes whose products are predicted to be involved in transcription,
translation, or RNA processing activities.
S7. RESEQUENCING AND PRUNUS DIVERSITY ANALYSES
Peaches are native to China and their domestication and cultivation goes back at least 4,000 years. P.
persica was brought to Europe following the trade routes through Persia, reaching Egypt first and then
Greece a few centuries B.C.1. The Romans supposedly spread the peach all around Europe. During the
16th
-17th
century the peach was brought to the Americas; the diversity was again strongly reduced when
a subsequent bottleneck occurred starting at the 19th
century with the widespread use of grafting instead
of seed propagation and later with the modern peach breeding programs that began in the USA in the
20th
century. These programs used a limited number of founders including cultivars of European origin,
the ‘Chinese Cling’ genotype that was introduced in 1850 from China and hybrids between these two
sources1. Peach cultivars released by US breeders were very successful and soon were also spread to
Europe where they, or other cultivars derived from them, became the basis of peach commercial
production. To investigate nucleotide diversity and the path of peach domestication we resequenced
eleven P. persica accessions, one P. ferganensis, one P. kansuensis, one P. davidiana and one P. mira
accession (Supplementary Table 18). P. ferganensis, P. kansuensis, P. davidiana and P. mira represent
the 4 wild species that are most closely related to P. persica. P. persica is the only species grown
commercially except for some local uses of P. ferganensis and P. mira. All the 5 peach species are
interfertile. The eleven accession of P. persica (Supplementary Table 18) have distinct geographical
and genetic origin. Four of them (‘Bolero’, ‘Earligold’, F1 ‘Contender’ x ‘Ambra’, IF7310828)
originated from modern breeding programs in US or Europe, while ‘GF305’ is a rootstock selected in
France from a local population78,79
. The double haploid reference accession (PLov2-2N) originated by
colchicine doubling of a haploid individual obtained from the cv. Lovell. ‘Lovell’ originated as a
chance seedling in USA in 1882 (ref. 78
) and has been widely used there as rootstock for peach.
‘Quetta’ is an old nectarine from Pakistan78
and was widely used in western nectarine breeding
Nature Genetics: doi:10.1038/ng.2586
24
programs80
, representing the most important progenitor of the modern western nectarines. ‘Oro A’
originated from the Brazilian breeding programs and has no known progenitors in common with the
modern western accession78
. ‘Shenzhou Mitao’ and ‘Sahua Hong Pantao’ are two white flesh Chinese
accessions; the first belongs to the Northern China germplasm and the second is a flat peach landrace
from Southern China81,82
(Gao Z., personal communication). ‘Yumyeong’ is a white stony hard peach
originated from the South Korean breeding program78
. P. davidiana (clone P 1908) and P. kansuensis
(clone P1429) originated in Northwest China in the area where they are native1 while the P. ferganensis
accession originated from the Fergana valley (former Soviet Union) and the P. mira accession from
Western China. Resequencing of the dihaploid PLov2-2N derived from ‘Lovell’ was only performed as
a control to estimate the robustness of our SNP calling pipeline and SNP data obtained from it were not
used for further analyses on genetic diversity, where we made use of the reference sequence from it
instead. Accessions were grouped into Eastern ones, including P. ferganensis, ‘Shenzhou Mitao’,
‘Yumyeong’, ‘Sahua Hong Pantao’ and Western ones, including ‘Bolero’, ‘Earligold’, F1 ‘Contender’ x
‘Ambra’, ‘GF305’, IF7310828, ‘Oro A’ and ‘PLov2-2N’, while ‘Quetta’ was not categorized as it
provided contradictory geographical origin and phylogenetic relationships.
The average sequence coverages, obtained after alignment of the reads to the reference genome
sequence, ranged from 9X (P. kansuensis) to 84X (dihaploid PLov2-2N; Supplementary Table 18) with
a mean coverage of 23.4X and a median of 19.0X. When comparing the sequences against the
reference genome sequence, three of the four wild species, P. kansuensis, P. davidiana and P. mira
were very different from the reference individual and from all other accessions, with almost 1 million
or more SNPs detected in each of them (Fig. 2; Supplementary Table 18). Conversely, P. ferganensis
appeared virtually undistinguishable from the 11 Prunus persica accessions in terms of both SNP
frequency and distribution. Therefore, P. ferganensis was jointly considered for all the downstream
analyses.
A redundant set of 6,401,064 SNPs was detected in total among the 15 Prunus accessions (including
the reference individual) with a minimum of 6,363 in the dihaploid PLov2-2N and a maximum of
1,675,062 in P. davidiana (Supplementary Table 18). We combined all SNPs detected in P. persica and
P. ferganensis accessions to obtain a unique set using only those present in nucleotide positions that
were informative in at least four accessions and obtained a total of 996,285 SNPs, 953,357 belonged to
the 8 pseudomolecules (Supplementary Table 19). The set of SNPs on the pseudomolecules was used
Nature Genetics: doi:10.1038/ng.2586
25
to estimate nucleotide diversity . Total diversity was 1.5x10-3
with considerable differences among
pseudomolecules, ranging from a minimum of 1.1x10-3
for pseudomolecule 1 to a maximum of
2.2x10-3
for pseudomolecule 2 (Supplementary Table 20).
We resequenced the double haploid reference genotype to estimate the robustness of our SNP calling
pipeline. When comparing the double haploid Illumina reads with the reference genome, we observed
6,363 SNPs giving a putative rate of inconsistencies between the two approaches of 0.034x10-3
(PLov2-2N has 186.2 Mb with appropriate coverage to allow for SNP detection with the defined
parameters). However, some of these SNPs are related to errors in the reference assembly. We consider
this to be the case for most of the 1,682 homozygous SNPs that are found in the double haploid. The
other 4,681 SNPs that have been called as heterozygous SNPs, appear to be characterized by relatively
high coverages (mean coverage of heterozygous SNPs 91.1, mean coverage of homozygous SNPs 68.5)
that suggests alignment of paralogous sequences. Thus, the putative rate of false positive SNPs appear
to be 0.025x10-3
. This rate is an order of magnitude lower than that observed in other resequencing
experiments such as in maize83
, chicken84
and rice85,86
.
A non-random distribution of diversity was evident when plotting along the pseudomolecules using
50 Kb non-overlapping windows (Fig. 2) with some genomic regions, namely at the top of
pseudomolecule 2 and at the bottom of pseudomolecule 4, displaying much higher diversity than the
rest of the genome. This wide regional variation in SNP frequency was not evident when considering P.
kansuensis, P. davidiana or P. mira, which show a much more homogeneous distribution of variation
across their genomes (Fig. 2). When the 12 accessions were grouped according to their geographical
origin and phylogenetic relationships into oriental and western accessions, a clear difference in
diversity was also visible with overall being 1.6x10-3
for the oriental ones and 1.1x10-3
for the
western ones (Supplementary Table 20).
The minor allele frequency spectrum of SNPs shows that low frequency variants appear to be less
frequent than expected under the standard neutral model (Supplementary Fig. 15), as evidenced by the
positive mean estimates of Tajima's D (0.23 for all sites, 0.3 for synonymous sites) and their positively
skewed distribution (Supplementary Fig. 16). Within our sample, including all P. persica accessions
and the P. ferganensis accession, linkage disequilibrium (r2) between pairs of SNPs decayed rather
slowly with marked differences among pseudomolecules and among genomic regions. At the whole
genome level (Supplementary Fig. 17) mean r2 decayed to one-half of the initial value within 1 kb and
Nature Genetics: doi:10.1038/ng.2586
26
to less than 0.3 within 10 kb, while at 50 kb it was still well above the mean r2 estimated among a
random sample of unlinked SNPs (mean r2 0.11). Pseudomolecules 4 and 5 exhibit slower than average
LD decay and pseudomolecules 2 and 3 exhibit a faster than average LD decay (Fig. 1 and
Supplementary Fig. 17). The analysis of mean r2
in 50 kb sliding windows across pseudomolecules
shows that LD varies considerably among regions with some strong peaks of high LD extending over
hundreds of kbs (Fig. 1). Peaks may result from selective sweeps related to domestication and
breeding; QTLs for fruit size, a typical domestication trait, have been mapped in regions showing LD
peaks on pseudomolecule 4 at ~2 Mb, ~8 Mb87
and ~20 Mb22,87,88
and on pseudomolecule 5 (15-17
Mb87
; Fig. 1). Peach is predominantly a self-fertilizing species, and this may explain the slow decay of
LD as well. However, since average LD in peach seems to decay slower than in two selfing non-
cultivated species, Arabidopsis thaliana89
and Medicago truncatula90
, the contribution of the
domestication bottleneck to the LD decay today observed in peach may have been significant.
Nucleotide diversity in peach appears to be strongly influenced by recent demographic events. The
overall estimate of is much lower than the estimate recently obtained with similar approaches in
Medicago truncatula (4.3x10-3
; ref. 90) and in wild soybean (3.0x10-3
; ref. 91) but similar to that
present in cultivated soybean (1.9x10-3
; ref. 91). Other cultivated fruit species such as grapevine
(5.1x10-3
; ref. 92) and apple (3.7x10-3
; ref. 93) definitely appear to harbor much greater diversity even
when only genic regions were examined. The domestication history and the natural tolerance to
inbreeding may be responsible for this low level of diversity. These results are consistent with three
major genetic bottlenecks: 1) the original domestication bottleneck in China about 4,000 years ago
followed by the practice of vegetative propagation; 2) the dispersion through Persia to Europe from
China; and 3) the much more recent (16-20th
century) bottleneck of the USA breeding that has
subsequently served as the genetic foundation of the modern western breeding germplasm94
.
We evaluated the effects of the putative domestication step by comparing the variation present in the
resequenced heterozygous accessions of P. davidiana, P. mira and P. kansuensis, three close wild
relatives of peach, with that of the domesticated Asian peach varieties including P. ferganensis. The
nucleotide diversity (is estimated at 4.8x10-3
for P. davidiana, at 1.3x10-3
for P. mira and at 0.8x10-3
for P. kansuensis in contrast to a value of of 1.6x10-3
for the Asian peach varieties highlighting the
strong reduction of variability associated with domestication. The estimates of obtained from just two
haplotypes derived from a single individual, such as those obtained for P. davidiana, P. mira and P.
Nature Genetics: doi:10.1038/ng.2586
27
kansuensis, may be subject to a high sampling variance but are very likely to underestimate the
diversity present within the species, as a consequence of either deviations from random mating (with a
greater underestimation the more related the father and mother of the individual considered are with the
most extreme case being that of a selfing species) or of the presence of strong between population
differentiation within the species. The three wild species, all interfertile with peach, are characterized
by different mating systems (allogamous for P. davidiana and autogamous for P. mira and P.
kansuensis)95
and therefore the most significant estimate is that of P. davidiana, that clearly highlights
the strong reduction of variability associated with domestication. The difference in the values
between eastern and western varieties (1.6x10-3
vs. 1.1x10-3
) further supports the proposed bottleneck
that occurred during the breeding history of the cultivated peach since its dispersion from China. These
bottlenecks appear to have led to a considerable loss of diversity in western varieties in comparison to
their related oriental varieties and wild relatives. The recent bottleneck shows its effects on sequence
variation through a deficit of rare variants and a positive mean value of Tajima's D (Supplementary
Fig. 16), as predicted after a recent genetic bottleneck and before population expansion and new
mutations start to replenish variation. The slow decay of the LD r2 values compared to those of other
selfing plant species90,91
is another consequence of the above mentioned bottlenecks.
The cluster analysis performed with the whole genome SNP genotyping data (Supplementary Fig. 14)
confirmed the genetic and geographical relationships among the accessions included in the analyses,
with one exception. All the Western accessions grouped together highlighting the genetic relationship
among them. For those deriving from the modern breeding programs, either in Europe or America
(‘Earligold’, ‘Oro A’, ‘Bolero’, IF7310828 and F1 ‘Contender’ x ‘Ambra’), a common origin from
recent ancestors can be inferred. The pedigree of all of them (with the exclusion of ‘Oro A’) can be
traced back to ‘Chinese Cling’ through its offspring ‘Elberta’, ‘J. H. Hale’ and their descendants. ‘Oro
A’ shares no common recent ancestors with the modern western germplasm but its pedigree
information is deficient going back only to around 1960, with accessions obtained from Brazil. Likely,
a common origin with the European (Iberian) germplasm can be historically inferred. The close
relationship between ‘Lovell’ double haploid and ‘GF305’ pointed out a putative common origin
whereas they seem to have no recent ancestors in common with the modern western germplasm.
‘Lovell’ and ‘GF305’ originated in fact as chance seedlings and were selected from local populations
(in USA in 1882 and in France in 1950, respectively)78
as rootstocks. The Asian accessions clustered
together with the exception of ‘Quetta’. This is an old nectarine discovered near the homonymous city
Nature Genetics: doi:10.1038/ng.2586
28
of Pakistan (India, in 1906 when it was collected)78
. It was expected to cluster with the Asian
accessions but unexpectedly it was found to be closer to the Western ones. The most plausible reason
of this unexpected clustering can be the geographical separation of Pakistan from North West China,
the center of origin of peach. The northern Pakistan highlands include parts of the Hindu Kush, the
Karakoram Range, and the Himalayas. In this area more than one-half of the summits are over 4,500
meters. Because of their rugged topography and the rigors of the climate, the northern highlands and
the Himalayas to the east have been formidable barriers to movement into Pakistan throughout history.
Likely, the ‘Quetta’ genotype was brought to Pakistan from the west. According to Okie78
, another
name of the accession is ‘Persian’ indicating a possible movement of the genotype eastward from
Persia. In support of this hypothesis, the total number of SNPs within this accession (Supplementary
Table 18) is comparable to that of the Western ones indicating a possible origin after the bottleneck
occurred during the dissemination of peach to the West through Persia.
The close relationship of the P. ferganensis accession with peach varieties is of considerable interest. P.
ferganensis is indistinguishable in sequence composition from the cultivated peach varieties and
phylogenetic trees (Supplementary Fig. 14) constructed with whole genome SNP data support this. In
these analyses, it grouped with ‘Shenzhou Mitao’, a peach accession belonging to the Northern China
ecotypes that are most closely related to the wild peaches81
. North West China, between the Kunlun
Shan mountains and the Tarim basin, is considered the center of origin of peach1. P. ferganensis is
found in the Fergana Valley on the west side of the Tarim basin in Central Asia. It exhibits some
undomesticated traits such as small fruit (70-80 g), absence of red coloration on the fruit skin, a typical
pattern of unbranched leaf veins and a groove in the pit96
. A very plausible explanation for these results
is that P. ferganensis is a wild ancestor, or, more likely, is an intermediate genome haplotype in peach
domestication. An alternative hypothesis, considering the P. ferganensis accession a highly introgressed
form between wild P. ferganensis and P. persica, does not appear to be supported by our data, since in
such a case we would expect to see the signatures of introgressions as regions of higher sequence
divergence (representing P. ferganensis progenitor sequences) alternated with others of lower
divergence (P. persica progenitor).
Interestingly, there are present some areas of high sequence diversity and divergence: a number of
small ones present in different chromosomes in areas that tend to be repeat rich and gene poor and two
very large ones are visible at the distal ends of pseudomolecule 2 (0-5 Mb, approximately) and of
Nature Genetics: doi:10.1038/ng.2586
29
pseudomolecule 4 (22-27 Mb) (Fig. 2). The chromosome 2 region harbors nematode resistance genes97–
100and the two rootstock varieties (‘Lovell’ and ‘GF305’) differ greatly in sequence from all others. We
have noted that the top of pseudomolecule 2 has a 5 fold higher density of NB-LRR genes than the rest
of the genome. As resistance regions are known to evolve rapidly101–103
this could explain the unusually
high level of sequence diversity in this region. The very end of pseudomolecule 4 contains a block
(about 300 kb) of ribosomal 16S and 28S RNA genes. In terms of their population genetics properties,
the two regions behave quite differently: the pseudomolecule 2 region shows a positive Tajima’s D,
high levels of haplotype diversity, and an LD decay that is similar to the genomic average
(Supplementary Fig. 17) while the pseudomolecule 4 region shows high Tajima’s D, low haplotype
diversity and no evidence of LD decay within 50 kb (Supplementary Fig. 17). Recombination per unit
of physical distance is quite low in G4 in all the three maps while in G2 strong suppression was
observed only in the interspecific map (Supplementary Fig. 6). The large extent of the high diversity
segments may be explained by the considerable linkage drag that may be associated with these regions
and the predominantly self-fertilizing mode of reproduction of the species. An alternative explanation
to introgression after domestication may be provided by balancing selection but this appears unlikely
due to the previously mentioned good tolerance for inbreeding.
S8. COMPARATIVE ANALYSIS AND PEACH GENOME EVOLUTION
The dot plot analyses (Supplementary Fig. 19A) did not show chromosomal level duplications but
seven major triplicated sub-genomic regions comparable to the ancient angiosperm triplication pattern
described in the grape genome were detected58
. The triplicated regions in peach were more fragmentary
when compared to the pattern shown in the dot plot of the grape duplications detected by similar
analyses (Supplementary Fig. 19B). This suggests that more interchromosomal genome rearrangements
have occurred in peach since the evolutionary split of Rosaceae and Vitis from common ancestors.
Figure 3, plotted by Circos104
, shows that in peach most of the paralogous regions reside in sub-
genomic regions with the same pre-hexaploidy ancestral origin, suggesting that the
duplicated/triplicated regions in peach are not generated by a more recent whole genome duplication
(WGD)105
.
Supplementary Figure 18 depicts the duplicated regions in peach as revealed by the same analyses used
for the data depicted in Figure 3, except that each diagram (Supplementary Figure 18 - A through G -)
shows major triplicated regions originating from one of the putative seven ancestral pre-hexaploidy
Nature Genetics: doi:10.1038/ng.2586
30
linkage groups. The regions with only one paralogous segment may correspond to a loss of one of the
three paleosegments due to genome rearrangements. Supplementary Figure 18 also shows examples of
paralogous regions representing putative peach genome rearrangements.
Supplementary Figure 18D shows that the ancestral origin of the region at the top of the PC4 was not
identified by peach-grape orthology (grey in color), but it resides in one of the triplicated regions, TR4,
suggesting that it may have the same ancestral origin as other paralogous regions. Supplementary
Figure 18H shows one paralogous region that resides in sub-genomic regions with unidentified
ancestral origin.
The orthology map between peach and grape shows that each paralogue in the peach genome resided in
the regions that are orthologous to one of the three paralogues in the grape genome (Supplementary
Figs. 20 and 21), further supporting that the triplications in peach and grape genomes have been
generated by the same paleo-hexaploidization event. On the other hand, the orthology map between
peach and poplar (Supplementary Fig. 22) shows that each paralogous region in peach is orthologous to
two different regions in poplar, supporting the occurrence of a WGD event59
after the peach and poplar
separation.
Different parts of a single paralogue of a triplicated set in the grape genome often matched to different
chromosomes of the peach genome, suggesting that this genome went through more chromosomal
translocation events than grape since the split from their ancestors (Supplementary Figs. 20 and 21).
For example, the paralogous region in VC18 matched to regions in PC1, PC5, and PC8. Moreover, the
region in PC1 is fragmented into four major regions (Supplementary Fig. 20), consistent with
Supplementary Figure 18F showing that different parts of PC8 had a conserved synteny to three
different sets of PC1, PC5, PC6 and PC7. The breakage that generated VC4 and VC7, however,
seemed to have occurred before the separation of grape and peach lineages (Supplementary Fig. 20).
The regions lacking synteny to other chromosomal regions in peach were found (no lines in Fig. 3) to
correspond to regions with the highest SNP diversity (Fig. 2). These regions include the top of
pseudomolecule 1 (around 15 Mb), 2, 7 and 8, and the bottom part of the pseudomolecule 4. These
chromosomal segments should represent regions that underwent more changes in gene order and gene
sequences.
Nature Genetics: doi:10.1038/ng.2586
31
To further analyze the evolutionary divergence of peach and other species, we calculated 4DTv
(fourfold synonymous third-codon transversion)59
rates (Fig. 4). The 4DTv values peaked at 0.06 for
paralogous pairs in apple, highlighting the recent WGD in this species. A peak at 0.14 between peach
and apple should correspond to species divergence. The orthologues between grape and peach or those
of grape and apple showed 4DTv peaks at 0.36 and 0.38, respectively, agreeing with the more ancient
divergence between Vitaceae and Rosaceae. The 4DTv values between paralogues in peach and grape
peaked respectively at 0.56 and 0.50, as expected if the putative paleo-hexaploidy event occurred
before the split of Vitaceae and Rosaceae.
Finally, to examine the chromosomal evolution of species within the Rosaceae, the synteny and
organization of key Rosaceae species genomes was evaluated using the Peach v1.0 and the available
whole genome sequences of apple57
and strawberry2. For details of this analysis, we refer the reader to
the recently published companion work of Jung et al.106
where evidence is presented for a hypothetical
ancestral Rosaceae genome of 9 chromosomes with further divergent evolutionary paths for the extant
three species.
In conclusion, the peach genome does not show any evidence of recent whole genome duplications
and contains triplicated chromosomal paleo-hexaploid blocks. The study shows that the peach genome,
unlike most other sequenced eudicots, did not go through extensive genome rearrangements.
Nevertheless it does exhibit a greater number of genome rearrangements than grape. In spite of these
limited additional changes, the peach genome substantially still retains the genome structure of the
eudicot ancestor.
SUPPLEMENTARY REFERENCES
1. Faust, M. & Timon, B. Origin and dissemination of peach. Hort. Reviews 17, 331–379 (1995).
2. Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109–116
(2011).
3. Meng, R. & Finn, C. Determining ploidy level and nuclear DNA content in Rubus by flow
cytometry. J. Am. Soc. Hortic. Sci. 127, 767–775 (2002).
4. Debener, T. & Linde, M. Exploring Complex Ornamental Genomes: The Rose as a Model Plant.
Crit. Rev. Plant Sci. 28, 267–280 (2009).
5. Verde, I. et al. Development and evaluation of a 9K SNP array for peach by internationally
coordinated SNP detection and validation in breeding germplasm. PloS one 7, e35668 (2012).
6. Peace C. et al. Development and Evaluation of a Genome-Wide 6K SNP Array for Diploid Sweet
Cherry and Tetraploid Sour Cherry. PLoS ONE 7, e48305 (2012).
Nature Genetics: doi:10.1038/ng.2586
32
7. Vera Ruiz, E. M. et al. Narrowing down the apricot Plum pox virus resistance locus and
comparative analysis with the peach genome syntenic region. Mol. Plant Pathol. 12, 535–547
(2011).
8. Brandi, F. et al. Study of ‘Redhaven’ peach and its white-fleshed mutant suggests a key role of
CCD4 carotenoid dioxygenase in carotenoid and norisoprenoid volatile metabolism. BMC Plant
Biol. 11, 24 (2011).
9. Dardick, C. et al. Identification of PpTAC1 as a functionally conserved regulator of axillary shoot
growth angle in Prunus persica (peach) trees. Plant J. (Submitted) (2012).
10. Zhebentyayeva, T. Chilling Requirement and Bloom Date in Peach: Genetic and Genomic
Approaches for Deciphering a Complex Gene Network. in Plant and Animal Genome Conference
XX (2012). at <https://pag.confex.com/pag/xx/webprogram/Paper1576.html>
11. Fang, G. C. et al. A physical map of the Chinese chestnut (Castanea mollissima) genome and its
integration with the genetic map. Tree Genet. Genomes DOI 10.1007/s11295-012-0576-6 (2012).
12. Kubisiak, T. et al. A transcriptome-based genetic map of Chinese chestnut, (Castanea
mollissima), and identification of regions of segmental homology with peach (Prunus persica).
Tree Genet. Genomes DOI 10.1007/s11295-012-0579-3 (2012).
13. Barakat, A. et al. Genome wide identification of chilling responsive microRNAs in Prunus
persica. BMC genomics 13, 481 (2012).
14. Navarro, B. et al. Small RNAs containing the pathogenic determinant of a chloroplast-replicating
viroid guide the degradation of a host mRNA as predicted by RNA silencing. The Plant Journal
70, 991–1003 (2012).
15. Leida, C., Conesa, A., Llácer, G., Badenes, M. L. & Ríos, G. Histone modifications and
expression of DAM6 gene in peach are modulated during bud dormancy release in a cultivar‐dependent manner. New Phytol. 193, 67–80 (2012).
16. Lyons, E. Quick and easy 2nd order genome assembly: Syntenic Path Assembly using CoGe.
Plant and Animal Genome Conference XX (2012). at
<https://pag.confex.com/pag/xx/webprogram/Paper1951.html>
17. Jaffe, D. B. et al. Whole-genome sequence assembly for mammalian genomes: Arachne 2.
Genome Res. 13, 91–96 (2003).
18. Dirlewanger, E. et al. Comparative mapping and marker-assisted selection in Rosaceae fruit
crops. P. Natl. Acad. Sci. USA 101, 9891–9896 (2004).
19. Howad, W. et al. Mapping with a few plants: using selective mapping for microsatellite saturation
of the Prunus reference map. Genetics 171, 1305–1309 (2005).
20. Joobeur, T. et al. Construction of a saturated linkage map for Prunus using an almond x peach F2
progeny. Theor. Appl. Genet. 97, 1034–1041 (1998).
21. Verde, I. et al. Microsatellite and AFLP markers in the Prunus persica [L.(Batsch)] x P.
ferganensisBC1 linkage map: saturation and coverage improvement. Theor. Appl. Genet. 111,
1013–1021 (2005).
22. Eduardo, I. et al. QTL analysis of fruit quality traits in two peach intraspecific populations and
importance of maturity date pleiotropic effect. Tree Genet. Genomes 7, 323–335 (2011).
23. Fan, S. et al. Mapping quantitative trait loci associated with chilling requirement, heat
requirement and bloom date in peach (Prunus persica). New Phytol. 185, 917–930 (2010).
24. Cabrera, A. et al. Development and bin mapping of a Rosaceae Conserved Ortholog Set (COS) of
markers. BMC Genomics 10, 562 (2009).
25. Illa, E. et al. Comparative analysis of rosaceous genomes and the reconstruction of a putative
ancestral genome for the family. BMC Evol. Biol. 11, 9 (2011).
Nature Genetics: doi:10.1038/ng.2586
33
26. Illa, E. et al. Saturating the Prunus (stone fruits) genome with candidate genes for fruit quality.
Mol. Breeding 28, 667–682 (2011).
27. Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).
28. Chain, P. S. G. et al. Genome Project Standards in a New Era of Sequencing. Science 326, 236–
237 (2009).
29. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant
Arabidopsis thaliana. Nature 408, 796–815 (2000).
30. International Rice Genome Sequencing Project. The map-based sequence of the rice genome.
Nature 436, 793–800 (2005).
31. The International Brachypodium Initiative. Genome sequencing and analysis of the model grass
Brachypodium distachyon. Nature 463, 763–768 (2010).
32. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large
genomes. Bioinformatics 21, i351–i358 (2005).
33. Li, R. et al. ReAS: Recovery of ancestral sequences for transposable elements from the
unassembled reads of a whole genome shotgun. PLoS Comput. Biol. 1, e43 (2005).
34. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR
retrotransposons. Nucleic Acids Res. 35, (Web Server issue)W265–W268 (2007).
35. Sonnhammer, E. L. L. & Durbin, R. A dot-matrix program with dynamic threshold control suited
for genomic DNA and protein sequence analysis. Gene 167, GC1–GC10 (1995).
36. Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. 1996-2011. (1996). at
<http://www.repeatmasker.org>
37. Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: A Web Server for Clustering and
Comparing Biological Sequences. Bioinformatics 26, 680–682 (2010).
38. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome
Res. 110, 462–467 (2005).
39. Benson, D., Karsch-Mizrachi, I., Lipman, D., Ostell, J. & Sayers, E. GenBank. Nucleic Acids Res.
39, (Database issue) D32–7 (2011).
40. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
41. SanMiguel, P., Gaut, B. S., Tikhonov, A., Nakajima, Y. & Bennetzen, J. L. The paleontology of
intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998).
42. Rice, P., Longden, I., Bleasby, A. & others. EMBOSS: the European molecular biology open
software suite. Trends in genetics 16, 276–277 (2000).
43. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res. 32, 1792–1797 (2004).
44. Tamura K., Peterson D, Stecher G, Nei M. & Kumar S. MEGA5: Molecular Evolutionary
Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony
Methods. Mol. Biol.Evol. 28, 2731–2739 (2011).
45. Du, C., Caronna, J., He, L. & Dooner, H. K. Computational prediction and molecular
confirmation of Helitron transposons in the maize genome. BMC genomics 9, 51 (2008).
46. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript
alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
47. Salamov, A. A. & Solovyev, V. V. Ab initio gene finding in Drosophila genomic DNA. Genome
Res. 10, 516–522 (2000).
48. Yeh, R. F., Lim, L. P. & Burge, C. B. Computational inference of homologous gene structures in
the human genome. Genome Res. 11, 803–816 (2001).
Nature Genetics: doi:10.1038/ng.2586
34
49. Eduardo, I., Chietera, G., Bassi, D., Rossini, L. & Vecchietti, A. Identification of key odor volatile
compounds in the essential oil of nine peach accessions. J. Sci. Food Agr. 90, 1146–1154 (2010).
50. Horvat, R. J. et al. Comparison of the volatile compounds from several commercial peach
cultivars. J. Agr. Food Chem. 38, 234–237 (1990).
51. Aubert, C. & Milhet, C. Distribution of the volatile compounds in the different parts of a white-
fleshed peach (Prunus persica L. Batsch). Food Chem. 102, 375–384 (2007).
52. Wang, Y. et al. Volatile characteristics of 50 peaches and nectarines evaluated by HP-SPME with
GC-MS. Food Chem. 116, 356–364 (2009).
53. Kakiuchi, N. & Ohmiya, A. Changes in the composition and content of volatile constituents in
peach [Prunus persica] fruits in relation to maturity at harvest and artificial ripening . J. Jpn. Soc.
Hortic. Sci. 60, 209–216 (1991).
54. Schöttler, M. & Boland, W. Biosynthesis of dodecano-4-lactone in ripening fruits: Crucial role of
an epoxide-hydrolase in enantioselective generation of aroma components of the nectarine
(Prunus persica var. nucipersica) and the strawberry (Fragaria ananassa). Helv. Chim. Acta 79,
1488–1496 (1996).
55. Mowbray, S. L., Elfström, L. T., Ahlgren, K. M., Andersson, C. E. & Widersten, M. X-ray
structure of potato epoxide hydrolase sheds light on substrate specificity in plant enzymes. Protein
Sci. 15, 1628–1637 (2006).
56. Potter, D. et al. Phylogeny and classification of Rosaceae. Plant Syst. Evol. 266, 5–43 (2007).
57. Velasco, R. et al. The genome of the domesticated apple (Malus x domestica Borkh.). Nat. Genet.
42, 833–839 (2010).
58. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major
angiosperm phyla. Nature 449, 463–467 (2007).
59. Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray).
Science 313, 1596–1604 (2006).
60. Tomás-Barberán, F. A. et al. HPLC-DAD-ESIMS analysis of phenolic compounds in nectarines,
peaches, and plums. J. Agr. Food Chem. 49, 4748–4760 (2001).
61. Gil, M. I., Tomas-Barberan, F. A., Hess-Pierce, B. & Kader, A. A. Antioxidant capacities,
phenolic compounds, carotenoids, and vitamin C contents of nectarine, peach, and plum cultivars
from California. J. Agr. Food Chem. 50, 4976–4982 (2002).
62. Ng, M. & Yanofsky, M. F. Function and evolution of the plant MADS-box gene family. Nat. Rev.
Genet. 2, 186–195 (2001).
63. Gramzow, L. & Theissen, G. A Hitchhiker’s guide to the MADS world of plants. Genome Biol
11, 214 (2010).
64. Horvath, D. Common mechanisms regulate flowering and dormancy. Plant Sci. 177, 523–531
(2009).
65. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity
searching. Nucleic Acids Res. 39, (Web Server issue): W29–W37 (2011).
66. Altschul, S. F. et al. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
67. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res.
34, (Web Server issue) W435–W439 (2006).
68. Korf, I. Gene finding in novel genomes. Bmc Bioinformatics 5, 59 (2004).
69. Crisosto, C. H., Mitchell, F. G. & Johnson, S. Factors in fresh market stone fruit quality.
Postharvest News Inf. 6, 17–21 (1995).
70. Hewett, E. W. An overview of preharvest factors influencing postharvest quality of horticultural
products. Int. J. Postharvest Technol. Innov 1, 4–15 (2006).
Nature Genetics: doi:10.1038/ng.2586
35
71. Romero, P. et al. Deficit irrigation and rootstock: their effects on water relations, vegetative
development, yield, fruit quality and mineral nutrition of Clemenules mandarin. Tree Physiol. 26,
1537–1548 (2006).
72. Gautier, H. et al. How does tomato quality (sugar, acid, and nutritional quality) vary with ripening
stage, temperature, and irradiance? J. Agr. Food Chem. 56, 1241–1250 (2008).
73. Kassim, A. et al. Environmental and seasonal influences on red raspberry anthocyanin antioxidant
contents and identification of quantitative traits loci (QTL). Mol. Nutr. Food Res. 53, 625–634
(2009).
74. Hanada, K., Zou, C., Lehti-Shiu, M. D., Shinozaki, K. & Shiu, S. H. Importance of lineage-
specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli.
Plant Physiol. 148, 993–1003 (2008).
75. Fahlgren, N. et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent
birth and death of MIRNA genes. PLoS One 2, e219 (2007).
76. Cloix, C. et al. Analysis of the 5S RNA pool in Arabidopsis thaliana: RNAs are heterogeneous
and only two of the genomic 5S loci produce mature 5S RNA. Genome Res. 12, 132–144 (2002).
77. Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids
Res. 33, (Database Issue): D121–D124 (2005).
78. Okie, W. R. Handbook of peach and nectarine varieties: performance in the southeastern United
States and index of names. (U.S. Dept. of Agriculture, Agricultural Research Service ; National
Technical Information Service, distributor, 1998).
79. Dettori, M. T., Quarta, R. & Verde, I. A peach linkage map integrating RFLPs, SSRs, RAPDs,
and morphological markers. Genome 44, 783–790 (2001).
80. Werner, D. J. & Creller, M. A. Genetic studies in peach: inheritance of sweet kernel and male
sterility. J. Am. Soc. Hortic. Sci. 122, 215–217 (1997).
81. Yoon, J. et al. Genetic diversity and ecogeographical phylogenetic relationships among peach and
nectarine cultivars based on simple sequence repeat (SSR) markers. J. Am. Soc. Hortic. Sci. 131,
513–521 (2006).
82. Cheng, Z. & Huang, H. SSR fingerprinting Chinese peach cultivars and landraces (Prunus
persica) and analysis of their genetic relationships. Scientia Horticulturae 120, 188–193 (2009).
83. Gore, M. A. et al. A First-Generation Haplotype Map of Maize. Science 326, 1115–1117 (2009).
84. Rubin, C.-J. et al. Whole-genome resequencing reveals loci under selection during chicken
domestication. Nature 464, 587–591 (2010).
85. Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying
agronomically important genes. Nature biotechnology (2011). at
<http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2050.html>
86. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490,
497–501 (2012).
87. Quilot, B. et al. QTL analysis of quality traits in an advanced backcross between Prunus persica
cultivars and the wild relative species P. davidiana. Theor. Appl. Genet. 109, 884–897 (2004).
88. Cantín, C. M. et al. Chilling injury susceptibility in an intra-specific peach [Prunus persica (L.)
Batsch] progeny. Postharvest Biol. Technol. 58, 79–87 (2010).
89. Cao, J. et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat.
Genet. 43, 956–963 (2011).
90. Branca, A. et al. Whole-genome nucleotide diversity, recombination, and linkage disequilibrium
in the model legume Medicago truncatula. P. Natl. Acad. Sci. USA 108, E864–E870 (2011).
91. Lam, H. M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of
genetic diversity and selection. Nat. Genet. 42, 1053–1059 (2010).
Nature Genetics: doi:10.1038/ng.2586
36
92. Lijavetzky, D., Cabezas, J., Ibáñez, A., Rodríguez, V. & Martínez-Zapater, J. High throughput
SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing
approach and SNPlex technology. BMC Genomics 8, 424 (2007).
93. Micheletti, D. et al. Genetic diversity of the genus Malus and implications for linkage mapping
with SNPs. Tree Genet. Genomes 7, 857–868 (2011).
94. Scorza, R., Mehlenbacher, S. A. & Lightner, G. W. Inbreeding and coancestry of freestone peach
cultivars of the eastern United States and implications for peach germplasm improvement. J. Am.
Soc. Hortic. Sci. 110, 547–552 (1985).
95. Foulongne, M., Minier, J., Pascal, T. & Kervella, J. Genetic relationships between peach (Prunus
persica (L.) Batsch) and closely related wild species using SSR markers. in XI Eucarpia
Symposium on Fruit Breeding and Genetics 663 629–634 (2003). at
<http://www.actahort.org/books/663/663_112.htm>
96. Okie, W. R. & Rieger, M. Inheritance of Venation Pattern in Prunus ferganensis × persica
Hybrids. Acta Hort. 622, 261–263 (2003).
97. Claverie, M. et al. Location of independent root-knot nematode resistance genes in plum and
peach. Theor. Appl. Genet. 108, 765–773 (2004).
98. Yamamoto, T. et al. Characterization of Morphological Traits Based on a Genetic Linkage Map in
Peach. Breeding Sci. 51, 271–278 (2001).
99. Lu, Z. X., Reighard, G. L., Nyczepir, A. P., Beckman, T. G. & Ramming, D. W. Inheritance of
resistance to root-knot nematodes in peach rootstocks. Acta Hort. 465, 111–116 (1998).
100. Bliss, F. A. et al. An expanded genetic linkage map of Prunus based on an interspecific cross
between almond and peach. Genome 45, 520–529 (2002).
101. Cannon, S. B. et al. Diversity, distribution, and ancient taxonomic relationships within the TIR
and non-TIR NBS-LRR resistance gene subfamilies. J. Mol. Evol. 54, 548–562 (2002).
102. McHale, L., Tan, X., Koehl, P. & Michelmore, R. W. Plant NBS-LRR proteins: adaptable guards.
Genome Biol. 7, 212 (2006).
103. Xing, Y., Frei, U., Schejbel, B., Asp, T. & Lübberstedt, T. Nucleotide diversity and linkage
disequilibrium in 11 expressed resistance candidate genes in Lolium perenne. BMC Plant Biol. 7,
43 (2007).
104. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res.
19, 1639–1645 (2009).
105. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
106. Jung, S. et al. Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes
of evolution between Rosaceous subfamilies. BMC genomics 13, 129 (2012).
107. Wang, Y., Georgi, L. L., Reighard, G. L., Scorza, R. & Abbott, A. G. Genetic mapping of the
evergrowing gene in peach [Prunus persica (L.) Batsch]. Journal Hered. 93, 352–358 (2002).
108. Verde, I., Quarta, R., Cedrola, C. & Dettori, M. T. QTL Analysis of Agronomic Traits in a BC1
Peach Population. Acta Hort. 592, 291–297 (2002).
109. Jáuregui, B. Identification of molecular markers linked to agronomic characters in an interspecific
almond x peach progeny. Dissertation. University of Barcelona, Spain. (1998).
110. Scorza, R., Melnicenco, L., Dang, P. & Abbott, A. G. Testing a microsatellite marker for selection
of columnar growth habit in peach [Prunus persica (L.) Batsch]. Acta Hort. 592, 285–289 (2002).
111. Chaparro, J. X., Werner, D. J., O’Malley, D. & Sederoff, R. R. Targeted mapping and linkage
analysis of morphological isozyme, and RAPD markers in peach. Theor. Appl. Genet. 87, 805–
815 (1994).
112. Joobeur, T. Construction of a molecular marker map and genetic analysis of agronomic characters
in Prunus. Dissertation. University of Lleida, Spain. (1998).
Nature Genetics: doi:10.1038/ng.2586
37
113. Dirlewanger, E. et al. Genetic linkage map of peach [Prunus persica (L.) Batsch] using
morphological and molecular markers. Theor. Appl. Genet. 97, 888–895 (1998).
114. Dirlewanger, E. et al. Development of a second-generation genetic linkage map for peach [Prunus
persica (L.) Batsch] and characterization of morphological traits affecting flower and fruit. Tree
Genet. Genomes 3, 1–13 (2006).
115. Ogundiwin, E. A. et al. A fruit quality gene map of Prunus. BMC Genomics 10, 587 (2009).
116. Etienne, C. et al. Candidate genes and QTLs for sugar and organic acid content in peach [Prunus
persica (L.) Batsch]. Theor. Appl. Genet. 105, 145–159 (2002).
117. Warburton, M. L., Becerra-Velasquez, V. L., Goffreda, J. C. & Bliss, F. A. Utility of RAPD
markers in identifying genetic linkages to genes of economic interest in peach. Theor. Appl.
Genet. 93, 920–925 (1996).
118. Abbott, A. G. et al. Construction of saturated linkage maps of peach crosses segregating for
characters controlling fruit quality, tree architecture and pest resistance. Acta Hort. 465, 41–50
(1998).
119. Dirlewanger, E. et al. Mapping QTLs controlling fruit quality in peach (Prunus persica (L.)
Batsch). Theor. Appl. Genet. 98, 18–31 (1999).
120. Boudehri, K. et al. Phenotypic and fine genetic characterization of the D locus controlling fruit
acidity in peach. BMC Plant Biol. 9, 59 (2009).
121. Foulongne, M., Pascal, T., Pfeiffer, F. & Kervella, J. QTLs for powdery mildew resistance in
peach x Prunus davidiana crosses: consistency across generations and environments. Mol.
Breeding 12, 33–50 (2003).
122. Decroocq, V. et al. Analogues of virus resistance genes map to QTLs for resistance to sharka
disease in Prunus davidiana. Molecular Genetics and Genomics 272, 680–689 (2005).
123. Marandel, G., Pascal, T., Candresse, T. & Decroocq, V. Quantitative resistance to Plum pox virus
in Prunus davidiana P1908 linked to components of the eukaryotic translation initiation complex.
Plant pathology 58, 425–435 (2009).
124. Rubio, M., Pascal, T., Bachellez, A. & Lambert, P. Quantitative trait loci analysis of Plum pox
virus resistance in Prunus davidiana P1908: new insights on the organization of genomic
resistance regions. Tree Genet. Genomes 6, 291–304 (2010).
125. Viruel, M. A., Madur, D., Dirlewanger, E., Pascal, T. & Kervella, J. Mapping quantitative trait
loci controlling peach leaf curl resistance. Acta Hort. 465, 79–88 (1998).
126. Pascal, T., Pfeiffer, F. & Kervella, J. Powdery mildew resistance in the peach cultivar Pamirskij 5
is genetically linked with the Gr gene for leaf color. Hortscience 45, 150–152 (2010).
127. Banks, J. A. et al. The Selaginella genome identifies genetic changes associated with the
evolution of vascular plants. Science 332, 960 (2011).
128. Afoufa-Bastien, D. et al. The Vitis vinifera sugar transporter gene family: phylogenetic overview
and macroarray expression profiling. BMC Plant Biol. 10, 245–266 (2010).
129. Büttner, M. The monosaccharide transporter(-like) gene family in Arabidopsis. Febs Letters 581,
2318–2324 (2007).
130. Hill, W. G. & Weir, B. S. Variances and covariances of squared linkage disequilibria in finite
populations. Theor. Popul. Biol. 33, 54–78 (1988).
131. Lyons, E., Pedersen, B., Kane, J. & Freeling, M. The value of nonmodel genomes and an example
using SynMap within CoGe to dissect the hexaploidy that predates the rosids. Tropical Plant.
Biol. 1, 181–190 (2008).
132. Schwartz, S. et al. Human–mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).
133. Haas, B. J., Delcher, A. L., Wortman, J. R. & Salzberg, S. L. DAGchainer: a tool for mining
segmental genome duplications and synteny. Bioinformatics 20, 3643–3646 (2004).
Nature Genetics: doi:10.1038/ng.2586
38
II. SUPPLEMENTARY TABLES
Supplementary Table 1. Description of major genes and QTLs affecting morphological or
agronomic characters in peach.
Architectural and Morphological Traits
Characters LG(a)
Symbol(b)
Populations References
Evergrowing G1 Evg ‘Empress op op dwarf’ x PI442380 107
Internode length G1 QTL (P. ferganensis x IF7310828)BC1 108
Flower color G1 B ‘Garfi’ x ‘Nemared’ 109
Broomy (or pillar) growth habit
G2 Br Various progenies 110
Double flower G2 Dl NC174RL x PI 111
Anther color (yellow/anthocyanic)
G3 Ag ‘Texas’ x ‘Earligold’ 112
Plant height G4 QTL ‘Venus’בBigTop’ 88
Polycarpel G3 Pcp ‘Padre’ x 54P455 100
Flower color G3 Fc ‘Akame’ x ‘Jusetou’ 98
Plant height (normal/dwarf)
G6 Dw ‘Akame’ x ‘Jusetou’ 98
Leaf shape (narrow/wide)
G6 Nl ‘Akame’ x ‘Jusetou’ 98
Male sterility G6 Ps ‘Ferjalou Jalousia®’ x ‘Fantasia’ 113,114
Leaf color (red/yellow)
G6-G8
Gr ‘Garfi’ x ‘Nemared’; ‘Akame’ x ‘Jusetou’ 109,98
Leaf gland (reniform /globose/eglandular)
G7 E (P. ferganensis x IF310828) BC1 79
Flower morphology G8 Sh ‘Contender’ x Fla.92-2C; Pop-DG 23,115
Phenological traits
Chilling and Heat requirement, Blooming date
G1 QTLs ‘Contender’ x Fla.92-2C 23
Ripening time G2 QTLs (P. ferganensis x IF310828) BC1 108
Nature Genetics: doi:10.1038/ng.2586
39
Blooming date G2 QTL ‘Contender’ x Fla.92-2C, ‘Summergrand’ x P1908 23,87
Blooming time, ripening time, fruit development period
G4 QTLs
‘Ferjalou Jalousia®’ x ‘Fantasia’; (P. ferganensis x IF7310828) BC1; Venus’בBigTop’;
‘Summergrand’ x P1908
116,108,88,87
Chilling requirement, Blooming date
G4 QTL ‘Contender’ x Fla.92-2C 23
Chilling requirement, Blooming date
G5 QTLs ‘Contender’ x Fla.92-2C 23
Ripening time, G6 QTLs (P. ferganensis x IF310828) BC1 108
Chilling requirement, Blooming date
G6 QTLs ‘Contender’ x Fla.92-2C 23
Chilling requirement, Blooming date
G7 QTLs ‘Contender’ x Fla.92-2C 23
Chilling requirement, Heat Requirment
G8 QTLs ‘Contender’ x Fla.92-2C 23
Fruit Quality Traits
Flesh color (white/yellow)
G1 Y ‘Padre’ x 54P455 100,117
Fruit skin color, soluble-solids content
G2 QTLs (P. ferganensis x IF7310828) BC1 108
Fruit weight, fruit diameter, glucose content
G3 QTLs ‘Suncrest’ x ‘Bailey’ 118
Flesh color around the stone
G3 Cs ‘Akame’ x ‘Jusetou’ 98
Fruit weight, fruit diameter, glucose content
G3 QTLs ‘Suncrest’ x ‘Bailey’ 118
Flesh adhesion (clingstone/freestone)
G4 F (P. ferganensis x IF7310828) BC1; ‘Akame’ x ‘Jusetou’
79,98
Soluble-solids content, fructose, glucose
G4 QTLs ‘Ferjalou Jalousia®’ x ‘Fantasia’; Venus’בBigTop’; ‘Summergrand’ x P1908
88,87,114
Chilling injury traits G4 QTL ‘Venus’ x ‘BigTop’ 88
Fruit size G4 QTLs ‘Venus’ x ‘BigTop’; ‘Summergrand’ x P1908; ‘Contender’ x ‘Ambra’
88,87,22
pH, titratable acidity G4 QTLs ‘Venus’ x ‘BigTop’ 88
Nature Genetics: doi:10.1038/ng.2586
40
Non-acid fruit G5 D ‘Ferjalou Jalousia®’ x ‘Fantasia’ 113,119,120
Sucrose, malate, titratable acidity, pH, sucrose
G5 QTLs ‘Ferjalou Jalousia®’ x ‘Fantasia’; ‘Summergrand’ x P1908
116,87
Skin hairiness (nectarine/peach)
G5 G ‘Ferjalou Jalousia®’ x ‘Fantasia’; ‘Padre’ x 54P455 113,119,100
Fruit size G5 QTLs ‘Summergrand’ x P1908 87
Kernel taste (bitter/sweet)
G5 Sk ‘Padre’ x 54P455 100
Fruit skin color, soluble-solids content
G6 QTLs (P. ferganensis x IF310828) BC1 108
Fruit shape (flat /round)
G6 S* ‘Ferjalou Jalousia®’ x ‘Fantasia’ 113,119,114
Aborting fruit G6 Af ‘Ferjalou Jalousia®’ x ‘Fantasia’ 114
Fruit skin color G6-G8
Sc ‘Akame’ x ‘Jusetou’ 98
Quinase G8 QTL ‘Ferjalou Jalousia®’ x ‘Fantasia’ 116
Resistance to biotic stresses
Powdery mildew resistance
G1 QTL ‘Summergrand’ x P1908 121
PPV resistance G1 QTLs ‘Summergrand’ x P1908; ‘Summergrand’ x P1908 F2;‘Rubira’ x P1908
122,123,124
Root-knot nematode resistance
G2 Mi(c) P.2175 x GN22; ‘Akame’ x ‘Jusetou’, ‘Lovell’ x
‘Nemared’,‘Garfi’ x ‘Nemared’, ‘Padre’ x 54P455 97,98,99,100,109
PPV resistance G2 QTLs ‘Summergrand’ x P1908 ; ‘Rubira’ x P1908 122,124
Leaf curl resistance G3 QTL ‘Summergrand’ x P1908 125
PPV resistance G4 QTLs ‘Summergrand’ x P1908; ‘Summergrand’ x P1908 F2;‘Rubira’ x P1908
122,124
PPV resistance G5 QTLs ‘Summergrand’ x P1908 F2; ‘Rubira’ x P1908 123,124
Powdery mildew resistance
G6 Vr2 ‘Rubira’ x ‘Pamirskij 5’ 126
Powdery mildew resistance
G6 QTL ‘Summergrand’ x P1908 121
Leaf curl resistance G6 QTL ‘Summergrand’ x P1908 125
Nature Genetics: doi:10.1038/ng.2586
41
PPV resistance G6 QTLs ‘Summergrand’ x P1908; ‘Summergrand’ x P1908 F2;‘Rubira’ x P1908
122,124
Powdery mildew resistance
G7 QTL (P. ferganensis x IF7310828) BC1 108
PPV resistance G7 QTLs ‘Summergrand’ x P1908; ‘Summergrand’ x P1908 F2;‘Rubira’ x P1908
122,124
Powdery mildew resistance
G8 QTL ‘Summergrand’ x P1908 (P. ferganensis x IF7310828) BC1
121,108
(a) LG=Linkage group;G6-G8 genes located close to the translocation break point between the two linkage groups.
(b) QTLs are included if they have been consistently found (at least in two independent measurements)in the indicated populations.
(c) One or two genes of nematode resistance with different notations and one QTL have been described in linkage group
Supplementary Table 2. Genomic libraries included in the Prunus persica genome assembly and
their respective sequence coverage levels in the final release.
Library Type Average Insert
Size
Read
Number
Assembled
Sequence
Coverage (X)
3 kb 2,846 536,032 1.30
4 kb 4,381 606,680 1.33
8 kb 7,780 2,106,103 4.72
Fosmid 35,293 419,424 1.00
BAC 69,504 61,440 0.13
Total 3,729,679 8.47
Nature Genetics: doi:10.1038/ng.2586
42
Supplementary Table 3. Summary statistics of the output of the whole genome shotgun
assembly before screening, removal of organelles and contaminating scaffolds and chromosome-
scale pseudomolecule construction. The table shows total contigs and total assembled basepairs
for each set of scaffolds greater than the given size.
Size Number Contigs Scaffold Size Basepairs % Non-gap
Basepairs
5,000,000 17 1,666 176,102,666 174,636,412 99.17%
2,500,000 24 1,989 203,348,610 201,501,921 99.09%
1,000,000 32 2,158 217,132,966 215,113,642 99.07%
500,000 42 2,270 224,220,399 222,121,572 99.06%
250,000 47 2,300 225,886,151 223,773,946 99.06%
100,000 49 2,305 226,261,811 224,137,473 99.06%
50,000 55 2,338 226,686,784 224,362,012 98.97%
25,000 66 2,364 227,047,017 224,683,712 98.96%
10,000 125 2,534 227,866,900 225,421,677 98.93%
5,000 252 2,836 228,749,031 226,158,702 98.87%
2,500 373 3,078 229,222,103 226,551,638 98.83%
1,000 391 3,099 229,249,132 226,577,745 98.83%
0 454 3,162 229,286,110 226,614,723 98.83%
Nature Genetics: doi:10.1038/ng.2586
43
Supplementary Table 4. Summary of the accuracy using the fosmid resources.
Category Value
Fosmids aligned 13
Total fosmid length 460,427
Aligned bases 442,732
Non-gap adjacent bp mismatches
and insertions/deletions
187
(0.04% error)
Supplementary Table 5. Peach v1.0 chromosome-scale final summary statistics.
Scaffold total 202
Contig total 2,730
Scaffold sequence total 227.3 Mb
Mapped scaffold sequence total 218.3 Mb (96%)
Contig sequence total 224.6 Mb (1.2% gap)
Mapped contig sequence total 215.9 Mb (96.1%)
Scaffold N/L50 4/26.8 Mb
Contig N/L50 294/214.2 kb
Nature Genetics: doi:10.1038/ng.2586
44
Supplementary Table 6. Mapping, position and orientation along the Peach v1.0 pseudomolecules
of 10 unmapped scaffolds of the Peach v1.0 assembly.
Peach v1.0 unmapped scaffolds Position (Peach v1.0) and orientation on
Pseudomolecules
Scaffold Position PChra
Position Orientationc
Start (nt) End (nt) Start (nt)
Scaffold_9 1 2,126,789 Pp3 9,632,255 -
Scaffold_10 1 851,981 Pp3 1 +
Scaffold_11 1 736,058 Pp8 7,209,409 -
Scaffold_12 1 675,284 Pp2 10,630,597 +
Scaffold_13 1 670,721 Pp3 9,632,256b
0
Scaffold_14 1 575,512 Pp3 9,632,256b
0
Scaffold_15 1 516,056 Pp1 13,139,480 +
Scaffold_16 1 390,024 Pp2 10,620,598 0
Scaffold_17 1 370,749 Pp6 14,804,932 0
Scaffold_18 1 333,953 Pp6 14,794,933 0
a PChr = pseudomolecule
b Scaffold_13 and Scaffold_14 cannot be ordered along the pseudomolecule 3
c + = positive orientation, - = negative orientation (scaffold needs to be flipped), 0= random orientation
Nature Genetics: doi:10.1038/ng.2586
45
Supplementary Table 7. Orientation of randomly oriented scaffolds within Peach v1.0 assembly.
Position on Pseudomolecule (Peach v1.0) Orientation
PChra
Position Mapped
Scaffold Size (bp) Orientation
c
Start (nt) End (nt)
Pp1 4,413,052 5461473 Sc27 1,048,421 -
Pp1 9,224,044 10010827 Sc29 786,784 +
Pp1 41,964,842 46877626 Sc13 4,912,785 -
Pp2 8,883,668 10620597 Sc23b
1,736,930 -
Pp3 9,632,256 11,470,863 Sc450 1,453,542 +
Pp4 5,364,815 6148865 Sc30 784,051 -
Pp4 29,824,511 30,528,727 Sc33 704,217 +
Pp5 7,342,802 9236262 Sc22 1,893,461 +
Pp6 1 1312528 Sc26 1,312,528 -
Pp7 1 4834826 Sc14 4,834,826 -
Pp8 7,219,410 10153270 Sc18 2,933,861 -
aPChr = pseudomolecule
b The position of Sc_23 along the Pp2 will be refined in a future assembly and it will be placed at the beginning of
the pseudomolecule.
c + = positive orientation, - = negative orientation (scaffold needs to be flipped), 0= random orientation
Nature Genetics: doi:10.1038/ng.2586
46
Supplementary Table 8. Individuation of misassembled pieces of Peach v1.0 assembly and their
refined position along the Peach v1.0 pseudomolecules.
Peach v1.0 sequences Assembly refinements
PChra
Position PChra
Position (v1.0) and Orientation
Start (nt) End (nt) Start (nt) Broken
Scaffold
Size
(Mb)
Orientationb
Pp3 11,085,798 11,470,863 Pp7 8,448,496 Sc450_1 0.4 +
Pp4 14,985,648 16,511,758 Pp3 9,622,256 Sc4_1 1.5 -
Pp4 16,511,759 16,912,335 Pp1 5,461,474 Sc4_2 0.4 +
Pp4 20,720,501 22,406,691 Pp2 8,883,668 Sc10_1 1.7 -
Pp4 28,708,822 29,814,510 Pp6 14,794,933 Sc7_1 1.1 -
Pp7 11,265,533 12,080,203 Pp2 15,232,751 Sc451_1 0.8 +
a PChr = pseudomolecule
b + = positive orientation, - = negative orientation (scaffold needs to be flipped), 0=random orientation
Supplementary Table 9. Linkage maps used for anchoring, ordering, orienting and checking the
WGS scaffolds.
Linkage maps
N. of anchoring
markers (N. of
anchored scaffolds)
Anchored
ungapped sequence
Mb (%)
Orientated
ungapped sequence
Mb (%)
‘Texas’ x ‘Earligold’ (T x E)18
909* (56) 223.1 (99.3) 204.5 (91)
‘Peach’ x ‘Ferganensis’ (P x F)21
1321 (48) 207.7 (92.5) 190.9 (85)
‘Contender’ x ‘Ambra’ (C x A)22
205 (45) 198.1 (88.2) 179.7 (80)
‘Contender’ x Fla 92-2C (C x Fla)23
69 (26) 179.3 (79.8) 126.9 (56.5)
* include 564 BIN-mapped markers
Nature Genetics: doi:10.1038/ng.2586
47
Supplementary Table 10. Final summary assembly statistics for the upcoming
refined chromosome-scale release.
Scaffold total 192
Contig total 2,730
Scaffold sequence total 227.4 Mb
Mapped scaffold sequence total 225.7 Mb (99.3%)
Contig sequence total 224.6 Mb (1.2% gap)
Mapped contig sequence total 223.1 Mb (99.3%)
Scaffold N/L50 4/27.4 Mb
Contig N/L50 294/214.2 kb
Nature Genetics: doi:10.1038/ng.2586
48
Supplementary Table 11. Statistics and comparison of the peach assembly to other plant genomes.
Genome Sequence
Coverage
Assembled scaffold
sequence (Mb)
Portion
Mapped
(Mb)
% Mapped Scaffold
N50
Scaffold
L50 (Mb)
Contig N50 Contig
L50 (kb)
Peach (Prunus persica) 8.47x 227.4 225.7 99.3 4 27.4 294 214.2
Apple (Malus x domestica)57
16.9x 598.3 528.3 88.3 80 2.0 16171 13.4
Arabidopsis thaliana29,a
-- 119.7 119.7 100 3 23.5 -- --
Rice (Oryza sativa)30,b
-- 382.2 382.2 100 6 30.8 -- --
Soybean (Glycine max)27c
8.04x 955.1 937.3 98.1 10 47.8 1492 189.4
Poplar (Populus trichocarpa)c 7.45x 403.8 370.4 91.7 9 18.8 448 242.2
Grape (Vitis vinifera)c 8.4x 467.5 290.2 62.1 14 13.9 2012 66.4
Papaya (Carica papaya),c <3x 271.7 235.0 86.5 74 1.3 7109 10.6
Brachypodium distachyon31
9.4x 271.9 271.1 99.7 3 59.3 252 347.8
Sorghum bicolorc 8.5x 697.6 625.6 89.7 6 62.4 958 195.4
Selaginella moellendorffii127,d
7x 212.6 -- -- 38 1.7 515 119.8
Physcomitrella patensc,d
8.92x 466.7 -- -- 86 1.7 369 291.8
a Arabidopsis assembly, obtained using BAC by BAC approach, represents the golden standard for plant genome. Statistics were calculated
from TAIR10 release. (http://www.ncbi.nlm.nih.gov/mapview/stats/BuildStats.cgi?taxid=3702&build=9&ver=2)
b Rice
assembly, obtained using BAC by BAC approach, represents the golden standard for plant genome. Statistics were calculated from
IRGSP Releases Build 4.0 (http://rgp.dna.affrc.go.jp/IRGSP/Build4/build4.html)
c Data retrieved from Schmutz et al.
27; they recalculated the original statistics to better match chromosome-scale assemblies.
d Selaginella and Physcomitrella genomes are unmapped releases.
Nature Genetics: doi:10.1038/ng.2586
49
Supplementary Table 12. Analysis of repetitive sequences within Peach v1.0 assembly.
Transposable elements include Class I and Class II elements. Only the main contributing superfamilies
are reported. A not negligible portion of the genome (7.54%) was identified as composed of repetitive
sequences that could not be classified as transposable elements and remained structurally unidentified.
Total
Mb
84.41
Genome
37.14%
Transposable elements 67.26 29.60%
ClassI: Retroelements 46.70 20.55%
LTR-retrotransposons 44.45 19.56%
Ty1/copia 19.54 8.60%
Ty3/gypsy 22.65 9.97%
Unclassified 2.27 1.00%
LINE 1.44 0.63%
Unclassified Class I 0.80 0.35%
ClassII: DNA transposons 20.56 9.05%
DNA transposons(except
helitrons) 20.02 8.81%
Helitrons 0.54 0.24%
Unclassified repeats 17.14 7.54%
Nature Genetics: doi:10.1038/ng.2586
50
Supplementary Table 13. tRNAs genes and pseudogenes predicted in Peach v1.0 assembly.
codon_AA # codon_AA # codon_AA # codon_AA #
AAC_Asn 15 GAC_Asp 24 CAC_His 11 TAC_Tyr 10
AAT_Asn 0 GAT_Asp 0 CAT_His 0 TAT_Tyr 0
AAA_Lys 14 GAA_Glu 10 CAA_Gln 7
AAG_Lys 17 GAG_Glu 14 CAG_Gln 6
ACC_Thr 0 GCC_Ala 0 CCC_pro 0 TCC_Ser 0
ACT_Thr 7 GCA_Ala 36 CCT_Pro 12 TCT_Ser 11
ACA_Thr 9 GCG_Ala 4 CCA_Pro 15 TCA_Ser 9
ACG_Thr 2 GCT_Ala 15 CCG_Pro 2 TCG_Ser 3
AGC_Ser 10 GGC_Gly 16 CGC_Arg 0 TGC_Cys 10
AGT_Ser 0 GGT_Gly 0 CGT_Arg 6 TGT_Cys 0
AGA_Arg 7 GGA_Gly 11 CGA_Arg 6 TGG_Trp 9
AGG_Arg 5 GGG_Gly 6 CGG_Arg 3
ATC_Ile 0 GTC_Val 1 CTC_Leu 0 TTC_Phe 17
ATT_Ile 20 GTT_Val 13 CTT_Leu 10 TTT_Phe 0
ATA_Ile 4 GTA_Val 5 CTA_Leu 8 TTA_Leu 6
ATG_Met 26 GTG_Val 7 CTG_Leu 4 TTG_Leu 11
???_Undet* 9
AAG_Pseudo 1
ACA_Pseudo 1
AAC_Pseudo 1
AAA_Pseudo 1
ATG_Pseudo 1
GAC_Pseudo 3
GCA_Pseudo 6
TAT_Pseudo 1
TGC_Pseudo 1
* tRNA and tRNA pseudogenes of undetermined codon specificity
Nature Genetics: doi:10.1038/ng.2586
51
Supplementary Table 14. ncRNAs loci predicted in Peach v1.0 assembly.
ncRNA #Loci ncRNA #Loci ncRNA #Loci ncRNA #Loci
U1 18 snoR100 1 snoR32_R81 1 snoU35 5
U2 29 snoR101 10 snoR35 1 snoU36a 2
U3 16 snoR103 2 snoR38 4 snoU61 7
U4 7 snoR104 1 snoR4 6 snoU83 1
U5 21 snoR109 4 snoR41 1 snoZ101 3
U6 27 snoR11 9 snoR43 1 snoZ102_R77 2
U6atac 2 snoR111 1 snoR44_J54 2 snoZ103 5
U11 2 snoR113 1 snoR4a 4 snoZ105 2
U12 1 snoR114 1 snoR53Y 8 snoZ107_R87 3
U54 1 snoR116 1 snoR60 4 snoZ112 1
snR50 4 snoR117 1 snoR64 1 snoZ122 2
SNORD100 1 snoR118 3 snoR64a 3 snoZ13_snr52 5
SNORD14 9 snoR12 2 snoR66 1 snoZ152 7
SNORD15 4 snoR127 1 snoR69Y 3 snoZ155 1
SNORD18 3 snoR128 2 snoR71 181 snoZ157 8
SNORD24 1 snoR134 2 snoR72 2 snoZ159 4
SNORD25 4 snoR135 1 snoR72Y 1 snoZ161_228 1
SNORD27 17 snoR137 4 snoR74 3 snoZ162 1
SNORD33 12 snoR14 3 snoR77 2 snoZ165 1
SNORD34 7 snoR143 1 snoR77Y 1 snoZ178 21
SNORD36 2 snoR16 3 snoR79 3 snoZ185 3
Nature Genetics: doi:10.1038/ng.2586
52
SNORD43 2 snoR160 4 snoR80 2 snoZ196 4
SNORD46 5 snoR17 17 snoR83 3 snoZ199 2
SNORD59 11 snoR2 1 snoR86 1 snoZ223 4
SNORD61 1 snoR20a 1 snoR8a 2 snoZ247 32
SNORD70 20 snoR24 9 snoR97 1 snoZ266 2
SNORD96 2 snoR26 1 snoR98 2 snoZ267 1
snoF1_F2 1 snoR27 4 snoR99 1 snoZ278 5
snoJ33 9 snoR28 4 snoU18 1 snoZ279_R105_R108 2
snoMe28S-
Am2634 2 snoR30 1 snoU19 2 snoZ43 2
snoMe28S-
Am982 2 snoR31 4 snoU30 1 snoZ5 7
snoR1 2 snoR31_Z110_Z27 1 snoU31b 6 snosnR60_Z15 2
SRP_euk_arch 17
RNase_MRP 3
Nature Genetics: doi:10.1038/ng.2586
53
Supplementary Table 15. Tissues and genotypes used as sources of RNA for cDNA preparation and
reads and sequences (Gb) obtained.
Embryos and
cotyledons
(cv. Flavorcrest)
Fruit bulk
(cv. Imera, endo-,
meso-, epicarp
stages S1-2-3-4)
Expanded
leaves
(cv. Lovell)
Roots
(op cv.
Yumyeong)
Pollen
(cv. Orion)
Apical leaves
(cv. Earligold)
Number of
reads in
millions
39 42 42 24 39 43
Total Gb 2.2 2.4 2.2 1.4 2.9 3.2
Nature Genetics: doi:10.1038/ng.2586
54
Supplementary Table 16. Manually annotated genes. Genes expressed in fruit and gene models modified by experts are reported.
Gene family General function Total Gene Models
examined
Gene models
expressed in fruit
Gene models
modified
Cell wall invertase (CWI) Sugar Metabolism 6 3 5
Vacuolar invertase (VI) Sugar Metabolism 2 2 1
Neutral invertase (NI) Sugar Metabolism 8 7 5
Sucrose synthase (SuSy) Sugar Metabolism 6 6 3
Sucrose phosphate synthase (SPS) Sugar Metabolism 4 4 2
Aldose-6-phosphate reductase (A6PR) Sugar Metabolism 2 1 0
Sorbitol dehydrogenase (SDH) Sugar Metabolism 7 4 3
Sorbitol transporter (SOT) Sugar Transport 10 3 2
Hexose transporter (HT) Sugar Transport 29 13 10
Sucrose transporter (SUT) Sugar Transport 3 2 3
Phytoene synthase (PSY) ABA/carotenoid Biosynthesis 3 2 3
Phytoene desaturase (PDS) ABA/carotenoid Biosynthesis 3 1 2
z-carotene desaturase (ZDS) ABA/carotenoid Biosynthesis 1 1 1
Lycopene ß-cyclase (LYBC) ABA/carotenoid Biosynthesis 3 3 3
ß-carotene hydroxylase (BCH) ABA/carotenoid Biosynthesis 1 1 1
Zeaxanthin epoxidase (ZEP) ABA/carotenoid Biosynthesis 3 2 2
9-cis-epoxycarotenoid dioxygenase (NCED) ABA/carotenoid Biosynthesis 3 3 2
Nature Genetics: doi:10.1038/ng.2586
55
Short-chain dehydrogenase reductase (SDR) ABA/carotenoid Biosynthesis 1 1 1
Abscisic aldehyde oxidase (AAO) ABA/carotenoid Biosynthesis 1 1 1
Cellulose synthase (CESA) Cellulose Biosynthesis 6 5 0
Cobra (COB) Cellulose Biosynthesis 2 2 1
Irregular xylem 3 (IRX3) Cellulose Biosynthesis 1 1 0
Irregular xylem 6 (IRX6) Cellulose Biosynthesis 1 1 0
-Galacturonosyltransferases (GAUT) Pectin Biosynthesis 10 10 2
-Galacturonosyltransferases like (GTLs) Pectin Biosynthesis 3 3 0
Rhamnogalacturonan xylosyltransferase 1 (RGXT1) Pectin Biosynthesis 1 1 0
Arabinan deficient 1 (ARAD1) Pectin Biosynthesis 1 1 0
β (1,4) glucan synthase (CSLC) Xyloglucan Biosynthesis 5 5 0
Xyloglucan xylosyltransferase 5 (XXT5) Xyloglucan Biosynthesis 1 1 0
Xyloglucan galactosyltransferase (MUR3) Xyloglucan Biosynthesis 1 1 0
Xyloglucan fucosyltransferase (MUR2) Xyloglucan Biosynthesis 1 1 0
(1,4) xylan synthase (GUT1/2) Glucuronoxylan Biosynthesis 2 2 0
Rhamnosyl (1,3)-xylosyltransferase (FRA8) Glucuronoxylan Biosynthesis 1 1 0
Irregular xylem 8/Galacturonosyltransferase 12 (IRX8/GAUT12) Glucuronoxylan Biosynthesis 1 1 0
Galacturonosyltransferase –like 1 (PARVUS/GLZ1) Glucuronoxylan Biosynthesis 1 1 0
Glucuronosyltransferase 2 (GUX2) Glucuronoxylan Biosynthesis 1 1 0
Mannan synthase (CSLA) Mannan Biosynthesis 1 1 0
Mannan galactosyltransferases Galactomannan Biosynthesis 2 1 0
Nature Genetics: doi:10.1038/ng.2586
56
Pectinacetylesterase (PAE) Pectin Degradation 12 5 0
Polygalacturonase inhibitor (PGIP) Pectin Degradation 2 2 0
Polygalacturonase/polygalacturonase-like (PG/PG-like) Pectin Degradation 23 18 4
Pectin methyl esterase (PME) Pectin Degradation 16 7 6
Pectate lyase (PL) Pectin Degradation 15 11 4
Alcohol acyl transferase (AAT) Ester Metabolism 1 1 0
Alcohol dehydrogenase (ADH) Ester, Aldehyde, Alcohol Metabolism 23 19 2
Aldehyde dehydrogenase (ALDH) Ester, Aldehyde, Alcohol Metabolism 10 9 0
Carotenoid cleavage dioxygenase (CCD) Ionone Metabolism 3 1 0
Epoxide hydrolase (EH) Lactone Metabolism 12 10 3
Eugenol-synthase (EGS) Phenylpropanoid Metabolism 4 0
HAD-superfamily hydrolase-epoxide hydrolase (EH) Lactone Metabolism 4 4 0
Linalool-synthase (LIS) Monoterpenoid Metabolism 5 1 2
Lipoxygenase (LOX) Ester, Aldehyde, Alcohol Metabolism 15 13 4
Phenylacetaldehyde synthase (PAAS) Benzenoid Metabolism 3 2 0
Phloroglucinol O-methyltransferase (POMT) Phenylpropanoid Metabolism 1 1 0
S-adenosylmethionine-dependent methyltransferase (SAMT) Phenylpropanoid Metabolism 14 4 1
Type 1 MADS-box genes Transcription Factor 40 8 32
Type 2 MADS-box genes Transcription Factor 39 27 32
NPR1 (nonexpresser of PR gene 1) Pathogen Related 1 3 0
NPR3 (NPR1-like protein 3) Pathogen Related 3 1 0
PR1 (Pathogenesis-Related gene 1) Pathogen Related 22 10 0
Nature Genetics: doi:10.1038/ng.2586
57
PR2 (beta-1,3-glucanase) Pathogen Related 1 0 0
PR3 (class I chiquitinase) Pathogen Related 3 1 0
PR4 (chitin binding) Pathogen Related 4 2 0
PR5 (thaumatin) Pathogen Related 8 0 8
PR6 (proteinase_inhibitor) Pathogen Related 2 1 1
PR8 (endochitinase) Pathogen Related 0 0 0
PR10 (RNase) Pathogen Related 28 14 1
Allene oxide synthase (AOS) Jasmonic Acid Biosynthesis 3 1 1
Allene oxide cyclase (AOC) Jasmonic Acid Biosynthesis 4 2 3
12-oxophytodienoate reductase (OPR); Oxo-phytodienoic acid
reductase 1
Jasmonic Acid Biosynthesis 11 6 2
Indole-3-acetic acid-amido synthetase GH3 Salicylic Acid Biosynthesis 0 0 0
5-enolpyruvylshikimate 3-phosphate synthase1 (EPSP) Salicylic Acid Biosynthesis 2 1 0
Catalase (CAT) ROS Related 4 2 2
Nitric-oxide synthase (NOS) ROS Related 2 1 1
Superoxide dismutase (SOD) ROS Related 4 2 2
l-aminocyclopropane-1-carboxylate oxidase (ACO) Ethylene biosynthesis 4 2 0
l-aminocyclopropane-1-carboxylate synthase (ACS) Ethylene biosynthesis 3 1 2
S-adenosylmethionine synthetase (SAM) Ethylene biosynthesis 12 6 9
Aspartate aminotransferase (ASP) Ethylene biosynthesis 4 2 4
Cold_acclimation (WCOR413-like_protein) Abiotic Stress Related 4 2 0
Heat shock 70 kDa protein cognate (HSP70) Abiotic Stress Related 18 9 4
Nature Genetics: doi:10.1038/ng.2586
58
Photolyase (PHR) Abiotic Stress Related 2 1 0
Temperature-induced lipocalin (TIL) Abiotic Stress Related 2 1 0
NADP-dependent isocitrate dehydrogenase (ICDH) Ripening Related 10 5 2
Pyruvate decarboxylase family protein (PDC) Ripening Related 2 1 0
Antiauxin-resistant 3 (AAR3) Auxin Related 2 1 1
Arabidopsis RAC like protein (ARAC1) Auxin Related 2 1 0
Auxin resistant 1 (AXR1) Auxin Related 2 1 0
Auxin response factor 16, putative (ARF16) Auxin Related 2 1 2
Auxin response factor 17, putative (ARF17) Auxin Related 2 1 0
Histone acetyltransferase 2 (HAT2) Auxin Related 1 0 1
Indole-3-acetic acid inducible 9 (IAA9); transcription factor, putative Auxin Related 2 1 2
Indole-3-acetic acid-amido synthetase GH3.6 (DFL1) Auxin Related 1 0 0
Molybdopterin biosynthesis protein, putative (B73) Auxin Related 2 1 0
MYB family transcription factor * Auxin Related 2 1 2
NAC domain protein 1 (NAC1)* Auxin Related 2 1 0
Nucleoside diphosphate kinase 2 (NDPK2) Auxin Related 2 1 2
PID serine/threonine protein kinase Auxin Related 2 1 1
Salt tolerance during germination 1 (STG1) Auxin Related 2 1 0
Sirtinol resistant 3 (SIR3) Auxin Related 2 1 0
Tetratricopepeptide-repeat thioredoxin-like 3 (TTL3) Auxin Related 2 1 1
VH1-interacting kinase (VIK) Auxin Related 2 1 0
Arabidopsis histidine-containing phosphotransmitter 1 (AHP1) Cytokinin Related 3 1 0
Nature Genetics: doi:10.1038/ng.2586
59
Arabidopsis histidine-containing phosphotransmitter 4 (AHP4) Cytokinin Related 4 0 0
Arabidopsis histidine-containing phosphotransmitter 6 (AHP6) Cytokinin Related 1 0 0
Arabidopsis response regulator 1 (ARR1); sensor histidine kinase,
putative
Cytokinin Related 2 1 1
Arabidopsis response regulator 2 (ARR2); two-component sensor
histidine kinase bacteria, putative
Cytokinin Related 2 1 1
Arabidopsis response regulator 3 (ARR3) Cytokinin Related 2 1 1
Arabidopsis response regulator 6 (ARR6); two-component sensor
protein histidine protein kinase, putative
Cytokinin Related 6 1 1
Arabidopsis response regulator 7 (ARR7) Cytokinin Related 2 1 2
Arabidopsis response regulator 8 (ARR8); type-a response regulator Cytokinin Related 2 1 1
Arabidopsis response regulator 9 (ARR9); type-a response regulator Cytokinin Related 2 1 0
Arabidopsis response regulator 12 (ARR12) Cytokinin Related 2 1 0
Arabidopsis response regulator 14 (ARR14); putative B-type response
regulator 14
Cytokinin Related 2 1 1
Arabidopsis response regulator 17 (ARR17); type-a response regulator Cytokinin Related 0 0 0
Arabidopsis response regulator 19 (ARR19) Cytokinin Related 2 1 1
Arabidopsis response regulator 21 (ARR21); two-component system
sensor histidine kinase/response regulator, putative
Cytokinin Related 1 0 1
Arabidopsis response regulator 24 (ARR24); sensory transduction
histidine kinase bacterial, putative
Cytokinin Related 2 1 0
Brevis radix (BRX) Cytokinin Related 2 1 0
Isopentenyltransferase 9 (IPT9) Cytokinin Related 2 1 0
putative SPINDLY protein (SPY) Cytokinin Related 2 1 0
Phenylalanine ammonia-lyase (PAL) Phenylpropanoid Metabolism 2 2 0
Nature Genetics: doi:10.1038/ng.2586
60
Trans-cinnamate 4-monooxygenase (C4H) Phenylpropanoid Metabolism 2 2 1
4-coumarate-CoA ligase (4CL) Phenylpropanoid Metabolism 3 3 3
Monooxygenase/ p-coumarate 3-hydroxylase (C3H) Cell Wall Biosynthesis (lignin) 5 4 2
Hydroxycinnamoyl-Co A shikimate/quinate hydroxyl-
cinnamoyltransferase (HCT/HQT)
Cell Wall Biosynthesis (lignin) 11 7 4
Cinnamoyl-CoA reductase (CCR) Cell Wall Biosynthesis (lignin) 2 2 2
Cinnamyl-alcohol dehydrogenase (CAD) Cell Wall Biosynthesis (lignin) 2 2 1
Ferulate 5-hydroxylase/ monooxygenase (F5H) Cell Wall Biosynthesis (lignin) 2 1 0
Caffeic acid/5-hydroxyferulic acid O-methyltransferase (COMT) Cell Wall Biosynthesis (lignin) 2 2 0
Caffeoyl-CoA O-methyltransferase (CCoAOMT) Cell Wall Biosynthesis (lignin) 4 1 0
Naringenin-chalcone synthase (CHS) Flavonoid Biosynthesis 3 3 1
Chalcone isomerase (CH) Flavonoid Biosynthesis 1 1 0
Naringenin 3-dioxygenase (F3H) Flavonoid Biosynthesis 1 1 0
Flavonoid 3'-monooxygenase (F3'H) Flavonoid Biosynthesis 1 1 0
Dihydrokaempferol 4-reductase (DFR) Flavonoid Biosynthesis 1 1 0
Anthocyanidin reductase (ANR) Flavonoid Biosynthesis 1 1 0
Leucoanthocyanidin 4- reductase (LAR) Flavonoid Biosynthesis 2 1 0
Leucocyanidin oxygenase (ANS) Flavonoid Biosynthesis 1 1 0
Flavonol synthase (FLS) Flavonoid Biosynthesis 4 0 1
Flavonol 3-O-glucosyltransferase (UFGT) Flavonoid Biosynthesis 6 3 1
Total 672 389 223
(57.9%) (33.2%)
*Genes from larger families selected for annotation based on putative function
Nature Genetics: doi:10.1038/ng.2586
61
Supplementary Table 17. Comparison of manual annotated gene families among seven sequenced plant genomes. Pp = Prunus persica; Md =
Malus x domestica; Fv = Fragaria vesca; Vv = Vitis vinifera; Pt = Populus thricocarpa; Cp = Carica papaya; At = Arabidopsis thaliana
Gene family General function Pp Md Fv Vv Pt Cp At
Cell wall invertase (CWI) Sugar Metabolism 6 3 3 2 4 3 6
Vacuolar invertase (VI) Sugar Metabolism 2 2 2 2 2 3 2
Neutral invertase (NI) Sugar Metabolism 8 15 2 9 11 5 9
Sucrose synthase (SuSy) Sugar Metabolism 6 15 5 5 7 5 6
Sucrose phosphate synthase (SPS) Sugar Metabolism 4 7 4 4 5 3 4
Aldose-6-phosphate reductase (A6PR) Sugar Metabolism 2 11 1 1 1 1 2
Sorbitol dehydrogenase (SDH) Sugar Metabolism 7 6 2 3 2 2 3
Sorbitol transporter (SOT) Sugar Transport 10 20 3 5 4 6 6
Hexose transporter (HT) Sugar Transport 29 27 24 26 32 16 24
Sucrose transporter Sugar Transport 3 5 7 4 5 3 9
Phytoene synthase (PSY) ABA/Carotenoid Biosynthesis 3 5 3 3 5 2 1
Phytoene desaturase (PDS) ABA/Carotenoid Biosynthesis 3 1 1 2 2 1 1
z-carotene desaturase (ZDS) ABA/Carotenoid Biosynthesis 1 2 1 1 2 1 1
Lycopene ß-cyclase (LYBC) ABA/Carotenoid Biosynthesis 3 2 1 1 2 1 1
ß-carotene hydroxylase (BCH) ABA/Carotenoid Biosynthesis 1 4 2 2 2 1 2
Zeaxanthin epoxidase (ZEP) ABA/Carotenoid Biosynthesis 3 3 2 2 3 1 1
9-cis-epoxycarotenoid dioxygenase (NCED) ABA/Carotenoid Biosynthesis 3 4 3 4 5 2 5
Short-chain dehydrogenase reductase (SDR) ABA/Carotenoid Biosynthesis 1 2 1 1 1 1 1
Abscisic aldehyde oxidase (AAO) ABA/Carotenoid Biosynthesis 1 0 1 2 3 1 4
Nature Genetics: doi:10.1038/ng.2586
62
Cellulose synthase (CESA) Cellulose Biosynthesis 6 23 7 9 25 8 10
Cobra (COB) Cellulose Biosynthesis 2 10 5 3 4 2 3
Irregular xylem 3 (IRX3) Cellulose Biosynthesis 1 6 1 1 3 1 1
Irregular xylem 6 (IRX6) Cellulose Biosynthesis 1 7 3 3 3 2 1
-Galacturonosyltransferases (GAUT) Pectin Biosynthesis 10 23 8 8 15 9 11
-Galacturonosyltransferases Like (GTLs) Pectin Biosynthesis 3 8 2 5 8 2 9
Rhamnogalacturonan xylosyltransferase 1 (RGXT1) Pectin Biosynthesis 1 2 1 2 2 1 5
Arabinan deficient 1 (ARAD1) Pectin Biosynthesis 1 4 1 0 2 0 0
β (1,4) glucan synthase (CSLC) Xyloglucan Biosynthesis 5 13 6 5 8 3 6
Xyloglucan xylosyltransferase 5 (XXT5) Xyloglucan Biosynthesis 1 2 1 0 2 2 2
Xyloglucan galactosyltransferase (MUR3) Xyloglucan Biosynthesis 1 3 1 0 1 0 1
Xyloglucan fucosyltransferase (MUR2) Xyloglucan Biosynthesis 1 1 1 0 0 1 0
(1,4) xylan synthase (GUT1/2) Glucuronoxylan Biosynthesis 2 4 2 2 5 2 2
Rhamnosyl (1,3)-xylosyltransferase (FRA8) Glucuronoxylan Biosynthesis 1 0 1 0 1 1 0
Irregular xylem 8/Galacturonosyltransferase 12 (IRX8/GAUT12) Glucuronoxylan Biosynthesis 1 3 1 1 2 1 1
Galacturonosyltransferase –like 1 (PARVUS/GLZ1) Glucuronoxylan Biosynthesis 1 4 2 0 4 1 2
Glucuronosyltransferase 2 (GUX2) Glucuronoxylan Biosynthesis 1 3 1 0 1 1 0
Mannan synthase (CSLA) Mannan Biosynthesis 1 10 3 3 5 3 2
Mannan galactosyltransferases Galactomannan Biosynthesis 2 7 2 1 0 0 3
Pectinacetylesterase (PAE) Pectin Degradation 12 25 7 7 14 5 11
Polygalacturonase inhibitor (PGIP) Pectin Degradation 2 5 2 0 2 1 0
Polygalacturonase/polygalacturonase-like (PG/PG-like) Pectin Degradation 23 69 35 28 52 22 20
Nature Genetics: doi:10.1038/ng.2586
63
Pectin methyl esterase (PME) Pectin Degradation 16 39 16 14 23 12 7
Pectate lyase (PL) Pectin Degradation 15 52 13 11 20 9 21
Alcohol acyl transferase (AAT) Ester Metabolism 1 11 4 6 15 1 2
Alcohol dehydrogenase (ADH) Ester, Aldehyde, Alcohol Metabolism 23 47 26 37 46 20 28
Aldehyde dehydrogenase (ALDH) Ester, Aldehyde, Alcohol Metabolism 10 26 13 15 18 7 15
Carotenoid cleavage dioxygenase (CCD) Ionone Metabolism 3 7 3 3 7 3 3
Epoxide hydrolase (EH) Lactone Metabolism 12 16 10 10 14 6 7
Eugenol-synthase (EGS) Phenylpropanoid Metabolism 4 1 1 3 4 1 0
HAD-superfamily hydrolase epoxide hydrolase (EH) Lactone Metabolism 4 0 2 2 2 2 1
Linalool-synthase (LIS) Monoterpenoid Metabolism 5 12 2 10 5 1 2
Lipoxygenase (LOX) Ester, Aldehyde, Alcohol Metabolism 15 25 15 12 22 11 6
Phenylacetaldehyde synthase (PAAS) Benzenoid Metabolism 3 4 2 2 4 2 3
Phloroglucinol O-methyltransferase (POMT) Phenylpropanoid Metabolism 1 20 3 5 10 1 10
S-adenosylmethionine-dependent methyltransferase (SAMT) Phenylpropanoid Metabolism 14 33 9 11 13 2 6
MADS-box genes Transcription Factor 79 138 82 54 95 171 107
NPR1 (nonexpresser of PR gene 1 Pathogen Related 1 5 5 1 2 1 1
NPR3 (NPR1-like protein 3) Pathogen Related 3 0 0 2 2 0 1
PR1 (pathogenesis-related gene 1) Pathogen Related 22 4 11 6 3 3 1
PR2 (beta-1,3-glucanase) Pathogen Related 1 4 5 6 6 4 5
PR3 (class I chitinase) Pathogen Related 3 4 4 2 1 1 0
PR4 (chitin binding) Pathogen Related 4 4 3 4 2 2 1
PR5 (thaumatin) Pathogen Related 8 4 6 15 10 2 1
Nature Genetics: doi:10.1038/ng.2586
64
PR6 (proteinase_inhibitor) Pathogen Related 2 9 2 2 1 0 0
PR8 (endochitinase) Pathogen Related 0 25 8 8 9 1 1
PR10 (RNase) Pathogen Related 28 2 1 10 1 0 1
Nitric-oxide synthase (NOS) ROS Related 2 0 7 1 1 1 1
Superoxide dismutase [Cu/Zn] (SOD CSD) ROS Related 2 4 17 4 3 12 3
Superoxide dismutase [Mn]) (SOD MSD) ROS Related 1 3 2 1 2 2 2
Superoxide dismutase [Fe]) (SOD FSD) ROS Related 1 4 2 2 2 4 0
Catalase (CAT) ROS Related 4 0 2 3 3 15 3
Cold_acclimation (WCOR413-like_protein) Abiotic Stress Related 4 0 0 4 2 3 2
Photolyase (PHR) Abiotic Stress Related 2 0 0 6 5 11 0
Heat shock 70 kDa protein cognate (HSP70) Abiotic Stress Related 18 31 43 9 2 6 3
Temperature-induced lipocalin (TIL) Abiotic Stress Related 2 0 0 2 1 1 1
Pyruvate decarboxylase family protein (PDC) Ripening Related 2 15 6 5 5 10 6
NADP-dependent isocitrate dehydrogenase (ICDH) Ripening Related 10 6 8 3 5 5 4
Auxin resistant 1 (AXR1) Auxin Related 2 0 0 1 1 5 1
NAC domain protein 1 (NAC1) Auxin Related 2 0 0 0 0 2 1
Auxin response factor 16, putative (ARF16) Auxin Related 2 2 1 3 1 6 1
Auxin response factor 17, putative (ARF17) Auxin Related 2 2 6 1 1 2 1
Histone acetyltransferase 2 (HAT2) Auxin Related 1 0 2 0 2 1 1
Indole-3-acetic acid inducible 9 (IAA9); transcription factor, putative Auxin Related 2 0 1 1 2 1 1
Arabidopsis response regulator 1 (ARR1); sensor histidine kinase,
putative
Cytokinin Related 2 2 6 0 3 1 1
Arabidopsis response regulator 6 (ARR6); two-component sensor Cytokinin Related 6 1 0 1 0 0 1
Nature Genetics: doi:10.1038/ng.2586
65
protein histidine protein kinase, putative
Arabidopsis response regulator 12 (ARR12) Cytokinin Related 2 0 3 6 11 5 1
Arabidopsis histidine-containing phosphotransmitter 1 (AHP1) Cytokinin Related 3 4 5 6 4 6 1
Arabidopsis histidine-containing phosphotransmitter 4 (AHP4) Cytokinin Related 4 1 3 3 2 6 1
Arabidopsis histidine-containing phosphotransmitter 6 (AHP6) Cytokinin Related 1 0 1 0 1 2 1
Allene oxide synthase (AOS) Jasmonic Acid Biosynthesis 3 6 7 3 1 5 1
Allene oxide cyclase (AOC) Jasmonic Acid Biosynthesis 4 6 4 1 1 2 4
12-oxophytodienoate reductase (OPR); Oxo-phytodienoic acid
reductase 1
Jasmonic Acid Biosynthesis 11 33 11 6 1 15 6
Indole-3-acetic acid-amido synthetase GH3 Salicylic Acid Biosynthesis 0 6 9 2 3 3 5
5-enolpyruvylshikimate 3-phosphate synthase1 (EPSP) Salicylic Acid Biosynthesis 2 0 0 1 1 2 2
l-aminocyclopropane-1-carboxylate oxidase (ACO) Ethylene Biosynthesis 4 24 41 8 3 7 4
l-aminocyclopropane-1-carboxylate synthase (ACS) Ethylene Biosynthesis 3 6 8 4 8 11 5
S-adenosylmethionine synthetase (SAM) Ethylene Biosynthesis 12 8 9 2 1 6 3
Aspartate aminotransferase (ASP) Ethylene Biosynthesis 4 5 6 3 5 11 5
Phenylalanine ammonia-lyase (PAL) Phenylpropanoid Metabolism 2 6 2 14 5 3 4
Trans-cinnamate 4-monooxygenase (C4H) Phenylpropanoid Metabolism 2 2 2 1 2 2 1
4-coumarate-CoA ligase (4CL) Phenylpropanoid Metabolism 3 4 3 2 3 3 4
Monooxygenase/p-coumarate 3-hydroxylase (C3H) Cell Wall Biosynthesis (Lignin) 5 12 2 1 3 1 1
Hydroxycinnamoyl-Co A shikimate/quinate hydroxyl-
cinnamoyltransferase (HCT/HQT)
Cell Wall Biosynthesis (Lignin) 11 2 6 2 7 1 1
Cinnamoyl-CoA reductase (CCR) Cell Wall Biosynthesis (Lignin) 2 2 1 1 6 1 2
Cinnamyl-alcohol dehydrogenase (CAD) Cell Wall Biosynthesis (Lignin) 2 2 1 1 1 1 2
Nature Genetics: doi:10.1038/ng.2586
66
Ferulate 5-hydroxylase/monooxygenase (F5H) Cell Wall Biosynthesis (Lignin) 2 3 2 2 3 4 1
Caffeic acid/5-hydroxyferulic acid O-methyltransferase (COMT) Cell Wall Biosynthesis (Lignin) 2 6 1 2 2 1 1
Caffeoyl-CoA O-methyltransferase (CCoAOMT) Cell Wall Biosynthesis (Lignin) 4 2 1 2 2 1 1
Naringenin-chalcone synthase (CHS) Flavonoid Biosynthesis 3 5 2 12 6 2 1
Chalcone isomerase (CH) Flavonoid Biosynthesis 1 8 1 1 1 1 1
Naringenin 3-dioxygenase (F3H) Flavonoid Biosynthesis 1 0 1 1 2 1 1
Flavonoid 3'-monooxygenase (F3'H) Flavonoid Biosynthesis 1 2 1 1 1 1 1
Dihydrokaempferol 4-reductase (DFR) Flavonoid Biosynthesis 1 2 1 1 2 1 1
Anthocyanidin reductase (ANR) Flavonoid Biosynthesis 1 5 1 1 2 1 1
Leucoanthocyanidin 4- reductase (LAR) Flavonoid Biosynthesis 2 5 1 2 2 1 0
Leucocyanidin oxygenase (ANS) Flavonoid Biosynthesis 1 4 1 1 2 1 1
Flavonol synthase (FLS) Flavonoid Biosynthesis 4 4 2 2 4 1 1
Flavonol 3-O-glucosyltransferase (UFGT) Flavonoid Biosynthesis 6 3 7 4 3 1 1
Nature Genetics: doi:10.1038/ng.2586
67
Supplementary Table 18. Resequenced accessions, mean coverage (excluding regions not covered),
total number of SNPs detected, SNP frequency per kb in comparison to the reference sequence and
geographical origin.
Mean
coverage SNPs
SNP
frequency per
103 bp
Geographical
origin
P. kansuensis 9.35 954,866 9.19 China
P. davidiana 20.44 1,675,062 14.10 China
P. mira 32.43 1,047,523 11.44 China
P. ferganensis 12.63 162,003 1.64 Fergana Valley
‘Bolero’ 22.80 213,410 2.61 Italy
‘Earligold’ 29.96 181,666 1.35 USA
F1 ‘Contender ‘x ‘Ambra’ 17.59 231,861 1.75 Italy
‘GF305’ 19.02 114,446 0.74 France
IF7310828 14.30 198,632 1.44 Italy
PLov2-2N 84.44 6,363 0.03 USA
‘Oro A’ 26.41 228,103 1.72 Brazil
‘Quetta’ 12.46 196,284 1.28 Pakistan
‘Sahua Hong Pantao’ 14.69 452,952 2.79 Southern China
‘Shenzhou Mitao’ 12.30 344,779 2.32 Northern China
‘Yumyeong’ 22.55 393,114 2.43 Korea
Nature Genetics: doi:10.1038/ng.2586
68
Supplementary Table 19. SNP distribution by pseudomolecule and genomic fraction.
Pseudomolecule SNP
length total intergenic CDS UTR introns
1 46,877,626 158,873 125,532 12,428 1,753 19,160
2 26,807,724 165,302 130,514 13,891 1,407 19,490
3 22,025,550 86,941 67,268 7,525 946 11,202
4 30,528,727 166,214 131,199 13,365 1,416 20,234
5 18,502,877 71,160 55,121 5,600 924 9,515
6 28,902,582 104,850 82,362 8,717 1,168 12,603
7 22,790,193 99,996 78,972 7,960 979 12,085
8 21,829,753 100,021 77,268 9,337 915 12,501
Total 218,265,032 953,357 748,236 78,823 9,508 116,790 All data refer to positions that had a coverage allowing SNP calling in at least four accessions among
P. ferganensis, ‘Shenzhou Mitao’, ‘Yumyeong’, ‘Sahua Hong Pantao’, ‘Quetta’, ‘Bolero’, ‘Earligold’,
F1 ‘Contender’ x ‘Ambra’, ‘GF305’, IF7310828, ‘Oro A’.
Nature Genetics: doi:10.1038/ng.2586
69
Supplementary Table 20. Covered bases and estimates of nucleotide diversity (and the Watterson
estimator w according to genomic fraction, chromosomal pseudomolecule and geographical origin.
Covered
bases (Mb)
Total bases
(%)
Polymorphic
sites bp
-3 w bp
-3
Total 183.06 100.00 953,357
Intergenic 116.71 63.76 748,236
Introns 32.76 17.90 116,790
UTR 3.07 1.68 9,508
Coding (CDS) 30.55 16.69 78,823
Synonymous 7.03 3.84 33,388
Replacement 23.52 12.85 45,435
pseudomolecule 1 40.01 21.86 158,873
pseudomolecule 2 21.73 11.87 165,302
pseudomolecule 3 18.41 10.06 86,941
pseudomolecule 4 24.84 13.57 166,214
pseudomolecule 5 15.93 8.70 71,160
pseudomolecule 6 24.40 13.33 104,850
pseudomolecule 7 19.35 10.57 99,996
pseudomolecule 8 18.37 10.03 100,021
Eastern accessions − − 784,286 −
Western accessions − − 613,225 −
All data refer to positions that had a coverage allowing SNP calling in at least four accessions among P. ferganensis,
‘Shenzhou Mitao’, ‘Yumyeong’, ‘Sahua Hong Pantao’, ‘Quetta’, ‘Bolero’, ‘Earligold’, F1 ‘Contender’ x ‘Ambra’,
‘GF305’, IF7310828, ‘Oro A’. Eastern accessions include P. ferganensis, ‘Shenzhou Mitao’, ‘Yumyeong’, ‘Sahua Hong
Pantao’, western accessions include ‘Bolero’, ‘Earligold’, F1 ‘Contender’ x ‘Ambra’, ‘GF305’, IF7310828, ‘Oro A’ and
PLov2-2N while ‘Quetta’ was excluded as it provided contradictory geographical origin and phylogenetic relationship.
Nature Genetics: doi:10.1038/ng.2586
70
III. SUPPLEMENTARY FIGURES
Supplementary Figure 1. Dot plot of fosmid clone AC253544 on a region of scaffold_5. This alignment is
representative of 11 of the 13 fosmid clones.
Nature Genetics: doi:10.1038/ng.2586
71
Supplementary Figure 2. Dot plot of fosmid clone AC253548 on a region of scaffold_1. The alignment contains an insertion in the fosmid that is not present in the genome assembly.
Nature Genetics: doi:10.1038/ng.2586
72
Supplementary Figure 3. Dot plot of fosmid clone AC253545 on a region of scaffold_5. The clone does
not align to the genome across the repeats.
Nature Genetics: doi:10.1038/ng.2586
73
Supplementary Figure 4. Dot plot of fosmid clone AC253539 on a region of scaffold_1 composed of
degraded transposons.
Nature Genetics: doi:10.1038/ng.2586
74
Supplementary Figure 5. Map of the WGS scaffolds (Sc). The eight linkage groups (G1 to G8) of the Prunus reference map (T x E) and the
Peach v1.0 pseudomolecules (Pp) are represented. Only informative markers are represented. Unmapped scaffolds in v1.0 are underlined and
shortened such as Scx v1. + indicates positive orientation; - negative orientation (scaffold needs to be flipped); 0 random orientation. The symbol §
after the scaffold name (ie, Sc4_1§) denotes new scaffolds originated from breaking.
MA010a
Sc458 (+)
Sc4_2§(+)
Sc15 v1 (+)
Sc453 (+)
G1
RPPG4-011
Pp1
Sc20 (+)
Sc27 (-)
Sc16 (+)
Sc29 (+)
Sc0 (+)
Sc13 (-)
Sc10 v1 (+)
Sc14 v1 (0)Sc13 v1 (0)
G3
RPPG10-003EPPCU5990
UDP97-403BPPCT007
AG7CPDCT047
AG32A
BPPCT039
EPDCU3083
EPPISF019MA068aAG57A
RPPG4-018
RPPG4-007RPPG14-001
MA031aRPPG13-001
EPPB4221
RPPG9-002
CPDCT008BPPCT021bUDAp-496
CC2MA039aAG110CC12AAG1ACC8
Pr1-13CPPCT002
EPPCU0443EPPISF012CPDCT025UDP96-008
AG50APC13
AG106FG38
FG22AEPPCU7190
CPDCT027AG37
Pp3
Sc452 (+)
Sc24 (+)
Sc4_1§(-)
Sc9 v1 (-)
Sc450 (+)
Sc31 (0)
Sc3 (+)
Sc451_1§ (+)
G2
RPPG2-004
RPPG2-003
CPDCT032
MA064a
AC33A
AC31
MA024a
RPPG4-040
RPPG4-025
RPPG16-002
RPPG12-001UDP98-025
AC13
RPPG2-006
AG21ARPPG7-011BPPCT004BPPCT001BPPCT002UDAp456CPSCT044
AG20AAG35
BPPCT013UDP96-013
PC5PC9A
AC27ACPDCT004UDP98-411
pchgms1AG107A
CPPCT043BPPCT030BPPCT024CPSCT021
Ole1AC21B
CPSCT031FG201A
CPSCT023EPPCU984
Omt1CPSCT034UDA023
Pp2
Sc23 (-)
Sc16 v1 (0)
Sc6 (+)
Sc12 v1 (+)
Sc5 (+)
Sc10_1§ (-)
Sc15 (+)
Sc456 (+)
CPPCT044
CPPCT038
G5
CPPCT040
FG26A
CPPCT004AEPPCU3049EPPCU5637CPSCT011BPPCT026
UDP97-401AG63B
RPPG5-001RPPG5-002EPPCU9830
AC49AC9
AG114EPPB4228
AG25ACPPCT009
PC14BPPCT017
CPSCT006BPPCT037pchgms4
CPDCT028CPPCT013CPDCT016
EPDCU5183BPPCT038CPSCT030BPPCT032
BPPCT014CPSCT022
AG33
Pp5
Sc8 (+)
Sc19 (+)
Sc22 (+)
Sc12 (+)
Sc25 (+)
Sc26 (-)
Sc457 (+)
Sc7_1§(-)
G6
FG215CPPCT008RPPG6-001
AG40
FG1AUDP96-001
UDAp420
BPPCT008
BPPCT009B
AC8RPPG4-054RPPG4-027RPPG18-001RPPG17-001
CPSCT012CPPCT015
AG26AEPPCU3766EPPISF010
EPPCU2828CPPCT023pchcms5
CPPCT048FG14
BPPCT025AP2M
CPPCT047
PC73
UDP98-412Ltp2Pgl1
FG209ACPPCT030CPPCT021
AG111A
Pp6
Sc1 (+)
Sc18 v1 (0)Sc17 v1 (0)
G4
BPPCT010CPSCT039
EPDCU5060MD205aAC43BCC47
AC41Apchgms2
CPPCT028UDP98-024
CC124CGPPB3705
CPDCT045BPPCT040pchgms5
MA053a
AG6CC3
CC59AEPPISF022
AG62UDP96-003CPPCT003BUDAp-439
FG3CC138
EPPCU2000EPPISF032
CPPCT046BPPCT023
PC1BPPCT035
EPPCU1775
pchgms55AG12BAG16B
CPSCT005
Pp4
Sc11 (+)
Sc30 (-)
Sc454 (+)
Sc7 (+)
Sc4 (+)
Sc10 (+)
Sc33 (+)
G8
EPPB4225CPSCT018
AG2ACPPCT019
AG112AEPPCU5628
BPPCT019MA044a
BPPCT006
RPPG11-002RPPG11-001
CPDCT034
EPPISF020BPPCT033
CPPCT035UDP96-019
FG230AEPDCU3516
CPPCT006AC22APC101AG4A
EPPCU4726AC48AFG37
CPDCT023UDP98-409EPDCU3454
AG49AC26Pru1
AG14AEPDCU3117
UDA038
Pp8
Sc18 (-)
Sc11 v1 (-)
Sc9 (+)
Sc455 (+)
G7
RPPG3-014RPPG3-002
UDA036AG101A
AC44
CPSCT004
CPPCT039Pr1-65
EPPCU0445EPPISF029
CPPCT050CPPCT022
UDP98-405CPSCT026
UDP98-408
CPSCT033CPPCT037BPPCT029
AG104
CPPCT033CPSCT042
AG39BAG60A
PS8e8pchcms2
FG27CPPCT017
AG17AFG24
EPDCU3392M02bPS5c3CC132
EPPCU6522
Pp7
Sc17 (+)
Sc14 (-)
Sc2 (+)
Sc450_1§(+)
Sc451 (+)
5
10
cM Mb
2.5
5
CPPCT016EPDCU3122
AG53FG83
AG109AC24
FG81ACPDCT042
RPPG1-003EPPCU2979
RPPG4-008CPSCT008
AG102
MA067aM16aAG51PC78
pchcms4RPPG1-002
AG116A
EPDCU5100
CPDCT038
CPPCT027
RPPG15-001
EPPCU5516AC32AC47
CPSCT036CPSCT027CPPCT049
CC6A
M9aFG16
PC35AAG22AFG9A
CPSCT024CPDCT024pchgms3
FG5CPPCT034
AG113CPDCT017
PC15AG30AG47
EPDCU3489BPPCT027AG105A
BPPCT020AG44
CPPCT045
BPPCT016
FG36
CPPCT042EPDCU2862
AC18CPPCT029
AC23Pr1-12FG8A
CGPAA1115
BPPCT028
CGPPB5971
Nature Genetics: doi:10.1038/ng.2586
75
Supplementary Figure 6. Plots of genetic-by-physical distances. In each plot, physical distance along the
indicated pseudomolecules is on the horizontal axis (Mb), and genetic distance is on the vertical axis (in cM).
Dots show the locations of markers from the T x E, P x F and C x A linkage maps on the sequence. The vertical
bars indicate the putative position of the centromeric regions.
0 5 10 15 20 25 30 35 40 45 50
0102030405060708090
100110120130140150160
0 5 10 15 20 25 30
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
TxE
CxA
PxFcM
Mb
Pp1
cM
Mb
Pp2
Putative centromere position
cM
Mb
Pp3
cM
Mb
Pp4
cM
Mb
Pp5
cM
Mb
Pp6
cM
Mb
Pp7
cM
Mb
Pp8
Nature Genetics: doi:10.1038/ng.2586
76
Supplementary Figure 7. Distribution of the nucleotide distances (Kimura 2 parameters method)
calculated for the LTR pair of each complete LTR-RT element. On the X-axis the nucleotide distances. On
the Y-axes the corresponding amount of elements.
Nature Genetics: doi:10.1038/ng.2586
77
Supplementary Figure 8. N-J tree for the Ty1-copia elements. Reverse transcriptase aminoacidic
sequences were used. Bootstrap values were calculated on 1,000 replicates. Strawberry sequences are indicated
by red dots, peach sequences by yellow dots.
Nature Genetics: doi:10.1038/ng.2586
78
Supplementary Figure 9. N-J tree for the Ty3-gypsy elements. Reverse transcriptase aminoacidic
sequences were used. Bootstrap values were calculated on 1,000 replicates. Strawberry sequences are indicated
by red dots, peach sequences by yellow dots.
Nature Genetics: doi:10.1038/ng.2586
79
Supplementary Figure 10. UPGMA consensus tree based on aminoacid sequences of sugar transporters
of Vitis128
, Arabidopsis129
and peach. Bootstrap values for 1,000 replicates are indicated at each node and
branches with less than 50% bootstrap replicates were collapsed. Asterisks indicate transcripts expressed in
fruit. Information on gene expression derives from Illumina mRNASeq data
(http://services.appliedgenomics.org/fgb2/iga/prunus_public/gbrowse/prunus_public/) and from macroarray
expression profiling128
for peach and grape, respectively.
Nature Genetics: doi:10.1038/ng.2586
80
Supplementary Figure 11. EHs phylogenetic tree of the 7 genomes. Groups are defined as follows: group1 red, group2 green, group3 blue. Mid-
point rooted tree. The last common ancestor of the most distant leaves is the current root node.
Nature Genetics: doi:10.1038/ng.2586
81
Supplementary Figure 12. Phylogeny of Rosaceae as in Potter et al.56
. In blue is reported the phylogeny of
the sub-family Spiraeoideae where species can transport and accumulate polyols. Expansions (or reductions) of
the gene number of some families are mapped and indicated with A, B, C, D.
A. expansion of SDH (sorbitol dehydrogenase), SOT (sorbitol transporters). B. expansion of HCT/HQT
(hydroxycinnamoyl-Co A shikimate/quinate hydroxyl-cinnamoyltransferase), PR1(pathogenesis-related gene1),
PR10 (RNase, pathogenesis-related). C. expansion of A6PR (aldose 6-P reductase), Cobra (cellulose
biosynthesis), CSL (mannan synthase) and phloroglucinol o-methyltransferase. D. reduction of gene number of
the family 1-aminocyclopropane-1-carboxylate oxidase (ACO).
Nature Genetics: doi:10.1038/ng.2586
82
Supplementary Figure 13. Distribution of nucleotide diversity in 50 kb non overlapping sliding
windows. Observed: observed distribution of nucleotide diversity. Expected: distribution of nucleotide
diversity expected under Poisson distribution.
Supplementary Figure 14. A neighbor joining phylogenetic tree constructed using genome-wide SNP
data.
Nature Genetics: doi:10.1038/ng.2586
83
Supplementary Figure 15. Minor Allele Frequency (MAF) for SNPs in different genomic
compartments. MAFs at non-synonymous loci (replacement), synonymous loci (synonymous), untranslated
regions (UTR), intergenic regions (intergenic), introns (intron) are shown, grouped in bins of size 0.1.
Nature Genetics: doi:10.1038/ng.2586
84
Supplementary Figure 16. Distribution of Tajima's D estimates for non-overlapping 50 kb windows.
Whole genome (Blue), in coding, synonymous sites (Red) and in coding, replacement sites (Black). Tajima’s D
values were grouped in bins of size 0.02.
Nature Genetics: doi:10.1038/ng.2586
85
Supplementary Figure 17.
Nature Genetics: doi:10.1038/ng.2586
86
Supplementary Figure 17. Mean LD levels as measured by pairwise r2 for the whole genome (panels 1
and 2), for each pseudomolecule (panels 3 to 10) and for two specific genomic regions (panels 11 and 12).
The solid line represents the nonlinear regression of r2 on distance between sites in base pairs based on the Hill
and Weir130
expectation between adjacent sites. The dotted line represents the mean r2 in 100 bp windows. The
box plots represent for each 5 kb window the median values (black band), the 25 and 75 percentiles (red box)
and the lowest and the highest datum still within 1.5 IQR of the upper quartile (end of whiskers). U = unlinked
sites. Box plots in panel 2 are drawn for window sizes of 100 bp each.
Nature Genetics: doi:10.1038/ng.2586
87
a b
c d
Supplementary Figure 18.
TR1
TR1
TR1
TR2
TR2
TR2
TR3TR3
TR3
TR4
TR4
TR4
Nature Genetics: doi:10.1038/ng.2586
88
e f
g h
Supplementary Figure 18. Duplicated and triplicated regions in peach genome. Each line links duplicated
regions in peach. PC stands for Prunus chromosome. The seven different colors represent each linkage group of
the eudicot ancestor existed prior to the hexaploidization. Peach genomic regions were colored by their
orthology to grape genome. The lines are colored by the paralogous regions and the order of precedence when
paralogous regions have different ancestral origins are colors of TR1, TR2, TR3, TR4, TR5, TR6, TR6 and
grey. a-g) each paralogous region that was originated from each linkage group of pre-hexaploid ancestor. h)
paralogous region that resides in sub-genomic regions with unidentified ancestral origin.
TR7
TR7
TR7
TR6
TR6TR6
TR6
TR5
TR5
TR5
Nature Genetics: doi:10.1038/ng.2586
89
a b
Supplementary Figure 19. Dot plots of the duplicated regions. a) peach genome b) grape genome. The
figures were generated using SynMap131
at CoGe website using BLASTZ132
and DAGchanier133
as underlying
software. The seven sets of arrows (a) show some large triplicated regions in peach genome.
14 18 5 13 19 4 1 12 8 16 7 15 11 10 179 6 23
14
18
5
13
19
4
1
9
12
8
16
6
7
15
11
3
2
10
17
PC1 PC4 PC6 PC2 PC7 PC3 PC8 PC5
PC1
PC4
PC6
PC2
PC7
PC3
PC8
PC5
TR1
TR2
TR3
TR4
TR5
TR6
TR7
Nature Genetics: doi:10.1038/ng.2586
90
a b1 c
Triplication
Chromosomebreakage Grape-peach
separation
More rearrangementsin peach genome
b2
a
a
b1
b1
b2
b2
c
c
TR6
TR6
TR6
TR6
TR6
Supplementary Figure 20. Schematic diagram of an orthologous region in peach and grape. It shows
further genome rearrangements in each lineage after the shared paleo-hexaploidy event. PC stands for Prunus
chromosome and VC stands for Vitis chromosome.
Nature Genetics: doi:10.1038/ng.2586
91
TR1
TR1
TR1
TR5
TR5
TR5
TR3
TR3
TR3
TR7TR7
TR7
TR4
TR4
TR4
TR2
TR2
TR2
a b
c d
e f
Supplementary Figure 21. The orthology map between grape and peach genome. Each diagram (a to f) shows the
subchromosomal regions that have been originated from a single linkage group of the paleo-hexaploid ancestor. The
orthology map shown in Supplementary Figure 20 is not shown here. PC stands for Prunus chromosome and VC stands
for Vitis chromosome.
Nature Genetics: doi:10.1038/ng.2586
92
a b
Supplementary Figure 22. Comparison between three paralogous regions in peach and their
orthologous regions in grape and poplar genomes. pp, pt and vv stand for Prunus persica, Populus
trichocarpa and Vitis vinifera, respectively. a) Each paralogous region in peach is orthologous to different
regions in grape, suggesting the triplication preceded the diversion of peach and grape. b) Each paralogous
region in peach matches to different regions in poplar, also suggesting that the triplication predates the
peach/poplar separation. Each peach region matches to two different regions in poplar, supporting the WGD
event in poplar genome after the peach/poplar separation.
Nature Genetics: doi:10.1038/ng.2586