Evolutionary Genetics of Complex Genome
-
Upload
jrossibarra -
Category
Science
-
view
495 -
download
0
Transcript of Evolutionary Genetics of Complex Genome
Evolutionary Genetics of a Complex Plant Genome
Jeffrey Ross-Ibarra @jrossibarra • www.rilab.org
Dept. Plant Sciences • Center for Population Biology • Genome Center University of California Davis
https://commons.wikimedia.org/wiki/File:Diversity_of_plants_image_version_5.png
hard sweep
how do genomes adapt?
hard sweep
how do genomes adapt?
hard sweep
how do genomes adapt?
hard sweep
multiple mutations
“soft” sweeps
how do genomes adapt?
hard sweep
multiple mutations
standing variation
“soft” sweeps
how do genomes adapt?
M T G P H R L
GGTCGAC ATG ACT GGT CCA CAT CGA CTG TAG
M T G P H R L
GGTCGAC ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
GGTCGAC ATG ACT GAT CCA CAT CGA CTG TAG
structural change to protein
M T G P H R L
GGTAAAC ATG ACT GGT CCA CAT CGA CTG TAG
GG—-AC ATG ACT GGT CCA CAT CGA CTG TAG
regulatory change to expression
Lowry & Willis 2010 PLoS Biology
Gaut and Ross-Ibarra 2008
Kew C-Value Database
Paris Japonica150GB Genome
Genlisia aurea63MB Genome Michal Rubeš
Michal Rubeš
1.5
2.5
3.5
4.5
Angiosperm average
6400 Mb
Non-TE DNATE DNA
Lo
g (
ge
no
me
siz
e in
Mb
)
0
1,500
3,000
4,500
6,000
0 1500 3000 4500 6000
Genom
e s
ize (
Mb)
TE content (Mb)
r = 0.99
Ara
bid
opsis
thalia
na
Ara
bid
opsis
lyra
ta
Bra
chypodiu
m d
ista
chyon
Papaya
Ric
e
Lotu
s japonic
us
Bla
ck c
ottonw
ood
Gra
pevin
e
Cabbage
Medic
ago tru
ncula
ta
Sorg
hum
Soybean
Levant cotton
Maiz
e
Aegilo
ps s
peltoid
es
Barley
Thursday, May 6, 2010
Figure 1 _ Main Text
Tenaillon et al. 2010 TIP
Suketoshi Taba
44.5 Mb 44.6 Mb 44.7 Mb 44.8 Mb 44.9 Mb 45 Mb
Gen
eLT
R Re
trotra
nspo
son maize - 2300 Mb
50 kb
7.4 Mb 7.5 Mb 7.6 Mb 7.7 Mb 7.8 Mb 7.9 Mb
50 kb
Gen
eLT
R Re
trotra
nspo
son arabidopsis - 130 Mb
Zea maysA. thaliana
Angiosperm 1C genome size (Mb)
Mb
DN
A
1
10
100
1000
10000
Arabidopsis Maize
15.5
3.4
7050
2,300
135
GenomeCDsIntergenic open chromatin
Sullivan et al. Cell Reports 2014 Rodgers-Melnick et al. PNAS 2016
Mb
DN
A
1
10
100
1000
10000
Arabidopsis Maize
15.5
3.4
7050
2,300
135
GenomeCDsIntergenic open chromatin "Functional" DNA
0%
25%
50%
75%
100%
Arabidopsis maize
81%93%
19%7%
IntergenicCDs
Sullivan et al. Cell Reports 2014 Rodgers-Melnick et al. PNAS 2016
Ne individuals, µ beneficial mutation rate per trait
bigger genome, larger mutation target, higher µ
predict that larger genomes adapt via standing variation, noncoding variants
Ne individuals, µ beneficial mutation rate per trait
bigger genome, larger mutation target, higher µ
predict that larger genomes adapt via standing variation, noncoding variants
selection from standing variation when 2Neµ > 1
Ne individuals, µ beneficial mutation rate per trait
bigger genome, larger mutation target, higher µ
predict that larger genomes adapt via standing variation, noncoding variants
selection from standing variation when 2Neµ > 1
larger % of µ should be noncoding
maizeteosinte
1 2 3 4 5
6 7 8 9 10
Briggs et al. 2007 Genetics
1 2 3 4 5
6 7 8 9 10
tb1
Studer et al. 2011 Nature Genetics.; Vann et al. 2015 PeerJ
©20
11 N
atur
e A
mer
ica,
Inc.
All
righ
ts r
eser
ved.
NATURE GENETICS ADVANCE ONLINE PUBLICATION 3
L E T T E R S
mutation rate21, strongly suggesting that the Hopscotch insertion (and thus, the older Tourist as well) existed as standing genetic variation in the teosinte ancestor of maize. Thus, we conclude that the Hopscotch insertion likely predated domestication by more than 10,000 years and the Tourist insertion by an even greater amount of time.
We identified four fixed differences in the portion of the proximal and distal components of the control region that show evidence of selection. We used transient assays in maize leaf protoplasts to test all four differences for effects on gene expression. Maize and teosinte chromosomal segments for the portions of the proximal and distal components with these four differences were cloned into reporter constructs upstream of the minimal promoter of the cauliflower mosaic virus (mpCaMV), the firefly luciferase ORF and the nopaline synthase (NOS) terminator (Fig. 4). Each construct was assayed for luminescence after transformation by electroporation into maize pro-toplast. The constructs for the distal component contrast the effects of the Tourist insertion plus the single fixed nucleotide substitution that distinguish maize and teosinte. Both the maize and teosinte constructs for the distal component repressed luciferase expression
relative to the minimal promoter alone. The maize construct with Tourist excised gave luciferase expression equivalent to the native maize and teosinte constructs and less expression than the minimal promoter alone. These results indicate that this segment is function-ally important, acting as a repressor of luciferase expression and, by inference, of tb1 expression in vivo. However, we did not observe any difference between the maize and teosinte constructs as anticipated. One possible cause for the lack of differences in expression between the maize and teosinte constructs might be that additional proteins required to cause these differences are not present in maize leaf pro-toplast. Another possibility is that the factor affecting phenotype in the distal component lies in the unselected region between −64.8 and −69.5 kb, which is not included in the construct. Nevertheless, the results do indicate that the distal component has a functional element that acts as a repressor. The functional importance of this segment is supported by its low level of nucleotide diversity (Fig. 3a), suggesting a history of purifying selection.
The constructs for the proximal component of the control region contrast the effects of the Hopscotch insertion plus a single fixed nucleo-tide substitution that distinguish maize and teosinte. The construct with the maize sequence including Hopscotch increased expression of the luciferase reporter twofold relative to the teosinte construct for the proximal control region and the minimal promoter alone (Fig. 4). Luciferase expression was returned to the level of the teosinte con-struct and the minimal promoter construct by deleting the Hopscotch element from the full maize construct. These results indicate that the Hopscotch element enhances luciferase expression and, by
a
b
0.06
A B C D M
T
P = 0.95 P = 0.41 P = 0.04
HKA neutrality tests
P 0.0001
0.04
0.02
0–67 kb –66 kb
Distalcomponent
Teosinte clusterhaplotype
Maize clusterhaplotype
Proximalcomponent
–65 kbTourist408 bp
Hopscotch4,885 bp
–64 kb –58 kb
Figure 3 Sequence diversity in maize and teosinte across the control region. (a) Nucleotide diversity across the tb1 upstream control region. Base-pair positions are relative to AGPv2 position 265,745,977 of the maize reference genome sequence. P values correspond to HKA neutrality tests for regions A–D, as defined by the dotted lines. Green shading signifies evidence of neutrality, and pink shading signifies regions of non-neutral evolution. Nucleotide diversity ( ) for maize (yellow line) and teosinte (green line) were calculated using a 500-bp sliding window with a 25-bp step. The distal and proximal components of the control region with four fixed sequence differences between the most common maize haplotype and teosinte haplotype are shown below. (b) A minimum spanning tree for the control region with 16 diverse maize and 17 diverse teosinte sequences. Size of the circles for each haplotype group (yellow, maize; green, teosinte) is proportional to the number of individuals within that haplotype.
Transient assay constructs
mpCaMV luc
luc
luc
luc
luc
luc
luc
luc
Hopscotch
Tourist
mpCaMV
T-dist
M-dist
T-prox
M-prox
0 0.5 1.0 1.5 2.0
∆M-dist
∆M-proxPro
xim
al c
ontr
ol r
egio
nD
ista
l con
trol
reg
ion
Relative expression
Figure 4 Constructs and corresponding normalized luciferase expression levels. Transient assays were performed in maize leaf protoplast. Each construct is drawn to scale. The construct backbone consists of the minimal promoter from the cauliflower mosaic virus (mpCaMV, gray box), luciferase ORF (luc, white box) and the nopaline synthase terminator (black box). Portions of the proximal and distal components of the control region (hatched boxes) from maize and teosinte were cloned into restriction sites upstream of the minimal promoter. “ ” denotes the excision of either the Tourist or Hopscotch element from the maize construct. Horizontal green bars show the normalized mean with s.e.m. for each construct.
relative expressionconstruct
1 2 3 4 5
6 7 8 9 10
tb1
Figure 2 Map of parviglumis Populations and Hopscotch allele frequency. Map showing the frequencyof the Hopscotch allele in populations of parviglumis where we sampled more than 6 individuals. Size ofcircles reflects number of individuals sampled. The Balsas River is shown, as the Balsas River Basin isbelieved to be the center of domestication of maize.
as our independent trait for phenotyping analyses. SAS code used for analysis is available athttp://dx.doi.org/10.6084/m9.figshare.1166630.
RESULTSGenotyping for the Hopscotch insertionThe genotype at the Hopscotch insertion was confirmed with two PCRs for 837 individualsof the 1,100 screened (Table S1 and Table S2). Among the 247 maize landrace accessionsgenotyped, all but eight were homozygous for the presence of the insertion Withinour parviglumis and mexicana samples we found the Hopscotch insertion segregatingin 37 (n = 86) and four (n = 17) populations, respectively, and at highest frequencywithin populations in the states of Jalisco, Colima, and Michoacan in central-westernMexico (Fig. 2). Using our Hopscotch genotyping, we calculated diVerentiation betweenpopulations (FST) and subspecies (FCT) for populations in which we sampled sixteenor more chromosomes. We found that FCT = 0, and levels of FST among populationswithin each subspecies (0.22) and among all populations (0.23) (Table 1) are similar togenome-wide estimates from previous studies Pyhajarvi, HuVord & Ross-Ibarra, 2013.Although we found large variation in Hopscotch allele frequency among our populations,BayEnv analysis did not indicate a correlation between the Hopscotch insertion andenvironmental variables (all Bayes Factors < 1).
Vann et al. (2015), PeerJ, DOI 10.7717/peerj.900 8/21
Studer et al. 2011 Nature Genetics.; Vann et al. 2015 PeerJ
©20
11 N
atur
e A
mer
ica,
Inc.
All
righ
ts r
eser
ved.
NATURE GENETICS ADVANCE ONLINE PUBLICATION 3
L E T T E R S
mutation rate21, strongly suggesting that the Hopscotch insertion (and thus, the older Tourist as well) existed as standing genetic variation in the teosinte ancestor of maize. Thus, we conclude that the Hopscotch insertion likely predated domestication by more than 10,000 years and the Tourist insertion by an even greater amount of time.
We identified four fixed differences in the portion of the proximal and distal components of the control region that show evidence of selection. We used transient assays in maize leaf protoplasts to test all four differences for effects on gene expression. Maize and teosinte chromosomal segments for the portions of the proximal and distal components with these four differences were cloned into reporter constructs upstream of the minimal promoter of the cauliflower mosaic virus (mpCaMV), the firefly luciferase ORF and the nopaline synthase (NOS) terminator (Fig. 4). Each construct was assayed for luminescence after transformation by electroporation into maize pro-toplast. The constructs for the distal component contrast the effects of the Tourist insertion plus the single fixed nucleotide substitution that distinguish maize and teosinte. Both the maize and teosinte constructs for the distal component repressed luciferase expression
relative to the minimal promoter alone. The maize construct with Tourist excised gave luciferase expression equivalent to the native maize and teosinte constructs and less expression than the minimal promoter alone. These results indicate that this segment is function-ally important, acting as a repressor of luciferase expression and, by inference, of tb1 expression in vivo. However, we did not observe any difference between the maize and teosinte constructs as anticipated. One possible cause for the lack of differences in expression between the maize and teosinte constructs might be that additional proteins required to cause these differences are not present in maize leaf pro-toplast. Another possibility is that the factor affecting phenotype in the distal component lies in the unselected region between −64.8 and −69.5 kb, which is not included in the construct. Nevertheless, the results do indicate that the distal component has a functional element that acts as a repressor. The functional importance of this segment is supported by its low level of nucleotide diversity (Fig. 3a), suggesting a history of purifying selection.
The constructs for the proximal component of the control region contrast the effects of the Hopscotch insertion plus a single fixed nucleo-tide substitution that distinguish maize and teosinte. The construct with the maize sequence including Hopscotch increased expression of the luciferase reporter twofold relative to the teosinte construct for the proximal control region and the minimal promoter alone (Fig. 4). Luciferase expression was returned to the level of the teosinte con-struct and the minimal promoter construct by deleting the Hopscotch element from the full maize construct. These results indicate that the Hopscotch element enhances luciferase expression and, by
a
b
0.06
A B C D M
T
P = 0.95 P = 0.41 P = 0.04
HKA neutrality tests
P 0.0001
0.04
0.02
0–67 kb –66 kb
Distalcomponent
Teosinte clusterhaplotype
Maize clusterhaplotype
Proximalcomponent
–65 kbTourist408 bp
Hopscotch4,885 bp
–64 kb –58 kb
Figure 3 Sequence diversity in maize and teosinte across the control region. (a) Nucleotide diversity across the tb1 upstream control region. Base-pair positions are relative to AGPv2 position 265,745,977 of the maize reference genome sequence. P values correspond to HKA neutrality tests for regions A–D, as defined by the dotted lines. Green shading signifies evidence of neutrality, and pink shading signifies regions of non-neutral evolution. Nucleotide diversity ( ) for maize (yellow line) and teosinte (green line) were calculated using a 500-bp sliding window with a 25-bp step. The distal and proximal components of the control region with four fixed sequence differences between the most common maize haplotype and teosinte haplotype are shown below. (b) A minimum spanning tree for the control region with 16 diverse maize and 17 diverse teosinte sequences. Size of the circles for each haplotype group (yellow, maize; green, teosinte) is proportional to the number of individuals within that haplotype.
Transient assay constructs
mpCaMV luc
luc
luc
luc
luc
luc
luc
luc
Hopscotch
Tourist
mpCaMV
T-dist
M-dist
T-prox
M-prox
0 0.5 1.0 1.5 2.0
∆M-dist
∆M-proxPro
xim
al c
ontr
ol r
egio
nD
ista
l con
trol
reg
ion
Relative expression
Figure 4 Constructs and corresponding normalized luciferase expression levels. Transient assays were performed in maize leaf protoplast. Each construct is drawn to scale. The construct backbone consists of the minimal promoter from the cauliflower mosaic virus (mpCaMV, gray box), luciferase ORF (luc, white box) and the nopaline synthase terminator (black box). Portions of the proximal and distal components of the control region (hatched boxes) from maize and teosinte were cloned into restriction sites upstream of the minimal promoter. “ ” denotes the excision of either the Tourist or Hopscotch element from the maize construct. Horizontal green bars show the normalized mean with s.e.m. for each construct.
relative expressionconstruct
Wang et al. 2005 Nature Wang et al 2015 Genetics
1 2 3 4 5
6 7 8 9 10
Figure 1.Phenotypes. a. Maize ear showing the cob (cb) exposed at top. b. Teosinte ear with the rachisinternode (in) and glume (gl) labeled. c. Teosinte ear from a plant with a maize allele of tga1introgressed into it. d. Close-up of a single teosinte fruitcase. e. Close-up of a fruitcase fromteosinte plant with a maize allele of tga1 introgressed into it. f. Ear of maize inbred W22(Tga1-maize allele) with the cob exposed showing the small white glumes at the base. g. Earof maize inbred W22:tga1 which carries the teosinte allele, showing enlarged (white) glumes.h. Ear of maize inbred W22 carrying the tga1-ems1 allele, showing enlarged glumes. For highermagnification copies of f–h see Supplementary Information.
Wang et al. Page 10
Nature. Author manuscript; available in PMC 2006 May 23.
NIH
-PA
Author M
anuscriptN
IH-P
A A
uthor Manuscript
NIH
-PA
Author M
anuscript
tga1tb1
Wang et al. 2005 Nature Wang et al 2015 Genetics
1 2 3 4 5
6 7 8 9 10
Figure 1.Phenotypes. a. Maize ear showing the cob (cb) exposed at top. b. Teosinte ear with the rachisinternode (in) and glume (gl) labeled. c. Teosinte ear from a plant with a maize allele of tga1introgressed into it. d. Close-up of a single teosinte fruitcase. e. Close-up of a fruitcase fromteosinte plant with a maize allele of tga1 introgressed into it. f. Ear of maize inbred W22(Tga1-maize allele) with the cob exposed showing the small white glumes at the base. g. Earof maize inbred W22:tga1 which carries the teosinte allele, showing enlarged (white) glumes.h. Ear of maize inbred W22 carrying the tga1-ems1 allele, showing enlarged glumes. For highermagnification copies of f–h see Supplementary Information.
Wang et al. Page 10
Nature. Author manuscript; available in PMC 2006 May 23.
NIH
-PA
Author M
anuscriptN
IH-P
A A
uthor Manuscript
NIH
-PA
Author M
anuscript
tga1tb1
1 2 3 4 5
6 7 8 9 10
gt1 tga1
Wills et al. 2013 PLoS Genetics
tb1
1 2 3 4 5
6 7 8 9 10
gt1 tga1
Wills et al. 2013 PLoS Genetics
teosinte maizeClint Whipple, BYU
tb1
1 2 3 4 5
6 7 8 9 10
gt1 tga1
Wills et al. 2013 PLoS Genetics
tb1
T/TM/TM/M
T/TM/TM/M
A B
T/TM/TM/M
T/TM/TM/M
A B
3’ UTR
5’ control region
hard sweep
M T N P H R L
GGTCGA ATG ACT GAT CCA CAT CGA CTG TAG
tga1 gt1 tb1
Multiple Mutations
Standing Variation
M T G P H R L
GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG
Hufford et al. 2012 Nat. Gen. Chia et al. 2012 Nat. Gen
13 teosinte 23 maizegenomes:
Hufford et al. 2012 Nat. Gen. Chia et al. 2012 Nat. Gen
13 teosinte 23 maizegenomes:
5-10% selected regions intergenic
whereas others are lost after domestication (Fig. 3B). It should benoted that many of these genes have unique coexpression edges inmaize that are not observed in teosinte (Fig. S4B).
Expression data provide an opportunity to investigate furtherfunctional alterations to genes located within genomic regionsthat population genomic analyses identify as targets of selective
E
DE(n=612)
AEC(n=1115)
Dom/Imp genes(n=1761)
292 230750
894644
1582
A
B
Teosinte network edges Maize network edges
D
C
GRMZM2G068436
GRMZM2G137947
GRMZM2G375302
Mb
Mb
Fig. 3. Analysis of genes with altered expression or conservation and targets of selection during improvement and/or domestication. (A) Venn diagramshowing the overlap between DE genes, AEC genes, and the genes that occur in genomic regions that have evidence for selective sweeps during maizedomestication or improvement (Dom/Imp genes). (B) Teosinte coexpression networks for three genes (GRMZM2G068436, GRMZM2G137947, andGRMZM2G375302). (Right) Edges that are maintained in maize coexpression networks are shown. Although the differentially expressed gene (red node) ishighly connected in teosinte, most of these connections are lost in maize. However, some parts of the teosinte network are still conserved in maize. (C) Cross-population composite likelihood ratio test (XP-CLR) plot shows the evidence for a selective sweep that occurs on chromosome 9. The tick marks along the xaxis represent genes, and the red tick mark indicates the gene (GRMZM2G448355) that was chosen as the candidate target of selection and is differentiallyexpressed in maize and teosinte. The bar plot underneath the graph shows the expression levels of all maize (blue) and teosinte (red) samples. (D) XP-CLR plotfor a large region on chromosome 5. The candidate target of selection is indicated in green and shows similar expression in maize and teosinte. Two othergenes (red) exhibit DE. (E) Neighbor-joining tree shows the relationships among the haplotypes at GRMZM2G141858. (Right) Bar plot shows expression levelsfor each genotype; red bars indicate teosinte genotypes, and blue bars represent maize genotypes. At least one teosinte genotype (TIL15) contains thehaplotype that has been selected in maize and has expression levels similar to maize genotypes.
Table 2. Genes in selected regions with evidence for DE or AEC
Gene listNo. genes selectedduring dom/imp
% up-regulatedin maize Significance
% higher connectedin maize % candidates
AEC and DE (n = 276) 46 76 0.0002 41.3 39.1DE only (n = 336) 44 61 0.0230 40.9 22.7AEC only (n = 839) 89 54 0.1837 57.3 32.6
dom, domestication; imp, improvement.
4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1201961109 Swanson-Wagner et al.
ExpressionGenealogy
teosintemaize
• ~500 selected regions
• 11M shared vs 3000 fixed SNPs
• Candidates differentially expressed, decreased expression variation
selection on regulatory sequence, standing variation
Hufford et al. 2012 Nat. Gen. Swanson-Wagner et al. 2012 PNAS
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
S.Americalowland
6,000BP
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
S.Americalowland
6,000BP
S.AmericaHighland
4,000BP
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico
phot
o by
Mon
thon
Wac
hira
sett
akul
Andes
phot
o by
Mat
t H
uffo
rd
Beissinger et al. Unpublished
SA MEX SA MEX
SA MEX SA MEX SA MEX SA
Ear Height Plant Height
Tassel Br. Number
T
Days to AnthesisSA MEX SA MEX
SA MEX SA MEX
LowlandHighland
Beissinger et al. Unpublished
Mexico Lowland
Mexico Highland
NA
NB
NC
N1 N2
N2P
tD tE
tF
NA
NB
NC
N1 N2
N2P
tD tE
tF
tmex
Nmex
NA
NB
NC
N1 N2
tD tE
tF
N3 N4
NC �ĮNA
N1 �ȕNC
N2 ����ȕ�NC
N2P� �ȖN2
NC �ĮNA
N1 �ȕNC
N2 ����ȕ�NC
N2P� �ȖN2
NC �ĮNA
N1 �ȕ1NC
N2 ����ȕ1�NC
N3 �ȕ2N2
N4 ����ȕ2�N2
N4P �ȖN4
tG
N4P Lowland Highland mexicana Mexico
Lowland SA
Lowland SA
Highland
Model IA Model IB Model II
Figure 2 Demographic models of maize low- and high-land populations. Parameters in bold were estimated inthis study. See text for details.
A HWE cut-off of P < 0.005 was used for each subpopu-lation due to our under-calling of heterozygotes. In total, weincluded 18,745 silent SNPs for the Mexican populations inModels IA and IB, 14,508 for the S. American populations inModel I and 11,305 for the Mexican lowland population andthe S. American populations in Model II. We obtained similarresults under more or less stringent thresholds for significance(P < 0.05 ⇠ 0.0005; data not shown), though the number ofSNPs was very small at P < 0.005. Demographic parameterswere inferred with the software �a�i (Gutenkunst et al. 2009),which uses a diffusion method to calculate an expected JFDand evaluates the likelihood of the data using a multinomialassumption.Model IA: This model is applied to the Mexican and S. Amer-ican populations. We assume the ancestral diploid popula-tion representing parviglumis follows a standard Wright-Fishermodel with constant size. The size of the ancestral popula-tion is denoted by NA. At tD generations ago, the bottleneckevent begins at domestication, and at tE generations ago, thebottleneck ends. The population size and duration of the bot-tleneck are denoted by NB and tB = tD � tE , respectively.The population size recovers to NC = ↵NA in the lowlands.Then, the highland population is differentiated from the low-land population at tF generations ago. The size of the low- andhighland populations at time tF is determined by a parameter� such that the population is divided by �NC and (1� �)NC .We assume that the population size in the lowlands is constantbut that the highland population experiences exponential ex-pansion after divergence: its current population size is � timeslarger than that at tF .isn’t this really a shrinking population in the lowlands, since �NC < NC ? wouldn’t
we want instead for lowlands to stay at NC and a new population branching off? how
much do we worry about this? actually, our conclusion holds when Iassumed the pop size of lowlands stays at NC . However, the
likelihood is a bit better in my original model.Model IB: We expand Model IA for the Mexican populationsby incorporating admixture from the teosinte mexicana to thehighland Mexican maize population. do we say ”Mexico population” or
”Mexican” (and thus ”South American”) ”population” throughout? as long as we’re
consistent probably OK either way. vote to Mexican population second
The time of differentiation between parviglumis and mexicanaoccurs at tmex generations ago. The mexicana population sizeis assumed to be constant at Nmex. At tF generations ago,the Mexican highland population is derived from admixturebetween the Mexican lowland population and a portion Pmex
from the teosinte mexicana .
Model II: The final model is for the Mexican lowland, S.American lowland and highland populations. This modelwas used for simulating SNPs with ascertainment bias (seebelow). At time tF , the Mexican and S. American lowlandpopulations are differentiated, and the sizes of populationsafter splitting are determined by �1. At time tG, S. Amer-ican lowland and highland populations are differentiated,and the sizes of populations at this time are determined by�2. As in Model IA, the S. American highland population isassumed to experience population growth with the parameter �.
Estimates of a number of our model parameters were avail-able from previous work. NA was set to 150,000 using esti-mates of the composite parameter 4NAµ ⇠ 0.018 from parvig-lumis (Eyre-Walker et al. 1998; Tenaillon et al. 2001, 2004;Wright et al. 2005; Ross-Ibarra et al. 2009) and an estimateof the mutation rate µ ⇠ 3 ⇥ 10
�8 (Clark et al. 2005) persite per generation. The severity of the domestication bottle-neck is represented by k = NB/tB (Eyre-Walker et al. 1998;Wright et al. 2005), and following Wright et al. (2005) we as-sumed k = 2.45 and tB = 1, 000 generations. Taking intoaccount archaeological evidence (Piperno et al. 2009), we as-sume tD = 9, 000 and tE = 8, 000. We further assumedtF = 6, 000 for Mexican populations in Models IA and IB(Piperno 2006), tF = 4, 000 for S. American populationsin Model lA (Perry et al. 2006; Grobman et al. 2012), andtmex = 60, 000, Nmex = 160, 000 (Ross-Ibarra et al. 2009),and Pmex = 0.2 (van Heerwaarden et al. 2011) for ModelIB. For both Models IA and IB, we inferred three parameters(↵, � and �), and, for Model II, we fixed tF = 6, 000 andtG = 4, 000 (Piperno 2006; Perry et al. 2006; Grobman et al.2012) and estimated the remaining four parameters (↵, �1, �2
and �).tF for model II is listed as 4,000 and 6,000 above. 6,000 is the number that matches
the lit best. is that what was used? if so, we should cite (Grobman et al. 2012) fixed
Differentiation between low- and highland popula-tions
We used our inferred demographic model to generate a nulldistribution of FST . As implemented in �a�i (Gutenkunst
4
Mexico Lowland
Mexico Highland
NA
NB
NC
N1 N2
N2P
tD tE
tF
NA
NB
NC
N1 N2
N2P
tD tE
tF
tmex
Nmex
NA
NB
NC
N1 N2
tD tE
tF
N3 N4
NC �ĮNA
N1 �ȕNC
N2 ����ȕ�NC
N2P� �ȖN2
NC �ĮNA
N1 �ȕNC
N2 ����ȕ�NC
N2P� �ȖN2
NC �ĮNA
N1 �ȕ1NC
N2 ����ȕ1�NC
N3 �ȕ2N2
N4 ����ȕ2�N2
N4P �ȖN4
tG
N4P Lowland Highland mexicana Mexico
Lowland SA
Lowland SA
Highland
Model IA Model IB Model II
Figure 2 Demographic models of maize low- and high-land populations. Parameters in bold were estimated inthis study. See text for details.
A HWE cut-off of P < 0.005 was used for each subpopu-lation due to our under-calling of heterozygotes. In total, weincluded 18,745 silent SNPs for the Mexican populations inModels IA and IB, 14,508 for the S. American populations inModel I and 11,305 for the Mexican lowland population andthe S. American populations in Model II. We obtained similarresults under more or less stringent thresholds for significance(P < 0.05 ⇠ 0.0005; data not shown), though the number ofSNPs was very small at P < 0.005. Demographic parameterswere inferred with the software �a�i (Gutenkunst et al. 2009),which uses a diffusion method to calculate an expected JFDand evaluates the likelihood of the data using a multinomialassumption.Model IA: This model is applied to the Mexican and S. Amer-ican populations. We assume the ancestral diploid popula-tion representing parviglumis follows a standard Wright-Fishermodel with constant size. The size of the ancestral popula-tion is denoted by NA. At tD generations ago, the bottleneckevent begins at domestication, and at tE generations ago, thebottleneck ends. The population size and duration of the bot-tleneck are denoted by NB and tB = tD � tE , respectively.The population size recovers to NC = ↵NA in the lowlands.Then, the highland population is differentiated from the low-land population at tF generations ago. The size of the low- andhighland populations at time tF is determined by a parameter� such that the population is divided by �NC and (1� �)NC .We assume that the population size in the lowlands is constantbut that the highland population experiences exponential ex-pansion after divergence: its current population size is � timeslarger than that at tF .isn’t this really a shrinking population in the lowlands, since �NC < NC ? wouldn’t
we want instead for lowlands to stay at NC and a new population branching off? how
much do we worry about this? actually, our conclusion holds when Iassumed the pop size of lowlands stays at NC . However, the
likelihood is a bit better in my original model.Model IB: We expand Model IA for the Mexican populationsby incorporating admixture from the teosinte mexicana to thehighland Mexican maize population. do we say ”Mexico population” or
”Mexican” (and thus ”South American”) ”population” throughout? as long as we’re
consistent probably OK either way. vote to Mexican population second
The time of differentiation between parviglumis and mexicanaoccurs at tmex generations ago. The mexicana population sizeis assumed to be constant at Nmex. At tF generations ago,the Mexican highland population is derived from admixturebetween the Mexican lowland population and a portion Pmex
from the teosinte mexicana .
Model II: The final model is for the Mexican lowland, S.American lowland and highland populations. This modelwas used for simulating SNPs with ascertainment bias (seebelow). At time tF , the Mexican and S. American lowlandpopulations are differentiated, and the sizes of populationsafter splitting are determined by �1. At time tG, S. Amer-ican lowland and highland populations are differentiated,and the sizes of populations at this time are determined by�2. As in Model IA, the S. American highland population isassumed to experience population growth with the parameter �.
Estimates of a number of our model parameters were avail-able from previous work. NA was set to 150,000 using esti-mates of the composite parameter 4NAµ ⇠ 0.018 from parvig-lumis (Eyre-Walker et al. 1998; Tenaillon et al. 2001, 2004;Wright et al. 2005; Ross-Ibarra et al. 2009) and an estimateof the mutation rate µ ⇠ 3 ⇥ 10
�8 (Clark et al. 2005) persite per generation. The severity of the domestication bottle-neck is represented by k = NB/tB (Eyre-Walker et al. 1998;Wright et al. 2005), and following Wright et al. (2005) we as-sumed k = 2.45 and tB = 1, 000 generations. Taking intoaccount archaeological evidence (Piperno et al. 2009), we as-sume tD = 9, 000 and tE = 8, 000. We further assumedtF = 6, 000 for Mexican populations in Models IA and IB(Piperno 2006), tF = 4, 000 for S. American populationsin Model lA (Perry et al. 2006; Grobman et al. 2012), andtmex = 60, 000, Nmex = 160, 000 (Ross-Ibarra et al. 2009),and Pmex = 0.2 (van Heerwaarden et al. 2011) for ModelIB. For both Models IA and IB, we inferred three parameters(↵, � and �), and, for Model II, we fixed tF = 6, 000 andtG = 4, 000 (Piperno 2006; Perry et al. 2006; Grobman et al.2012) and estimated the remaining four parameters (↵, �1, �2
and �).tF for model II is listed as 4,000 and 6,000 above. 6,000 is the number that matches
the lit best. is that what was used? if so, we should cite (Grobman et al. 2012) fixed
Differentiation between low- and highland popula-tions
We used our inferred demographic model to generate a nulldistribution of FST . As implemented in �a�i (Gutenkunst
4
Table 2 Inference of demographic parameters
Mexico Model I Model II
Likelihood �5592.80 Likelihood �4654.79
↵ 0.92 ↵ 1.5
� 0.38 � 0.76
� 1 � 1
South America Model I Model III
Likelihood �3855.28 Likelihood �8044.71
↵ 0.52 ↵ 1.0
� 0.97 �1 0.64
� 88 �2 0.95
� 54
Population structure
We performed a STRUCTURE analysis (Pritchard et al. 2000;Falush et al. 2003) of our landrace sample, varying the numberof groups from K = 2 to 6 (Figure 1, Figure S3). Most lan-draces were assigned to groups consistent with a priori popu-lation definitions, but admixture between highland and lowlandpopulations was evident at intermediate elevations (⇠ 1700m).Consistent with previously described scenarios for maize dif-fusion (Piperno 2006), we find evidence of shared ancestrybetween lowland Mexican maize and both Mexican highlandand S. American lowland populations. Pairwise FST amongpopulations reveals low overall differentiation (Table 1), andthe higher FST values observed in S. America are consistentwith decreased admixture seen in STRUCTURE. Archaeolog-ical evidence supports a more recent colonization of the high-lands in S. America (Piperno 2006; Perry et al. 2006; Grobmanet al. 2012), suggesting that the observed differentiation maybe the result of a stronger bottleneck during colonization of theS. American highlands.
Population differentiation under inferred demogra-phy
To provide a null expectation for allele frequency differentia-tion, we used the joint site frequency distribution (JFD) of low-land and highland populations to estimate parameters of twodemographic models using the maximum likelihood methodimplemented in �a�i (Gutenkunst et al. 2009). All models in-corporate a domestication bottleneck (Wright et al. 2005) andpopulation differentiation between lowland and highland popu-lations, but differ in their consideration of admixture and ascer-tainment bias (Figure 2; see Materials and Methods for details).
Estimated parameter values are listed in Table 2; while theobserved and expected JFDs were quite similar for both mod-els, residuals indicated an excess of rare variants in the ob-served JFDs in all cases (Figure 3). Under both models IA and
A
B
Lowlands
Hig
hlan
ds
Observation Expectation ResidualMexico
South America
40
–40
0
Model IA
Model IB
Density
Residual
10–4
0
10–310–210–1
Lowlands
Hig
hlan
ds
Observation Expectation Residual
40
–40
0
Model IA
Model II
Density
Residual
10–4
0
10–310–210–1
Figure 3 Observed and expected joint distributions of mi-nor allele frequencies in low- and highland populations in(A) Mexico and (B) S. America. Residuals are calculatedas (model � data)/
pmodel
IB, we found expansion in the highland population in Mexicoto be unlikely, but a strong bottleneck followed by populationexpansion is supported in S. American maize in both modelsIA and II. The likelihood value of model IB was higher thanthe likelihood of model IA by 850 units of log-likelihood (Ta-ble 2), consistent with analyses suggesting that introgressionfrom mexicana played a significant role during the spread ofmaize into the Mexican highlands (Hufford et al. 2013).
In addition to the parameters listed in Figure 2, we investi-gated the impact of varying the domestication bottleneck size(NB). Surprisingly, NB was estimated to be equal to NC , thepopulation size at the end of the bottleneck, and the likelihoodof NB < NC was much smaller than for alternative parame-terizations (Table 2, S2). This result appears to contradict ear-
7
Table 2 Inference of demographic parameters
Mexico Model I Model II
Likelihood �5592.80 Likelihood �4654.79
↵ 0.92 ↵ 1.5
� 0.38 � 0.76
� 1 � 1
South America Model I Model III
Likelihood �3855.28 Likelihood �8044.71
↵ 0.52 ↵ 1.0
� 0.97 �1 0.64
� 88 �2 0.95
� 54
Population structure
We performed a STRUCTURE analysis (Pritchard et al. 2000;Falush et al. 2003) of our landrace sample, varying the numberof groups from K = 2 to 6 (Figure 1, Figure S3). Most lan-draces were assigned to groups consistent with a priori popu-lation definitions, but admixture between highland and lowlandpopulations was evident at intermediate elevations (⇠ 1700m).Consistent with previously described scenarios for maize dif-fusion (Piperno 2006), we find evidence of shared ancestrybetween lowland Mexican maize and both Mexican highlandand S. American lowland populations. Pairwise FST amongpopulations reveals low overall differentiation (Table 1), andthe higher FST values observed in S. America are consistentwith decreased admixture seen in STRUCTURE. Archaeolog-ical evidence supports a more recent colonization of the high-lands in S. America (Piperno 2006; Perry et al. 2006; Grobmanet al. 2012), suggesting that the observed differentiation maybe the result of a stronger bottleneck during colonization of theS. American highlands.
Population differentiation under inferred demogra-phy
To provide a null expectation for allele frequency differentia-tion, we used the joint site frequency distribution (JFD) of low-land and highland populations to estimate parameters of twodemographic models using the maximum likelihood methodimplemented in �a�i (Gutenkunst et al. 2009). All models in-corporate a domestication bottleneck (Wright et al. 2005) andpopulation differentiation between lowland and highland popu-lations, but differ in their consideration of admixture and ascer-tainment bias (Figure 2; see Materials and Methods for details).
Estimated parameter values are listed in Table 2; while theobserved and expected JFDs were quite similar for both mod-els, residuals indicated an excess of rare variants in the ob-served JFDs in all cases (Figure 3). Under both models IA and
A
B
Lowlands
Hig
hlan
ds
Observation Expectation ResidualMexico
South America
40
–40
0
Model IA
Model IB
Density
Residual
10–4
0
10–310–210–1
Lowlands
Hig
hlan
ds
Observation Expectation Residual
40
–40
0
Model IA
Model II
Density
Residual
10–4
0
10–310–210–1
Figure 3 Observed and expected joint distributions of mi-nor allele frequencies in low- and highland populations in(A) Mexico and (B) S. America. Residuals are calculatedas (model � data)/
pmodel
IB, we found expansion in the highland population in Mexicoto be unlikely, but a strong bottleneck followed by populationexpansion is supported in S. American maize in both modelsIA and II. The likelihood value of model IB was higher thanthe likelihood of model IA by 850 units of log-likelihood (Ta-ble 2), consistent with analyses suggesting that introgressionfrom mexicana played a significant role during the spread ofmaize into the Mexican highlands (Hufford et al. 2013).
In addition to the parameters listed in Figure 2, we investi-gated the impact of varying the domestication bottleneck size(NB). Surprisingly, NB was estimated to be equal to NC , thepopulation size at the end of the bottleneck, and the likelihoodof NB < NC was much smaller than for alternative parame-terizations (Table 2, S2). This result appears to contradict ear-
7
lowlands
high
land
s
density
Mexico observed expected
95 samples ~100K SNPs
Takuno et al. 2015 Genetics
-Log
p-v
alue
Fst
S. A
mer
ica
-Log p-value Fst Mexico
shared SNPs
unique S. America
unique Mexico
Takuno et al. 2015 Genetics
-Log
p-v
alue
Fst
S. A
mer
ica
-Log p-value Fst Mexico
shared SNPs
unique S. America
unique Mexico
Takuno et al. 2015 Genetics
39%61%
IntergenicGenic
19%
81%
Standing VariationNew mutation
Pyhäjärvi et al. GBE 2013
Figures
��������
����� �������
���������
����������������
��������
���� �����
��������
������
������
������
�����������
��������������
��������������
���������
!�������������������"���
#$�$�� ����������������
%&����
��������'()(�'�))�'�*))*))�'�+)))+)))�'�+*))+*))�'�()))()))�'�(*))(*))�'�,))),)))�'�,*)),*))�'�*-./
�����������$0���� ����������������$0���������
�
1����+
32
Pyhäjärvi et al. GBE 2013
Pyhäjärvi et al. GBE 2013
environment allele frequency
Beissinger et al. 2016 Nature Plants (pending rev)
nucl
eotid
e di
vers
ity
distance to nearest substitution (cM)
hard sweeps in genes play minor role in Zea
Beissinger et al. 2016 Nature Plants (pending rev)
nucl
eotid
e di
vers
ity
distance to nearest substitution (cM)
hard sweeps in genes play minor role in Zea
Beissinger et al. 2016 Nature Plants (pending rev)
nucl
eotid
e di
vers
ity
distance to nearest substitution (cM)
hard sweeps in genes play minor role in Zea
Wallace et al. 2014 PLoS GeneticsRodgers-Melnick et al. 2016 PNAS
GWAS candidate SNPs
Wallace et al. 2014 PLoS GeneticsRodgers-Melnick et al. 2016 PNAS
Variance PartitioningGWAS candidate SNPs
how to adapt: Zea mays
M T G P H R L
GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG
noncoding/regulatory variationmultiple
mutations
“soft” sweeps
standing variation
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 ScienceRoss-Ibarra et al. 2009 Genetics
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 ScienceRoss-Ibarra et al. 2009 Genetics
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
dive
rsity
distance from substitution
Ross-Ibarra et al. 2009 Genetics
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
dive
rsity
distance from substitution
20% nonsyn. adaptive 10% nonsyn. adaptive
50% nonsyn. adaptive 40% nonsyn. adaptive
Ross-Ibarra et al. 2009 Genetics
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
dive
rsity
distance from substitution
Ross-Ibarra et al. 2009 Genetics
µ ∝ 2,500 Mbp µ ∝ 3,100 Mbp
µ ∝ 130 Mbp µ ∝ 220 Mbp
Pyhäjärvi et al. GBE 2013
enric
hmen
t no
<———
>yes
larger genomes enriched in noncoding adaptive variants
inte
rgen
ic
syno
nym
ous
nons
ynon
ymou
s
enric
hmen
t in
terg
enic
<———
>cod
ing
Hancock et al 2011 Science Fraser et al. 2013 Gen. Research
Pyhäjärvi et al. GBE 2013
larger genomes enriched in noncoding adaptive variants
enric
hmen
t in
terg
enic
<———
>cod
ing
exce
ss a
dapt
ive
SNPs
Hancock et al 2011 Science Fraser et al. 2013 Gen. Research
WHAT IS A TE?
Credit: Robert Martienssen, CSHL
Doebley 2004, Studer et al., 2011tb1Hopscotch
Doebley 2004, Studer et al., 2011tb1Hopscotch ZmCCT
CACTA
Yang et al., 2013
Mu
KNOTTED1 kn1
Greene, et al., 1994http://pmb.berkeley.edu/sites/default/files/users/Knotted1%20mutant.jpgDoebley 2004, Studer et al., 2011tb1
Hopscotch ZmCCTCACTA
Yang et al., 2013
Makarevitch et al. 2015 PLoS Genetics
Makarevitch et al. 2015 PLoS Genetics
new insertions activate expression
Makarevitch et al. 2014 bioRxiv
-0.5
0.5
1.5
2.5
Lines with the TE insertion
Lines without the TE insertion
GRMZM2G071206
Log 2
(stre
ss/c
ontro
l)
-202468
1012
Lines with the TE insertion
Lines without the TE insertion
-202468
1012
Log 2
(stre
ss/c
ontro
l) GRMZM2G400718 C
-0.50.00.51.01.52.0D
GRMZM2G102447
Lines with the TE insertion
Lines without the TE insertion
GRMZM2G108057
-202468
101214
Lines with the TE insertion
Lines without the TE insertion
GRMZM2G108149
A
B
Log 2
(stre
ss/c
ontro
l) Lo
g 2(s
tress
/con
trol)
E
Log 2
(stre
ss/c
ontro
l)
Lines with the TE insertion
Lines without the TE insertion
on September 9, 2014http://biorxiv.org/Downloaded from
-0.50.00.51.01.52.02.53.03.5
1 2 3 4 5 6 7 8 9 10
Oh43
B73 Mo17
- - + - - + - + - - ++ - - + - - + - - + - - + - - + - - + Gene
Log 2
(stre
ss/c
ontro
l)
TE presence
0%
20%
40%
60%
80%
100%
alaw
dagaf
etug flip
gyma
ipiki
jeli
joem
onnaiba
nihep
odoj
pebi
raider
riiryl
ubel
uwum
Zm00346
Zm02117
Zm03238
Zm05382
Salt
UV
Heat
Cold
B
A
Per
cent
of c
onse
rved
ge
nes
on September 9, 2014http://biorxiv.org/Downloaded from
***
****
*** *
new insertions activate expression
Makarevitch et al. 2014 bioRxiv
-0.5
0.5
1.5
2.5
Lines with the TE insertion
Lines without the TE insertion
GRMZM2G071206
Log 2
(stre
ss/c
ontro
l) -202468
1012
Lines with the TE insertion
Lines without the TE insertion
-202468
1012
Log 2
(stre
ss/c
ontro
l) GRMZM2G400718 C
-0.50.00.51.01.52.0D
GRMZM2G102447
Lines with the TE insertion
Lines without the TE insertion
GRMZM2G108057
-202468
101214
Lines with the TE insertion
Lines without the TE insertion
GRMZM2G108149
A
B
Log 2
(stre
ss/c
ontro
l) Lo
g 2(s
tress
/con
trol)
E
Log 2
(stre
ss/c
ontro
l)
Lines with the TE insertion
Lines without the TE insertion
on September 9, 2014http://biorxiv.org/Downloaded from
-0.50.00.51.01.52.02.53.03.5
1 2 3 4 5 6 7 8 9 10
Oh43
B73 Mo17
- - + - - + - + - - ++ - - + - - + - - + - - + - - + - - + Gene
Log 2
(stre
ss/c
ontro
l)
TE presence
0%
20%
40%
60%
80%
100%
alaw
dagaf
etug flip
gyma
ipiki
jeli
joem
onnaiba
nihep
odoj
pebi
raider
riiryl
ubel
uwum
Zm00346
Zm02117
Zm03238
Zm05382
Salt
UV
Heat
Cold
B
A
Per
cent
of c
onse
rved
ge
nes
on September 9, 2014http://biorxiv.org/Downloaded from
***
****
*** *
Fedoroff 2012, Wang and Dooner 2006
Homologous(loop)34%
Nopairing20%Nonhomologous46%
Maguire1966Gene=cs
Homologous(loop)34%
Nopairing20%Nonhomologous46%
Fang et al. Genetics 2012 Pyhäjärvi et al. GBE 2013
Figure S4 LD in chromosome 9 among mexicana populations based on SNPs with minor allele frequency >0.1.
Inv9d
Inv9e
Fang et al. Genetics 2012 Pyhäjärvi et al. GBE 2013
0.0
0.4
0.8
0 1000 2000Elevation (m)
Inve
rsio
n Fr
eque
ncy
Inv4n
Figure S4 LD in chromosome 9 among mexicana populations based on SNPs with minor allele frequency >0.1.
Inv9d
Inv9e
Fang et al. Genetics 2012 Pyhäjärvi et al. GBE 2013
0.0
0.4
0.8
0 1000 2000Elevation (m)
Inve
rsio
n Fr
eque
ncy
Inv4n
Figure S4 LD in chromosome 9 among mexicana populations based on SNPs with minor allele frequency >0.1.
Inv9d
Inv9eInv1n
Lauter et al. 2004 Genetics
Inv4n
mexicana parviglumis
Nielsen 2004; Nielsen et al. 2005; McVean 2007). However,the largest sweep identified in maize to date is only 1.1 Mb(Tian et al. 2009), and both the age of the inversion andcommon tests for departures from neutrality do not provideevidence of strong selection. Another alternative explana-tion would be the presence of strong negative interactionsbetween distantly linked loci, potentially due to syntheticlethality (Boone et al. 2007). Such interactions should notgenerate extended patterns of elevated LD among interven-ing SNPs, as crossing over among haplotypes not carryingalleles involved in the negative interaction should not beaffected. Both selective sweeps and negative interactionsare inconsistent with the presence of only two major haplo-types in the Inv1n region and fail to explain the clinal var-iation in haplotype frequencies seen at Inv1n-I.
To our knowledge, the only prior evidence for Inv1n isa report of high LD and high FST from a much smaller sam-ple of parviglumis (Hufford et al. 2012), but a number ofother large inversions have been previously reported inmaysand its wild relatives (Ting 1965, 1967, 1976; Maguire1966; Kato 1975). These include an !50-Mb inversion onthe long arm of chromosome 3 in Z. luxurians (Ting 1965)and an !35-Mb inversion that covers most of the short armof chromosome 8 in both mays (McClintock 1960) and mex-icana (Ting 1976). While some of these inversions wereexperimentally induced (McClintock 1931; Morgan 1950),several have also been identified in natural populations ofmultiple taxa (Kato 1975; Ting 1976).
One of the factors that may limit the geographic spread oflarge inversions is the potential fitness cost of crossing over.The frequency of chromosome loss is dependent on theinversion size and efficiency of synapsis over the inverted
region (Burnham 1962; Maguire and Riess 1994; Lamb et al.2007). When gene density is low, such as in pericentromericregions, or there is a lack of continuous homology, chromo-somes will often synapse in a nonhomologous manner with-out recombination (McClintock 1933). In maize, for example,an inversion on the long arm of chromosome 1 similar in sizeto Inv1n (19 cM) was seen to undergo homologous pairing inonly about one-third of cases (Maguire 1966). Since Inv1n islocated in a pericentromeric region with low gene density andcovers a short genetic distance (2–13 cM), we anticipatedthat it would rarely pair and recombine with a noninvertedchromosome. Our data are consistent with these arguments.We observed repressed recombination around Inv1n and nocytological evidence of crossing over in inversion heterozy-gotes. SNP data indicate no deviations from expected Hardy–Weinberg genotype frequencies at Inv1n, and we see noobvious evidence of effects on fertility. Given these observa-tions, we suspect that inversion polymorphisms may be rel-atively common in natural plant populations, especially inregions of the genome with low recombination rates suchas pericentromeres. Low recombination has also been offeredas an explanation for the lack of underdominance in manypericentromeric inversions in Drosophila (Coyne et al. 1993).As dense genotyping becomes more cost effective, we predictthat numerous common inversions will be identified in nat-ural populations of Zea and other organisms.
Origin and age of Inv1n
Our evidence suggests that Inv1n-I is the derived, invertedarrangement. Inv1n-I is not found in Tripsacum or Zea taxaexcept for parviglumis and mexicana (Figure 3C), and, un-like in Inv1n-S, all SNPs private to Inv1n-I are derived in
Figure 5 (A) Bayes factors for correlation between allelefrequencies and altitude in 33 natural parviglumis popula-tions. Inv1n is indicated by red vertical lines. The 99thpercentile of the distribution of Bayes factors is indicatedby a horizontal dashed line. Chromosomes 1–10 are plot-ted in order and in different colors. (B) Association be-tween all SNPs and culm diameter. SNPs significant at5% FDR are above the dashed line.
890 Z. Fang et al.
Fang et al. Genetics 2012 Hufford et al. PLoS Genetics 2013
culm diameter
macrohairs, anthocyanin
Inv1n
Pyhäjärvi et al. GBE 2013
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
Inv4nFst high vs. low elevation maize
Hufford et al. PLoS Gen 2013
4%ofB73
~8%absent
✓⇡
n�1X
i=1
1i
= Sreferencegenome~70%lowcopysequenceθπ~8%pairwisediff
1-S%pan-genomeinref
% r
eads
unm
appe
d re
ads
Goreetal.2009ScienceChiaetal2012NatGen
4%ofB73
~8%absent
✓⇡
n�1X
i=1
1i
= Sreferencegenome~70%lowcopysequenceθπ~8%pairwisediff
1-S%pan-genomeinref
% r
eads
unm
appe
d re
ads
Goreetal.2009ScienceChiaetal2012NatGen
0%#
20%#
40%#
60%#
80%#
100%#
Angle# Length# NLB# SLB# Width#
10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%
0#
2#
4#
6#
8#
10#
12#
14#
16#
18#
20#
Angle# Length# NLB# SLB# Width#
Intergenic#
Intronic#
500bp#
Upstream#
500bp#
Downstream#
3'#UTR#
NonHSyn#
Coding#
5'#UTR#
Splice#Site#
Syn#Coding#
0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#
Fold#Enrichment#
HapMapV2#SNPs#
HapMapV1#SNPs#
0#
5#
10#
15#
20#
25#
30#
35#
0# 50# 100# 150# 200# 250# 300#
pHvalue#(Hlog10)#
PosiVon#Along#Chr#1#(Mb)#
Intergenic# Intronic#SNPs#
UTR# UP/Down#Stream#
Syn#SNP# Splice#Site#
NonSyn#SNP# 10Kb#RDV#
Gene#RDV#
A.# B.# C.#
D.#
0%#
20%#
40%#
60%#
80%#
100%#
Angle# Length# NLB# SLB# Width#
10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%
0#
2#
4#
6#
8#
10#
12#
14#
16#
18#
20#
Angle# Length# NLB# SLB# Width#
Intergenic#
Intronic#
500bp#
Upstream#
500bp#
Downstream#
3'#UTR#
NonHSyn#
Coding#
5'#UTR#
Splice#Site#
Syn#Coding#
0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#
Fold#Enrichment#
HapMapV2#SNPs#
HapMapV1#SNPs#
0#
5#
10#
15#
20#
25#
30#
35#
0# 50# 100# 150# 200# 250# 300#
pHvalue#(Hlog10)#
PosiVon#Along#Chr#1#(Mb)#
Intergenic# Intronic#SNPs#
UTR# UP/Down#Stream#
Syn#SNP# Splice#Site#
NonSyn#SNP# 10Kb#RDV#
Gene#RDV#
A.# B.# C.#
D.#
0%#
20%#
40%#
60%#
80%#
100%#
Angle# Length# NLB# SLB# Width#
10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%
0#
2#
4#
6#
8#
10#
12#
14#
16#
18#
20#
Angle# Length# NLB# SLB# Width#
Intergenic#
Intronic#
500bp#
Upstream#
500bp#
Downstream#
3'#UTR#
NonHSyn#
Coding#
5'#UTR#
Splice#Site#
Syn#Coding#
0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#
Fold#Enrichment#
HapMapV2#SNPs#
HapMapV1#SNPs#
0#
5#
10#
15#
20#
25#
30#
35#
0# 50# 100# 150# 200# 250# 300#
pHvalue#(Hlog10)#
PosiVon#Along#Chr#1#(Mb)#
Intergenic# Intronic#SNPs#
UTR# UP/Down#Stream#
Syn#SNP# Splice#Site#
NonSyn#SNP# 10Kb#RDV#
Gene#RDV#
A.# B.# C.#
D.#
0%#
20%#
40%#
60%#
80%#
100%#
Angle# Length# NLB# SLB# Width#
10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%
0#
2#
4#
6#
8#
10#
12#
14#
16#
18#
20#
Angle# Length# NLB# SLB# Width#
Intergenic#
Intronic#
500bp#
Upstream#
500bp#
Downstream#
3'#UTR#
NonHSyn#
Coding#
5'#UTR#
Splice#Site#
Syn#Coding#
0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#
Fold#Enrichment#
HapMapV2#SNPs#
HapMapV1#SNPs#
0#
5#
10#
15#
20#
25#
30#
35#
0# 50# 100# 150# 200# 250# 300#
pHvalue#(Hlog10)#
PosiVon#Along#Chr#1#(Mb)#
Intergenic# Intronic#SNPs#
UTR# UP/Down#Stream#
Syn#SNP# Splice#Site#
NonSyn#SNP# 10Kb#RDV#
Gene#RDV#
A.# B.# C.#
D.#
fold
enr
ichm
ent
Renny-Byfield et al. In Prep
Chr 6 (Mb)
NOR repeat array
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
mixed model for selection on genome size
altitudemean
slope (selection)kinshipgenome size
error
β1 < 0 11MB decrease per 100 meter gained
Bilinski et al. In Prep
Bilinski et al. In Prep
Bilinski et al. In Prep
Bilinski et al. In Prep
bp o
f kno
b
Rayburn et al. 1994 Plant Breeding Francis et al. 2008. Ann. Bot.
cycle time that did not exceed 20 h compared with a muchgreater spread of cycle times for the monocots. If DNAmass per se is the limiting factor for cell cycle time, wehypothesize that cycle times would be the same for dicotsand monocots of comparable C-value. This is so even ifthe data for Scilla sibirica and Trillium grandiflorum are
excluded. Indeed, if we ignore the marked discontinuityof the y-axis caused by their inclusion, then the nucleotypiceffect is strong for all species regardless of phylogeny. Totest the rigour of these hypotheses would require data toplug the gap between Trillium grandiflorum and themajority of C-value/cell cycle times analysed here.
Separate plots for diploids and polyploids show a strongnucleotypic effect on CCT in diploids (Fig. 3; Table 2).Removing the five diploid outliers (.25 pg) reduced theslope (b ¼ 0.27) by approximately four-fold but theregression continued to be significant (P , 0.001). Forthe polyploids, a nucleotypic effect on CCT was alsodetected (Fig. 3; Table 2); however, removing the two poly-ploid outliers rendered the regression non-significant (y ¼0.03x 2 13.5). This confirms previous work in which theslope/rate of increase in CCT with increasing DNA washigher in diploids than in autopolyploids (Evans et al.,1972). With the exception of Scilla sibirica, CCT in poly-ploids is generally more buffered than in diploids (Fig. 3).
We acknowledge that some traditionally classifieddiploids are not necessarily so (see Soltis and Soltis,1999). For example, there are strong arguments that Zeamays is actually an allotetraploid (2n ¼ 4x ¼ 20; Gaut andDoebly, 1997). However, in the data reported here wehave assigned ploidy level as listed by the authors of thepapers and reviews we have consulted.
The longest CCTs (.20 h) are exhibited by the peren-nials (Fig. 4). Indeed, the data for perennials overall had anearly seven-fold steeper slope (b ¼ 1.37) than a compar-able regression for annuals (b ¼ 0.20; Table 2). Thesedata are consistent with findings of Bennett (1972) wherethe mean CCT in 19 annuals was significantly shorterthan in eight obligate perennials. Where our analysesdiffer from Bennett (1972) is in relation to the broadrange of CCTs shown by perennials compared withannuals (Fig. 4). However, in Fig. 4 the longer CCTs
FI G. 3. DNA C-value (pg) and cell cycle time (h) in the root apical mer-istem of a range of diploid and polyploid angiosperms. See Table 2 for
regression analyses.
FI G. 2. DNA C-value (pg) and cell cycle time (h) in the root apical mer-istem of a range of (A) eudicots and monocots (n ¼ 110), and (B) eudicots
(n ¼ 60). See Table 2 for regression analyses.
TABLE 2. Regression analyses of all data presented inFigs. 2–4 together with the percentage variance accountedfor by the regression (R2), the level of probability (P) for
each regression
Regression (y ¼ bx þ a) R2 P n
All measurements y ¼ 1.09x þ 5.39 54.2 *** 110Monocots y ¼ 1.29x þ 2.44 58.7 *** 48Eudicots y ¼ 0.32x þ 10.2 15.4 *** 62Diploids y ¼ 1.04x þ 4.95 49.86 *** 86Polyploids y ¼ 1.14x þ 3.12 56.3 *** 24Annuals y ¼ 0.20x þ 10.7 19.9 *** 75Perennials y ¼ 1.37x þ 4.13 63.6 *** 35
*** P , 0.001; n, number of replicates.
Francis et al. — DNA C-value and the Cell Cycle750
at University of C
alifornia, Davis - Library on February 19, 2013
http://aob.oxfordjournals.org/D
ownloaded from
late flowering
early flowering
0
10
20
30
100 105 110DNA
plants
cycle0
6
smaller genome, faster development?
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
Bilinski et al. In Prep
• Adaptation in maize occurs from standing variation and targets regulatory variants
• Large genomes may have more targets, more standing variation, and more regulatory adaptation
• Adaptation in complex plant genomes likely involves many kinds of variation including transposable elements, inversions, copy number variation, and even genome size?
Evolutionary Genetics in a Complex Genome
Kew C-Value Database
photo by lady_lbrty
Acknowledgments
Maize Diversity GroupPeter Bradbury
Ed Buckler John Doebley Theresa Fulton
Sherry Flint-Garcia Jim Holland
Sharon Mitchell Qi Sun
Doreen Ware
CollaboratorsCSI Davis
Nathan Springer
Lab AlumniTim Beissinger (USDA-ARS, Mizzou)
Kate Crosby (Monsanto) Matt Hufford (Iowa State)
Tanja Pyhäjärvi (Oulu) Shohei Takuno (Sokendai)
Joost van Heerwaarden (Wageningen)