Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

19
Thinking about domestication bottlenecks Jeffrey Ross-Ibarra www.rilab.org @jrossibarra rossibarra January 12, 2014

description

Some thoughts on bottlenecks and a bit of data from maize.

Transcript of Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Page 1: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Thinking about domestication bottlenecks

Jeffrey Ross-Ibarrawww.rilab.org

@jrossibarrarossibarra

January 12, 2014

Page 2: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Diversity lost in domestication bottlenecks

Ross-Ibarra et al. 2007

Page 3: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Bottleneck effects on the SFS

Page 4: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Bottlenecks can mimic selection

0 20 40 60 80 100

05

1015

2025

ππ

0 20 40 60 80 100

−2−1

01

2

kb

D

I Bottlenecks affect mean π and D, but also inflate variance

Page 5: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Duration and strength of bottleneck confounded?1

individuals. The tb1 likelihoods under these two values ofNb are 0.0004 and 0.085, respectively (fig. 2d), and an LRtest indicates that these values are significantly different(LR ¼ 10.72; P , 0.01). Hence, not surprisingly, the tb1

data do not fit the ‘‘neutral’’ bottleneck model, but neitherdo ts2 (LR¼4.97; P, 0.05), d8 (LR¼7.42; P, 0.01), andzagl1 (LR ¼ 22.55; P , 0.001). No LR tests weresignificant with data from putatively neutral loci (fig. 2a;data not shown).

Second, we formalized differences between themultilocus model and selected genes by simulation. Wesimulated each selected locus under the multilocus modeland then determined the probability that the observed Sfalls into the 95% confident interval of the simulateddistribution. Under both recombination estimates, werejected the multilocus scenario for tb1 ( p ¼ 0.0002 andp¼ 0.0003, with 4Nchud87 and 4Nchud01, respectively), ts2( p ¼ 0.008 and p ¼ 0.0196), d8 ( p ¼ 0.0021 and p¼ 0.0032), and zagl1 ( p ¼ 0.00 and p ¼ 0.00), stronglysuggesting that the bottleneck model alone does notaccount for the evolutionary history of these genes inmaize. In contrast, none of the eight putatively neutralgenes could be differentiated from the multilocus model bythis method (data not shown).

Genetic Diversity and Recombination

Previous studies reported a positive and significantcorrelation between recombination rate, measured by either4Nchud87 or Wall’s estimator (Wall 2000), and nucleotidediversity (h) in maize (r ¼ 0.65, P ¼ 0.007), based on 18putatively neutral loci. However, this same correlation wasnot significant when recombination was measured either by4Nchud01 or by a physical measure of recombination (R)(Tenaillon et al. 2002).

One of the questions we wanted ask was whether thepositive correlation observed in maize was also evident inparviglumis. Among seven neutrally evolving loci forwhich a 4N̂Nchud87 value could be determined (table 3), wefound no significant correlation between 4N̂Nchud87 and h inparviglumis (r ¼ 20.07, p ¼ 0.56). Similarly, h inparviglumis is correlated with neither R (r¼20.116; p¼0.37) nor 4N̂Nchud01 (r¼20.25, p¼ 0.27). By contrast, thecorrelation between 4N̂Nchud87 and h in this subset of sevenloci was still high in maize (r ¼ 0.58), but not significant( p¼0.32), probably reflecting a lack of power with a smallsample.

Using simulation, we explored whether the populationbottleneck could generate the positive correlation between4N̂Nchud87 and h in maize. For this purpose, we performed10,000 coalescent simulations under the best conditionsdefined for each of the seven values of d (fig. 3a). For eachcondition, the correlation between 4Ncsimul and hsimul wasdetermined among the seven neutral loci for which 4Nchud87could be estimated in parviglumis. We then compared thedistribution of r based on simulation to the observed r (¼0.58) in maize. At best, only 2% of simulations (236 of10,000) produced a correlation coefficient higher than theobserved correlation. It is thus possible, but quiteimprobable, that bottleneck effects created the observedcorrelation between 4N̂Nchud87 and h in maize.

FIG. 4.—Likelihood ratio values based on the multilocus likelihoodfor seven loci (a) or eight loci (b, c) as a function of both bottleneckduration in generations (d) and population size during the bottleneck (Nb).Three recombination conditions were explored (a) 4Nchud87, (b) 4Nchud01,and (c) no recombination. The fitting criterion was 620% of Smaize.

Maize Domestication 1221

by guest on January 10, 2014http://m

be.oxfordjournals.org/D

ownloaded from

I Can estimate k = NBd

I Size and durationconfounded

1Tenaillon et al. 2004 MBE

Page 6: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Duration and strength of bottleneck confounded?1

individuals. The tb1 likelihoods under these two values ofNb are 0.0004 and 0.085, respectively (fig. 2d), and an LRtest indicates that these values are significantly different(LR ¼ 10.72; P , 0.01). Hence, not surprisingly, the tb1

data do not fit the ‘‘neutral’’ bottleneck model, but neitherdo ts2 (LR¼4.97; P, 0.05), d8 (LR¼7.42; P, 0.01), andzagl1 (LR ¼ 22.55; P , 0.001). No LR tests weresignificant with data from putatively neutral loci (fig. 2a;data not shown).

Second, we formalized differences between themultilocus model and selected genes by simulation. Wesimulated each selected locus under the multilocus modeland then determined the probability that the observed Sfalls into the 95% confident interval of the simulateddistribution. Under both recombination estimates, werejected the multilocus scenario for tb1 ( p ¼ 0.0002 andp¼ 0.0003, with 4Nchud87 and 4Nchud01, respectively), ts2( p ¼ 0.008 and p ¼ 0.0196), d8 ( p ¼ 0.0021 and p¼ 0.0032), and zagl1 ( p ¼ 0.00 and p ¼ 0.00), stronglysuggesting that the bottleneck model alone does notaccount for the evolutionary history of these genes inmaize. In contrast, none of the eight putatively neutralgenes could be differentiated from the multilocus model bythis method (data not shown).

Genetic Diversity and Recombination

Previous studies reported a positive and significantcorrelation between recombination rate, measured by either4Nchud87 or Wall’s estimator (Wall 2000), and nucleotidediversity (h) in maize (r ¼ 0.65, P ¼ 0.007), based on 18putatively neutral loci. However, this same correlation wasnot significant when recombination was measured either by4Nchud01 or by a physical measure of recombination (R)(Tenaillon et al. 2002).

One of the questions we wanted ask was whether thepositive correlation observed in maize was also evident inparviglumis. Among seven neutrally evolving loci forwhich a 4N̂Nchud87 value could be determined (table 3), wefound no significant correlation between 4N̂Nchud87 and h inparviglumis (r ¼ 20.07, p ¼ 0.56). Similarly, h inparviglumis is correlated with neither R (r¼20.116; p¼0.37) nor 4N̂Nchud01 (r¼20.25, p¼ 0.27). By contrast, thecorrelation between 4N̂Nchud87 and h in this subset of sevenloci was still high in maize (r ¼ 0.58), but not significant( p¼0.32), probably reflecting a lack of power with a smallsample.

Using simulation, we explored whether the populationbottleneck could generate the positive correlation between4N̂Nchud87 and h in maize. For this purpose, we performed10,000 coalescent simulations under the best conditionsdefined for each of the seven values of d (fig. 3a). For eachcondition, the correlation between 4Ncsimul and hsimul wasdetermined among the seven neutral loci for which 4Nchud87could be estimated in parviglumis. We then compared thedistribution of r based on simulation to the observed r (¼0.58) in maize. At best, only 2% of simulations (236 of10,000) produced a correlation coefficient higher than theobserved correlation. It is thus possible, but quiteimprobable, that bottleneck effects created the observedcorrelation between 4N̂Nchud87 and h in maize.

FIG. 4.—Likelihood ratio values based on the multilocus likelihoodfor seven loci (a) or eight loci (b, c) as a function of both bottleneckduration in generations (d) and population size during the bottleneck (Nb).Three recombination conditions were explored (a) 4Nchud87, (b) 4Nchud01,and (c) no recombination. The fitting criterion was 620% of Smaize.

Maize Domestication 1221

by guest on January 10, 2014http://m

be.oxfordjournals.org/D

ownloaded from

I Can estimate k = NBd

I Size and durationconfounded

0

25

50

75

100

125

800 900 1000 1100 1200 1300π

coun

t

size

big

small

1Tenaillon et al. 2004 MBE

Page 7: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Duration and strength of bottleneck confounded?1

0

50

100

800 900 1000 1100 1200 1300pi

coun

t

model

constant

exp

I Can estimate k = NBd

I Size and durationconfounded

0

25

50

75

100

125

800 900 1000 1100 1200 1300π

coun

t

size

big

small

1Tenaillon et al. 2004 MBE

Page 8: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Duration and strength of bottleneck confounded?

0

50

100

0.0 0.3 0.6 0.9 1.2Tajima's D

coun

t

size

big

small

0

50

100

1000 1200 1400SNPs unique to domesticate

coun

t

size

big

small

0

25

50

75

100

125

800 900 1000 1100 1200 1300π

coun

t

size

big

small

Page 9: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Bottleneck effects vary over time

Page 10: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Previous estimates of the maize bottleneck

had smaller S than Smaize. In this way, we estimated theexpected value and the 95% lower confidence interval of !B,conditional on d, !A, !P, and Smaize. The 95% lower confidenceinterval of !B represents the minimum estimate of !B that isstatistically consistent with the observed data.

The second model represents two populations, one of whichhas experienced a domestication bottleneck (Fig. 1). Thismodel contains five parameters of interest: d, the duration ofthe bottleneck; t, the time the two populations diverged; !A, thepopulation parameter of both the ancestral population and thenonbottlenecked population; !B, the population parameterduring the bottleneck; and !P, the current ! of the populationthat experienced a bottleneck. This model assumes that diver-gence between populations was rapid, with no gene flowfollowing population divergence. The expected value andlower 95% confidence interval of !B was estimated as in modelI except that the number of polymorphisms shared betweenpopulations (R), rather than the number of segregating sites S,was compared between observed and simulated data. In short,the two coalescent models employ different summary statisticsfrom DNA sequence data to make inferences about !B.

For both models, Tajima’s D (26) was used as a measure of‘‘goodness-of-fit’’ between the coalescent model and the ob-served data. In all simulations, D from observed data wascompared with the distribution of D based on simulated data.If the observed D did not fall within the central 95% of thedistribution of D, we concluded that the observed data did notfit the coalescent model, given the parameter values used forsimulation. All simulations were based on 997 silent sites andsample sizes of 9 for the bottlenecked population (representingmaize) and 8 for the nonbottlenecked population (represent-ing the ancestor parviglumis).

RESULTSSummary of Adh1 Sequence Variation. Table 2 contains a

summary of sequence variation found in the three Zea taxa. Asmeasured by !̂, Z. mays ssp. parviglumis is the most diverse ofthe three taxa at the Adh1 locus (Table 2). Furthermore,parviglumis sequences are distributed widely on the genealogy(Fig. 2). One subset of parviglumis sequences forms a mono-

phyletic clade that is an outgroup to the remainder of the ZeaAdh1 sequences. This group contains sequences 1b, 4, 5, and7, and two of these sequences are identical (sequences 4 and7). The remaining parviglumis sequences (haplotypes 1a, 2, 3,and 6) do not form a distinct clade. It should be noted thatthere is no clear relationship between genealogical relation-ships (as inferred from Fig. 2) and geographic origin. Forexample, sequences parv1a and parv1b came from the sameindividual, but one sequence is within the monophyletic out-group clade and the other is not. It should also be noted thatrecombination is detectable among parviglumis Adh1 se-quences (Table 2), and recombination may affect phylogeneticresolution. Nonetheless, the wide distribution of parviglumisalleles on the Adh1 genealogy, coupled with the observationthat parviglumis is genetically diverse, is consistent with ahistorically large population segregating old alleles.

Sequence diversity at the Adh1 locus is consistent with acultivar–progenitor relationship between maize and parviglu-mis, for three reasons. First, maize contains less sequencediversity than parviglumis (Table 2). However, the reductionin diversity is not severe: maize contains roughly 75% of thelevel of genetic diversity found in parviglumis at the Adh1locus. Second, maize and parviglumis contain a relatively highnumber of shared polymorphisms (r ! 35, of a total of 49segregating sites in maize), suggesting a recent divergencebetween taxa. Finally, the Adh1 genealogy suggests that maizesequences represent a subset of parviglumis sequences (Fig. 2).

Patterns of sequence diversity in Z. luxurians differ consid-erably from those found in parviglumis. For example, Z.luxurians contains the least sequence variation at the Adh1locus, with roughly 60% of the sequence variation found inmaize (Table 2). Furthermore, phylogenetic reconstruction ofAdh1 sequences indicates that Z. luxurians sequences form ahighly supported monophyletic clade that is distinct fromparviglumis and maize sequences (Fig. 2). Finally, Z. luxuriansshares relatively few polymorphisms with either parviglumis(r ! 11) or maize (r ! 9).

Tests of Neutrality. If we are to make reliable demographicinferences about domestication bottlenecks from Adh1 data, itis important that genetic variation at the Adh1 locus is notaffected by selection, either directly or indirectly (i.e., throughlinkage to selected loci). We used two methods to test for the

FIG. 1. Schematic representation of the coalescent models used insimulation. See text for details.

Table 2. Variation at the Adh1 locus

Taxa n m S !̂ !̂!bp D r

parv 8 993 63(1) 24.30 0.0245 "0.241 4maize 9 997 49(1) 18.03 0.0181 0.785 3lux 7 998 26(0) 10.61 0.0106 0.258 3all 24 998 94(2) 25.17 0.0252 0.241 6

n, number of sequences; m, number of silent sites; S, number ofsegregating silent sites (with number of segregating replacement sitesin parentheses); D, tajima’s D, based on silent sites; r, the minimumnumber of inferred recombination events among sequences.

FIG. 2. The neighbor-joining reconstruction of Adh1 sequences.Bootstrap values greater than 50% are given above nodes. Abbrevi-ations are given in Table 1.

Evolution: Eyre-Walker et al. Proc. Natl. Acad. Sci. USA 95 (1998) 4443

I Single locus adh1 estimates2 reveal loss of diversity

I Refinement of bottleneck and test for selection3

I Analysis of ≈ 800 loci: NBNA

< 0.01 and 45% diversity loss4

2Eyre-Walker et al. 1996 PNAS3Tenaillon et al. 2004 MBE4Wright et al. 2005 Science

Page 11: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Genome resequencing: more diversity, exponential growth

0

5000

10000

15000

20000

0 1 2 3πMZ πTEO

coun

t

I HapMap 2 data5 show mean loss of diversity only < 20%

I PSMC6 analysis of resequenced maize landraces estimatesbottleneck NB

NA≈ 0.2 and recent explosive growth7

5Hufford et al. 2012 Nat. Gen6Li & Durbin 2011 Nature7Takuno et al., Unpublished

Page 12: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Selection and demography interact along genome

I Approximate Bayesian Computation of simple growth modelestimates stronger bottleneck in genic regions

Page 13: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Selection and demography interact along genome

0

5000

10000

15000

−2 0 2 4Tajima's D

coun

t

Taxa

maize

teo

Nongenic

I Excess of new rare intergenic variants in maize8

I No excess rare, 40% fewer unique SNPs in genes

I Purifying selection slows recovery of diversity

I Estimates of DFE9: new nonsynonymous mutations deleterious

8Hufford et al. 2012 Nat. Genetics9Stoletzki & Eyre-Walker 2011 MBE

Page 14: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Selection and demography interact along genome

0

5000

10000

15000

−2 0 2 4Tajima's D

coun

t

Taxa

maize

teo

Nongenic

0

1000

2000

3000

−2 0 2Tajima's D

coun

t

Taxa

maize

teo

Genic

I Excess of new rare intergenic variants in maize8

I No excess rare, 40% fewer unique SNPs in genes

I Purifying selection slows recovery of diversity

I Estimates of DFE9: new nonsynonymous mutations deleterious

8Hufford et al. 2012 Nat. Genetics9Stoletzki & Eyre-Walker 2011 MBE

Page 15: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Selection and demography interact along genome

0

5000

10000

15000

−2 0 2 4Tajima's D

coun

t

Taxa

maize

teo

Nongenic

0

1000

2000

3000

−2 0 2Tajima's D

coun

t

Taxa

maize

teo

Genic

I Excess of new rare intergenic variants in maize8

I No excess rare, 40% fewer unique SNPs in genes

I Purifying selection slows recovery of diversity

I Estimates of DFE9: new nonsynonymous mutations deleterious

8Hufford et al. 2012 Nat. Genetics9Stoletzki & Eyre-Walker 2011 MBE

Page 16: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Patterns of genetic load vary under different models

To further investigate the effect of genetic drift andnatural selection on the number of segregating sites underpopulation growth, we estimated at each generation thepercentage of segregating sites that are not observed in thenext generation (%Slost). After a few generations of muta-tion accumulation, %Slost becomes higher for the model withpopulation growth, both for neutral and for deleterious loci(Figure 1B), implying that population growth increases notonly the number of segregating sites, but also the rate atwhich they are lost. This phenomenon is explained by thelarger fraction of singletons (Figure S2) and very rare var-iants in the growing population, which have a higher prob-ability of loss (Figure 2).

Derived allele count of segregating sites

Each segregating site in the population can be categorizedby the number of sequences that carry the derived allele.The average DAC per segregating site (see Materials andMethods) is a measure of the prevalence of those sites in thepopulation. This measure is pertinent when comparing pop-ulations of different sizes since allele frequencies are dif-ficult to interpret as the sample size (which is here thepopulation size) increases in the population with growth.Furthermore, it is the allele count, rather than the allelefrequency, that affects its probability of loss or transmission(File S1).

Our simulations reproduce a well-established effect ofpopulation growth (Slatkin and Hudson 1991; Wakeley

2008; Coventry et al. 2010; Keinan and Clark 2012; Nelsonet al. 2012; Tennessen et al. 2012) by showing an increase inthe proportion of singletons (sites with DAC = 1) (FigureS2). The proportion is further elevated at deleterious loci forboth population models (Figure S2). To investigate the ef-ficacy of purifying selection in a growing population free ofthe expected skew in the site frequency spectrum, we con-sider instead the DAC of lost segregating sites.

We computed the percentage of lost sites within eachcategory of DAC, with %Slost for DAC = k being the percent-age of segregating sites with k derived alleles that are lostwithin one generation (Materials and Methods). In contrastto the increase of %Slost when considered across all DACs(Figure 1B), we observe that within each DAC category,population growth decreases %Slost (Figure 2A), both forneutral (36.7% to !30.4%) and for deleterious loci(!44.1% to !39.7% for singletons). This differential direc-tion (Figure 1B vs. Figure 2A) is due to the greater percent-age of variants with low DAC in the growing population. Forexample, singletons represent 46.8% of all segregating sitesunder growth with neutral mutations and only 17.9% in thesame scenario without growth (Figure S2). As such, thepercentage of variants that are singleton and lost (withoutconditioning on being a singleton) from all segregating sitesis higher in the growing population than in the scenariowithout growth (30.4% 3 46.8% = 14.2% vs. 36.7% 317.9% = 6.6%, respectively).

Figure 1 Population growth increases the number of segregating sites, but also the fraction of sites that are lost. (A) S, the number of segregating sitesof the whole population (on a log scale); (B) %Slost, the percentage of segregating sites lost from the population in a single simulated generation(Materials and Methods). Both panels present the two simulated demographic scenarios (with growth and with no growth) for each selection model(neutral or deleterious). Results are presented every 10 generations (corresponding to a single simulated generation) during the last 440 generations.Population growth increases both S and %Slost. S is smaller for deleterious than for neutral mutations, while %Slost is higher. Trends with time in themodels without growth are due to the preceding population bottlenecks (Figure S3).

972 E. Gazave et al.

population improves the efficacy of natural selection, anddeleterious sites are more readily eliminated.

Fitness effect of deleterious alleles ina growing population

Average fitness effect of a deleterious mutation: To gobeyond the burden in the number of deleterious mutationsand consider their effects, we compared the distribution ofselection coefficients in the population models with andwithout growth. We computed the average fitness effect(selection coefficient) for derived alleles that are lost andderived alleles that are transmitted to the next generation byaveraging the fitness effect of each allele weighed by itsnumber of copies (Materials and Methods). As expected, inboth demographic scenarios, lost sites are much more dele-terious than sites that are transmitted (Figure S6). Interest-ingly, this phenomenon is more pronounced in a growingpopulation, again pointing to the higher efficacy of selectionin a larger population (Figure S6).

To obtain a snapshot of the fitness effect of all segregat-ing variation (i.e., independently of whether sites are lost ortransmitted to the following generation), we partitionedsegregating sites into three categories corresponding to verydeleterious, mildly deleterious, and nearly neutral (Materi-als and Methods). Considering the number of copies of eachsite, we measured the percentage of copies of derived alleles(%DA) that fall into each category. In the very deleteriouscategory, %DA decreases progressively over time as the pop-ulation grows (Figure S7). At the last generation of thesimulation, %DA is significantly lower (by 8.4%) than inthe model without growth (Figure 3). The effect of the pop-ulation model on the other two categories is much smaller

and nonsignificant (Figure S7 and Figure 3). The strongereffect of population growth on the most deleterious alleles isalso visible in the site frequency spectrum (Figure S8). Moregenerally, the selection coefficient s averaged across all de-rived allele copies (wDA) becomes less deleterious as thepopulation grows (Figure 4). At the end of the simulation,an allele chosen randomly is 15.8% less deleterious in thepopulation that has undergone growth (Figure 4). We notethat the average selection coefficient also increases—although to a smaller extent—in the absence of growth(Figure 4). This is because the population model withoutgrowth is also not at equilibrium due to the preceding popu-lation bottlenecks. The role of the bottlenecks becomes evi-dent in comparison to a model of a population that has beenof constant size throughout history (File S1; Figure S9).

Despite the accumulation of deleterious segregating sitesin the growing population, we show a stronger increase inthe average fitness effect of derived alleles in the growingpopulation (Figure 4). This effect is particularly evidentwhen considering the relative amount of very deleteriousalleles. In the scenario with growth, for one copy of a verydeleterious allele there are 130 copies of nearly neutralones; the respective is only 1 to 48 (Table S1A). The newmutations accumulated due to growth tend to be more del-eterious due to their recency, but while less deleteriousalleles increase in number of copies faster as the populationgrows, the very deleterious alleles are purged more effec-tively in this scenario.

Average number of mutations per chromosome: We nextconsidered the burden of deleterious mutations as the num-ber of mutations present in each of the 2Ne chromosomes in

Figure 3 Higher efficiency of natural selec-tion in a growing population decreases thepercentage of the most deleterious allelecopies. Segregating sites are classified intothree discrete categories of fitness effect.For each category, the percentage of de-rived alleles (%DA) is the sum of the num-ber of copies of derived alleles observedacross all the segregating site in the cate-gory divided by the total number of derivedalleles across all segregating sites in thepopulation 3 100 (%DA of the three cate-gories sums up to 100). Data are shown forthe last generation of the simulation, bothwith and without recent growth. Verticalbars denote 6SE based on 10,000 repli-cates. Population growth leads to a lowerpercentage of derived alleles in the mostdeleterious category.

974 E. Gazave et al.

I Growth may lead to larger proportion of deleterious SNPs

I Growth decreases mean effect size of deleterious SNPs 10

10Gazave et al. 2013 Genetics

Page 17: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

More VA due to del. alleles under growth

! 33

●●●●●

●●

●●

●●

BN BN+growth Old growth

50

100

150

200

250

Num

ber o

f cau

sal v

aria

nts

A

●●●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

BN BN+growth Old growth

50

100

150

200

250

Num

ber o

f cau

sal v

aria

nts

B

Figure 5: The number of causal mutations in a sample of 1000 individuals from each

simulated population. (A) A SNP’s effect on the trait is correlated with its effect on fitness (τ =

0.5). Note that the population that experienced recent growth (green) has a higher number of

causal mutations than the populations that did not recently expand (orange). (B) A SNP’s effect

on the trait is independent of its effect on fitness (τ = 0).

I Exponential growth leads to more rare causal variants11 andthese explain a larger proportion of VA

I Standard GWAS has lower power to detect rare deleteriousvariants

11Lohmueller 2013 arXiv

Page 18: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Conclusions: thinking more about bottlenecks

I Bottlenecks affect genome-wide diversity and the SFS

I May lead to false positive signals of selection

I Bottlenecks and growth affect fate of deleterious variants andgenetic architecture of quantitative traits

I In maize, genome-wide data suggest a weaker bottleneck andrapid population growth

I Interplay of selection and demography leads to differentpatterns in genic and intergenic regions

Page 19: Bottlenecks -- some ramblings and a bit of data from maize PAGXXII

Acknowledgements

People

Arun Durvasula Shohei Takuno Vince Buffalo

Funding