Download - Detecting selection using genome scans

Detecting selection using genome scans

Roger ButlinUniversity of Sheffield

Nielsen R (2005) Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218.

What signatures does selection leave in the genome?

1. Population differentiation – today’s focus!2. Frequency spectrum, e.g. Tajima’s D3. Selective sweeps4. Haplotype structure (linkage disequilibrium)5. MacDonald-Kreitman tests (or PAML over long time-scales)

From Nielsen (2005): frequency of derived allele in a sample of 20 alleles.

Tajima’s D = (π-S)/sd, summarises excess of rare variants

Frequency distribution:

Selective sweep:

Extended haplotype homozygosity (Sabeti et al. 2002)

MacDonald-Kreitman and related tests

dN = replacement changes per replacement sitedS = silent changes per silent site

dN/dS = 1 - neutral

dN/dS < 1 - conserved (purifying selection)

dN/dS > 1 - adaptive evolution (positive selection)

Selection on phenotypic traits:

QTLAssociation analysisCandidate genes

Genome scans (aka ‘Outlier analysis’)

‘H’

‘M’

Thornwick Bay

Littorina saxatilis – locally adapted morphs

What signatures of selection might we look for?

Signatures of selection:

Departure from HWELow diversity (selective sweep)Frequency spectrum testsHigh divergenceElevated proportion of non-synonymous substitutionsLD

02468

10121416

Fst

Num

ber

of lo

ci

Neutral loci

024

681012

1416

Fst

Num

ber

of lo

ci

Stabilizing selection

024

681012

1416

Fst

Num

ber

of lo

ci

Local adaptation

Charlesworth et al. 1997 (from Nosil et al. 2009)

A concrete example: adaptation to altitude in Rana temporaria (Bonin et al. 2006)

High – 2000m

Intermediate – 1000m

Low – 400m

190 individuals392 AFLP bands

http://www.sluitertijd.org/imagebrowser/specialtheme_main.htm

Generating the expected distribution

Ne

DetSel – Vitalis et al. 2001

N0

N1

N2

t

μ

Ne

to

F1,2 – measure of divergence of population 1,2 from population 2,1

Dfdist – Beaumont & Nichols 1996

NN

N

N

N

N

N

m

FST – symmetrical population differentiation, as a function of heterozygosity

Does the structure/history matter?

DetSel Dfdist

‘Low 1’ vs ‘High 1’

95% CI

95%

50%5%

http://mbe.oxfordjournals.org/content/vol23/issue4/images/large/molbiolevolmsj087f01_lw.jpeg

http://mbe.oxfordjournals.org/content/vol23/issue4/images/large/molbiolevolmsj087f02_lw.jpeg

DetSel

Dfdist

Both Interpretation

Monomorphic in one population

35 N/A Unreliable outliers

Significant in one comparison

14 29 False positives

Significant in comparisons involving one population

3 11 Local effects

Significant in at least 2 comparisons

2 3 1 Adaptation to altitude

Significant in global comparison across altitudes

6(2 at 99%)

Adaptation to altitude

392 AFLPs, 12 pairwise comparisons across altitude or 3 altitude categories, 95% cut off

8 loci343 loci

http://mbe.oxfordjournals.org/content/vol23/issue4/images/large/molbiolevolmsj087f03_ht.jpeg

Outliers and selected traits

Coregonus clupeaformis (lake whitefish)

Rogers and Bernatchez (2007):Dwarf x Normal cross both backcrossesMeasure ‘adaptive’ traits (9)QTL map (>400 AFLP plus microsatellites)Homologous AFLP in 4 natural sympatric population pairsOutlier analysis (forward simulation based on Winkle)

Homologous AFLP

Outlier AFLP in homologous set*

Outlier within QTL (based on 1.5 LOD support)

Hybrid x Dwarf 180 19 9(3.6 expected,

P=0.0015)

Hybrid x Normal

131 8 4(0.5 expected,

P=0.0002)*Only 3 outliers shared between lakes

Roger Butlin - Genome scans 21

Nosil et al. 2009 review of 14 studies:

1. 0.5 – 26% outliers, most studies 5-10%2. 1 - 5% outliers replicated in pair-wise comparisons3. 25 - 100% of outliers specific to habitat comparisons4. No consistent pattern for EST-associated loci 5. LD among outliers typically low

But many methodological differences between studiesPopulation samplingMarker typeAnalysis type and optionsStatistical cut-offs

Environmental correlations

SAM – Joost et al. 2007

IBA – Nosil et al. 2007

FST for each locus correlated with ‘adaptive distance’, controlling for geographic distance (partial Mantel test)

Methodological improvements – Bayesian approaches

BayesFst – Beaumont & Balding 2004Bayescan – Foll & Gaggiotti 2008

Ancestral

For each locus i and population j we have an FST measure, relative to the ‘ancestral’ population, Fij

Then decompose into locus and population components,

Log(Fij/(1-Fij) = αi + βj

αi is the locus-effect – 0 neutral, +ve divergence selection, -ve balancing selection

βj is the population effect

Assuming Dirichlet distribution of allele frequencies among subpopulations, can estimate αi + βj by MCMC

In Bayescan, also explicitly test αi = 0

Apparently much greater power to detect balancing selection than FDISTLower false positive rateWider applicability

Methodological improvements – hierarchical structure

Arlequin – Excoffier et al. 2009

Circles – simulated STR data, grey – null distribution

Bayenv – Coop et al. 2010

Estimates variance-covariance matrix of allele frequencies then tests for correlations with environmental variables (or categories).

Software available at: http://www.eve.ucdavis.edu/gmcoop/Software/Bayenv/Bayenv.html

Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010

Hohenlohe et al. 2010

Mäkinen et al 2008

7 populations3 marine, 4 freshwater

103 STR lociAnalysed by BayesFst(and LnRH)

5 under directional selection (3 in Eda locus)

15 under balancing selection

Used as a test case by Excoffier et al2 directional3 balancing

Can we replicate these results?

Bayescan

Stickleback_allele.txt – input fileOutput_fst.txt – view with R routine plot_Bayescan

Arlequin

Stickleback_data_standard.arp – IAMStickleback_data_repeat.arp – SMM

Run using Arlequin3.5

Try hierarchical and island models, maybe different hierarchies

Sympatric speciation?

FST distribution as evidence of speciation with gene flow

Savolainen et al (2006)

Howea - palms

Cf. Gavrilets and Vose (2007)• few loci underlying key traits• intermediate selection• initial environmental effect on phenology