Detecting selection using genome scans
Roger ButlinUniversity of Sheffield
Nielsen R (2005) Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218.
What signatures does selection leave in the genome?
1. Population differentiation – today’s focus!2. Frequency spectrum, e.g. Tajima’s D3. Selective sweeps4. Haplotype structure (linkage disequilibrium)5. MacDonald-Kreitman tests (or PAML over long time-scales)
From Nielsen (2005): frequency of derived allele in a sample of 20 alleles.
Tajima’s D = (π-S)/sd, summarises excess of rare variants
Frequency distribution:
Selective sweep:
Extended haplotype homozygosity (Sabeti et al. 2002)
MacDonald-Kreitman and related tests
dN = replacement changes per replacement sitedS = silent changes per silent site
dN/dS = 1 - neutral
dN/dS < 1 - conserved (purifying selection)
dN/dS > 1 - adaptive evolution (positive selection)
Selection on phenotypic traits:
QTLAssociation analysisCandidate genes
Genome scans (aka ‘Outlier analysis’)
‘H’
‘M’
Thornwick Bay
Littorina saxatilis – locally adapted morphs
What signatures of selection might we look for?
Signatures of selection:
Departure from HWELow diversity (selective sweep)Frequency spectrum testsHigh divergenceElevated proportion of non-synonymous substitutionsLD
02468
10121416
Fst
Num
ber
of lo
ci
Neutral loci
024
681012
1416
Fst
Num
ber
of lo
ci
Stabilizing selection
024
681012
1416
Fst
Num
ber
of lo
ci
Local adaptation
Charlesworth et al. 1997 (from Nosil et al. 2009)
A concrete example: adaptation to altitude in Rana temporaria (Bonin et al. 2006)
High – 2000m
Intermediate – 1000m
Low – 400m
190 individuals392 AFLP bands
Generating the expected distribution
Ne
DetSel – Vitalis et al. 2001
N0
N1
N2
t
μ
Ne
to
F1,2 – measure of divergence of population 1,2 from population 2,1
Dfdist – Beaumont & Nichols 1996
NN
N
N
N
N
N
m
FST – symmetrical population differentiation, as a function of heterozygosity
Does the structure/history matter?
DetSel Dfdist
‘Low 1’ vs ‘High 1’
95% CI
95%
50%5%
DetSel
Dfdist
Both Interpretation
Monomorphic in one population
35 N/A Unreliable outliers
Significant in one comparison
14 29 False positives
Significant in comparisons involving one population
3 11 Local effects
Significant in at least 2 comparisons
2 3 1 Adaptation to altitude
Significant in global comparison across altitudes
6(2 at 99%)
Adaptation to altitude
392 AFLPs, 12 pairwise comparisons across altitude or 3 altitude categories, 95% cut off
8 loci343 loci
Outliers and selected traits
Coregonus clupeaformis (lake whitefish)
Rogers and Bernatchez (2007):Dwarf x Normal cross both backcrossesMeasure ‘adaptive’ traits (9)QTL map (>400 AFLP plus microsatellites)Homologous AFLP in 4 natural sympatric population pairsOutlier analysis (forward simulation based on Winkle)
Homologous AFLP
Outlier AFLP in homologous set*
Outlier within QTL (based on 1.5 LOD support)
Hybrid x Dwarf 180 19 9(3.6 expected,
P=0.0015)
Hybrid x Normal
131 8 4(0.5 expected,
P=0.0002)*Only 3 outliers shared between lakes
Roger Butlin - Genome scans 21
Nosil et al. 2009 review of 14 studies:
1. 0.5 – 26% outliers, most studies 5-10%2. 1 - 5% outliers replicated in pair-wise comparisons3. 25 - 100% of outliers specific to habitat comparisons4. No consistent pattern for EST-associated loci 5. LD among outliers typically low
But many methodological differences between studiesPopulation samplingMarker typeAnalysis type and optionsStatistical cut-offs
Environmental correlations
SAM – Joost et al. 2007
IBA – Nosil et al. 2007
FST for each locus correlated with ‘adaptive distance’, controlling for geographic distance (partial Mantel test)
Methodological improvements – Bayesian approaches
BayesFst – Beaumont & Balding 2004Bayescan – Foll & Gaggiotti 2008
Ancestral
For each locus i and population j we have an FST measure, relative to the ‘ancestral’ population, Fij
Then decompose into locus and population components,
Log(Fij/(1-Fij) = αi + βj
αi is the locus-effect – 0 neutral, +ve divergence selection, -ve balancing selection
βj is the population effect
Assuming Dirichlet distribution of allele frequencies among subpopulations, can estimate αi + βj by MCMC
In Bayescan, also explicitly test αi = 0
Apparently much greater power to detect balancing selection than FDISTLower false positive rateWider applicability
Methodological improvements – hierarchical structure
Arlequin – Excoffier et al. 2009
Circles – simulated STR data, grey – null distribution
Bayenv – Coop et al. 2010
Estimates variance-covariance matrix of allele frequencies then tests for correlations with environmental variables (or categories).
Software available at: http://www.eve.ucdavis.edu/gmcoop/Software/Bayenv/Bayenv.html
Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010
Hohenlohe et al. 2010
Mäkinen et al 2008
7 populations3 marine, 4 freshwater
103 STR lociAnalysed by BayesFst(and LnRH)
5 under directional selection (3 in Eda locus)
15 under balancing selection
Used as a test case by Excoffier et al2 directional3 balancing
Can we replicate these results?
Bayescan
Stickleback_allele.txt – input fileOutput_fst.txt – view with R routine plot_Bayescan
Arlequin
Stickleback_data_standard.arp – IAMStickleback_data_repeat.arp – SMM
Run using Arlequin3.5
Try hierarchical and island models, maybe different hierarchies
Sympatric speciation?
FST distribution as evidence of speciation with gene flow
Savolainen et al (2006)
Howea - palms
Cf. Gavrilets and Vose (2007)• few loci underlying key traits• intermediate selection• initial environmental effect on phenology
Top Related