Population Genetics I. Bio5488 - 2016
Transcript of Population Genetics I. Bio5488 - 2016
![Page 2: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/2.jpg)
Why study population genetics?
• Functional Inference• Demographic inference:
– History of mankind is written in our DNA. We can learn about our species’ population size changes, migrations, etc.
• Complex disease:– What approaches for analysis make sense?
• Molecular biology:– Measure rates of biological processes like mutation and recombination,
learn about gene regulation, speciation • Sequence era. Framework for understanding these sequences.• You will have your own genome sequence
![Page 3: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/3.jpg)
Outline for Part I and Part II • Theory
– Hardy-Weinberg – Forward Models: Wright Fisher Model – Backward Models: Coalescent
• Data – Mutation, mutation rates – Global diversity, serial bottleneck model – Recombination, LD blocks, hotspots PRDM9 – Natural Selection
![Page 4: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/4.jpg)
Hardy-Weinberg
• What is the fate of a neutral genetic variant at a biallelic locus in an infinite population?
• Udney Yule: individuals with dominant traits will increase in the population over time
• Hardy: Yule is wrong, and that expected genotype frequencies are simply the product of underlying allele frequencies assuming independence
![Page 5: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/5.jpg)
A (100%) a (0%)
A (0%) AA (0%) Aa (0%)
a (100%) Aa (100%)
aa (0%)
Hardy’s Argument: Generation 1
Males
Females
![Page 6: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/6.jpg)
A (50%) a (50%)
A (50%) AA (25%) Aa (25%)
a (50%) Aa (25%)
aa (25%)
p2 + 2pq + q2 = 1
Hardy’s Argument: Generation 2
Males
Females
p = ( 2*25+1*50 ) / 200 = 0.5q = 1-p = 0.5
![Page 7: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/7.jpg)
Gcbias.org
![Page 8: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/8.jpg)
Modern Synthesis • Reconciliation of Mendelian genetics with
observations of the Biometrists • Reconciliation of Mendelian genetics with Darwinian
evolution
R.A. Fisher Sewell Wright J.B.S Haldane
![Page 9: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/9.jpg)
Wright-Fisher Model Assumptions: • Two allele system • N diploid individuals in each generation • 2N gametes • Random mating, no selection • Discrete generations
Aa
Generationt
t + 1
A a
a A
a
A
A A
a a Gamete pool
![Page 10: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/10.jpg)
Let’s play a round of this game
![Page 11: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/11.jpg)
The game is faster by computer
I = 400 A = 200 R = 100 G = 100
I = Number of GenerationsA = Population size (gametes)�G = Count of the G alleleR = Count of the R allele
![Page 12: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/12.jpg)
I = 400 A = 200 R = 100 G = 100
![Page 13: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/13.jpg)
I = 400 A = 200 R = 100 G = 100
![Page 14: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/14.jpg)
Let’s investigate this phenomenon
• Change Population Size
• Change allele frequencies
![Page 15: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/15.jpg)
I = 40 A = 20 R = 10 G = 10
![Page 16: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/16.jpg)
I = 40 A = 20 R = 10 G = 10
![Page 17: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/17.jpg)
I = 1000 A = 2000 R = 1000 G = 1000
![Page 18: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/18.jpg)
I = 1000 A = 2000 R = 1000 G = 1000
![Page 19: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/19.jpg)
I = 400 A = 200 R = 150 G = 50
![Page 20: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/20.jpg)
I = 400 A = 200 R = 150 G = 50
![Page 21: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/21.jpg)
I = 400 A = 200 R = 150 G = 50
![Page 22: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/22.jpg)
Can we deduce general rules?
• Larger population size = alleles stick around longer. Less susceptibility to “random walk”
• Probability of winning seems related to initial frequencies. At 50/50 50% chance of either allele winning. Hypothesize: probability of winning is proportional to initial frequency.
• Hypothesis: One allele must always win.
![Page 23: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/23.jpg)
• Each generation, the new population is made by sampling with replacement from the previous generation
A a
a A
a
A
A A
a a
aA
Aa
AA
Aa
Let: Pt = freq (A) among gametesPt+1 = …. In the next generationnt+1 = count of (A) …..
Then: nt+1 ~ Binomial (Pt, 2N)
Pr( nt+1 = m) E( pt+1) = Pt Var( pt+1) = pt (1-pt)
2N
= 2Nm
!
"##
$
%&& pt
m1−pt( )2N−m
Implications: sampling variance (“genetic drift”) is dependent on population size. Allele frequency is a random sequence of numbers: p1, p2, p3,… Eventually p = 1 or p = 0. Stay “fixed”until new mutation.
![Page 24: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/24.jpg)
An important concept: Drift
• Drift – stochastic fluctuations in allele frequency due to random sampling in a finite population.
![Page 25: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/25.jpg)
Drift versus Darwin
• How can we add selection to our game?
• We need to account for dominant and recessive alleles!
![Page 26: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/26.jpg)
The Wright Fisher Game v0.2 • Define relative fitness for each possible
individual Fitness RR = 1 Fitness RB = 1.1 Fitness BB = 2 Modify rules. Pick an individual with probability
proprotional to the fitness of her genotype. A given BB individual is twice as likely to be picked. Now choose one chromosome and put into the next generation.
![Page 27: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/27.jpg)
What relative fitness should we select?
• Conserved elements <0.01% increase
in fitness
![Page 28: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/28.jpg)
Drift versus Darwin
I = 100 A = 100 R = 99 G = 1 fG = 2*fR
![Page 29: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/29.jpg)
I = 100 A = 100 R = 99 G = 1 fG = 3*fR
![Page 30: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/30.jpg)
I = 100 A = 100 R = 99 G = 1 fG = 3*fR
![Page 31: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/31.jpg)
I = 100 A = 100 R = 99 G = 1 fG = 3*fR
![Page 32: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/32.jpg)
I = 100 A = 2000 R = 1999 G = 1 fG = 3*fR
![Page 33: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/33.jpg)
Some startling results!
• Survival of the fittest luckiest.
• Sometimes drift can overcome selection. Depends on allele frequency, population size.
• Most new advantageous mutations are not fixed!
![Page 34: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/34.jpg)
Mutation
• Infinite alleles model – Assumptions
![Page 35: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/35.jpg)
I = 5000 U = 0.0001 Start as Homozygous At allele A
U=mutation rate
![Page 36: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/36.jpg)
Summary thus far • Chance can play a large role in determining which
polymorphisms are fixed in a population. • The fittest don’t always survive. • These findings are/were not obvious. • They become (more) obvious with quantitative
investigation.
• And we’ve only scratched the surface.
![Page 37: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/37.jpg)
Further explorations of this model
• To date our approach has been based on observations of simulations. But the model is simple – analytic approach may prove fruitful.
• Our hypotheses: – Can we prove them?
– Can we quantify them?
• Lets explore this hypothesis: One allele must always win.
![Page 38: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/38.jpg)
The Decay of Heterozygosity
• Define Gt, the homozygosity at generation t.
= probability of picking two genomes from population and they are the same allele
• Then the heterozygosity Ht = 1- Gt .
![Page 39: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/39.jpg)
What is G0 R
B R B B B
B B
Generation 0
1. Pick R then R
= number of R’s / 2N * number Rs-1 / (2N-1) 2. Pick B then B
= number of B’s / 2N) * (number B’s-1) / (2N-1)
![Page 40: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/40.jpg)
What is G1?
Probability = 1/2N
Probability (1-1/2N)*G0
Generation 0 Generation 1
Generation 0 Generation 1
![Page 41: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/41.jpg)
Proof of decay of heterozygosity
![Page 42: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/42.jpg)
What is the half life of H?
• H0 /2 = H0(1-1/2N)t
• t = 2Nln2
• N = 10^4, t = 1.1e5 generations
![Page 43: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/43.jpg)
What does this mean?
• In a large population, eventually, every allele will have descended from a single allele in the founding population! All but 1 allele will have “died off”.
• Drift-Mutation-Selection balance.
![Page 44: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/44.jpg)
-Genealogical Analysis of all 131K Icelanders born after 1972
![Page 45: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/45.jpg)
Analysis of selection
Genotype Total AA Aa aa
Freq in generation t q2 2pq p2 1 = q2 + 2pq + p2 Fitness w11 w12 w22 Freq (after selection) q2w11 2pqw12 p2w22 ŵ = q2w11 + 2pqw12+p2w22
pt+1 = p2w22 +pqw12ŵ
qt+1 = q2w11 +pqw12ŵ
“Recursion equations”
Assumptions in this example: no drift or mutation, discrete generations, random mating
![Page 46: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/46.jpg)
Evolutionary dynamics in a simplex for a biallelic locus
Modified from Gokhale C S , Traulsen A PNAS 2010;107:5500-5504 ©2010 by National Academy of Sciences
AA
Aa
aa
![Page 47: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/47.jpg)
Dynamics:Topics covered • Selection (additive, balancing, frequency-dependent) • Altruism, kin selection • Structural variation (inversions) • Multiple loci (recombination, epistatic selection) • Population structure (island model, stepping stone
model, isolation by distance, metapopulation models) • Assortative mating • Sex-specific effects (migration, selection) • Variable environments, etc…
![Page 48: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/48.jpg)
Sampling with Replacement
• Some alleles pass on no copies to the next generation, while some pass on more than one.
Present
Past
![Page 49: Population Genetics I. Bio5488 - 2016](https://reader031.fdocuments.us/reader031/viewer/2022020622/61ee35eba482423f60390d43/html5/thumbnails/49.jpg)
The Coalescent Process • “Backward in time process” • Discovered by JFC Kingman,
F. Tajima, R. R. Hudson c. 1980
• DNA sequence diversity is shaped by genealogical history
• Genealogies are unobserved but can be estimated
• Conceptual framework for population genetic inference: mutation, recombination, demographic history
ACTT
ACGT ACGT ACTT ACTT AGTT
T
G
C G