Comparative computational biology
Positive selection
What is positive selection?
Positive selection is selection of a particular trait
- and the increased frequency of an allele in a population
Species B Species CSpecies A
Recurrent adaptation
Intraspecific levelPositive selection driving continuous adaptation to changes
in the environment
Woolhouse et al, 2002. Nat. Genet
Positive selection and recurrent selective sweeps
Positive selection and dynamics of polymorphisms
Allele frequencies in host-pathogen interactions
Host
Pathogen
Fixation of alleles
Species B Species CSpecies A
Diversifying positive selection
Interspecific levelPositive selection driving divergence
Why is it interesting to identify traits which
have undergone or are under positive selection?
FunctionEvolution
Environment……
How can we detect positive selection?
by comparing homologous sequences from
different individuals and identify an unexpectedly high proportion of
non-neutral mutational changes
Positive selection acts on beneficial mutations
…… which give rise to changes in the amino acid
sequences of proteins
Rate of synonymous mutations dS or KS
Rate of non-synonymous mutations dN or KA
To measure positive selection:
1 2 3 4 5
Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA
Pro Phe Ala Leu Val How many possible non-synonymous and
synonymous mutations??
We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S):
ProlineCCUCCCCCACCG
Counted as S=0 (N=1)
A non-degenerate site (all possible nucleotides will result in an amino acid change):
A fourfold degenerate site (all possible nucleotides can be tolerated without an an amino acid change :
Counted as S=1 (N=0)Possible synonymous changes for proline:S = 0 + 0 + 1 = 1
We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S):
PhenylalanineUUUUUC
Counted as S=0 (N=1)
A non-degenerate site (all possible nucleotides will result in an amino acid change):
A two-fold degenerate site (two possible nucleotides can be tolerated without an an amino acid change:
Counted as S=1/3 (N=2/3)Possible synonymous changes for phenylalanineS = 0 + 0 + 1/3 = 1/3
1. Proline S = 0 + 0 + 1 = 1
2. Phenylalanine S = 0 + 0 + 1/3 = 1/3
3. For Glycine S = 0 + 0 + 1 = 1, 4. Leucine for UUA, S = 1/3 + 0 + 1/3 = 2/3 for CUA, S = 1/3 + 0 + 1 = 4/3 Take the average of these: S = 1
5. Phenylalanine for UUU, S = 1/3 Valine, S = 1 Take average: S = 2/3 For the whole sequence: S = 1 + 1/3 + 1 + 1 + 2/3 = 4 N = total number of sites: 15 - 4 = 11
Counts of possible synonymous sites for each gene (S) 1 2 3
4 5Pro Phe Gly
Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA
Pro Phe Ala Leu Val
Counts of synonymous and non-synonymous changes for each gene (Sd and Nd)
1 2 3 4 5
Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA
Pro Phe Ala Leu Val
Calculate Sd and Nd for each codon. 1. Sd = 0, Nd = 0
2. Sd = 1, Nd = 0
3. Sd = 0, Nd = 1
4. Sd = 1, Nd = 0
5. this could happen in two ways UUU --> GUU --> GUANd = 1 Sd = 1 Route 1: Sd = 1, Nd = 1
UUU --> UUA --> GUANd = 1 Nd = 1 Route 2: Sd = 0, Nd = 2 Average: Sd = 0.5, Nd = 1.5
Total Sd = 2.5, total Nd = 2.5
1 2 3 4 5
Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA
Pro Phe Ala Leu Val Possible synonymous sites: 4
Possible non-synonymous sites: 11Synonymous changes: 2.5Non-synonymous changes: 2.5
Synonymous rate: KS=Sd / S = 2.5/4 = 0.625
Non-synonymous rate: KA= Nd / N = 2.5/11 = 0.227
Finally, we can compute the rates of syn and non-syn changes
Evaluating the effect of positive selection by computing
the RATIO of syn and non-syn changes
Neutral evolution
Purifying selection
Positive selection
f.ex. immune related genes
f.ex. housekeeping genes
f.ex. pseudogenes
Ks or dS
Ka or dN
KA or dN: rate of non-synonymous divergence
KS or dS: rate of synonymous divergence
KA>KS
KA<KS
KA=KS
1 2 3 4 5
Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA
Pro Phe Ala Leu Val Possible synonymous sites: 4
Possible non-synonymous sites: 11Synonymous changes: 2.5Non-synonymous changes: 2.5
Synonymous rate: KS=Sd / S = 2.5/4 = 0.625
Non-synonymous rate: KA= Nd / N = 2.5/11 = 0.227
Finally, we can compute the rates of syn and non-syn changes
Ka/Ks = 0.36
Species A Species B Species C
PN / Ps
Positive selection affects gene evolution during species divergence and during evolution of a population
Ka/Ks
Within species:
Non-synonymous proportion of polymorphisms: P N = Nd / N
Synonymous proportion of polymorphisms: PS = Sd / S
Species A Species B Species C
PN / Ps
Positive selection affects gene evolution during species divergence and during evolution of a population
Ka/Ks
We can use the two ratios Ka/Ks and Pn/Ps
to infer when selection has acted (or is acting)
Past selection (during speciation)
Present day selection (adaptation)
McDonald Kreitman (MK) test to contrast within and between species variation
Question: Are adaptive mutations in the alcohol dehydrogenase in Drosophila species a result of species divergence or current positive selection?
Repl: Nonsynonymous, Syn: SynonymousFixed: Substitution, Poly: Polymorphisms
Drosophila dataset alcohol dehydrogenase
The proportion of non-synonymous fixed differences between species much higher than the proportion of non-synonymous polymorphisms
MK test contrasts within and between species synonymous and non-synonymous differences
Contingency table can be tested by a G-test
Conclusion from MK-test:
Adh locus in Drosophila has accumulated
adaptive mutations (been under positive
selection) when the Drosophila species
diverged
One problem with the “counting methods”
Sometimes the signal of selection is not very strong
Positive selection on one or few particular codonsor in one particular branch
Evolutionary model to detect selection in particular codons or branches
Top Related