Comparative computational biology

31
Comparative computational biology Positive selection

description

Comparative computational biology. Positive selection. What is positive selection?. Positive selection is selection of a particular trait - and the increased frequency of an allele in a population. Intraspecific level Positive selection driving continuous adaptation to changes - PowerPoint PPT Presentation

Transcript of Comparative computational biology

Page 1: Comparative computational biology

Comparative computational biology

Positive selection

Page 2: Comparative computational biology

What is positive selection?

Page 3: Comparative computational biology

Positive selection is selection of a particular trait

- and the increased frequency of an allele in a population

Page 4: Comparative computational biology

Species B Species CSpecies A

Recurrent adaptation

Intraspecific levelPositive selection driving continuous adaptation to changes

in the environment

Page 5: Comparative computational biology

Woolhouse et al, 2002. Nat. Genet

Positive selection and recurrent selective sweeps

Positive selection and dynamics of polymorphisms

Allele frequencies in host-pathogen interactions

Host

Pathogen

Fixation of alleles

Page 6: Comparative computational biology

Species B Species CSpecies A

Diversifying positive selection

Interspecific levelPositive selection driving divergence

Page 7: Comparative computational biology
Page 8: Comparative computational biology

Why is it interesting to identify traits which

have undergone or are under positive selection?

FunctionEvolution

Environment……

Page 9: Comparative computational biology

How can we detect positive selection?

by comparing homologous sequences from

different individuals and identify an unexpectedly high proportion of

non-neutral mutational changes

Page 10: Comparative computational biology

Positive selection acts on beneficial mutations

Page 11: Comparative computational biology

…… which give rise to changes in the amino acid

sequences of proteins

Page 12: Comparative computational biology

Rate of synonymous mutations dS or KS

Rate of non-synonymous mutations dN or KA

To measure positive selection:

Page 13: Comparative computational biology

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val How many possible non-synonymous and

synonymous mutations??

Page 14: Comparative computational biology
Page 15: Comparative computational biology

We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S):

ProlineCCUCCCCCACCG

Counted as S=0 (N=1)

A non-degenerate site (all possible nucleotides will result in an amino acid change):

A fourfold degenerate site (all possible nucleotides can be tolerated without an an amino acid change :

Counted as S=1 (N=0)Possible synonymous changes for proline:S = 0 + 0 + 1 = 1

Page 16: Comparative computational biology

We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S):

PhenylalanineUUUUUC

Counted as S=0 (N=1)

A non-degenerate site (all possible nucleotides will result in an amino acid change):

A two-fold degenerate site (two possible nucleotides can be tolerated without an an amino acid change:

Counted as S=1/3 (N=2/3)Possible synonymous changes for phenylalanineS = 0 + 0 + 1/3 = 1/3

Page 17: Comparative computational biology

1. Proline S = 0 + 0 + 1 = 1

2. Phenylalanine S = 0 + 0 + 1/3 = 1/3

3. For Glycine S = 0 + 0 + 1 = 1, 4. Leucine for UUA, S = 1/3 + 0 + 1/3 = 2/3 for CUA, S = 1/3 + 0 + 1 = 4/3 Take the average of these: S = 1

5. Phenylalanine for UUU, S = 1/3 Valine, S = 1 Take average: S = 2/3 For the whole sequence: S = 1 + 1/3 + 1 + 1 + 2/3 = 4 N = total number of sites: 15 - 4 = 11

Counts of possible synonymous sites for each gene (S) 1 2 3

4 5Pro Phe Gly

Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val

Page 18: Comparative computational biology

Counts of synonymous and non-synonymous changes for each gene (Sd and Nd)

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val

Calculate Sd and Nd for each codon. 1. Sd = 0, Nd = 0

2. Sd = 1, Nd = 0

3. Sd = 0, Nd = 1

4. Sd = 1, Nd = 0

5. this could happen in two ways UUU --> GUU --> GUANd = 1 Sd = 1 Route 1: Sd = 1, Nd = 1

UUU --> UUA --> GUANd = 1 Nd = 1 Route 2: Sd = 0, Nd = 2 Average: Sd = 0.5, Nd = 1.5

Total Sd = 2.5, total Nd = 2.5

Page 19: Comparative computational biology

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val Possible synonymous sites: 4

Possible non-synonymous sites: 11Synonymous changes: 2.5Non-synonymous changes: 2.5

Synonymous rate: KS=Sd / S = 2.5/4 = 0.625

Non-synonymous rate: KA= Nd / N = 2.5/11 = 0.227

Finally, we can compute the rates of syn and non-syn changes

Page 20: Comparative computational biology

Evaluating the effect of positive selection by computing

the RATIO of syn and non-syn changes

Neutral evolution

Purifying selection

Positive selection

f.ex. immune related genes

f.ex. housekeeping genes

f.ex. pseudogenes

Ks or dS

Ka or dN

KA or dN: rate of non-synonymous divergence

KS or dS: rate of synonymous divergence

KA>KS

KA<KS

KA=KS

Page 21: Comparative computational biology

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val Possible synonymous sites: 4

Possible non-synonymous sites: 11Synonymous changes: 2.5Non-synonymous changes: 2.5

Synonymous rate: KS=Sd / S = 2.5/4 = 0.625

Non-synonymous rate: KA= Nd / N = 2.5/11 = 0.227

Finally, we can compute the rates of syn and non-syn changes

Ka/Ks = 0.36

Page 22: Comparative computational biology

Species A Species B Species C

PN / Ps

Positive selection affects gene evolution during species divergence and during evolution of a population

Ka/Ks

Within species:

Non-synonymous proportion of polymorphisms: P N = Nd / N

Synonymous proportion of polymorphisms: PS = Sd / S

Page 23: Comparative computational biology

Species A Species B Species C

PN / Ps

Positive selection affects gene evolution during species divergence and during evolution of a population

Ka/Ks

We can use the two ratios Ka/Ks and Pn/Ps

to infer when selection has acted (or is acting)

Past selection (during speciation)

Present day selection (adaptation)

Page 24: Comparative computational biology

McDonald Kreitman (MK) test to contrast within and between species variation

Page 25: Comparative computational biology

Question: Are adaptive mutations in the alcohol dehydrogenase in Drosophila species a result of species divergence or current positive selection?

Page 26: Comparative computational biology

Repl: Nonsynonymous, Syn: SynonymousFixed: Substitution, Poly: Polymorphisms

Drosophila dataset alcohol dehydrogenase

Page 27: Comparative computational biology

The proportion of non-synonymous fixed differences between species much higher than the proportion of non-synonymous polymorphisms

MK test contrasts within and between species synonymous and non-synonymous differences

Contingency table can be tested by a G-test

Page 28: Comparative computational biology

Conclusion from MK-test:

Adh locus in Drosophila has accumulated

adaptive mutations (been under positive

selection) when the Drosophila species

diverged

Page 29: Comparative computational biology
Page 30: Comparative computational biology

One problem with the “counting methods”

Sometimes the signal of selection is not very strong

Page 31: Comparative computational biology

Positive selection on one or few particular codonsor in one particular branch

Evolutionary model to detect selection in particular codons or branches