Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of...

91
Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh

Transcript of Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of...

Page 1: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Drosophila Population Genetics

Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh

Page 2: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Why is intra-specific variability interesting?

A high degree of variability is obviously favourable, as freely giving the materials for selection to work on… Charles Darwin, The Origin of Species, Chap. 1.

Darwin was the first person to recognize clearly that evolutionary change over time is the result of processes acting on genetically controlled variability among individuals within a population, which eventually cause differences between ancestral and descendant populations.

Knowledge of the nature and causes of this variability is crucial for an understanding of the mechanisms of evolution, animal and plant breeding, and human genetic diseases.

Page 3: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Classical and quantitative genetic studies of variation

Classical genetics reveals the existence of discrete polymorphisms in natural populations, but is necessarily limited either to chromosomal rearrangements such as inversions that can be detected cytologically, or to conspicuous phenotypes such as eye colour or body colour (flies carrying certain eye-colour mutations such as cardinal can be found in natural populations).

Within a given species, only a handful of such polymorphisms can easily be detected. Relatively few cases of discrete polymorphisms affecting morphological traits are known.

Page 4: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

A human inversion polymorphism

The classic polymorphism ofDrosophila pseudoobscura

Page 5: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Quantitative genetics reveals the existence of ubiquitous genetic variation in metrical and meristic traits.

Most metric traits have a coefficient of variation (the ratio of the standard deviation to the mean) of 5-10%.

Measurements of the resemblances between relatives show that 20%-80% of the variance in such traits is typically due to genetic factors.

This type of variation is of great evolutionary, medical and economic significance, but measuring it does not tell us anything about the details of its genetic control (numbers of loci involved, frequencies of variant alleles, etc.).

Page 6: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Studies of concealed variability (revealed by inbreeding) indicates the existence of low frequency recessive alleles usually with deleterious effects, that are not normally detectable in a large random-mating population.

The results of close inbreeding (e.g. by brother-sister matings) are:

1. Reduced mean performance of a set of inbred lines, with respect to traits like survival, fertility and growth rate.

2. Increased variability among lines, sometimes involving abnormalities caused by single gene mutations.

Page 7: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

While amply validating Darwin’s view that there is plenty of variation available for evolution to utilize, this evidence leaves two important questions unanswered:

(a)How much variation within a natural population is there at an average locus? Classical genetics provides no means of sampling loci at random from the genome, without respect to their functional importance or level of natural variability.

(b) To what extent does natural selection as opposed to mutation and/or genetic drift control the frequencies of allelic variants within populations? The classical genetics bias towards genes with conspicuous phenotypic effects means that strong selective forces are likely to be operating. Such genes might well be unrepresentative of the global picture.

Page 8: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Molecular genetics to the rescue

The solution to question (a) is to use the fact that genes correspond to stretches of DNA that code for proteins.

If either the protein sequence corresponding to a gene, or its DNA sequence, can be studied directly, then we can look at variation within the population without having to follow visible mutations, i.e. there is no need for prior knowledge of the existence of variation.

We can also look at variation in non-coding sequences.

Page 9: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Electrophoretic variation

The first steps were taken in the mid-1960s by Lewontin and Hubby, working in Chicago on the fruitfly Drosophila pseudoobscura, and by Harris in London, working on humans.

They used the technique of gel electrophoresis of proteins to screen populations for variants in a large number of soluble proteins controlled by independent loci, mostly enzymes with well-established metabolic roles. The proteins were chosen purely because they could be

studied easily.

Page 10: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

The results of the early electrophoretic surveys were startling: a large fraction (as high as 40%) of loci were found to be polymorphic (i.e. they exhibited one or more minority alleles with frequencies greater than 1%).

An average D. pseudoobscura individual was estimated to be heterozygous at 13% of the 24 protein loci that had been studied by 1974 i.e. a random individual sampled from the population would be expected to have distinct maternal and paternal alleles at 13% of its protein-coding loci.

Much lower levels of heterozygosity (or gene diversity: the chance that two randomly chosen copies of a gene are different) were found in mammals, and much higher levels in bacteria.

Page 11: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

This work conclusively refuted the view that loci are only rarely polymorphic.

However, it raised more questions than it answered. In particular, there were several biases in the data. Only soluble proteins could easily be studied, and amino-acid changes that do not affect the mobility of proteins on gels are not detected by electrophoresis.

Similarly, any changes in the DNA that do not affect the protein sequence go undetected.

Page 12: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

DNA sequence variation

The advent in the late 1970s of methods for cloning and sequencing of DNA meant that studies of natural variation could be carried out at the DNA level. This eliminates virtually all the possible biases in quantifying variability.

With the advent of PCR amplification for isolated specific regions of the DNA, and with relatively cheap automated sequencing, this is now the method most commonly used in surveys of variation.

Efforts are currently under way in D. melanogaster to scale this “resequencing” up to the whole genome level.

Page 13: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

The pioneering work on directly comparing homologous DNA sequences sampled within a species was carried out by Martin Kreitman in Lewontin’s lab at Harvard in the early 1980s.

Kreitman sequenced 11 independent copies (alleles) of the Adh (alcohol dehydrogenase) gene of D. melanogaster, isolated from collections made around the world. He sequenced 2379 bases from each of these alleles, an heroic effort in those days.

Page 14: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

His work succeeded in:

• Demonstrating a high level of variability at the level of individual nucleotide sites, a factor of ten or so higher than would have been expected from the typical level of heterozygosity for protein polymorphisms

• Showing that nearly all of this variability involved silent changes that did not affect protein sequences, i.e. the changes were either in regions that did not code for amino-acids or involved synonymous changes in codons.

• The only amino-acid polymorphism detected was that already known to cause the difference between the fast (F) and slow (S) electrophoretic alleles of Adh.

Page 15: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Kreitman’s Adh Results

Intron 1 Coding Region 3' Non-Transcr. % Silent Sites Segregating 1.7 6.7 0.6

No. Sites 654 765 767

No non-silent substitutions found (other than F/S): 39 are expected if variability were same as for silent sites.

Page 16: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

These results demonstrate that the protein sequence is highly constrained by selection, i.e. most mutations affecting the amino-acid sequence of a protein cause selectively disadvantageous changes to its functioning, and are eliminated rapidly from the population.

Most variation that is detected in coding sequences (typically over 85% in Drosophila) thus involves synonymous variants. Non-coding region variation shows a similar level to synonymous variation.

These results suggest that most variation and evolution at the DNA level may be due to neutral or nearly neutral mutations, whose fate is controlled by genetic drift rather than selection, especially as much of the genome is non-coding, even in Drosophila.

Page 17: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

How to measure DNA sequence variation

ATACTTAGCGTTGGCATCCTCGCGATTGAG

ATGCTTAGCGTTGGCATCCTAGCGATCGAG

ATGCTTGGCGTTGGCATCCTAGCGATCGG

Allele 1

Allele 2

Allele 3

Page 18: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

The nucleotide site diversity () for a given set of alleles sampled from a population is the frequency with which a randomly chosen pair of alleles differ at a given site.

It can be calculated from data on a sample of homologous DNA sequences, by determining the sum of the numbers of differences between all possible pairs of sequences.

The result is divided by the product of the number of sequences that were compared (this equals n(n-1)/2, if there are n independent alleles), and the number of bases studied.

Page 19: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

In the example, n= 3, so n(n-1)/2 = 3.

The total number of pairwise differences between all 3 combinations of sequences is 1 + 3 + 4 = 8.

To get the pairwise diversity per site, we divide this by 3 times the number of sites, so that

= 8/(3 x 30) = 0.089

Page 20: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

An alternative method of measuring variation is simply by counting the number of sites that are segregating in the sample, S.

By dividing S by the product of the number of bases in the sequence and the sum

a = 1 + 1/2 + 1/3 + ... + 1/(n -1)

we obtain a statistic called Watterson’s w

Page 21: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

If the population is at equilibrium and there is no selection, w is expected to be similar in value to

In the example, we have S = 4, and a Hence:

w = 4/(30 x 1.5) = 0.089

Page 22: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Under the neutral theory of evolution, variability in DNA sequences reflects the balance between the input of new variants by mutation and their loss by random fluctuations in frequencies caused by finite population size (genetic drift).

Page 23: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Under this model, variant frequencies at a locus are always shifting around, but a statistical equilibrium will eventually be reached if population size stays constant.

The expected value of the pairwise diversity in the population is then given by:

= 4Ne

where is the neutral mutation rate per site, and Ne

is the effective population size, which controls the rate of genetic drift.

The expected values of both and w are equal to

Page 24: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Estimates of have now been obtained from many different kinds of organisms, by sampling sets of homologous genes from natural populations and sequencing them.

Rough average values over many genes for silent nucleotide are as follows:

• Escherichia coli (bacterium): 0.05

• Drosophila melanogaster 0.02(African)

• Homo sapiens 0.001

Page 25: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Knowledge of enables us to estimate Ne from .

For example, with = 4 x 10-9, and = 0.02, we obtain Ne = 1.25 x 106.

Drosophila effective population sizes are therefore very large.

Page 26: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Detecting Selection

One of the major goals of evolutionary genetics is to understand to what extent selection, as opposed to neutral forces of mutation and genetic drift, controls variation and evolution in DNA and protein sequences.

The methods for doing this often involves combining data on sequence divergence between species with data on polymorphism within species.

Page 27: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Forms of selection

• Purifying selection, which acts to prevent the spread of deleterious mutations, e.g. those affecting the amino-acid sequences of proteins.

• Positive directional selection, which causes an adaptive mutation to spread through a species

• Balancing selection, which maintains alternative variants in the population

Directional and balancing selection are often collectively referred to as positive selection.

Page 28: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Use of sequence divergence data

The simplest situation is when we have two homologous (aligned) DNA sequences from a pair of related species.

For the purpose of discussion, assume that all evolutionary change occurs by nucleotide substitutions, i.e. the sequence differences are caused entirely by one nucleotide base changing into another by mutation.

This is usually the case for coding sequences, since insertions or deletions cause disruption of functionality.

Page 29: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Species 1

Species 2

T T

The total time separating a pair of sequences from the two species is 2T

Page 30: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Under neutral evolution, K is expected to be equal to the mutation rate () times the divergence time between the two species, i.e.

K = 2 T

The simplest way to understand this is to note that, under neutral evolution, the expected number of mutations that distinguish a pair of sequences is equal to the time separating them (2T) times the rate of mutation per unit time ().

Neutral sequence evolution

Page 31: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

We compare K values for nucleotide sites where mutations can reasonably be assumed to be neutral or nearly neutral with K for sites where we wish to test for selection; larger than neutral K values indicate directional selection, and smaller than neutral K values indicate purifying selection.

Nonsynonymous sites are usually used as the candidates for selection, but there is increasing use of defined types of non-coding sequences.

Page 32: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Evidence for pervasive purifying selection

This comes from the fact that both K and for nonsynonymous variants are nearly always muchsmaller than for synonymous and noncoding sites.

Page 33: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Means A S A2 S2 KA KS

.88

(.44/

.4)

.478

(.342/

.626)

.26

(.24/

.3)

2.73

(2.3/

3.4)

2.48

(.3/

3.76)

22.2

(9.9/

24.8)

Statistics on diversity and divergence in D. miranda (species 1: 18 loci) and D. pseudoobscura (species 2: 14 loci)

All values are percentages

Divergence (K) is measured between D. miranda and D. affinis. (KS between mir pseudo is 3.5%)

L. Loewe et al. 2006 Genetics 172: 1079-1092.

Page 34: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Divergence of mel-sim introns

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

10 100 1000 10000 100000

Length of intron (base pairs)

Divergence

First IntronsNon-first Introns

P. Haddrill et al. 2005 Genome Biol. 6: R67. 1-8.

Page 35: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Effects of deleterious mutations on fitness

• There are clearly a lot of deleterious mutations entering the population each generation, most of which will eventually be eliminated by selection

• While the mean level of variability is much lower for nonsynonymous than synonymous mutations, this could simply mean that all the deleterious ones are rapidly removed by selection, so that the amino-acid variants that we see segregating are in fact selectively neutral.

Page 36: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• It is a topic of current research to try and estimate the distribution of selection coefficients on deleterious amino-acid and silent variants in natural populations

• Estimate for amino-acid variants indicate a wide distribution, such that the mean selection coefficient against a heterozygous non-synonymous variant is of the order of 10-5

• Values for synonymous or silent variants are much smaller, of the order of 10-6.

Page 37: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Faster divergence in coding than non-coding sequences suggests positive selection

• In the OdsH gene of three Drosophila species, divergence in the homeodomain is highly significantly accelerated

• This directly suggests selection

0

0.05

0.1

0.15

0.2

mel-sim mel-mau sim-mau

HomeodomainNon-homeodomainIntron

Species compared

Distantly related species

Closely related species

Positive directional selection

C. Ting et al. 1998 Science 282:1501-1504

Page 38: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

The McDonald-Kreitman test

• Compares non-synonymous and synonymous site divergence between species, and non-synonymous and synonymous site diversity within species, in the same gene

• If variants at both kinds of sites were neutral, the numbers of substitutions at the two kinds of sites between two species should be in the same ratio as the polymorphism within either species, assuming equilibrium between drift and mutation:

Neutral divergence = 2T

Neutral diversity = 4Ne

Page 39: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• If the ratio of non-synonymous variants to synonymous variants for differences between species is greater than the ratio for within-species variation, this suggests positive directional selection

• If the opposite is the case, either purifying selection or balancing selection is acting

Page 40: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Centromeric

histone protein

evolution• Alignment of the Cid proteins of five melanogaster subgroup species with histone H3 proteins from D. melanogaster (2.3 million years divergence )with E. histolytica (> 1 billion years divergence)– The most divergent histone H3 sequences have >75% identity to each other, whereas centromeric H3-like proteins are much more diverged (35–50% identical to histone H3).

Page 41: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Sliding window analysis of Cid50-nucleotide (nt) window, in steps of 10

nt, using all sites

intraspecific polymorphism within D. simulans () interspecific divergence (K)

N-terminal tail region (mostly non-synonymous)

C-terminal core (mostly synonymous substitutions)

or K

Page 42: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Evidence for adaptive evolution in D. melanogaster & simulans Cid

• Polymorphism was studied in D. melanogaster (15 strains) and D. simulans (8 strains), and divergence between them

• Non-synonymous: synonymous (N:S) ratios differ significantly (P < 0.0025) – For divergence between the species = 18:10– For pooled polymorphic sites within the two species = 9:28

• McDonald-Kreitman test for the D. melanogaster lineage (box): P < 0.006

Fixed diffs

Polymorphic sites

Non-syn 8 0

Synonymous

4 9

H. Malik & S. Henikoff2001 Genetics157: 1293-1298

Page 43: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Using data on many different genes, methods have been developed to use the McDonald-Kreitman approach to estimate what fraction of amino-acid differences between D. melanogaster and D. simulans are caused by directional selection.

This fraction is of the order of 25%, a surprisingly high value.

N. Bierne & A. Eyre-Walker 2004 Mol. Biol. Evol. 21: 1350-1360.

Page 44: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Indirect evidence for selection: selective sweeps

• After an advantageous mutation has spread through a population, the level of polymorphism will be reduced across the region (i.e. at closely linked neutral sites)

• This is because a unique selectively favourable mutation may arise at a site in a DNA sequence that is completely linked to a polymorphic variant segregating in a population

J. Maynard Smith & J. Haigh 1974 Genet. Res. 12: 12-35.

Page 45: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

A selective sweep fixes variants linked to the selected site

It is a form of hitch-hiking:

• as the black (advantageous) variant increases in frequency in a population, it causes low diversity at closely linked sites in a sequence (white circles)

Page 46: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

A recent selective sweep is detectable if the time since selective substitution is sufficiently small (around 0.25Ne generations), but there is a lot of noise

Page 47: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• It is also possible to work out the frequencies at which variants are expected to be found in equilibrium populations, under both neutrality and selection

– Under neutrality, most variants are expected to be quite rare

• If selection is operating on the sequence, it will affect the frequencies of variants in the sample

– This forms the basis for some tests for selection, and methods for estimating the intensity of selection.

Indirect evidence for selection: statistics of variant frequency distributions

Page 48: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• Assuming neutrality and equilibrium, the expected value of both and w = 4Ne

• If ≠ w, it suggests the possibility of selection– If there are excess rare variants, compared with what is expected

under neutrality, this suggests purifying selection– Excess high frequency variants might suggest balancing selection or

the presence of advantageous mutations spreading in the population

• BUT there are two problems– We have to test whether the difference could be produced by chance– The population may not have been constant in size, as assumed in

the model, and so its demographic history may cause ≠ w

Page 49: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Statistical tests must be used!

• Things we estimate from a sample may look very different from the average that is expected

• Statistical tests are necessary to decide whether a sample could not have arisen by a process of neutral mutation and drift. Only if we can say this, can we conclude that something such as selection has affected the sequences.

• Neutrality is used as a null hypothesis

Page 50: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Fixed advantageous mutation

One haplotype selected, then new neutral variants occur

< w , Tajima’s D < 0

The spread of an advantageous mutation affects diversity very much like a bottleneck, but only on the region around the gene

Extreme bottleneckOne haplotype present,

then new neutral variants occur

< w , negative Tajima’s D

Page 51: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Evidence for a selective sweepon the neo-X chromosomeof D. miranda

D. Bachtrog 2003 Nat. Genet. 34: 215-219.

Page 52: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Genome scans for selective sweeps

There is currently a lot of interest in using scans of variability across the genome, to look for patterns that suggest a recent selective sweep.

The hope is that this will lead to identification of the mutations that have been favoured by selection.

Page 53: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

One subject of study is non-African populations of D. melanogaster and D. simulans, which are believed to have originated relatively recently (10,000 years ago??) from ancestral African populations.

They must have adapted to their new environments. It should be possible to see which regions of the genome show evidence of selective sweeps.

The problem is that they have also gone through bottlenecks of small population size, which has similar effects to sweeps, but are distributed over the whole genome.

Page 54: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

B. Harr et al. (2002) Proc. Natl. Acad. Sci. USA 99, 12949-12954

Relative values of microsatellite (A) and sequence diversity (B) in

non-African and African populations of D. melanogaster

Page 55: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Q is the probability of getting as many as the observed number ofpolymorphisms in the European sample on a bottleneck modelEmpty and filled circles indicate sig. negative or positive Tajima’s D.

Scan of 250 approximately 500 bp non-coding sequences across the X chromosome of mel

(L. Ometto et al. 2005 M.B.E. 22: 2119-2130)

Page 56: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Some recent research problems in my lab

• What is the typical magnitude of selection on mutations that alter codon usage?

• Are non-coding sequences evolving neutrally?

Page 57: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• The genetic code is degenerate; there are at least two codons for each amino-acid except methionine and tryptophan

• The 3rd coding position is often redundant, so that at least some changes in it frequently result in no change in the protein sequence

Page 58: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• The genetic code is degenerate; there are at least two codons for each amino-acid except methionine and tryptophan

• The 3rd coding position is often redundant, so that at least some changes in it frequently result in no change in the protein sequence

Page 59: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• It might be thought that synonymous changes would have no effect on fitness, so that such changes could be treated as selectively neutral

• If this is so, the frequency with which codons corresponding to a particular amino-acid are used should correspond to the frequencies with which they would be expected to be produced by randomly combining their constituent nucleotides

• It quickly became apparent in the early days of DNA sequencing that this was not the case, and that there is considerable codon usage bias in many species

Page 60: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• The proportion of codons in a gene that are preferred (major codons) provides an index of overall codon bias (major codon usage or MCU)

• A variant of this method has become popular with the advent of databases of levels of gene expression to identify codons that are more frequently used in genes with high levels of expression

• These are often called optimal codons, and the frequency of optimal codons in a gene is known as Fop. This term is now often used for MCU

Page 61: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• An important observation is that there is a general tendency for patterns of codon usage to be fairly consistent across different genes in the genome i.e. the same codons are preferred in different genes, although the level of bias varies considerably, and there are differences between species in the nature of the preferred codons

• General levels of codon usage are well-conserved evolutionarily

Page 62: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Gene Fop ( D. pseudoobscura) Fop ( D. melanogaster)

bcd 0.50 0.52Bruce 0.55 0.56ftz 0.60 0.56Gld 0.60 0.57hb 0.55 0.50hyd 0.21 0.25nop56 0.59 0.64rh1 0.57 0.62rp49 0.67 0.75sry-alpha 0.42 0.57T1 0.66 0.63Xdh 0.64 0.61ade3 0.48 0.48Adh 0.66 0.69Adh-dup 0.56 0.57amd 0.53 0.55Ddc 0.60 0.62dpp 0.44 0.41Eno 0.76 0.77Gpdh 0.48 0.46Lam 0.64 0.64smo 0.50 0.52Uro 0.64 0.64cyp1 0.66 0.67Annx 0.69 0.65Est-5B 0.42 0.40Gapdh2 0.33 0.40Hsp82 0.67 0.70scute 0.63 0.61sesB 0.66 0.72sisA 0.63 0.56Sod 0.69 0.73swallow 0.58 0.55

Average 0.57 0.58

Page 63: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

These facts suggest that the forces affecting the use of preferred codons mainly operate across the whole genome, rather than being specific for individual genes, although the magnitude of these forces varies considerably.

Page 64: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

The evolution of codon usage bias

• In most species there is substantial variation at synonymous nucleotide sites, even in genes with high levels of codon usage bias (of the order of 1-2% per cent diversity per site in many Drosophila species)

• This means that any selection on codon usage must be weak in relation to other evolutionary factors, such as genetic drift and mutation.

• In order to understand codon bias, we need population genetic models that take all three factors into account

Page 65: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Modelling codon usage evolution

(the Li-Bulmer model)

The simplest model that can be made is for a random-mating population with a large number of independently evolving sites

Each site has two alternatives: preferred and unpreferred codons (A versus a)

Page 66: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Evolutionary forces

• Selection for preferred over unpreferred codons

• Mutation in either direction (preferred to unpreferred, and vice-versa).

• Genetic drift (random sampling of allele frequencies). Its effectiveness is inversely related to the effective population size (Ne )

Page 67: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• Selection is less effective at preventing deleterious mutations becoming polymorphic than spreading to fixation.

• It was suggested in 1995 by Hiroshi Akashi that this result could be used to test for present-day selection on codon usage

• This requires a species in which synonymous single nucleotide polymorphisms at numerous codons exist, and in which the ancestral state of each SNP can be inferred

Page 68: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• Polymorphic mutations can then be classified as preferred (P) to unpreferred (U)

• In addition, we need to identify fixed differences from a related species as P U or U P, to check whether codon bias is in evolutionary equilibrium.These differences are assumed to have accumulated in the two focal species since the split between them

Page 69: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• If codon usage is in equilibrium, the numbers of fixations in the two directions must be equal

• Since selection has less of an effect on polymorphic mutations than fixations, we thus expect a deficiency of U P polymorphisms, and an excess of P U polymorphisms

• Mutational bias and mutation rates do not affect these statistics, if codon usage is in equilibrium

Page 70: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

The species of choiceWe have been using three Drosophila species for this purpose:

D. miranda is used for the polymorphism study

D. pseudoobscura is a very close relative (less than 4% silent site divergence from miranda)

D. affinis is a more distant outgroup species (about 23% silent site divergence from the other two)

Codons were classified as preferred (P) versus unpreferred (U), using Akashi’s codon usage table for D. pseudoobscura.

Page 71: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Multiple sequences from one species:polarising a polymorphism

MirPseAffGGGG G,A

Page 72: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Multiple sequences from one species: polarising a fixation

MirPseAffGGGG A

Page 73: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.
Page 74: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.
Page 75: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.
Page 76: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.
Page 77: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Polymorphism/divergence for codon usage changes for 18 X

and autosomal genes P U U P

Fixed 19 12

Polymorphic 37 6

rpd 1.95 0.50

Ratio of rpd values = 3.9

C. Bartolomé et al. 2005 Genetics 169: 1495-1507

Page 78: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

For a sample of n homologous sequences from the population, the expected fraction of P U polymorphisms among both P U and U P polymorphisms is:

= upI0/(up I0 + v[1-p] I1)

where:

I0 is the probability that a P U polymorphism is detected in the sample;

I1 is the probability of detecting a U P polymorphism; p is the proportion of P codons in the sequence;u and v are the mutation rates for P U and U P changes

Page 79: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Ii = ∫1/(2N)

1

{1 - xn - (1-x)

n } φi (x) dx

φ0(x) ∝ x-1 (1- x)-1 (1 - exp γ)−1 {1 - exp γ (1-x)}

Page 80: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

If the Li-Bulmer formula for equilibrium p is substituted into this equation, we get the simple relation:

= I0 /(I0 + I1e - ) i.e. the proportion of P U polymorphisms depends only on = 4Net.

This allows us to use maximum likelihood to estimate the value of and its approximate 95% confidence limits.

Page 81: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• For all 18 genes together, the maximum likelihood of was 2.5 (2-unit support limits 1.5 - 3.8.

• This value is not significantly different from those obtained after dividing the dataset into two groups of genes with low bias (Fop < 0.60, = 2.6) and high bias (Fop > 0.63, = 2.2).

• This lack of an apparent difference may reflect the limited range of Fop values; the average Fopvalues for the low and high bias groups were 0.50 ± 0.024 and 0.66 ± 0.009, respectively.

Page 82: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• These results suggest that Net for mutations changing codon usage in D. miranda is between 0.38 to 0.96, with an ML value of 0.62

• Silent polymorphism data suggest an Ne of about 800,000 for miranda. The selection coefficient s is thus about 8 x 10-7

• This is much lower than previous estimates of Net by Akashi and coworkers for simulans and pseudoobscura (around 1 or more)

• It agrees well with an estimate using the same approach for americana

Page 83: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

GC to AT changes

GC->AT AT->GC

Coding Fixed 30 12Polymorphic 48 4

rpd 1.60 0.33 rc= 4.80

Non-coding Fixed 16 22Polymorphic 13 9

rpd 0.81 0.41 rnc=1.99

Page 84: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• Similar methods to those applied to P and U codons can be applied to GC content at 3rd coding positions (GC3); to explain the observed mean value of 69% with the estimated level of selection requires a mutational bias of over 3-fold in favour of GC to AT mutations

• This predicts a GC content of 23% for non-coding sequences, if these are evolving neutrally, as opposed to an observed value of around 36%

• The implication is that non-coding sequences are subject to non-neutral evolution, despite our failure to detect it.

Page 85: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

XAutosomeYNeo-X Neo-Y

Formation of a neo-Y chromosome

Page 86: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

• The two autosomal copies in males segregate with the sex chromosomes in the first division of meiosis, in such a way that one always accompanies the X into a sperm, and the other accompanies the Y.

• The lack of crossing over in male Drosophila means that the neo-Y chromosome is immediately placed in a genetic environment that is identical to that of the true Y chromosome.

Page 87: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.
Page 88: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

From: Bachtrog & Charlesworth (2002) Nature 416: 323-326.

Page 89: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Relaxed selection on codon usage

Fixations were assigned to the neo-X and neo-Y branches, subsequent to the neo-X/neo-Y split

Neo-X Neo-Y

P U 15 47 p = 0.014

U P 7 4

Bartolomé and Charlesworth 2006 Genetics 174:2033-2044

Page 90: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

Polymorphisms on the neo-X versus the neo-Y

On a Mantel-Haenszel test, there is a significant excess (p <0.001) of non-synonymous relative to silent polymorphisms on the neo-Y compared with the neo-X, indicating a relaxation of purifying selection on the neo-Y.

Page 91: Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh.

ACKNOWLEDGEMENTS• THE HARD EXPERIMENTAL WORK: Doris Bachtrog, Carolina

Bartolomé, and Soojin Yi

• HELP WITH FLY-COLLECTING: Deborah Charlesworth

• PROVISION OF LAB FACILITIES ON COLLECTING TRIP: Dan Barbash, Chuck Langley

• IDENTIFICATION OF MIRANDA STRAINS: Doris Bachtrog

• TECHNICAL ASSISTANCE: Helen Borthwick and Helen Cowan

• MONEY: BBSRC, Royal Society

• THEODOSIUS DOBZHANSKY: for discovering D. miranda 71 years ago, and for the posthumous loan of his field microscope