Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN...

8
Copyright 0 1992 by the Genetics Society of America Evolutionary Consequences of DNA Mismatch Inhibited Repair Opportunity Wolfgang Stephan* and Charles H. Langley? *Department of Zoology, University of Maryland, College Park, Maryland 20742, and ?Center for Population Biology and Department of Genetics, University of Calijornia, Davis, Calijornia 95616 Manuscript received April 1, 1992 Accepted for publication June 6, 1992 ABSTRACT Double strand breaks (DSBs) are often repaired via homologous recombination. Recombinational repair processes are expected to be influenced by nucleotide heterozygosity through mismatch detection systems. Unrepaired DSBshave severe biological consequences and are often lethal. We show that natural selection due to inhibition of recombinational repair associated with polymorphisms could influence their molecular evolution. The main conclusionsfromthisanalysis are that, for increasing population size, mismatch detection leads to a limit on average heterozygosity of otherwise selectively neutral polymorphism, an excess of rare variants, and a slowing down of the rate of neutral molecular evolution. The first two resultssuggestthatmismatchdetection may account for the surprisingly narrow range of observed average heterozygosities, given the great variation in population size between species. T HE efficient repair of double strand breaks (DSBs) in yeast is greatly reduced when homol- ogous chromosomes are diverged in sequence (RES- NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity of a DSB can be expected to reduce the stability (and thus availability) of homolo- gousheteroduplexesthatmustbethesubstrateof DSB repair in cells in the GI stage of the cell cycle. While mismatches in heteroduplexes that survive re- combinational repair may be simply corrected, there is some evidence that the “correction”process can be more complicated (HASTINGS 1984; BORTS and HA- BER 1987). Mismatch detection processes have been shown to be sensitive to single mismatches (RADMAN 1988). This fact suggests to us that the efficacy of recombinational repair could be affected by the levels of nucleotide sequence heterozygosity (0.005 to 0.02; see AGUAD~, MIYASHITA and LANGLEY 1989; AQUADRO, LADO and NOON 1988) commonly found in most outbreeding organisms. If this were the case then the levels of nucleotide sequence polymorphism might actually be determined by natural selection associated with mismatch inhibited repair opportuni- ties. To investigate this hypothesis we have formulated a specific population genetics model based onthe repair of DSBs. Persuaded by the simplicity and eleganceof the neutral theory of molecular evolution, many biologists accepted that most of molecular polymorphisms within species and divergence between species is ade- quately explained by the simple compounding of two stochastic phenomena: mutation to selectively equiv- alent variants and random genetic drift (KIMURA Genetics 132: 567-574 (October, 1992) 1968). This hypothesis is the nominal theoretical basis for such practical empirical tools as the “molecular clock” and the identification of “conserved and thus possibly functionally important” sequences. The neu- tral theory survives as a scientific hypothesis despite repeated observationsof unexpectedly large variances in the rate of nucleotide substitutions (OHTA and KIMURA 197 1 ; GILLESPIE 1989), largely because there is no competing hypothesis of comparable generality and simplicity (GILLESPIE 1987). Our purpose here is not to provide an alternative hypothesis but to argue that natural selection associated with mismatch inhib- ited repair opportunities may be an important evolu- tionary process in determiningnaturallyoccurring molecular genetic polymorphism and divergence. It may also account for a vexing old inadequacy of the neutral theory, i.e., the inability to explain the narrow range of observed average heterozygosities (LEWON- TIN 1974). RAYSSIGUIER, THALER and RADMAN (1989) have presented evidence that this same process inhib- its exchange of genetic information between different species of bacteria and suggested that it may play a direct role in speciation. These considerations are beyond the scope of this paper. MODEL Our model assumes that three processes are acting in an outbreeding, diploid population. The first two are the familiar random genetic drift due to sampling of a finite number of random gametes each generation and mutation at individual sites to selectively equiva- lent alleles. We propose a third and novel force: phenotypic selection due toreduced viability and fer- tility associated with mismatch inhibited repair oppor-

Transcript of Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN...

Page 1: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

Copyright 0 1992 by the Genetics Society of America

Evolutionary Consequences of DNA Mismatch Inhibited Repair Opportunity

Wolfgang Stephan* and Charles H. Langley? *Department of Zoology, University of Maryland, College Park, Maryland 20742, and ?Center for Population Biology and

Department of Genetics, University of Calijornia, Davis, Calijornia 95616 Manuscript received April 1, 1992

Accepted for publication June 6 , 1992

ABSTRACT Double strand breaks (DSBs) are often repaired via homologous recombination. Recombinational

repair processes are expected to be influenced by nucleotide heterozygosity through mismatch detection systems. Unrepaired DSBs have severe biological consequences and are often lethal. We show that natural selection due to inhibition of recombinational repair associated with polymorphisms could influence their molecular evolution. The main conclusions from this analysis are that, for increasing population size, mismatch detection leads to a limit on average heterozygosity of otherwise selectively neutral polymorphism, an excess of rare variants, and a slowing down of the rate of neutral molecular evolution. The first two results suggest that mismatch detection may account for the surprisingly narrow range of observed average heterozygosities, given the great variation in population size between species.

T HE efficient repair of double strand breaks ( D S B s ) in yeast is greatly reduced when homol-

ogous chromosomes are diverged in sequence (RES- NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity of a DSB can be expected to reduce the stability (and thus availability) of homolo- gous heteroduplexes that must be the substrate of DSB repair in cells in the GI stage of the cell cycle. While mismatches in heteroduplexes that survive re- combinational repair may be simply corrected, there is some evidence that the “correction” process can be more complicated (HASTINGS 1984; BORTS and HA- BER 1987). Mismatch detection processes have been shown to be sensitive to single mismatches (RADMAN 1988). This fact suggests to us that the efficacy of recombinational repair could be affected by the levels of nucleotide sequence heterozygosity (0.005 to 0.02; see AGUAD~, MIYASHITA and LANGLEY 1989; AQUADRO, LADO and NOON 1988) commonly found in most outbreeding organisms. If this were the case then the levels of nucleotide sequence polymorphism might actually be determined by natural selection associated with mismatch inhibited repair opportuni- ties. To investigate this hypothesis we have formulated a specific population genetics model based on the repair of DSBs.

Persuaded by the simplicity and elegance of the neutral theory of molecular evolution, many biologists accepted that most of molecular polymorphisms within species and divergence between species is ade- quately explained by the simple compounding of two stochastic phenomena: mutation to selectively equiv- alent variants and random genetic drift (KIMURA

Genetics 132: 567-574 (October, 1992)

1968). This hypothesis is the nominal theoretical basis for such practical empirical tools as the “molecular clock” and the identification of “conserved and thus possibly functionally important” sequences. The neu- tral theory survives as a scientific hypothesis despite repeated observations of unexpectedly large variances in the rate of nucleotide substitutions (OHTA and KIMURA 197 1 ; GILLESPIE 1989), largely because there is no competing hypothesis of comparable generality and simplicity (GILLESPIE 1987). Our purpose here is not to provide an alternative hypothesis but to argue that natural selection associated with mismatch inhib- ited repair opportunities may be an important evolu- tionary process in determining naturally occurring molecular genetic polymorphism and divergence. It may also account for a vexing old inadequacy of the neutral theory, i .e. , the inability to explain the narrow range of observed average heterozygosities (LEWON- TIN 1974). RAYSSIGUIER, THALER and RADMAN (1 989) have presented evidence that this same process inhib- its exchange of genetic information between different species of bacteria and suggested that it may play a direct role in speciation. These considerations are beyond the scope of this paper.

MODEL

Our model assumes that three processes are acting in an outbreeding, diploid population. The first two are the familiar random genetic drift due to sampling of a finite number of random gametes each generation and mutation at individual sites to selectively equiva- lent alleles. We propose a third and novel force: phenotypic selection due to reduced viability and fer- tility associated with mismatch inhibited repair oppor-

Page 2: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

568 W. Stephan and C . H. Langley

3- DSB DSB

_I 4- r l l _ I

171 .II unstable heteroduplex G= stable heteroduplex J

/ -+ w)d - 3’

no rep- repair - A

FIGURE 1 .-Model of recombinational repair of a DSB. The initial step in the repair of a DSB is the searching of the free 3’ ends of the broken strands for sequence similarity on the homologous strands. Repair can proceed, if a stretch of perfect match, A, is found adjacent to the DSB and a stable heteroduplex is formed (right panel). If there is a mismatch within the heteroduplex, mismatch detection enzymes may destabilize the structure and prevent further steps in the repair of the DSB (left panel). +/- indicates a heterozygous nucleotide site.

tunities. Figure 1 depicts the basic mechanism leading to such selection associated with DSBs. In this simplest hypothesis a DSB “spontaneously” occurs in a cell at G I . While a DSB may be a very rare event, there are an enormous number of critical cells at risk in multi- cellular organisms. DSBs are likely to lead to chro- mosome loss, if not repaired correctly (RESNICK, SKAANILD and NILSSON-TILLGREN 1989). According to the models of RESNICK (1976) and SZOSTAK et al. (1983), the initial step in the repair of a DSB is the searching of the genome for homologous regions. This process is generally considered to involve the formation of heteroduplexes between the single- stranded ends at the break and a homologous region (Figure 1). When there are too many mismatches (perhaps even one) within a heteroduplex region, mismatch detection proteins could promote the dis- sociation of the strands, thus preventing any repair as in the case of homologous regions of Escherichia coli and Salmonella typhimurium (RAYSSICUIER, THALER and RADMAN 1989). Thus, we assume that adjacent to a site of a DSB there must be a length of sequence, A, that forms a stable heteroduplex in the homologous region in order for the recombinational repair process to proceed. If there are nucleotide sequence heterozy- gosities in the heteroduplex region, there must be an associated reduction in fitness due to mismatch inhib- ited repair opportunities. This reduction can be inter- preted as (a) the cumulative somatic and germ line effects of possible chromosome loss (if repair fails), (b) mutagenesis (if misrepair follows) or (c) even less efficient development, if the cell cycle is lengthened by repair (HARTWELL and SMITH 1985).

Analytical approximations: We consider a random mating diploid population of size N and assume that the pattern of DNA polymorphism is determined by (two-way) mutation, i .e., base changes occurring at a uniform rate per generation to developmentally and

ecologically equivalent states, random genetic drift, and natural selection associated with mismatch inhib- ited repair opportunities. T o describe the effects of natural selection, we make the following assumption. Consider an individual heterozygous at a particular nucleotide position. This individual is assumed to suffer a reduction in fitness (relative to homozygotes) due to mismatch inhibited repair opportunities asso- ciated with this heterozygosity. In general, the reduc- tion in fitness is a function of the density and fre- quency distribution of heterozygosities in the neigh- borhood of the site under consideration. We assume that the influence of these heterozygosities leads to an average reduction, s, in fitness due to a DSB which can be approximated by

2A-1

s = 2 SjH(1 - m-1. (1) i= I

Here, si describes the reduction in fitness due to a DSB adjacent to the heterozygous site under consid- eration, if the next heterozygous site is i nucleotides away (in either of the two directions). An explicit model of DSB repair for calculating si is given below. H is the average heterozygosity per nucleotide in the region around this heterozygous site, and H( 1 - H)”’ is the probability that the next heterozygous site is i nucleotides away from the reference site. The factor 2 takes into account that heterozygous sites on both sides of the reference site have to be considered. The basic assumptions underlying (1) are (i) that there is a sufficient level of meiotic recombination to assure that the heterozygosities at each site are identically distrib- uted and independent stochastic random variables, and (ii) that the distribution of heterozygosities can be described by the mean. The validity of these assump- tions is examined by Monte Carlo simulations (see below). It follows that, given H , s is constant and that the frequency distribution of a variant for each site

Page 3: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

Evolutionary Effects of DNA Repair 569

can be described by WRIGHT’S (1 937) formula:

Ax) = CxS-’(l - x)e”exp[-2ax(l - x)], (2)

where 8 = 4Np, a = 2Ns (a > 0), and C is the normalization constant.

s can be expressed in terms of genetic parameters using an explicit DSB-repair model. The basic as- sumption of this model is that the repair of a DSB requires a minimum amount of sequence identity, a stretch A of perfect match adjacent to the DSB (Figure 1). Consider a heterozygous nucleotide site. Let the reduction in fitness due to a DSB adjacent to this site be si, if the next heterozygous site is i nucleotides away, and let /3 be the rate of DSBs (per nucleotide) times the sum of fitness effects of each unrepaired break through the life of an individual. Then, si can be expressed by pi times the fraction of unrepairable DSBs. If i d A, no DSB can be repaired, because stable heteroduplexes cannot be formed (Figure 1). For A < i < 211 only the fraction 2(i - A)/ i can be repaired, while for i 3 2 8 repair is always possible. Hence,

si = { ;kA - i), A < i < 2 A (3) 1 s i c A

i 3 211.

Using (3), Equation 1 can be simplified to

2/3[(1 - H)” - 112 H

s = (4)

Furthermore, combining Equations 2 and 4 leads to an implicit equation for average heterozygosity

1

H = 2[x(1 - x)]f(x,H)dx. ( 5 )

In general, this equation can only be solved numer- ically. However, asymptotic approximations are pos- sible for small and large population sizes. If population size is small, such that a < 1 and 6 << 1, then the exponential function in (2) can be expanded into a Taylor series (up to the first-order term) and the integrals in Equation 5 can be expressed as beta func- tions B(. , . ) . Thus, the integral of xe(l - x)@ exp[-2ax (1 - x)] over the unit interval [0 , 11 is approximately B(6 + 1,6 + 1) - 2aB(6 + 2,6 + 2), and the normalizing constant can be approximated by C” = B(8,O) - 2&(6 + 1, 8 + 1). Because of 8 << 1, a straightforward calculation then leads to the small population size approximation

H z O(l - a/3), a d 1. ( 6 4

This equation shows an interesting consequence of the proposed mechanism. Although random genetic drift dominates for a G 1, average heterozygosity can be significantly different from its neutral equilibrium value 8. We obtain more insight into this effect by expressing a in terms of the model parameters N, p,

0.01 2.

0.01

g 0.008. c)

v) .- h

0.006. 3 2

2 0.0°4

al

0.002.

0. I 0

I I I I

500 1000 1500 2000

Population Size (2N) FIGURE 2.-Average heterozygosity, H , as a function of the

(haploid) population size, 2N. The straight solid line represents the prediction of the neutral theory, the smooth lower line is the result of the analytical approximation. The latter was obtained by solving Equation 5 by means of Newton’s procedure. In addition, the simulation results are shown (with standard errors). The fixed parameters are: fi = 0.01, p = 0.00001 17, A = 10, c = 0.1 and chromosome length = 64 nucleotides. Under selective neutrality heterozygosity per nucleotide site increases proportionally to pop- ulation size. When selection associated with mismatch inhibited repair opportunities is operating, H is an increasing function of N but levels off to become independent of population size.

p, and A, using Equation 4 in a simplified form. The right-hand side of (4) is approximately linear in H , if H A << 1. This assumption is probably realistic for many species. For instance, for Drosophila melano- gaster average nucleotide heterozygosity is around 0.005 (AGUAD~, MIYASHITA and LANGLEY 1989), and values of A are less than 100 bp for E. coli (SHEN and HUANG 1986) and S. cerevisiae (SUGAWARA and HABER 1992) and may be as low as between 10 and 20 bp (AYARES et al. 1986). With H A << 1, Equation 4 can be approximated by s = 2pA2H. Combining this for- mula with (6a) leads to

4NpA28 1 + 4/3NpA28‘

a =

The parameters for mutation and DSB repair proc- esses appear in this expression multiplicatively, and population size enters in a quadratic fashion. This indicates that population size has a strong effect on the reduction of heterozygosity below its neutral value, as N increases. As a consequence, H changes abruptly from the neutral domain, where H scales linear with N, to a domain in which H is no longer sensitive to an increase in population size (see below and Figure 2).

For large population sizes an approximation is also possible. This limit is independent of N. Perhaps the most elegant solution is based on Laplace’s method for asymptotic expansions of integrals (BLEISTEIN and

Page 4: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

570 W. Stephan and C . H. Langley

HANDELSMAN 1975, Chap. 5.1). Write the normalized constant as

c-1 = i”? 2[x(1 - x)”exp[4N@(x)] dx

and H as

H = C il’? 4 exp[4N+(x)] dx,

where @(x) = p In[x( 1 - x)] - sx( 1 - x). Then, noting that the maximum of @ in [0, 1/21 is approximately at x = I / / S and using formula (5.1.21) of BLEISTEIN and HANDELSMAN (1 975), we find asymptotically

H -. 2P S

(7a) is identical with the corresponding result which can be obtained from deterministic selection theory (CROW and KIMURA 1970, Chap. 6.2). Combining of (4) and (7a) leads to the following asymptotic formulas for average heterozygosity

or, because of A >> 1 ,

r H x A” v:.

Simulations: T o evaluate the adequacy of our ap- proximations a computer program was written. The basic structure of this program was as follows. The population each generation was represented as a 2N array (chromosomes) of 64 sites. Populations of larger arrays were investigated and found to give similar results (data not shown). Each site could assume any of the four states (0, 1, 2, 3 analogous to A, T, C, G). The creation of the next generation involved the sampling of 2N gametic chromosomes. The parental 2N chromosomes are randomly ordered into N pairs (zygotes). A zygote was sampled (with replacement). Its fitness was determined according to the rule in Equation 3, by summing over heterozygous sites and assuming the chromosome was circular. The chosen zygote’s fitness was compared to a pseudorandom number drawn from a uniform (0, 1) distribution. If its fitness was less a new zygote was sampled. If it was equal to or greater than the random number one gamete was sampled from this zygote after recombi- nation. A gamete was sampled from the zygote at random. This was done with one randomly distributed crossover if a (0, 1) uniform pseudorandom number was less than the recombination parameter, c. After the 2N new gametes were assembled a pseudo-Poisson random number of mutations (expected = 4Np times

number of sites) were distributed randomly among all the sites of all the 2N chromosomes. The mutation model was simple and independent of state, i e . , a mutated site became 0, 1, 2 or 3 with equal probability (=0.25). Thus there was an “effective” mutation only three quarters of the time. Through preliminary analysis (data not shown) it was determined that the number of segregating sites and the average number of site differences between chromosomes ( H ) are at stationarity. Thus the populations were sampled after 4N generations every 2N generations. Unless other- wise indicated results such as average number of seg- regating sites, heterozygosity ( H ) , and rate of diver- gence were estimated in this way. The numbers of generations the simulations were run for each set of parameters depended on the variability which is re- flect in the standard deviation (see Table 1).

NUMERICAL AND SIMULATION RESULTS

Average nucleotide heterozygosity: Table 1 shows simulation and numerical results for average hetero- zygosity as a function of different mutation rates, selection intensities, and recombination rates (fixed parameters: A = 10 and 2N = 1000). The most important aspect of these results is that the level of heterozygosity does only weakly depend on recombi- nation, if at all. This justifies assumption (i) leading to Equation 1. Furthermore, a comparison of numerical and simulation results shows that our analytical ap- proximations are reasonably good as long as selection is relatively weak (p c 0.001). For /3 = 0.001, approx- imations and simulations are in good agreement, whereas the neutral predictions are far off. However, for strong selection discrepancies become apparent. We have investigated these discrepancies further for varying population sizes. For relatively weak selection (p = 0.001), appreciable differences were not ob- served, even for 2N = 2000 (results not shown). However, for = 0.01 we found large differences between the numerical and simulation results, when N becomes large (Figure 2). Another interesting find- ing is that, in contrast to average heterozygosity, the mean number of segregating sites increases linearly with population size, at least in the parameter range examined (Figure 3; same parameter values as in Figure 2). These results indicate that the distribution of heterozygosities is skewed toward rare polymor- phisms. Thus, the DSB model predicts an excess of rare variants. Due to this skewness, an approximation of the distribution by the mean is not appropriate for strong selection.

The important aspects of the proposed model show up in both approximation and simulation results. For small population sizes average heterozygosity ap- proaches the neutral limit H = 0, ie., increases pro- portionally to population size. For larger N, average

Page 5: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

Evolutionary Effects of DNA Repair

TABLE 1

Average site heterozygosity as a function of mutation, selection and recombination

57 1

~ ~~~ ~~ ~ ~~

Recombination rate Selective

Mutation rate value (@) c = o c = 0.1

Low ( p = 0.0000 1 17) 0 (neutral) 0.0236 (0.0020) 0.0231 (0.0017)

0.00 1 0.0128 (0.0010) 0.01 39 (0.0009)

0.01 0.0071 (0.0006) 0.0074 (0.0006)

0.0224 0.0224

0.0122 0.0122

0.0038 0.0038

High ( p = 0.000075) 0 (neutral) 0.1377 (0.0078) 0.1257 (0.0032)

0.001 0.0382 (0.0015) 0.0424 (0.0015)

0.01 0.0138 (0.0005) 0.0126 (0.0005)

0.1250 0. I250

0.0360 0.0360

0.0094 0.0094

This table shows theoretical and simulation results for expected nucleotide heterozygosity. The standard deviations of the simulated values are given in parentheses. Standard deviation is calculated as the square root of the variance of the mean number of differences between chromosomes sampled every 2N generations. The theoretical values (in italics) were obtained by solving Equation 5. The neutral values were derived using a one-parameter Jukes-Cantor model. The fixed parameters are: 2N = 1000, A = 10, and chromosome length = 64 nucleotides.

FIGURE 3.-Mean number of segregating sites in the entire pop- ulation as a function of population size, 2N. The number of segre- gating sites was counted every 2N generations. The parameter values are the same as in Figure 2: @ = 0.01, p = 0.00001 17, A = 10, c = 0.1 and chromosome length = 64 nucleotides.

heterozygosity levels off to an asymptotic value which is independent of population size. The deterministic value of average heterozygosity is given by Equation 7b. Furthermore, our model predicts a relatively ab- rupt change of average heterozygosity from the neu- tral domain, where H is linear in N , to a domain in which H is no longer sensitive to an increase in pop- ulation size (see Equation 6b and Figure 2). Equation 6a and Figure 2 show that average heterozygosity is approximately one third below its neutral value, when a! is around 1, ie., when drift dominates the evolu- tionary process. Figure 2 also indicates that H becomes insensitive to population size between 2N = 300 and 500, which corresponds to a! values between 2.7 and 6.0 (these a! values were calculated using Equation 4 and the simulation results for H ) . This insensitivity to population size may, at least in part, explain why heterozygosities of most species fall within a very narrow range. It is also interesting to note that the

0.005

0 1 4 6 8 10 12 14 16

Match Length (A) FIGURE 4.-Average heterozygosity, H , as a function of the

match length, A. The smooth line shows the theoretical results obtained by solving Equation 5 numerically. The simulation results (with standard errors) are the boxes. The parameter values are: 2N = 500, B = 0.01, p = 0.0001 17, c = 0.1 and chromosome length = 64 nucleotides.

match length A has a stronger impact on the upper bound of average heterozygosity than the mutation rate P (see Equation 7c). The l/A-behavior of average heterozygosity was confirmed by the simulations (Fig- ure 4).

Rate of molecular evolution: The effect of selec- tion associated with mismatch inhibited repair oppor- tunities on the rate of molecular evolution, K, is shown in Figure 5 . For small population sizes, the deviation from the neutral limit k = p is not great. The rate of molecular evolution drops off with increasing popu- lation size. The simulations indicate that for large population sizes K approaches a level (>O) which is independent of N . The asymptotic value of K appears to be +/(A + 1). That means that only polymor-

Page 6: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

572 W. Stephan and C. H. Langley

B 6 1.0.. - 6N Generations

- 2N Generations - A 4N Generations .*

0 M

2 8.0" -0- ?'heoretical

b

L O "

.- - 0 6.0.- v)

2 2 2.0" d c2 0.0- - .- .- -

0.0 500 1000 1500 2000

Population Size (2N)

FIGURE 5.-Rate of molecular evolution as a function o f sample size, 2N. The number of substitutions was calculated every 2 N , 4 N , and 6N generations, respectively. The theoretical curve was ob- tained using Equations 4 and 8. The fixed parameters are chosen as in Figure 2: 6 = 0.01, = 0.0000117, A = 10, c = 0.1 and chromosome length = 64 nucleotides.

phisms that are more than the match length A apart from each other can drift to fixation. Thus, the fixa- tion process looks locally like a single-file diffusion; i e . in a chromosomal region of length A + 1, at any one time at most only one mutation can go to fixation. I t is interesting to compare the simulation results to the theory developed above. This can be done using the formula of LANDE (1979) for underdominant se- lection and Equation 4. Hence,

I

where erf is the error function. Figure 5 shows that this approach is only useful for small population sizes. Similar to average heterozygosity, the approximation results are systematically lower than the simulation results. For large populations, k drops down to 0 almost exponentially, since the error function in (8) approaches 1.

DISCUSSION

The ubiquity of mismatch inhibited repair oppor- tunities and the severity of the cost of failure to repair correctly DSBs and other damages to the DNA in both somatic and germline cells suggests to us that mutation and recombination may not be the only significant genetic mechanisms shaping standing mo- lecular population genetic variation and molecular evolution. The potential impact of mismatch detection may extend to control the accumulation and distri- bution of middle-repetitive, dispersed DNA sequences or transposable elements (RAYSSIGUIER, THALER and RADMAN 1989). If heterozygosities inhibit heterodu- plex formation between homologous unique sequence DNA, then exonucleases may degrade the ends of the DSB (Figure 6), until they reach a repetitive sequence

repetitive

DNA 3 I no repair

+

I Exonuclease 1 Degradation

I ectopic misrepair

nonhomologous pairing

"

FIGURE 6.-Interaction of mismatch detection and repetitive, dispersed DNA sequences in recombinational repair of DSBs. As in Figure 1, left panel, the initial heteroduplex is unstable, because mismatch detection enzymes responding to heterozygosity promote its dissociation. 3' Exonuclease may degrade the end, until a repet- itive sequence (hatched block) is exposed. This sequence may readily form a heteroduplex with one of the many other copies of the fimily located throughout the genome in rlorlllomologous sites. An (ectopic) exchange arising from this heteroduplex is likely to cause a severely deleterious chromosomal rearrangement. Only a single strand of each homolog is shown.

(RAY, MACHIN and STAHL 1989). These repetitive ends may more readily form a (nonhomologous) het- eroduplex with one of the many other members of the middle-repetitive DNA family and thus give rise to ectopic exchanges. The chromosomal re- arrangements produced by these asymmetric ex- changes are expected to be highly deleterious. This mechanism has been proposed as force that contains replicative transposable elements in outbreeding p o p ulations (LANGLEY et al. 1988). The natural selection attributable to the interaction between mismatch in- hibited repair opportunities associated with nucleotide sequence heterozygosity and increased levels of ec- topic exchange among transposable elements (or other middle repetitive disperse DNAs) might give rise to a negative correlation (among species) between the level of nucleotide heterozygosity (and thus also mismatch inhibited repair opportunities) and the den- sity of repetitive, dispersed DNA sequences. This hypothesis may explain the curious differences in mo-

Page 7: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

Evolutionary Effects of DNA Repair 573

lecular genetic variation between the two sibling spe- cies D. melanogaster (AGUAD~, MIYASHITA and LANG- LEY 1989) and Drosophila simulans (AQUADRO, LADO and NOON 1988). Nucleotide sequence heterozygosity (per bp) is higher in D. simulans (0.019 us. 0.005), while the density of middle-repetitive, dispersed DNAs (per kb) is greater in D. melanogaster (0.005 vs. 0.00 1) (CHARLESWORTH and LANGLEY 1989).

While there is considerable evidence that mismatch detection is a ubiquitous process that may inhibit DSB repair (RADMAN 1988), it is difficult to judge the fitness impact of individual heterozygosities. Variation in the rate of occurrence of DSBs and the detectability of different mismatches can be expected to make the impact of polymorphisms quite different. The analysis presented above indicates that @ must be substantially greater than the mutation rate. Little is known about the frequency of DNA damage requiring recombina- tional repair. But the recent results of N. A. NASSIF and W. R. ENGELS (personal communication) show that the rate of reversion of a P element-associated white mutant in heterozygous females is quantitatively repressed by increasing numbers of heterozygosities in the region flanking the site of insertion of the P element. The reversion process is thought to involve the excision of the P element leaving a DSB that is repaired by recombination with sister strand or the homologues (ENGELS et al. 1990; GLOOR et al. 199 1). These observations are certainly consistent with the assumption underlying the hypothesis advanced here. It remains to be demonstrated that any inhibition of reversion (perhaps repair) is associated with cell death or other detrimental effects. It should be noted that @ is assumed to sum these detrimental effects up over all cells and developmental stages of the zygote, so that the cumulative effect might be considerably larger than that of a mutation in a single germ cell lineage.

Our analysis indicates that the natural selection associated with mismatch inhibited repair opportuni- ties may dominate molecular evolution as population size increases. But two of our main quantitative pre- dictions (that selectively neutral molecular evolution will slow down and that most variants will be rare as N increases) are not obvious in the available data (KIMURA 1983; GILLESPIE 1991). This leads to the unavoidable dichotomy that either nucleotide heter- ozygosity has no influence on fitness through the selective consequences of mismatch inhibited repair opportunities in natural populations or that an ade- quate model should allow for a significant amount of positive natural selection acting on other phenotypic effects of many molecular variants segregating in nat- ural populations and substituted over geological time. How much natural selection would be required to bring this theory in line with observations and what

impact would relaxation of our various assumptions have are questions to be addressed. It is, however, clear that novel and potentially significant interactions between DNA repair processes and DNA sequence polymorphisms may represent a significant force in molecular evolution.

We thank C. F. AQUADRO, J. F. CROW, N. L. KAPLAN, M. RADMAN, M. A. RESNICK, M. SLATKIN, F. W. STAHL and J. B. WALSH for valuable comments on the manuscript, and N. A. NASSIF and W. R. ENGELS for permission to cite their unpublished results. This research was supported in part by a grant from the National Institutes of Health to W.S.

LITERATURE CITED

AGUAD~, M., N. MIYASHITA and C. H. LANGLEY, 1989 Reduced variation in the yellow-achaete-scute region in natural popula- tions of Drosophila melanogaster. Genetics 122: 607-615.

AQUADRO, C. F., K. M. LAW and W. A. NOON, 1988 The rosy region of Drosophila melanogaster and Drosophila simulans. 1. Contrasting levels of naturally occurring DNA restriction map variation and divergence. Genetics 1 1 9 875-888.

AYARES, D., L. CHEKURI, K.-Y. SONG and R. KUCHERLAPATI, 1986 Sequence homology requirements for intermolecular recombination in mammalian cells. Proc. Natl. Acad. Sci. USA

BLEISTEIN, N., and R. A. HANDELSMAN, 1975 Asymptotic Expan- sions of Integrals. Holt, Rinehart & Winston, New York.

BORTS, R. H., and J. E. HABER, 1987 Meiotic recombination in yeast: alteration by multiple heterozygosities. Science 237:

CHARLESWORTH, B., and C. H. LANGLEY, 1989 The population genetics of Drosophila transposable elements. Annu. Rev. Ge- net. 23: 251-287.

CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory. Burgess, Minneapolis.

ENGELS, W. R., D. M. JOHNSON-SCHLITZ, W. B. EGGLESTON and J. SVED, 1990 High-frequency P element loss in Drosophila is homolog dependent. Cell 62: 515-525.

GILLESPIE, J. H., 1987 Molecular evolution and the neutral allele theory. Oxf. Surv. Evol. Biol. 4: 10-37.

GILLESPIE, J. H., 1989 Lineage effects and the index of dispersion of molecular evolution. Mol. Biol. Evol. 6: 636-647.

GILLESPIE, J. H., 1991 The Causes of Molecular Evolution. Oxford University Press, Oxford.

GLOOR, G. B., N. A. NASSIF, D. M. JOHNSON-SCHLITZ, C. R. PRES- TON and W. R. ENGELS, 1991 Targeted gene replacement in Drosophila via P element-induced gap repair. Science 253:

HARTWELL, L. H., and D. SMITH, 1985 Altered fidelity of mitotic chromosome transmission in cell cycle mutants of S. cereuisiae. Genetics 1 1 0 381-395.

HASTINGS, P. J., 1984 Measurement of restoration and conver- sion: its meaning for the mismatch repair hypothesis of conver- sion. Cold Spring Harbor Symp. Quant. Biol. 4 9 49-53.

KIMURA, M., 1968 Evolutionary rate at the molecular level. Na- ture 217: 624-626.

KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, London.

LANDE, R., 1979 Effective deme sizes during long-term evolution estimated from rates of chromosomal rearrangements. Evolu- tion 33: 234-251.

LANGLEY, C. H., E. A. MONTGOMERY, R. R. HUDSON, N. L. KAPLAN and B. CHARLESWORTH, 1989 On the role of unequal ex- change in the containment of transposable element copy num- ber. Genet. Res. 52: 223-235.

8 6 5199-5203.

1459-1463.

1110-1117.

Page 8: Evolutionary Consequences of DNA Mismatch Inhibited Repair ...NICK, SKAANILD and NILSSON-TILLGREN 1989). Mis- matches arising from nucleotide sequence heterozy- gosity in the vicinity

574 W. Stephan and C. H. Langley

LEWONTIN, R. C., 1974 The Genetic Basis of Evolutionary Change. Columbia University Press, New York.

OHTA, T., and M. KIMURA, 1971 On the constancy of the evolu- tionary rate of cistrons. J. Mol. Evol. 1: 18-25.

RADMAN, M., 1988 Mismatch repair and genetic recombination, pp. 169-192 in Genetic Recombination, edited by R. KUCHER- LAPATI and G. R. SMITH. American Society for Microbiology, Washington, D.C.

RAY, A., N. MACHIN and F. W. STAHL, 1989 A double chain break stimulates triparental recombination in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 8 6 6225-6229.

RAYSSIGUIER, C., D. S. THALER and M. RADMAN, 1989 The barrier to recombination between Escherichia coli and Salmo- nella typhimurium is disrupted in mismatch-repair mutants. Nature 342: 396-40 1 .

RESNICK, M. A., 1976 The repair of double strand breaks in DNA: a model involving recombination. J. Theor. Biol. 5 9 97-106.

RESNICK, M. A., M. SKAANILD and T. NILSSON-TILLGREN, 1989 Lack of DNA homology in a pair of divergent chromosomes greatly sensitizes them to loss by DNA damage. Proc. Natl. Acad. Sci. USA 86: 2276-2280.

SHEN, P., and H. V. HUANG, 1986 Homologous recombination in Escherichia coli: dependence on substrate length and homology. Genetics 112: 441-457.

SUGAWARA, N., and J. E. HABER, 1992 Characterization of dou- ble-strand break-induced recombination: Homology require- ments and single-stranded DNA formation. Mol. Cell. Biol. 12:

SZOSTAK, J. W., T. L. ORR-WEAVER, R. J. ROTHSTEIN and F. W. STAHL, 1983 The double-strand break repair model for re- combination. Cell 33: 25-35.

WRIGHT, S., 1937 The distribution of gene frequencies in popu- lations. Proc. Natl. Acad. Sci. USA 23: 307-320.

563-575.

Communicating editor: M. SLATKIN