NP Completeness of Kauffman’s N-k Model, a Tuneably Rugged … · 2018-07-03 · NP Completeness...

NP Completeness ofKauffman’s N-k Model, aTuneably Rugged FitnessLandscapeEdward D. Weinberger

SFI WORKING PAPER: 1996-02-003

SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu

SANTA FE INSTITUTE

NP Completeness of Kau�man�s N�k Model�

a Tuneably Rugged Fitness Landscape

Edward D� Weinberger

Max�Planck�Institut f�ur biophysikalische Chemie

Postfach ��

Am Fassberg

D�� G�ottingen�Nikolausberg

Federal Republic of Germany

The concept of a ��tness landscape�� a picturesque term for a mapping of the vertices of a

�nite graph to the real numbers� has arisen in several �elds� including evolutionary theory�

The computational complexity of two� qualitatively similar versions of a particularly simple

�tness landscape are shown to di�er considerably� In one version� the question �Is the global

optimum greater than a given value V�� is shown to be answerable in polynomial time by

presenting an ecient algorithm that actually computes the optimum� The corresponding

problem for the other version of the landscape is shown to be NP complete� The NP

completeness of the latter problem leads to some speculations on why P �� NP�

�

Introduction

The notion of an adaptive �landscape representing the abstract ��tness of various kinds

of organisms in various contexts has been a �xture of evolutionary biology ever since it was

proposed by Sewell Wright �� Although there are problems with a notion of ��tness

that is a property of an individual� independent of other individuals and the environment�

the discoveries of molecular biology have signi�cantly reinforced the power of this idea�

We now understand� for example� the role of a discrete genomic �blueprint in specifying

the chemical constitutents of enzymes and a glimmering of how sensitive the �tness of the

organism can be to variations in enzyme chemistry� so that it makes sense to identify the

speci�c sequence of nucleotide bases in the genome as the argument of the �tness function�

It has also become increasingly clear that the �design of organisms involves a host of

complex trade�o�s� implying that there must inevitably be large numbers of local optima

in such �tness landscapes� It was thus only a matter of time before the analogy between

optimization �i�e� selection� on landscapes and combinatorial optimization problems ap�

peared in the biological literature� One paper that discussed this analogy �Kau�man �

Weinberger� �� also proposed a simple statistical model of a �tness landscape� the N�k

model� that could be used as an aide to the qualitiative understanding of more complex� if

more realistic models� The purpose of the present paper is to consider the computational

complexity of optimization on landscapes generated with this simple model� We show� in

particular� that the optimization problem for one version of the model can be solved in

polynomial time� but that another� qualitatively very similar version of the model� is NP

complete� This result makes the N�k model an ideal candidate for investigating the nature

of NP completeness�

One might think that optimization onN�k landscapes isNP complete because a knowledge

�

of the �tness of a given con�guration says little about the �tness of distant con�gurations�

thus leading to the need for an exhaustive search over a large number of con�gurations�

Part of this conjecture turns out to be true� we can identify a set of pairwise distant

con�gurations whose number grows faster than any polynomial in N � However� when

we compute the correlation in �tness between pairs of sites in this set� the correlation is

actually larger for the NP complete version of the model than for the polynomial time

version� thus providing strong evidence that the rest of the conjecture must be false� We

conclude the paper by noting di�erences between the two models in �tree width � Dress

�� has shown that the global optimum in models where this quantity is O�� can be

solved in polynomial time� Our result � that the tree width of anNP complete problem is

O�N� � suggests that the notions of NP completeness and tree width are more generally

related�

De�nition of the N�k Model

In nature� the argument of the �tness function is the set of all possible sequences of

the four nucleotide bases� Nature �evaluates this function by translating these nucleic

sequences into sequences of amino acids� It is only amino acid sequences � the enzymes

and structural proteins of an organism � that have biological signi�cance� Therefore� the

simplest version of the N�k model ignores genetics and assigns a �tness to the sequences of

amino acids directly� The N�k model makes two further simpli�cations� the �rst� that the

amino acid sequence has a �xed length of N sites� and the second� that there are only two

possible amino acids� rather than the full complement of twenty� that can occupy a given

site of the sequence� This last assumption is justi�ed by the fact that almost all of the

properties of such sequences are determined by their three dimensional� folded structures�

which� in turn� are determined by the chemical properties of the constituent amino acids�

The most important of these properties is polarity� those amino acids in the sequence

that are polar get pulled to the outside of the folded structure by chemical attraction to

surrounding water molecules� and non�polar amino acids get pushed into the interior of

the structure�

The assumption of two amino acids per site also dramatically simpli�es the modelling task

by reducing the argument of the �tness function to a bit string� which we denote b� The

N�k model assigns a real valued ��tness to b by �rst assigning a real valued ��tness

contribution � fi � to the ith bit� bi� in b� Each such assignment depends� not just on i

and the value of bi� but also on � k � N other bits� which we call its �neighbors � The

�tness contribution of each site is a random function� fi�si�� of the substring� si� formed

by the ith bit and its k neighbors� fi�si� is assigned by selecting an independent random

variable from some distribution p�x�� such as the uniform or Gaussian distributions� for

each of the �k�� possible values of si� thus generating a ��tness table for the ith site�

There is a di�erent� independently generated table for each of the N sites� Then� given

any string of N bits� the total �tness of the string� F � is de�ned as the average of the

�tness contributions of each site� that is�

F �b� ��

N

NXi��

fi�si��

The use of a probability distribution in assigning the �tness contributions can be inter�

preted either as an admission of ignorance of the true nature of the complex couplings

between the bits or as an attempt to capture the typical statistical properties of a wide

class of landscapes with k interconnections per bit�

One other aspect of the N�k model must be speci�ed� namely� the way in which the

substrings� si� are chosen� The simplest � but not the only � way of choosing neighbors�

at least for even k� is to use the k sites adjacent to site i� that is� the bits at sites i � k��

�

thru i� k�� As in the original formulation of the model� we introduce periodic boundary

conditions to assign neighbors to sites i with i � k�� and i � N � k�� In other words�

we assume that the sites are arranged in a circle� such that site N is next to site �� Under

this assumption� if k � �� site N has neighbors N � � and �� site � has neighbors N and ��

and� more generally� site i has neighbors �N � i � k�� mod N� � � � � �N � i� k�� mod N

for any even integer k� This assignment of the neighbors gives rise to a class of short range

spin glasses� Alternatively� we could assign the neighbors by randomly selecting� for each

site i� k other sites on the string to be used in forming the index to the ith �tness table�

a de�nition that also makes sense for odd k� This assignment of neighbors makes the

model similar to a long range� dilute spin glass� Rather surprisingly� local features of the

landscape � the height of local optima� and the length of typical �up hill walks through a

series of �tter one mutant variants to these local optima � were remarkably insensitive to

the details of how the k�� bit substrings were chosen in computer simulations �Kau�man�

Weinberger� � Perelson� �� Kau�man � Weinberger� ��

The N�k model a�ords a �tuneably rugged �tness landscape� since tuning k alters the

ruggedness of the landscape� For k � � each site is independent of all other sites� Either

the bit value or the bit value � is almost surely ��tter than the other� hence� a single

speci�c sequence comprised of the �tter bit value in each position is almost surely the

single� global optimum in the �tness landscape� Any other string is sub�optimal� and lies

on a connected walk via ��mutant �tter variants to the global optimum by �ipping bits

from less �t to more �t values� The length of the walk is just the Hamming distance from

the initial string to the global optimum� For a randomly chosen initial string� half of the

bits will be in their less �t state� hence the expected walk length is just N�� A transition

to a one mutant neighbor �i�e� the �ip of a single bit� typically alters �tness by an amount

O��N�� In contrast� the fully connected N�k model yields a completely random �tness

�

landscape� For this k � N � � case� the �tness contribution of each site depends on all of

the other sites because the �context of each of the N � � other bits is changed when even

a single bit is �ipped� In this case� therefore� the �tness of each N bit string is statistically

independent of its neighbors� As was shown in Kau�man � Levin �� Weinberger

�� Macken � Perelson �� and Weinberger ��a�� such random landscapes have

very many local optima ��N��N �� on average�� walks to optima are short �O�lnN�� on

average�� and only a small fraction of local optima are accessible from any initial string�

Thus adaptive walks vary dramatically as the ruggedness of the landscape varies�

The N�k decision problem

Having de�ned the model� we are now in a position to consider the following decision

problem� Is the global optimum� FMAX� in a given instance of an N�k landscape greater

than some speci�ed value V� In view of the work of Dress �� it is not surprising

that this question can be answered for adjacent neighborhoods with periodic boundary

conditions by the following simple dynamic programming algorithm that actually �nds the

globally maximal �tness� For simplicity� we present the algorithm for k � �� leaving the

trivial generalization to arbitrary k to the reader� Also� N is to be added or subtracted� as

appropriate� to subscripts outside the range �� N so that they assume values within

that range�

Let fbi��bibi��i be the site �tness of the ith site� given the values of the bits bi�� bi� and

bi�� and let FbNb�jbi��bi��i be the maximum value of the sum

i��Xj��

fbj��bjbj��j �

over the values of b�� b�� bi� given the values of the bits bN � b�� bi��and bi�� The

algorithm then consists of the three phases

�

Initialization� FbNb�jb�b�� fbNb�b�� fb�b�b��

Continuation� FbNb�jbi��bi��i � max

bi

�FbNb�jbibi��i�� f

bibi��bi��i��

�for � � i �

N � �� which implicitly speci�es a value for bi for � � i � N � � and each

choice of bN � b�� bi�� and bi�� Use the value of bi thus speci�ed for subsequent

calculations�

Note that the formula still makes sense when i � N � � or i � N � �� but there

will be only � FbNb�jbN��bNN�� values and only � F

bNb�jbNb�N�� values instead of the ��

FbNb�jbi��bi��i values de�ned for � � i � N � ��

Termination� FMAX � maxbNb�

�FbNb�jbNb�N��

��

We have therefore proven

Theorem � The N�k decision problem with adjacent neighborhoods is solveable in

O��kN� steps� and is thus in P�

However� the situation for random neighborhoods is quite di�erent� as is shown by

Theorem The N�k decision problem with random neighborhoods is NP complete for

k � �

Proof We note �rst that the single integer N characterizes the size of the problem com�

pletely if k is �xed� �k��N real numbers are required to specify the �tness tables� and k

integers per site� or kN integers per instance of the problem to specify the neighborhoods�

Furthermore� it is merely a matter of a table lookup followed by an addition for each site�

or O�N� total work to check that a proposed solution does� indeed� have a �tness greater

than a given value� We conclude that the N�k decision problem is in NP�

�

We demonstrateNP completeness by showing that theN�k problem is polynomially equiv�

alent to one of the best known NP complete problems� the SAT problem �Garey � John�

son� �� Given N boolean variables b � �b�� b�� bN � and a list of M expressions�

Ei�bpi � bqi � bri �� involving arbitrary triples of these variables and the operators AND� OR�

and NOT� the SAT problem is to determine whether values exist for the bi�s such that

all of the expressions are satis�able �i�e� all evaluate to TRUE�� If we can show that every

such problem can be mapped into an N�k decision problem in polynomial time� we can

conclude that the latter problem is �at least as hard as the SAT problem� because our

mapping� together with a polynomial time solution to the N�k decision problem� would

provide a polynomial time solution to the SAT problem�

It is easiest to generate the N�k decision problem that corresponds to a given SAT problem

when N � M and k � � We �rst transform each expression Ei�bpi � bqi � bri � into the

equivalent expression

E�

i�bi� bpi � bqi � bri � � Ei�bpi � bqi � bri � AND �bi � bi��

where the expression a � b is TRUE if and only if a � b� We then identify bi with the ith

bit in the N�k con�guration� and the other variables appearing in the ith expression with

bi�s �neighbors � We assign the �tness tables associated with bi as follows� a �tness table

entry is assigned the value � if the corresponding E�

is TRUE for the speci�ed values of

the b�s� and if it is FALSE� Clearly� the corresponding SAT problem is solved if we can

determine whether the global maximum of the N�k landscape thus generated is greater

than or equal to N �

A trivial variation of the above mapping su�ces whenM � N � As before� we identify each

SAT variable� bi� with the ith bit in the N�k con�guration� modify the M expressions

such that bi appears in the ith expression for � � i � M and assign the �tness tables as

�

described above� SitesM ��M�� N are assigned �tness tables in which every entry

is given the value �� Once more� the given SAT problem is solved if and only if the global

maximum of the corresponding N�k problem is at least N �

TheMN case can be handled by introducing additional variables bN�� bN�� bM � and

proceeding as above� provided we lengthen the N bit string in the N�k model to M bits�

and assign the neighborhoods as before�

The N�k decision problem for k is a fortiori NP complete because every SAT problem

can be embedded in a k decision problem by de�ning the expressions

E��

i �bi� � � � � bj � bpi � bqi � bri � � Ei�bpi � bqi � bri � AND �bi � bi� AND � � � AND �bj � bj ��

where the bits bi� � � � � bj include the bits at the ith site and some arbitrary collection of

k� neighbors� The previously given mapping to the N�k problem can then be applied�

Remark �� So far as we know� the question of whether the k � � random model is NP

complete is open�

Remark � An almost identical argument shows that the �P �T model� proposed by

Pedro Tarazona� in which the �tness table entries for site i do not depend on the ith bit�

but only on its k neighbors� is also NP complete for k � � The computational complexity

of the k � � P �T model is also an open question� but we conjecture that it is polynomially

equivalent to the polynomial time �SAT problem� a variant of the SAT problem in which

the M boolean expressions involve arbitrary pairs� rather than triples� of the boolean

variables�

NP Completeness and Correlation

The foregoing is a clear example of Garey and Johnson�s observation that seemingly trivial

�

modi�cations to a combinatorial optimization problem can render it intractable� Previous

analytical and numerical work suggests that many of the statistical properties of N�k

landscapes for large� but �xed k are quite similar for adjacent and random neighborhoods�

Weinberger ��a� shows that the mean number of local optima� distances between optima

and the expected �tness of a local optimum is� asymptotically for large k� the same in

both cases� In Appendix I� we compute the exact correlation R�d�� between pairs of points

separated by a Hamming distance d� from which we can deduce the limiting behavior for

d� k �� N � For the random landscape� we obtain

R�d� � ��d�k � ��

N�d�d� ��k�k � ��

�N��O

��dkN

��

for the adjacent neighbor landscape� we obtain the rather similar expression

R�d� � ��d�k � ��

N�d�d� ��k�k � ��

�N��O

��dkN

��

These correlations� along with the common mean and variance of the �tnesses� completely

characterize the multivariate Gaussian distribution to which the distribution of �tnesses

converges as N� k ��

We might conjecture that a problem is intractable because the correlation between pairs

of distant con�gurations decays so rapidly as their distance increases that their �tnesses

are e�ectively independent� We then have the obvious� but suggestive

Lemma A Every algorithm for �nding the maximum �or minimum� of M completely

arbitrary real numbers requires at least O�M� steps�

Proof If such an algorithm required merely o�M� steps� some of the numbers must neces�

sarily be ignored by the algorithm� The answer given must then be independent of which

�if any� of the ignored numbers is the maximum� Clearly� such an algorithm cannot be

guaranteed to �nd the maximum in all cases�

�

Any hope of �nding the maximum of the �N �tnesses assigned to the �N vertices in

polynomial time must be based on the ability to rule out whole classes of �tnesses with a

single operation� which� from the above lemma� is impossible if the vertices are assigned

completely independently and without any a priori knowledge about the assignment� On

the face of things� it would seem that such problems cannot possibly lie in NP� however�

we show that a similar problem is embedded in the problem of optimization on a random

neighbor N�k landscape� for which the correlation between �tnesses of points separated

by a Hamming distance of N�� is ��e�k��

�� O�k�N�

��See Appendix I�� Clearly� this

quantity can be made as small as desired by choosing k su�ciently large and letting N tend

to in�nity �In fact� the N�k decision problem remains in NP for k � O�logN�� Because

the �tnesses are asymptotically jointly Gaussian� an asymptotically vanishing correlation

between them implies that they are asymptotically statistically independent� and therefore

a knowledge of one �tness tells us� asymptotically� nothing about the other� We now prove

Lemma B For arbitrarily large N � there exists a set� N � of �N �log�N�� bit strings

of length N such that the Hamming distance between any pair of strings is at least N��

The proof of Lemma B is based on

Lemma C For arbitrarily large N � there exists a set� !N � of �N bit strings of length N

such that the Hamming distance between any pair of strings is at least N��

Proof of Lemma CWe start by recursively constructing� for N � �n� n � �� a set� !N � of

�N strings of length N whose pairwise Hamming distances are at least N�� Clearly�

!� � f� �� g�

The �� strings of length N � � in ! are obtained as follows� form � complementary pairs�

�ti� "ti�� from the strings ti !�� Because the distance between distinct elements of !� is

��

at least �� the requisite N � � strings are tijjti� tijj"ti� "tijjti� and "tijj"ti� where the symbol

�jj denotes concatenation �simple juxtaposition�� The pairwise distance between each of

the �� strings thus formed is least N�� either because the strings are formed from

di�erent t�s� or because half of one string is the complement of the corresponding half of

the other string� The resulting set is also a set of � pairs of complementary strings� so that

the iteration can be repeated once again� and� in fact� arbitrarily often� each time doubling

the number of strings� The resulting set of �N strings is obviously not unique� because

another set can be generated by complementing the bits at a �xed position in each of the

strings in the �rst set�

Remark� In the course of referreeing this paper� Prof� Andreas Dress proved my conjec�

ture that !N has maximal size� in that at least one member of !N is fewer than N�� bit

�ips away from all N bit strings not in !N � His proof begins by identifying the successive

bits bi� for � � i � N of the string S as the image of a mapping from the index� i� to the

appropriate bit value� He then notes the equivalence of the integer i � � with its binary

representation� the n � log�N bit string� which we write explictly as �� n� The

construction above shows that !N consists of precisely those N bit strings S whose ith

bit� bi is given by

��

nXj��

�j�j �

for some choice of the bits �� n� and for all i between � and N � The strings thus

generated are precisely the set of a�ne mappings from the n bit strings �� n to single

bits� We can thus re�establish that the strings in !N have pairwise Hamming distances of

at least N�� If the corresponding ��s for two mappings di�er at position p� the strings

they produce will di�er at indicies i whose binary representations have the property that

X�p � ��

and there are exactly N�� such indicies� If the ��s di�er only at position p � � the

��

corresponding strings will be complementary�

The maximality of !N now follows from character theory� Identify !N with the irreducible

characters of Fn� � considered as a group� via the mapping

�i � ��bi � ��

��P

n

j��j�j �

where �i is the ith component of the vector �� Fn� � For a given N bit string S� denote

this mapping as ��S�� Now consider the inner product

� ��S�� S�

� ��

�n

Xi

�i��

i

��

�n

hd�S�S

�

� � d�S� "S�

�i �

where d�S�S�

� is the Hamming distance between S and S�

� The fact that !N is closed

under complementation guarantees that the Hamming distance between it and any N bit

string S is at most N�� If S could be exactly N�� bit �ips from every member of !N �

the di�erence d�S�S�

�� d�S� "S�

� would be exactly zero for all S�

!N � and thus the inner

product of ��S� with all of the irreducible characters of Fn� would be zero� As is well

known� this last condition is satis�ed if and only if every component of ��S� is zero� which

is clearly absurd�

We now return to the

Proof of Lemma B Clearly� the �� strings of length � whose pairwise Hamming distances

are at least � are the binary representations of the integers thru �� For larger values of

N � we form the strings of N from two N�� bit substrings� a pre�x � and a su�x � �

is an arbitrary member of N�� is constructed from an arbitrary member !N�� by

complementing those bit positions marked by ��s in � �i�e� � � �� where the addition

is taken modulo ��

To check that this construction does� indeed� meet the requirements stated above� we

partition N into subsets of strings with the same pre�x� Clearly� there is no problem with

�

the strings in the same subset� because their su�xes� corresponding to di�erent members

of !N�� are mutually separated by at least N�� bits� There is also no problem with

strings in di�erent subsets whose su�xes were derived from the same � Their pre�xes�

� and ��

must di�er in at least N�� bit positions� so that their su�xes� � � � and

�

� ��

� must also di�er in at least N�� bit positions� There remains the case in which

two strings in N have both di�erent pre�xes and di�erent su�xes� Because they have

di�erent pre�xes� the above construction guarantees that their pre�xes di�er in at least

N�� positions� we now show that the same applies to the su�xes� and �

� We have

� d��

� � d��

� �

��

For each N � !N N � so that� in particular� � �

N�� It follows from the fact �proven

below� that N�� is closed under �� and the positive distance between �� and ��

� �

that they are distinct members of N�� and therefore di�er in at least N�� bit positions�

We now verify that N is closed under �� by induction� For N � �� closure obtains

trivially� For larger N � we assume closure for N�� and establish it for N � Given two

elements of S�� S� N � which we write ��jj�� and ��jj�� where �� N��

�� !N�� we have

S� � S� � �� jj #�� $�

Using the induction hypothesis� �� N�� using the commutativity and the asso�

ciativity of �� we write the su�x string as �� We now observe that

!N�� is also closed under �� as is clear from their construction� all elements !N��

are N�� bit strings formed by choosing a single element t !� and forming arbitrary

concatenations of t and its complement� "t� Thus� � � � is the arbitrary concatenation of

the four bit strings t� � t� � "t� � "t� and t� � "t� � "t� � t�� From the fact that these last

��

two strings form a complementary pair in !�� we conclude that � � � is also in !N��

allowing us to write

S� � S� � ��jj��

for some �� N�� and some � !N�� as required�

We now count the number of distinct strings� DN � in N � Because each string in N is

generated by choosing one of the DN�� members of N�� and one of the N members of

!N�� we have the recursion relation

DN � NDN��

Writing N � �n� and Gn � D�n � this relation becomes

Gn � �nGn��

Given the initial condition D� � G� � �� that was computed �by hand above� we see

that Gn � �n�n�� so that DN � �N �log�N�� as claimed�

Remark� We conjecture that N is also maximal�

We state the conclusions of this section in the following

Summary The NP complete random neighbor N�k optimization problem seems to be

at least as hard as the problem of optimization over a sample of DN � �N �log�N��

Gaussian random variables whose pairwise correlations can be made as small as desired�

so that they are �arbitrarily close to being pairwise statistically independent� Because

DN grows faster than any polynomial in N � this result seems to suggest that P �� NP�

Unfortunately� small correlations between �tnesses of distant points on the landscape are

not� in themselves� su�cient to render a problem intractable� However small the correla�

tions between the �tnesses of strings in N can be made in the random neighbor landscape�

��

they can be made even smaller in the adjacent neighbor landscape� at least for su�ciently

large landscapes� In fact� if we ignore the O��N� correction terms in the correlation

functions� we have the relation

Radj�� k�� Rrand�� e��k�

which follows immediately from the inequality �� e�� for � � � ��

Discussion

The crucial di�erence between the adjacent and random landscapes seems to be the number

of bits upon which each site �tness could depend� the �tree width discussed by Dress

�� For the adjacent neighbor case� we know a priori that each site depends only on

the bit at that site and the k bits at adjacent sites� so that the tree width is k � �� It

is this a priori knowledge that makes possible the solution of the corresponding decision

problem with a polynomial time dynamic programming algorithm� In contrast� which k��

of the N bits each site �tness depends will vary from one instance of the random neighbor

model to the other� so that the tree width is no longer k � �� but O�N�� It appears�

therefore� that the random neighbor problem is intractable� not only because of the small

correlations between distant points on the landscape� but also because there is no e�ective

way to partition the problem into smaller problems and thus use information available

from nearby points to infer relationships between more distant points�

In Weinberger� �� Weinberger ��a� and Weinberger ��b�� we have argued for

the importance of the N�k model and� more generally� the notion of statistically isotropic

�AR�� landscapes � In general� statistical isotropy implies that the correlation� R�d��

between the �tnesses of pairs of points depends only on the distance d between them�

in the speci�c case of the AR�� landscape� this correlation function assumes the form

��

R�d� � e�d�T � for some choice of the �correlation length � T � The exact calculations of

the pair correlations for the random neighbor and adjacent neighborN�k models presented

in the Appendix show some disparity from the precise �AR��ness of the P �T model�

However� this disparity is relatively minor when d � T � and it is R�d� values for d � T that

determine the local properties of the landscape �See the above cited papers for details��

From these observations� we make two conjectures� that� in general� approximately AR��

landscapes with unbounded tree width are NP�complete� but these problems have statis�

tical properties similar to the properties of �easy optimization problems� In view of the

ubiquity of AR�� landscapes in optimization problems in computer design �Sorkin� ��

and in RNA folding landscapes �Fontana et al� �� this conjecture deserves further

study�

Finally� our results suggest that nature is solving a non�trivial optimization problem in

the �design of individual enzymes� at least if the relevant �tness landscape resembles

an N�k or P �T landscape� As we saw above� the �tness of an enzyme depends crucially

on its three dimensional structure� so that the random neighbor N�k landscape �i�e� the

NP�complete one� is clearly more biologically accurate than the adjacent neighbor model�

This observation is further evidence that there is much to be learned about optimization

from the study of evolutionary strategies�

Acknowledgements

The author gratefully acknowledges the support of ONR Grant N��K�� for the

time period in which this work was begun and the support of a Max Planck Stipendium

during its gestation� A discussion with Andreas Dress convinced the author that the opti�

mization problem on adjacent landscapes could be solved in polynomial time via dynamic

programming� and his subsequent encouragement resulted in the completion of this work�

��

including the correction of some mistakes in a previous version and the proof that !N

is maximal� The author would also like to acknowledge useful discussions with Stuart

Kau�man on the general subject of evolution as a combinatorial optimization problem�

and Pedro Tarazona for suggesting the P �T model and for the computation of both its

correlation function and the correlation function of the random neighbor N�k model�

References

Dress� A� �� On the Computational Complexity of Composite Systems� Lecture Notes

in Physics� Vol� �� Fluctuations and Stochastic Phenomena in Condensed Matter� L�

Garrido �ed�� Springer� Berlin�

Fontana� W�� Griesmacher� T�� Schnabl� W�� Stadler� P�� and Schuster� P� �� Statis�

tics of Landscapes Based on Free Energies� Replication and Degradation Rate Constants

of RNA Secondary Structures� Monatshefte f�ur Chemie� in press�

Garey� M� and Johnson� D� �� Computers and Intractability A Guide to the Theory

of Incomputability� W� H� Freeman� San Francisco�

Kau�man� S� � Levin� S� �� Towards a General Theory of Adaptive Walks on

Rugged Landscapes� Journal of Theoretical Biology ��

Kau�man� S�� Weinberger� E�� and Perelson� A� �� Maturation of the Immune

Response Via Adaptive Walks On A�nity Landscapes� Theoretical Immunology� Part I�

Santa Fe Institute Studies in the Sciences of Complexity� A� S� Perelson �ed�� Addison�

Wesley� Reading� Ma�

��

Kau�man� S� �Weinberger� E� �� TheN�kModel of Rugged Fitness Landscapes and

Its Application to Maturation of the Immune Response� Journal of Theoretical Biology

�� No� ��

Macken� C� and Perelson� A� �� Protein Evolution on Rugged Landscapes� Pro�

ceedings of the National Academy of Sciences � ��

Sorkin� G� �� Combinatorial Optimization� Simulated Annealing� and Fractals �

IBM Research Report RC�� No� ��

Weinberger� E� �� A More Rigorous Derivation of Some Results on Rugged Fitness

Landscapes� J� theor� Biol� �� No� ��

Weinberger� E� �� Correlated and Uncorrelated Fitness Landscapes and How to Tell

the Di�erence� Biological Cybernetics �� No� ��

Weinberger� E� ��a�� Local Properties of Kau�man�s N�k model� a Tuneably Rugged

Energy Landscape� Physical Review A� �� No� ��

Weinberger� E� ��b�� Fourier and Taylor Series on Fitness Landscapes� Biological

Cybernetics� ��

Wright� S� �� The roles of mutation� inbreeding� crossbreeding and selection in evo�

lution� In� Proceedings �th Congress on Genetics ��

��

Appendix

Calculation of the Correlation between Pairs of Fitnesses in the N�k Model

We want to compute

R�d� �E#f�a�f�b�$ � ��

��

where a and b are bit strings separated by Hamming distance d� f�a� and f�b� are their

respective �tnesses� � is the common mean of these �tnesses� and �� is their common

variance� The expectation� E� is taken over the joint distribution of the random variables

f�a� and f�b�� Without loss of generality� we choose a distribution for the �site �tnesses

that has mean zero and variance �� We then have � � � �� N � and

R�d� ��

NE

�� NX

j��

fj �a�

A�Xj ��C

fj �a� �Xj�C

fj�b�

A��

where the notation is chosen to re�ect the fact that a certain subset� C� of the site �tnesses

change when bit string a is changed to bit string b� but site �tnesses fj � C� i�e� all of the

others� remain the same� For pairs of sites i �� j� the de�nition of the N�k model guarantees

that site �tnesses fi�a� and fj�b� are independent� and thus uncorrelated� Similarly� fj�a�

and fj�b� are either identical �because bit j and its neighbors are identical in both a and

b� or they are independent� In any case� we conclude that

R�d� ��

NEhXj ��C

f�j �a�i

� Prfj � Cg�

In other words� the correlation between the �tnesses of two bit strings in the N�k model is

exactly the probability that a randomly chosen site �tness is the same in the computation

of f�a� and f�b��

For the random neighbor model� the required probability is easily obtained� A site �tness

is unchanged only if the bit at that site is not one of the d bits that has been �ipped and if

�

it is not one of the k neighbors of any �ipped bit� The probability that a site satis�es the

�rst condition is �� d�N � the probability that a site satis�es the statistically independent

second condition is #�� k��N � ��$d� Thus� for the random neighbor model�

R�d� �

��

d

N

��

k

N � �

�d

�

For k� d �� N � we have

R�d� � ��d�k � ��

N��d� ��dk�k � ��

�N��O

��dkN

��

If d � �N � where � � O�� and k �� N �

R��N� � ��

��

k

N � �

��N� �� e��k

�� O�k�N�

��

For the P �T model� in which neighborhoods are de�ned completely at random� the proba�

bility that a site is not a�ected by �ipping a random bit is ��k�N � independent of whether

other bits have been �ipped previously� provided they have not a�ected the given site� The

probability that a site remains una�ected by d such �ips is thus �� k�N�d� For purposes

of comparison with the other models� we give the small and large d approximations to R�d�

for the P �T model with k � �� rather than k neighbors� The �rst of these approximations

is

R�d� � ��d�k � ��

N�d�d� ��k � ��

�N��O

��dkN

��for d� k �� N�

as is clear from a binomial expansion of � � �k � ��N

�d� If d � �N � where � � O��

and k �� N �

R�d� � e��k�� O

�k�N

��

The same derivation for the adjacent neighbor model begins by imagining that the N site

�tnesses are arranged in a circle� and that the �ipped bits are represented by the integer

��

vector �n�� n�� ni� � � � � nd�� where � � ni � N � Without loss of generality� we choose

n� � � and n� � n� � � � � � nd � N � Because we must make d � � choices among the

N�� remaining bits� and� because we cannot make the same choice twice� there are N��d��

�ways to make the required choices� However� if we constrain ni�� ni � l� we need only

make d� � choices from N � l � � bits�

The number of ways of making this second set of choices is N�l��d��

�� The probability� �l�

that ni�� ni � l is then given by

�l �

N�l��d��

� N��d��

� �

independent of i� If l � k� then l site �tnesses change as a result of �ipping bit ni��

otherwise� k � � site �tnesses change� It follows that the expected number� E#jCj$� of

changed site �tnesses after moving a distance d from the starting point is

E#jCj$ � d

�kXl��

l�l � �k � ��h��

kXl��

�l

i�

� d�k � ��d

N��d��

� kXl��

�k � �� l�

�N � l � �

d� �

��

The probability that a randomly chosen site �tness doesn�t change is then � � E#jCj$�N �

and

R�d� � ��E#jCj$

N

� ��d�k � ��

N�

d

N N��d��

� kXl��

�k � �� l�

�N � l � �

d� �

�

For d� k �� N � this expression can be written as

R�d� � ��d�k � ��

N�d�d � ��

N�

kXl��

�k � �� l�

�� l��

N

�� l��

N

�� d�l��

N

��

N

�� d��

N

��

d�k � ��

N�d�d � ��k�k � ��

�N��O

��dkN

��

��

When d � �N and k �� N �

R�d� � ��d�k � ��

N�d�d � ��

N�

kXl��

�k � �� l�

�� d

N

�� d��

N

�� d�l��

N

��

N

�� l

N

�

� �� k � �� kXl��

�k � �� l�� l�� O�k��N�

� �� k�� O�k��N��

�

NP Completeness of Kauffman’s N-k Model, a Tuneably Rugged … · 2018-07-03 · NP Completeness...

Documents

Transcript of NP Completeness of Kauffman’s N-k Model, a Tuneably Rugged … · 2018-07-03 · NP Completeness...