NP Completeness of Kauffman’s N-k Model, a Tuneably Rugged … · 2018-07-03 · NP Completeness...
Transcript of NP Completeness of Kauffman’s N-k Model, a Tuneably Rugged … · 2018-07-03 · NP Completeness...
NP Completeness ofKauffman’s N-k Model, aTuneably Rugged FitnessLandscapeEdward D. Weinberger
SFI WORKING PAPER: 1996-02-003
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu
SANTA FE INSTITUTE
NP Completeness of Kau�man�s N�k Model�
a Tuneably Rugged Fitness Landscape
Edward D� Weinberger
Max�Planck�Institut f�ur biophysikalische Chemie
Postfach ����
Am Fassberg
D�� G�ottingen�Nikolausberg
Federal Republic of Germany
The concept of a ��tness landscape�� a picturesque term for a mapping of the vertices of a
�nite graph to the real numbers� has arisen in several �elds� including evolutionary theory�
The computational complexity of two� qualitatively similar versions of a particularly simple
�tness landscape are shown to di�er considerably� In one version� the question �Is the global
optimum greater than a given value V�� is shown to be answerable in polynomial time by
presenting an ecient algorithm that actually computes the optimum� The corresponding
problem for the other version of the landscape is shown to be NP complete� The NP
completeness of the latter problem leads to some speculations on why P �� NP�
�
Introduction
The notion of an adaptive �landscape representing the abstract ��tness of various kinds
of organisms in various contexts has been a �xture of evolutionary biology ever since it was
proposed by Sewell Wright ������ Although there are problems with a notion of ��tness
that is a property of an individual� independent of other individuals and the environment�
the discoveries of molecular biology have signi�cantly reinforced the power of this idea�
We now understand� for example� the role of a discrete genomic �blueprint in specifying
the chemical constitutents of enzymes and a glimmering of how sensitive the �tness of the
organism can be to variations in enzyme chemistry� so that it makes sense to identify the
speci�c sequence of nucleotide bases in the genome as the argument of the �tness function�
It has also become increasingly clear that the �design of organisms involves a host of
complex trade�o�s� implying that there must inevitably be large numbers of local optima
in such �tness landscapes� It was thus only a matter of time before the analogy between
optimization �i�e� selection� on landscapes and combinatorial optimization problems ap�
peared in the biological literature� One paper that discussed this analogy �Kau�man �
Weinberger� ������ also proposed a simple statistical model of a �tness landscape� the N�k
model� that could be used as an aide to the qualitiative understanding of more complex� if
more realistic models� The purpose of the present paper is to consider the computational
complexity of optimization on landscapes generated with this simple model� We show� in
particular� that the optimization problem for one version of the model can be solved in
polynomial time� but that another� qualitatively very similar version of the model� is NP
complete� This result makes the N�k model an ideal candidate for investigating the nature
of NP completeness�
One might think that optimization onN�k landscapes isNP complete because a knowledge
�
of the �tness of a given con�guration says little about the �tness of distant con�gurations�
thus leading to the need for an exhaustive search over a large number of con�gurations�
Part of this conjecture turns out to be true� we can identify a set of pairwise distant
con�gurations whose number grows faster than any polynomial in N � However� when
we compute the correlation in �tness between pairs of sites in this set� the correlation is
actually larger for the NP complete version of the model than for the polynomial time
version� thus providing strong evidence that the rest of the conjecture must be false� We
conclude the paper by noting di�erences between the two models in �tree width � Dress
������ has shown that the global optimum in models where this quantity is O��� can be
solved in polynomial time� Our result � that the tree width of anNP complete problem is
O�N� � suggests that the notions of NP completeness and tree width are more generally
related�
De�nition of the N�k Model
In nature� the argument of the �tness function is the set of all possible sequences of
the four nucleotide bases� Nature �evaluates this function by translating these nucleic
sequences into sequences of amino acids� It is only amino acid sequences � the enzymes
and structural proteins of an organism � that have biological signi�cance� Therefore� the
simplest version of the N�k model ignores genetics and assigns a �tness to the sequences of
amino acids directly� The N�k model makes two further simpli�cations� the �rst� that the
amino acid sequence has a �xed length of N sites� and the second� that there are only two
possible amino acids� rather than the full complement of twenty� that can occupy a given
site of the sequence� This last assumption is justi�ed by the fact that almost all of the
properties of such sequences are determined by their three dimensional� folded structures�
which� in turn� are determined by the chemical properties of the constituent amino acids�
The most important of these properties is polarity� those amino acids in the sequence
that are polar get pulled to the outside of the folded structure by chemical attraction to
surrounding water molecules� and non�polar amino acids get pushed into the interior of
the structure�
The assumption of two amino acids per site also dramatically simpli�es the modelling task
by reducing the argument of the �tness function to a bit string� which we denote b� The
N�k model assigns a real valued ��tness to b by �rst assigning a real valued ��tness
contribution � fi � to the ith bit� bi� in b� Each such assignment depends� not just on i
and the value of bi� but also on � k � N other bits� which we call its �neighbors � The
�tness contribution of each site is a random function� fi�si�� of the substring� si� formed
by the ith bit and its k neighbors� fi�si� is assigned by selecting an independent random
variable from some distribution p�x�� such as the uniform or Gaussian distributions� for
each of the �k�� possible values of si� thus generating a ��tness table for the ith site�
There is a di�erent� independently generated table for each of the N sites� Then� given
any string of N bits� the total �tness of the string� F � is de�ned as the average of the
�tness contributions of each site� that is�
F �b� ��
N
NXi��
fi�si��
The use of a probability distribution in assigning the �tness contributions can be inter�
preted either as an admission of ignorance of the true nature of the complex couplings
between the bits or as an attempt to capture the typical statistical properties of a wide
class of landscapes with k interconnections per bit�
One other aspect of the N�k model must be speci�ed� namely� the way in which the
substrings� si� are chosen� The simplest � but not the only � way of choosing neighbors�
at least for even k� is to use the k sites adjacent to site i� that is� the bits at sites i � k��
�
thru i� k��� As in the original formulation of the model� we introduce periodic boundary
conditions to assign neighbors to sites i with i � k�� and i � N � k��� In other words�
we assume that the sites are arranged in a circle� such that site N is next to site �� Under
this assumption� if k � �� site N has neighbors N � � and �� site � has neighbors N and ��
and� more generally� site i has neighbors �N � i � k��� mod N� � � � � �N � i� k��� mod N
for any even integer k� This assignment of the neighbors gives rise to a class of short range
spin glasses� Alternatively� we could assign the neighbors by randomly selecting� for each
site i� k other sites on the string to be used in forming the index to the ith �tness table�
a de�nition that also makes sense for odd k� This assignment of neighbors makes the
model similar to a long range� dilute spin glass� Rather surprisingly� local features of the
landscape � the height of local optima� and the length of typical �up hill walks through a
series of �tter one mutant variants to these local optima � were remarkably insensitive to
the details of how the k�� bit substrings were chosen in computer simulations �Kau�man�
Weinberger� � Perelson� ����� Kau�man � Weinberger� ������
The N�k model a�ords a �tuneably rugged �tness landscape� since tuning k alters the
ruggedness of the landscape� For k � � each site is independent of all other sites� Either
the bit value or the bit value � is almost surely ��tter than the other� hence� a single
speci�c sequence comprised of the �tter bit value in each position is almost surely the
single� global optimum in the �tness landscape� Any other string is sub�optimal� and lies
on a connected walk via ��mutant �tter variants to the global optimum by �ipping bits
from less �t to more �t values� The length of the walk is just the Hamming distance from
the initial string to the global optimum� For a randomly chosen initial string� half of the
bits will be in their less �t state� hence the expected walk length is just N��� A transition
to a one mutant neighbor �i�e� the �ip of a single bit� typically alters �tness by an amount
O���N�� In contrast� the fully connected N�k model yields a completely random �tness
�
landscape� For this k � N � � case� the �tness contribution of each site depends on all of
the other sites because the �context of each of the N � � other bits is changed when even
a single bit is �ipped� In this case� therefore� the �tness of each N bit string is statistically
independent of its neighbors� As was shown in Kau�man � Levin ������� Weinberger
������� Macken � Perelson ������ and Weinberger �����a�� such random landscapes have
very many local optima ��N��N ���� on average�� walks to optima are short �O�lnN�� on
average�� and only a small fraction of local optima are accessible from any initial string�
Thus adaptive walks vary dramatically as the ruggedness of the landscape varies�
The N�k decision problem
Having de�ned the model� we are now in a position to consider the following decision
problem� Is the global optimum� FMAX� in a given instance of an N�k landscape greater
than some speci�ed value V� In view of the work of Dress ������� it is not surprising
that this question can be answered for adjacent neighborhoods with periodic boundary
conditions by the following simple dynamic programming algorithm that actually �nds the
globally maximal �tness� For simplicity� we present the algorithm for k � �� leaving the
trivial generalization to arbitrary k to the reader� Also� N is to be added or subtracted� as
appropriate� to subscripts outside the range �� �� � � � �N so that they assume values within
that range�
Let fbi��bibi��i be the site �tness of the ith site� given the values of the bits bi��� bi� and
bi��� and let FbNb�jbi��bi��i be the maximum value of the sum
i��Xj��
fbj��bjbj��j �
over the values of b�� b�� � � � � bi� given the values of the bits bN � b�� bi���and bi��� The
algorithm then consists of the three phases
�
Initialization� FbNb�jb�b�� � fbNb�b�� � fb�b�b�� �
Continuation� FbNb�jbi��bi��i � max
bi
�FbNb�jbibi��i�� � f
bibi��bi��i��
�for � � i �
N � �� which implicitly speci�es a value for bi for � � i � N � � and each
choice of bN � b�� bi��� and bi��� Use the value of bi thus speci�ed for subsequent
calculations�
Note that the formula still makes sense when i � N � � or i � N � �� but there
will be only � FbNb�jbN��bNN�� values and only � F
bNb�jbNb�N�� values instead of the ��
FbNb�jbi��bi��i values de�ned for � � i � N � ��
Termination� FMAX � maxbNb�
�FbNb�jbNb�N��
��
We have therefore proven
Theorem � The N�k decision problem with adjacent neighborhoods is solveable in
O��kN� steps� and is thus in P�
However� the situation for random neighborhoods is quite di�erent� as is shown by
Theorem The N�k decision problem with random neighborhoods is NP complete for
k � �
Proof We note �rst that the single integer N characterizes the size of the problem com�
pletely if k is �xed� �k��N real numbers are required to specify the �tness tables� and k
integers per site� or kN integers per instance of the problem to specify the neighborhoods�
Furthermore� it is merely a matter of a table lookup followed by an addition for each site�
or O�N� total work to check that a proposed solution does� indeed� have a �tness greater
than a given value� We conclude that the N�k decision problem is in NP�
�
We demonstrateNP completeness by showing that theN�k problem is polynomially equiv�
alent to one of the best known NP complete problems� the SAT problem �Garey � John�
son� ������ Given N boolean variables b � �b�� b�� � � � � bN � and a list of M expressions�
Ei�bpi � bqi � bri �� involving arbitrary triples of these variables and the operators AND� OR�
and NOT� the SAT problem is to determine whether values exist for the bi�s such that
all of the expressions are satis�able �i�e� all evaluate to TRUE�� If we can show that every
such problem can be mapped into an N�k decision problem in polynomial time� we can
conclude that the latter problem is �at least as hard as the SAT problem� because our
mapping� together with a polynomial time solution to the N�k decision problem� would
provide a polynomial time solution to the SAT problem�
It is easiest to generate the N�k decision problem that corresponds to a given SAT problem
when N � M and k � � We �rst transform each expression Ei�bpi � bqi � bri � into the
equivalent expression
E�
i�bi� bpi � bqi � bri � � Ei�bpi � bqi � bri � AND �bi � bi��
where the expression a � b is TRUE if and only if a � b� We then identify bi with the ith
bit in the N�k con�guration� and the other variables appearing in the ith expression with
bi�s �neighbors � We assign the �tness tables associated with bi as follows� a �tness table
entry is assigned the value � if the corresponding E�
is TRUE for the speci�ed values of
the b�s� and if it is FALSE� Clearly� the corresponding SAT problem is solved if we can
determine whether the global maximum of the N�k landscape thus generated is greater
than or equal to N �
A trivial variation of the above mapping su�ces whenM � N � As before� we identify each
SAT variable� bi� with the ith bit in the N�k con�guration� modify the M expressions
such that bi appears in the ith expression for � � i � M and assign the �tness tables as
�
described above� SitesM ���M��� � � � �N are assigned �tness tables in which every entry
is given the value �� Once more� the given SAT problem is solved if and only if the global
maximum of the corresponding N�k problem is at least N �
TheMN case can be handled by introducing additional variables bN��� bN��� � � � � bM � and
proceeding as above� provided we lengthen the N bit string in the N�k model to M bits�
and assign the neighborhoods as before�
The N�k decision problem for k is a fortiori NP complete because every SAT problem
can be embedded in a k decision problem by de�ning the expressions
E��
i �bi� � � � � bj � bpi � bqi � bri � � Ei�bpi � bqi � bri � AND �bi � bi� AND � � � AND �bj � bj ��
where the bits bi� � � � � bj include the bits at the ith site and some arbitrary collection of
k� neighbors� The previously given mapping to the N�k problem can then be applied�
Remark �� So far as we know� the question of whether the k � � random model is NP
complete is open�
Remark � An almost identical argument shows that the �P �T model� proposed by
Pedro Tarazona� in which the �tness table entries for site i do not depend on the ith bit�
but only on its k neighbors� is also NP complete for k � � The computational complexity
of the k � � P �T model is also an open question� but we conjecture that it is polynomially
equivalent to the polynomial time �SAT problem� a variant of the SAT problem in which
the M boolean expressions involve arbitrary pairs� rather than triples� of the boolean
variables�
NP Completeness and Correlation
The foregoing is a clear example of Garey and Johnson�s observation that seemingly trivial
�
modi�cations to a combinatorial optimization problem can render it intractable� Previous
analytical and numerical work suggests that many of the statistical properties of N�k
landscapes for large� but �xed k are quite similar for adjacent and random neighborhoods�
Weinberger �����a� shows that the mean number of local optima� distances between optima
and the expected �tness of a local optimum is� asymptotically for large k� the same in
both cases� In Appendix I� we compute the exact correlation R�d�� between pairs of points
separated by a Hamming distance d� from which we can deduce the limiting behavior for
d� k �� N � For the random landscape� we obtain
R�d� � ��d�k � ��
N�d�d� ��k�k � ��
�N��O
��dkN
����
for the adjacent neighbor landscape� we obtain the rather similar expression
R�d� � ��d�k � ��
N�d�d� ��k�k � ��
�N��O
��dkN
����
These correlations� along with the common mean and variance of the �tnesses� completely
characterize the multivariate Gaussian distribution to which the distribution of �tnesses
converges as N� k ���
We might conjecture that a problem is intractable because the correlation between pairs
of distant con�gurations decays so rapidly as their distance increases that their �tnesses
are e�ectively independent� We then have the obvious� but suggestive
Lemma A Every algorithm for �nding the maximum �or minimum� of M completely
arbitrary real numbers requires at least O�M� steps�
Proof If such an algorithm required merely o�M� steps� some of the numbers must neces�
sarily be ignored by the algorithm� The answer given must then be independent of which
�if any� of the ignored numbers is the maximum� Clearly� such an algorithm cannot be
guaranteed to �nd the maximum in all cases�
�
Any hope of �nding the maximum of the �N �tnesses assigned to the �N vertices in
polynomial time must be based on the ability to rule out whole classes of �tnesses with a
single operation� which� from the above lemma� is impossible if the vertices are assigned
completely independently and without any a priori knowledge about the assignment� On
the face of things� it would seem that such problems cannot possibly lie in NP� however�
we show that a similar problem is embedded in the problem of optimization on a random
neighbor N�k landscape� for which the correlation between �tnesses of points separated
by a Hamming distance of N�� is ��e�k��
�� �O�k�N�
��See Appendix I��� Clearly� this
quantity can be made as small as desired by choosing k su�ciently large and letting N tend
to in�nity �In fact� the N�k decision problem remains in NP for k � O�logN���� Because
the �tnesses are asymptotically jointly Gaussian� an asymptotically vanishing correlation
between them implies that they are asymptotically statistically independent� and therefore
a knowledge of one �tness tells us� asymptotically� nothing about the other� We now prove
Lemma B For arbitrarily large N � there exists a set� N � of �N �log�N���� bit strings
of length N such that the Hamming distance between any pair of strings is at least N���
The proof of Lemma B is based on
Lemma C For arbitrarily large N � there exists a set� !N � of �N bit strings of length N
such that the Hamming distance between any pair of strings is at least N���
Proof of Lemma CWe start by recursively constructing� for N � �n� n � �� a set� !N � of
�N strings of length N whose pairwise Hamming distances are at least N��� Clearly�
!� � f� ����� ��� ��� ��� ��� ��� ��g�
The �� strings of length N � � in ! are obtained as follows� form � complementary pairs�
�ti� "ti�� from the strings ti !�� Because the distance between distinct elements of !� is
��
at least �� the requisite N � � strings are tijjti� tijj"ti� "tijjti� and "tijj"ti� where the symbol
�jj denotes concatenation �simple juxtaposition�� The pairwise distance between each of
the �� strings thus formed is least N�� � �� either because the strings are formed from
di�erent t�s� or because half of one string is the complement of the corresponding half of
the other string� The resulting set is also a set of � pairs of complementary strings� so that
the iteration can be repeated once again� and� in fact� arbitrarily often� each time doubling
the number of strings� The resulting set of �N strings is obviously not unique� because
another set can be generated by complementing the bits at a �xed position in each of the
strings in the �rst set�
Remark� In the course of referreeing this paper� Prof� Andreas Dress proved my conjec�
ture that !N has maximal size� in that at least one member of !N is fewer than N�� bit
�ips away from all N bit strings not in !N � His proof begins by identifying the successive
bits bi� for � � i � N of the string S as the image of a mapping from the index� i� to the
appropriate bit value� He then notes the equivalence of the integer i � � with its binary
representation� the n � log�N bit string� which we write explictly as ���� � � � �n� The
construction above shows that !N consists of precisely those N bit strings S whose ith
bit� bi is given by
�� �
nXj��
�j�j �
for some choice of the bits ��� ��� � � � � �n� and for all i between � and N � The strings thus
generated are precisely the set of a�ne mappings from the n bit strings ���� � � � �n to single
bits� We can thus re�establish that the strings in !N have pairwise Hamming distances of
at least N��� If the corresponding ��s for two mappings di�er at position p� the strings
they produce will di�er at indicies i whose binary representations have the property that
X�p � ��
and there are exactly N�� such indicies� If the ��s di�er only at position p � � the
��
corresponding strings will be complementary�
The maximality of !N now follows from character theory� Identify !N with the irreducible
characters of Fn� � considered as a group� via the mapping
�i � ����bi � ����
���P
n
j���j�j �
where �i is the ith component of the vector ��� Fn� � For a given N bit string S� denote
this mapping as ����S�� Now consider the inner product
� ����S�� ����S�
� ��
�n
Xi
�i��
i
��
�n
hd�S�S
�
� � d�S� "S�
�i �
where d�S�S�
� is the Hamming distance between S and S�
� The fact that !N is closed
under complementation guarantees that the Hamming distance between it and any N bit
string S is at most N��� If S could be exactly N�� bit �ips from every member of !N �
the di�erence d�S�S�
�� d�S� "S�
� would be exactly zero for all S�
!N � and thus the inner
product of ����S� with all of the irreducible characters of Fn� would be zero� As is well
known� this last condition is satis�ed if and only if every component of ����S� is zero� which
is clearly absurd�
We now return to the
Proof of Lemma B Clearly� the �� strings of length � whose pairwise Hamming distances
are at least � are the binary representations of the integers thru ��� For larger values of
N � we form the strings of N from two N�� bit substrings� a pre�x � and a su�x � �
is an arbitrary member of N��� is constructed from an arbitrary member !N�� by
complementing those bit positions marked by ��s in � �i�e� � � �� where the addition
is taken modulo ����
To check that this construction does� indeed� meet the requirements stated above� we
partition N into subsets of strings with the same pre�x� Clearly� there is no problem with
�
the strings in the same subset� because their su�xes� corresponding to di�erent members
of !N��� are mutually separated by at least N�� bits� There is also no problem with
strings in di�erent subsets whose su�xes were derived from the same � Their pre�xes�
� and ��
must di�er in at least N�� bit positions� so that their su�xes� � � � and
�
� ���
� must also di�er in at least N�� bit positions� There remains the case in which
two strings in N have both di�erent pre�xes and di�erent su�xes� Because they have
di�erent pre�xes� the above construction guarantees that their pre�xes di�er in at least
N�� positions� we now show that the same applies to the su�xes� and �
� We have
� d�� �
� � d�� � � ��
� �
��
For each N � !N N � so that� in particular� � �
N��� It follows from the fact �proven
below� that N�� is closed under �� and the positive distance between �� and ��
� �
that they are distinct members of N��� and therefore di�er in at least N�� bit positions�
We now verify that N is closed under �� by induction� For N � �� closure obtains
trivially� For larger N � we assume closure for N��� and establish it for N � Given two
elements of S�� S� N � which we write ���jj����� and ���jj������ where ��� �� N���
�� � !N��� we have
S� � S� � ��� � ��� jj #��� � �� � ��� � ��$�
Using the induction hypothesis� �� � �� N��� using the commutativity and the asso�
ciativity of �� � we write the su�x string as ��� � ��� � �� � ��� We now observe that
!N�� is also closed under �� � as is clear from their construction� all elements !N��
are N�� bit strings formed by choosing a single element t !� and forming arbitrary
concatenations of t and its complement� "t� Thus� � � � is the arbitrary concatenation of
the four bit strings t� � t� � "t� � "t� and t� � "t� � "t� � t�� From the fact that these last
��
two strings form a complementary pair in !�� we conclude that � � � is also in !N���
allowing us to write
S� � S� � ��jj�� � �
for some �� N�� and some � !N��� as required�
We now count the number of distinct strings� DN � in N � Because each string in N is
generated by choosing one of the DN�� members of N�� and one of the N members of
!N��� we have the recursion relation
DN � NDN���
Writing N � �n� and Gn � D�n � this relation becomes
Gn � �nGn���
Given the initial condition D� � G� � �� that was computed �by hand above� we see
that Gn � �n�n������� so that DN � �N �log�N����� as claimed�
Remark� We conjecture that N is also maximal�
We state the conclusions of this section in the following
Summary The NP complete random neighbor N�k optimization problem seems to be
at least as hard as the problem of optimization over a sample of DN � �N �log�N����
Gaussian random variables whose pairwise correlations can be made as small as desired�
so that they are �arbitrarily close to being pairwise statistically independent� Because
DN grows faster than any polynomial in N � this result seems to suggest that P �� NP�
Unfortunately� small correlations between �tnesses of distant points on the landscape are
not� in themselves� su�cient to render a problem intractable� However small the correla�
tions between the �tnesses of strings in N can be made in the random neighbor landscape�
��
they can be made even smaller in the adjacent neighbor landscape� at least for su�ciently
large landscapes� In fact� if we ignore the O���N� correction terms in the correlation
functions� we have the relation
Radj��� � ��� ��k�� � Rrand��� � �� � ��e��k�
which follows immediately from the inequality �� � � e�� � for � � � ��
Discussion
The crucial di�erence between the adjacent and random landscapes seems to be the number
of bits upon which each site �tness could depend� the �tree width discussed by Dress
������� For the adjacent neighbor case� we know a priori that each site depends only on
the bit at that site and the k bits at adjacent sites� so that the tree width is k � �� It
is this a priori knowledge that makes possible the solution of the corresponding decision
problem with a polynomial time dynamic programming algorithm� In contrast� which k��
of the N bits each site �tness depends will vary from one instance of the random neighbor
model to the other� so that the tree width is no longer k � �� but O�N�� It appears�
therefore� that the random neighbor problem is intractable� not only because of the small
correlations between distant points on the landscape� but also because there is no e�ective
way to partition the problem into smaller problems and thus use information available
from nearby points to infer relationships between more distant points�
In Weinberger� ������ Weinberger �����a� and Weinberger �����b�� we have argued for
the importance of the N�k model and� more generally� the notion of statistically isotropic
�AR��� landscapes � In general� statistical isotropy implies that the correlation� R�d��
between the �tnesses of pairs of points depends only on the distance d between them�
in the speci�c case of the AR��� landscape� this correlation function assumes the form
��
R�d� � e�d�T � for some choice of the �correlation length � T � The exact calculations of
the pair correlations for the random neighbor and adjacent neighborN�k models presented
in the Appendix show some disparity from the precise �AR����ness of the P �T model�
However� this disparity is relatively minor when d � T � and it is R�d� values for d � T that
determine the local properties of the landscape �See the above cited papers for details���
From these observations� we make two conjectures� that� in general� approximately AR���
landscapes with unbounded tree width are NP�complete� but these problems have statis�
tical properties similar to the properties of �easy optimization problems� In view of the
ubiquity of AR��� landscapes in optimization problems in computer design �Sorkin� �����
and in RNA folding landscapes �Fontana et al� ������� this conjecture deserves further
study�
Finally� our results suggest that nature is solving a non�trivial optimization problem in
the �design of individual enzymes� at least if the relevant �tness landscape resembles
an N�k or P �T landscape� As we saw above� the �tness of an enzyme depends crucially
on its three dimensional structure� so that the random neighbor N�k landscape �i�e� the
NP�complete one� is clearly more biologically accurate than the adjacent neighbor model�
This observation is further evidence that there is much to be learned about optimization
from the study of evolutionary strategies�
Acknowledgements
The author gratefully acknowledges the support of ONR Grant N������K���� for the
time period in which this work was begun and the support of a Max Planck Stipendium
during its gestation� A discussion with Andreas Dress convinced the author that the opti�
mization problem on adjacent landscapes could be solved in polynomial time via dynamic
programming� and his subsequent encouragement resulted in the completion of this work�
��
including the correction of some mistakes in a previous version and the proof that !N
is maximal� The author would also like to acknowledge useful discussions with Stuart
Kau�man on the general subject of evolution as a combinatorial optimization problem�
and Pedro Tarazona for suggesting the P �T model and for the computation of both its
correlation function and the correlation function of the random neighbor N�k model�
References
Dress� A� ������ �On the Computational Complexity of Composite Systems� Lecture Notes
in Physics� Vol� �� � Fluctuations and Stochastic Phenomena in Condensed Matter� L�
Garrido �ed��� Springer� Berlin�
Fontana� W�� Griesmacher� T�� Schnabl� W�� Stadler� P�� and Schuster� P� ������� �Statis�
tics of Landscapes Based on Free Energies� Replication and Degradation Rate Constants
of RNA Secondary Structures� Monatshefte f�ur Chemie� in press�
Garey� M� and Johnson� D� ������� Computers and Intractability A Guide to the Theory
of Incomputability� W� H� Freeman� San Francisco�
Kau�man� S� � Levin� S� ������� �Towards a General Theory of Adaptive Walks on
Rugged Landscapes� Journal of Theoretical Biology �� ������
Kau�man� S�� Weinberger� E�� and Perelson� A� ������� �Maturation of the Immune
Response Via Adaptive Walks On A�nity Landscapes� Theoretical Immunology� Part I�
Santa Fe Institute Studies in the Sciences of Complexity� A� S� Perelson �ed��� Addison�
Wesley� Reading� Ma�
��
Kau�man� S� �Weinberger� E� ������� �TheN�kModel of Rugged Fitness Landscapes and
Its Application to Maturation of the Immune Response� Journal of Theoretical Biology
���� No� �� ����
Macken� C� and Perelson� A� ������� Protein Evolution on Rugged Landscapes� Pro�
ceedings of the National Academy of Sciences � �����
Sorkin� G� ������� �Combinatorial Optimization� Simulated Annealing� and Fractals �
IBM Research Report RC���� �No� ������
Weinberger� E� ������� �A More Rigorous Derivation of Some Results on Rugged Fitness
Landscapes� J� theor� Biol� ��� No� �� ��������
Weinberger� E� ������ �Correlated and Uncorrelated Fitness Landscapes and How to Tell
the Di�erence� Biological Cybernetics �� No� �� �����
Weinberger� E� �����a�� �Local Properties of Kau�man�s N�k model� a Tuneably Rugged
Energy Landscape� Physical Review A� ��� No� �� ��������
Weinberger� E� �����b�� �Fourier and Taylor Series on Fitness Landscapes� Biological
Cybernetics� �� ����
Wright� S� ������ �The roles of mutation� inbreeding� crossbreeding and selection in evo�
lution� In� Proceedings �th Congress on Genetics �����
��
Appendix
Calculation of the Correlation between Pairs of Fitnesses in the N�k Model
We want to compute
R�d� �E#f�a�f�b�$ � ��
��
where a and b are bit strings separated by Hamming distance d� f�a� and f�b� are their
respective �tnesses� � is the common mean of these �tnesses� and �� is their common
variance� The expectation� E� is taken over the joint distribution of the random variables
f�a� and f�b�� Without loss of generality� we choose a distribution for the �site �tnesses
that has mean zero and variance �� We then have � � � �� � ��N � and
R�d� ��
NE
��� NX
j��
fj �a�
A�Xj ��C
fj �a� �Xj�C
fj�b�
A�� �
where the notation is chosen to re�ect the fact that a certain subset� C� of the site �tnesses
change when bit string a is changed to bit string b� but site �tnesses fj � C� i�e� all of the
others� remain the same� For pairs of sites i �� j� the de�nition of the N�k model guarantees
that site �tnesses fi�a� and fj�b� are independent� and thus uncorrelated� Similarly� fj�a�
and fj�b� are either identical �because bit j and its neighbors are identical in both a and
b� or they are independent� In any case� we conclude that
R�d� ��
NEhXj ��C
f�j �a�i
� Prfj � Cg�
In other words� the correlation between the �tnesses of two bit strings in the N�k model is
exactly the probability that a randomly chosen site �tness is the same in the computation
of f�a� and f�b��
For the random neighbor model� the required probability is easily obtained� A site �tness
is unchanged only if the bit at that site is not one of the d bits that has been �ipped and if
�
it is not one of the k neighbors of any �ipped bit� The probability that a site satis�es the
�rst condition is �� d�N � the probability that a site satis�es the statistically independent
second condition is #�� k��N � ��$d� Thus� for the random neighbor model�
R�d� �
���
d
N
����
k
N � �
�d
�
For k� d �� N � we have
R�d� � ��d�k � ��
N��d� ��dk�k � ��
�N��O
��dkN
����
If d � �N � where � � O���� and k �� N �
R��N� � ��� ��
���
k
N � �
��N� ��� ��e��k
�� �O�k�N�
��
For the P �T model� in which neighborhoods are de�ned completely at random� the proba�
bility that a site is not a�ected by �ipping a random bit is ��k�N � independent of whether
other bits have been �ipped previously� provided they have not a�ected the given site� The
probability that a site remains una�ected by d such �ips is thus ��� k�N�d� For purposes
of comparison with the other models� we give the small and large d approximations to R�d�
for the P �T model with k � �� rather than k neighbors� The �rst of these approximations
is
R�d� � ��d�k � ��
N�d�d� ���k � ���
�N��O
��dkN
���for d� k �� N�
as is clear from a binomial expansion of � � �k � ���N
�d� If d � �N � where � � O����
and k �� N �
R�d� � e���k���� �O
�k�N
���
The same derivation for the adjacent neighbor model begins by imagining that the N site
�tnesses are arranged in a circle� and that the �ipped bits are represented by the integer
��
vector �n�� n�� � � � � ni� � � � � nd�� where � � ni � N � Without loss of generality� we choose
n� � � and n� � n� � � � � � nd � N � Because we must make d � � choices among the
N�� remaining bits� and� because we cannot make the same choice twice� there are N��d��
�ways to make the required choices� However� if we constrain ni�� � ni � l� we need only
make d� � choices from N � l � � bits�
The number of ways of making this second set of choices is N�l��d��
�� The probability� �l�
that ni�� � ni � l is then given by
�l �
N�l��d��
� N��d��
� �
independent of i� If l � k� then l site �tnesses change as a result of �ipping bit ni���
otherwise� k � � site �tnesses change� It follows that the expected number� E#jCj$� of
changed site �tnesses after moving a distance d from the starting point is
E#jCj$ � d
�kXl��
l�l � �k � ��h��
kXl��
�l
i�
� d�k � ���d
N��d��
� kXl��
�k � �� l�
�N � l � �
d� �
��
The probability that a randomly chosen site �tness doesn�t change is then � � E#jCj$�N �
and
R�d� � ��E#jCj$
N
� ��d�k � ��
N�
d
N N��d��
� kXl��
�k � �� l�
�N � l � �
d� �
�
For d� k �� N � this expression can be written as
R�d� � ��d�k � ��
N�d�d � ��
N�
kXl��
�k � �� l�
��� l��
N
���� l��
N
�� � ���� d�l��
N
���� �
N
�� � ���� d��
N
�� ��
d�k � ��
N�d�d � ��k�k � ��
�N��O
��dkN
����
��
When d � �N and k �� N �
R�d� � ��d�k � ��
N�d�d � ��
N�
kXl��
�k � �� l�
��� d
N
���� d��
N
�� � ���� d�l��
N
���� �
N
�� � ���� l
N
�
� �� �k � ��� � ��kXl��
�k � �� l��� � ��l�� �O�k��N�
� ��� ��k�� �O�k��N��
�