EXACTLY SOLVABLE MODEL OF PROTEIN FOLDING: RUBIK… · April 23, 1999 11:2 WSPC/140-IJMPB 0083 328...

April 23, 1999 11:2 WSPC/140-IJMPB 0083

International Journal of Modern Physics B, Vol. 13, No. 4 (1999) 325–361c© World Scientific Publishing Company

EXACTLY SOLVABLE MODEL OF PROTEIN FOLDING:

RUBIK’S MAGIC SNAKE MODEL

KAZUMOTO IGUCHI∗

70-3 Shinhari, Hari, Anan, Tokushima 774-0003, Japan

Received 14 December 1998

I study the conceptual framework of protein folding considering an exactly solvablemodel — the Rubik’s magic snake model. I discuss the mathematical representation ofthe model, the model Levinthal paradox, the non-unique compact folded structure, thefunction of the chain, the ground state energy, the commensurability between the foldedstructure and the potential sequence, the relationship between the unique ground stateand broken symmetry in this model and the dual model and the inflation of the magicsnake chain, respectively.

PACS number(s): 36.20.-r, 87.10.+e, 87.15.By

1. Introduction

Denaturation of proteins has been of strong interest for both biological and the-

oretical physicists for a long time.1 A denatured protein can fold into a native

three-dimensional structure which is coded by the amino acid sequence in the pri-

mary structure of the protein.2 It has been a long standing problem to answer the

following questions:

(1) How does the protein find a pathway to the folded structure quite rapidly from

astronomically many possibilities of the structure? — Levinthal paradox 3;

(2) Is the native structure unique? — Unique ground state problem4;

(3) What is the relationship between the native folded structure and the amino

acid sequence? — The second genetic code problem.5

There have been mainly three approaches to study the protein folding (PF)

problem:

(a) One is the experimental determination of the PF.4,6 In this approach, sub-

millisecond laser pulse spectroscopy experiments6 have revealed the nature of

the relaxation process from unfolded to folded structures in the PF, where

∗E-mail: [email protected]

325

April 23, 1999 11:2 WSPC/140-IJMPB 0083

326 K. Iguchi

the intensity of pulse in the relaxation process is fitted by the Williams–Watts

function for describing a nonexponential decay.7

(b) The second is the so-called inverse PF problem.8 In this approach, it has been

conjectured that the structures of the order of a thousand are realized in the

real PF in Nature and the other structures are obtained by a combination of

these structures.

(c) The third is the computer associated determination of the PF. There have re-

cently appeared many theoretical investigations.9 In this approach, one mimics

a protein as a self-avoiding linear chain of heteropolymer placed on a three- (or

two-) dimensional cubic (or square) lattice. Then, given an amino acid sequence

with defining potential differences, one searches by computer to find the lowest

configurational energy out of many configurations of folded structures.

However, since there is the limit of computer power and time, one must be re-

stricted to usually adopt anN ×N ×N (= N3 sites, N = 3 ∼ 5) cubic lattice for

the purpose. Therefore, even if one successfully obtains the lowest energy state of

the folded protein, one cannot answer the above questions from this approach, so

far. Hence, one needs a much more different approach to understand the concep-

tual framework of the PF problem, which is sufficient to conceptually answer the

above problems. In this paper, I would like to present such an alternative approach,

considering a toy model10 for the PF problem, part of which has been published

recently.11

It may be appropriate here to describe the motivation underlying the study of

a system of toy models. One would like, of course, to study the general PF problem

with any amino acid sequence. Such a program can be formalistically carried out.

It is, however, generally recognized that to draw any definite physical conclusions

from such a general program is very difficult. If one makes approximations on the

general problem in order to arrive at concrete results, one usually encounters the

great difficulty of defining and justifying the validity of the approximation made. I

therefore start instead from a concrete model, which is sufficiently simple so that

one might hope to be able to discuss the validity of the method of approach.

The organization of the paper is as follows. In Sec. 2, I will introduce the toy

model of Rubik’s magic snake chain for the PF and the mathematical representation

of the model using the four types of rotational operations for the conformation of

the model. In Sec. 3, I will discuss the model Levinthal paradox which seems to be

important to discuss the Levinthal paradox in the real proteins. In Sec. 4, I will

mathematically represent the compact folded structures. In Sec. 5, I will discuss

how to construct and classify the compact folded structures mathematically, using

a graph theoretical approach. In Sec. 6, I will discuss the function of the magic snake

chain. In Sec. 7, I will define the model Hamiltonian for the system. In Sec. 8, I will

present the ground state energy of the system. In Sec. 9, I will discuss the ground

state energy difference between the folded and the unfolded structures of the magic

snake chain. In Sec. 10, I will discuss the ground state energy difference between

April 23, 1999 11:2 WSPC/140-IJMPB 0083

Exactly Solvable Model of Protein Folding . . . 327

the folded structures with different helicity of S = 0 and S = ±1. In Sec. 11, I

will discuss the commensurability between the potential sequence and the folded

structure. In Sec. 12, I will discuss the relationship between the appearance of the

lowest ground state and the concept of broken symmetry in the PF problem. In

Sec. 13, I will discuss the important nature that is inherent in the geometry of the

magic snake model, such as the dual model and the inflation of the magic snake

chain. In Sec. 14, I will draw a conclusion.

2. Toy Model of Protein Folding

Let us first introduce a toy model for understanding the conceptual framework of

PF. The reason why I study this model is described as follows.

In solid state physics, it is very well-known that a solid is constructed by close-

packedly piling up single building blocks — unit cells, and the varieties of solids

appear from symmetry of the unit cells.12 This concept has recently been extended

to nonperiodic solids — the so-called quasicrystals, where the building blocks are

a combination of several unit segments of different shapes and the varieties of

quasicrystals appear from the way of piling up the unit segments.13 Thus, the whole

macroscopic geometry of a solid is governed by the microscopic geometry of the unit

cell. In the sense that the native structure of a protein is governed by a sequence of

20 types of amino acids to make a close-packed compact structure, the PF problem

seems similar to that of solid state physics. One now would like to ask whether or

not such unit cells or segments which dominate the whole geometry of the folded

structure of proteins exist in the PF. This can be thought of as a kind of inverse

problem for the PF. Because, the normal problem for the PF is that given an amino

acid sequence of a protein, one usually considers what the folded structure is.

Is there any example for this? Yes, there are simple models which satisfy the

above condition. One is the Rubik’s magic snake model.10 Another is the Fuller’s

tensegrity model14 where there are many varieties of the models. Although Fuller’s

tensegrity models are more realistic to the PF problem, I would like to restrict

myself to consider only the former in this paper. Because even if the model is so

simple, there still exist many unsolved interesting problems within this model.

The Rubik’s magic snake10 is constructed by 24 triangular segments, each of

which has five surrounding faces where the top and bottom faces are right-angled

isosceles triangles and three sides faces are two squares and one rectangle, respec-

tively. The square face of one segment is attached to the square face of the nearest

neighbor segment to make a chain of 24 triangular segments (Fig. 1), and each seg-

ment can rotate around the faces such that mainly four directions of rotation are

fixed to define four configurations of the adjacent two segments: cis (c) (no rotation

of φ = 0), trans (t) (rotation of φ = 180), the right gauche (g+) (rotation of

φ = 90) and the left gauche (g−) (rotation of φ = 270) positions, respectively

(Fig. 2).

I now find simple geometrical constraints as

April 23, 1999 11:2 WSPC/140-IJMPB 0083

328 K. Iguchi

Fig. 1. Rubik’s magic snake model. The upper figure is the folded structure with helicity of S = 0seen from the three-fold symmetry axis and the lower figure the unfolded structure.

Fig. 2. Four types of configuration between the two adjacent segments. There are cis (c) (norotation of φ = 0), trans (t) (rotation of φ = 180), the right gauche (g+) (rotation of φ = 90)and the left gauche (g−) (rotation of φ = 270) positions, respectively.

c4 = cccc = 1 (1)

which means that four successive operations of cis-configurations make a square

segment — a cycle or closed loop of period of four [Fig. 3(a)] and

g+g−g+g−g+g− = g−g+g−g+g−g+ = 1 (2)

which means that an alternative six operations of the two types of gauche config-

urations make a cycle of period six [Fig. 3(b)]. These can be regarded as defining

relations for the free group made by strings of the four symbols, c, t, g+ and g−.

Thus, the set of the four symbols forms an alphabet Λ for the folding problem

[i.e. Λ ≡ c, t, g+, g−].

April 23, 1999 11:2 WSPC/140-IJMPB 0083


(a)

(b)Fig. 3. Defining relations for the free group constructed by Λ = c, t, g±. (a) The definingrelation, c4 = 1, is drawn and (b) the defining relation, g+g−g+g−g+g− = g−g+g−g+g−g+ = 1,is drawn.

3. Model Levinthal Paradox

I now encountering a very similar problem to the Levinthal paradox3 in this model.

Since there are four possibilities of configuration coded by the four symbols, c, t,

g+ and g−, at each attached face and there are 23 such rotational faces, the total

number of configurations of this magic snake is 423 ≈ 1013. Here each configura-

tion is represented by a string (or word) of 23 letters of the four symbols such

as cctg+g−ttccttcg+g+g−g−cttctcg+g−, etc. Each string describes the sequence of

conformation of the chain. Do not confuse this with the amino acid sequence of the

chain constructed from the 20 amino acid codes. These are different from each other.

For example, I give some typical configurations: t23 is a linear chain; (g+)23 [(g−)23]

the right (left) helix; t10cct11 a hair pin; t6g+g−t6g+g−t6g+(≈ 1) a triangular loop.

Here ≈ means an equivalence relation.

The meaning of the equivalence relation is as follows.

t6g+g−t6g+g−t6g+g− = 1 (3)

means an exact closed loop without ends [Fig. 4(a)]. Here, to assign the sequence

of the symbols is not unique since one can read the sequence backward along the

course of the sequence of the chain. Therefore, the reverse order of the sequence of

the symbols,

g−g+t6g−g+t6g−g+t6 = 1 (4)

also represents the same folded structure. Hence, all cyclic and anticyclic permu-

tations of the sequence of the symbols represent the same closed loop structure as

April 23, 1999 11:2 WSPC/140-IJMPB 0083

330 K. Iguchi

(a)

(b)

Fig. 4. Meaning of equivalence unity. (a) A closed loop structure of g+t6g−g+t6g−g+t6g− = 1(a triangular loop) is drawn. (b) The equivalent closed loop structure with the ends in thechain is shown, where g− is removed to place the ends. This structure is represented byg+t6g−g+t6g−g+t6 ≈ 1.

well. This is always valid for any closed loop structure represented by “= 1”. If

there is no confusion, I use only one of them for the sake of simplicity.

Let us consider the closed-loop case of the magic snake chain model. For exam-

ple, consider the case of Eq. (3). In this case, there are two end faces that meet at one

position in the closed loop of the magic snake model. Suppose that such end faces

meet at the position where the rotational conformation is represented by one of the

three symbols of t, g+ and g− (say, “g−”). This is mathematically represented by

April 23, 1999 11:2 WSPC/140-IJMPB 0083


removing one of the 24 symbols in the sequence. In the case of t6g+g−t6g+g−t6g+g−,

the last g− is removed. But the geometry of the chain is still a closed loop [Fig. 4(b)].

This is the meaning of the equivalence relation. Therefore, a string representing a

closed loop turns out to be equivalence unity.

Suppose that I need a second to flip the segment at each time. Then, to find

a compact folded structure I need 1013 s ≈ 105 yr, which absolutely exceeds my

life time. Therefore, if I search the folded structure in this way such as statistical

random searching, then there is no hope for me to meet the desired structure in

my life. However, I can easily find a folded structure within a few minutes! This

situation seems very similar to the Levinthal paradox.3 Hence, I would like to call

this situation the model Levinthal paradox.

How can I find such a folded structure of the magic snake for a short time?

As one can recognize if one challenges to solve the toy model, a guiding principle

to reach the folded structure is to find locally compact or close-packed structure:

First, I want to make a local part in the chain as smallest as possible, which is the

structure of period four [i.e. c4-structure, see Fig. 3(a)]. But this is impossible by

volume exclusion (i.e. self-avoiding) of the magic snake structure. Second, I want to

find the next smallest part of period six, which is possible. The shape of this part is a

bit different from the closed loop of (g+g−)6 but locally close-packed represented as

−g+g+g−g+g−g−− (Fig. 5), which can be regarded as an example of short-ranged

local interaction of the chain.5 I keep continuing the same procedure to find a final

folded structure with helicity of S = 0 (This concept of helicity will be discussed

later):

g+g−g+g−g−g+g−g+g+g−g+g−g−g+g−g+g+g−g+g−g−g+g− ≈ 1(S = 0) , (5)

which is obtained from the corresponding closed loop structure without ends of the

chain:

g+g−g+g−g−g+g−g+g+g−g+g−g−g+g−g+g+g−g+g−g−g+g−g+ = 1(S = 0) , (6)

by removing the last symbol “g+” (Fig. 6).

Fig. 5. The locally compact structure. This is constructed by changing the closed loop struc-ture of period six, g+g−g+g−g+g− = 1 (or g−g+g−g+g−g+ = 1) to −g−g−g+g−g+g+− (or−g+g+g−g+g−g−−).

April 23, 1999 11:2 WSPC/140-IJMPB 0083

332 K. Iguchi

Fig. 6. The compact folded structures with helicity of S = 0,±1. There is one three-fold sym-metric axis for the folded structure of S = 0 while there is no three-fold symmetric axis for thefolded structure of S = ±1.

This procedure saves much time to find a folded structure as follows: I first

make a six-segment module and make the other three such modules, successively.

Since the total number of configurations of the six-segment module is 46 = 4096,

the total number of configurations of the four modules of six segments is about

4096× 4 ≈ 1.6× 104. Therefore, time to find the folded structure is of the order of

1.6×104 s = 273 min = 4.5 h. In this way, to find a locally close-packed structure is

very significant for the PF problem, which may solve the Levinthal paradox in the

real PF problem.

4. Configuratons of the Folded Structure

Let us consider next whether or not the compact folded structure is unique. Con-

trary to the expectation in the real PF problem,1–6,8,9 the folded structure of

Eq. (5) is not unique, but there are many other possible configurations. This is

a consequence of the closed loop structure of the folded structure of Eq. (6). There

are two square end faces in the magic snake model. These meet each other at

the same position in the closed loop of Eq. (6), which can be regarded as an ex-

ample of long-ranged nonlocal interaction of the chain.5 Unless this closed loop

structure is a real closed loop gluing the end faces, there appear 24 possibilities to

place the meeting position of the pair of end faces in the closed loop. This is math-

ematically represented by considering all cyclic permutations of the string given by

Eq. (5) such as g−g+g−g+g−g−g+g−g+g+g−g+g−g−g+g−g+g+g−g+g−g−g+, etc.

Here, I have assumed that I always start reading the sequence from the first inter-

face between the first and second segments (i.e. the rotational face nearest to one

end) to the 23rd interface between the 23rd and 24th segments (i.e. the rotational

face nearest to the other end). Thus, the structure of Eq. (5) has the 24 geomet-

rically equivalent degenerating configurations. This must be true in any kind of

April 23, 1999 11:2 WSPC/140-IJMPB 0083


closed loop structure of chains. However, this point is missing in the arguments in

the previous literature.1–6,8,9

Let us consider whether or not more folded structures exist. Since there is a

three-fold symmetry axis in the structure of Eq. (5), I can define the helicity (or

chirality) S such that the helicity of this folded structure is denoted by S = 0. By

the aid of this concept of helicity, I can find more folded structures with the right-

(left-) handedness where there is no three-fold symmetry axis and hence I define

helicity of S = 1(−1). These folded structures are given by

g+g−g+g−g−g+g+g−g−g+g−g+g+g−g+g−g−g+g+g−g−g+g− ≈ 1(S = −1) , (7)

g−g+g−g+g+g−g−g+g+g−g+g−g−g+g−g+g+g−g−g+g+g−g+ ≈ 1(S = 1) . (8)

Here there are 24 geometrically equivalent folded structures for each helicity

(Fig. 6). These are obtained from the closed loop structures of S = ±1:

g+g−g+g−g−g+g+g−g−g+g−g+g+g−g+g−g−g+g+g−g−g+g−g+ =1(S = −1) ,

(9)

g−g+g−g+g+g−g−g+g+g−g+g−g−g+g−g+g+g−g−g+g+g−g+g−=1(S = 1) ,

(10)

by removing the last symbol “g+(g−)”, respectively.

In this way, I find totally the 24 geometrically equivalent degenerating folded

structures for each helicity in the magic snake model. All of the 24 structures for

each helicity are regarded as one structure once all positions of local and nonlocal

interactions of the chain are glued [see, Eqs. (5), (9) and (10)]. And also, the com-

pactness of the folded structures for each helicity are very similar to each other.

Therefore, the compactness of all these is the same in the classical mechanics level.

Hence, I find totally the 72 geometrically almost equivalent degenerating folded

structures.15

5. Constructing Folded Structures from Modules

There are very interesting aspects of the folded structures. Suppose that

there are four segments of period six, each of which is represented by either

g+g−g+g−g+g− = 1 or g−g+g−g+g−g+ = 1. These segments of period six can

be regarded as a model for modules or domains for the PF.16 So, I call the seg-

ments modules. If I put the four modules together to make one compact structure,

then this is geometrically very similar to the folded structures. The symmetry of

this compact structure is tetrahedral with four three-fold symmetry axes and there

are six positions where rectangular faces of one module are attached to those of

the other three modules. From this, I can construct the three types of the folded

structures with different helicities.

Mathematically, this is carried out as follows: Let us represent the structure of

four contact modules (i.e. an assembly or a cohesive structure of four modules) by

April 23, 1999 11:2 WSPC/140-IJMPB 0083

334 K. Iguchi

(g+g−g+g−g+g−) ∪ (g+g−g+g−g+g−)

∪ (g+g−g+g−g+g−) ∪ (g+g−g+g−g+g−) , (11)

where ∪ means disjoint union of the objects17 (Fig. 7). First, consider a pair of

modules of period six, which is attached to one another at one position, i.e.,

(g+g−g+g−g+g−) ∪ (g+g−g+g−g+g−) . (12)

If I pick up a pair of adjacent attached rectangular faces in the modules and ex-

change the positions of the connected rectangular faces, then this mathematical

operation provides making a larger closed loop-like module of period 12 from two

smaller modules of period six (Fig. 8). This is mathematically nothing but a con-

nected sum ],17 represented by

(g+g−g+g−g+g−)](g+g−g+g−g+g−)

= g+g−g+g+g−g−g+g−g+g+g−g− = 1 , (13)

where all cyclic (and anticyclic) permutations of the symbols mean the same struc-

ture in the closed loops. Using this concept of the connected sum, if I do the same

Fig. 7. The cohesive assemble of four modules of period six. This is mathematically described asa disjoint union ∪ of the four modules.

Fig. 8. Connected sum of two modules. The case of two modules of period six is shown as anexample. By the operation of the connected sum, the two separated modules are connected to amodule of period 12. The left is phrased by the word “disconnected”, denoted by d and, the rightis phrased by the word “connected”, denoted by c. Therefore, in this example, the connected sumis represented by an operation, ] : d→ c.

April 23, 1999 11:2 WSPC/140-IJMPB 0083


thing for one module of period 12 and one module of period six, then I get

(g+g−g+g+g−g−g+g−g+g+g−g−)](g+g−g+g−g+g−)

= g+g−g+g+g−g−g+g+g−g−g+g−g+g+g−g+g−g− = 1 , (14)

which represents a closed loop of period 18 and all cyclic (and anticyclic) permuta-

tions of the symbols stand for the same structure. Do the connected sum between

the loop modules of period 18 and of period six, once again. I obtain

(g+g−g+g+g−g−g+g+g−g−g+g−g+g+g−g+g−g−)](g+g−g+g−g+g−)

= g+g−g+g−g−g+g−g+g+g−g+g−g−g+g−g+g+g−g+g−g−g+g−g+ = 1 , (15)

which is a closed loop structure of period 24 with helicity S = 0 and also all cyclic

(and anticyclic) permutations of the symbols mean the same structure. To make

the two end points I remove one of the symbols in the string of Eq. (15). Hence,

there appear 24 possibilities to locate the ends in the loop. Thus, I finally obtain

the folded structure of S = 0, represented by Eq. (5).

Let us similarly do the connected sum for the two loop modules of period 12.

Then, I get

(g+g−g+g+g−g−g+g−g+g+g−g−)](g+g−g+g+g−g−g+g−g+g+g−g−)

= g−g+g+g−g−g+g−g+g+g−g+g−g−g+g+g−g−g+g−g+g+g−g+g− = 1 , (16)

which is the closed loop structure with helicity of S = 1 and its conjugate represents

the one with helicity of S = −1, and all cyclic (and anticyclic) permutations of the

symbols stand for the same closed-loop folded structure. To make the two end

points I remove one of the symbols in the string of Eq. (16). Hence, there appear 24

possibilities to locate the ends in the closed loop. Thus, I finally obtain the folded

structure of S = ±1, represented by Eqs. (7) and (8) as discussed before.

In this way, the folded structures are obtained by using the concepts of mod-

ules and the connected sum. Here, the sequence of the connected-sum operations

represents an evolution for constructing the folded structure from the four small-

est modules of period six to one largest module of period 24, which is the magic

snake chain. Hence, the mathematical concept of the connected sum can describe

the evolution of the magic snake chain from smaller modules to a larger module.

This might give a hint to understand biological evolution of a protein from smaller

molecules to larger molecules.

The above mathematical construction of the folded structures can be also con-

sidered as follows: As described before in this section, if I consider the four modules

of period six, which are put together to make one cohesive structure, then there are

six positions where the 12 rectangular faces meet each other. Let us put numbers to

the six positions by 1 through 6. Let us denote two configurations at each position

by connected (c) and disconnected (d). Here c (d) means that the configuration of

the two modules that sandwich one position is connected (disconnected) such that

the two modules are kept connected to make one module (disconnected to make

April 23, 1999 11:2 WSPC/140-IJMPB 0083

336 K. Iguchi

Fig. 9. The six positions of the connected sum in the folded structure. These are numbered from1 through 6.

Fig. 10. Duality in the connected sum. dddddd denotes an assembly of four separate modules ofperiod six, while cccccc denotes another assembly of four separate modules of period six. Theseare dual to each other.

two separate modules) (Fig. 9). Therefore, since there are two possibilities at each

position, the total possibility of all the configurations is 26 = 64.

Let us denote one configuration by u1u2u3u4u5u6, where uj = c, d. Now,

dddddd = d6 represents the assembly of the four disconnected modules of period

six. Similarly, cccccc = c6 represents the other assembly of the four disconnected

modules of period six, which is dual to the original one (Fig. 10). This is due to

the duality of the geometrical conformation of the assembly of the four modules of

period six. I call this the duality in the connected sum. Since there is this geometrical

duality, the 32 possibilities are meaningful among these 64 possibilities. Now I find

the following:

(1) There is one (= 6C0) possibility to construct an assembly of the four discon-

nected modules of period six described as d6.

(2) There are 6(= 6C1) possibilities such as dddddc, where one module of period

eight and three modules of period six are assembled.

(3) There are three possibilities such as dddcdc, where two modules of period 12

are assembled. And there are 12 possibilities such as ddddcc, where one module

of period 18 and one module of cycle six are assembled.

April 23, 1999 11:2 WSPC/140-IJMPB 0083


(4) There are four possibilities of the closed-loop folded structure with S = 0 such

as dddccc, and there are three possibilities of the closed-loop folded structure

of S = 1(−1) such as ddccdc.

Hence, totally there are 32 configurations. Thus, one can represent the folded

structure in terms of the language of the representation of c and d.

6. Function of the Folded Structure

Let us consider the function of the folded structure of the magic snake chain. Amaz-

ingly, from its geometry, there is a cubic space with volume of two segments at the

center of the folded structure. Therefore, this shape of the folded structure is noth-

ing but a kind of shell which can contain something inside. Hence, the function of

the folded structure of the magic snake is a container. This function resembles that

of many real globular proteins such as cytochrome c in which a molecule like the

haem group is contained.6

There is another function that comes from duality of the folded structure. The

folded structure can be reversed by exchanging the role of the even and odd number

segments of the magic snake chain such that the reversed folded structure is identical

to the original one. In this sense, the folded structure is self-dual and I call this

duality the duality between the even and odd number segments.

I would like to note here the following geometrical nature of the folded structure.

Suppose that the total number of the segments of the magic snake chain is one or

two less than 24 such as 22 or 23, or it is one or two more than 24 such as 25 or 26.

The above argument in the previous section works for the former case as well, but

it does not work so well for the latter case. In the former case, the chain can fold

into the similar compact folded structure with lack of one or two segments, where

there appear a hole in the shell of the structure (Fig. 11). On the other hand, in the

latter case, the chain cannot fold into the similar compact folded structure due to

the volume exclusion and the geometry effect of the one or two residual segments

Fig. 11. The folded structure with the lack of a segment. If the magic snake chain is constructedby the 23 segments, then it can fold into a compact folded structure. However, a hole appears inthe shell of the folded structure.

April 23, 1999 11:2 WSPC/140-IJMPB 0083

338 K. Iguchi

Fig. 12. The folded structure with the residue of a segment. If the magic snake chain is con-structed by the 25 segments, then it cannot fold into a compact folded structure, but can fold intoa partially compact folded structure. This is due to the volume exclusion of the residual segment.

(Fig. 12). In this sense, the number 24 is a critical number for constructing a closely

packed folded structure in the magic snake chain. It is exactly a magic number for

the folded structure. This point will be discussed later in Sec. 13.

7. Model Hamiltonian

Let us now consider the stability of the folded structure. To do so, one needs

consider the Hamiltonian of the system. Following the arguments in the literature,9

the configurational energy of the chain is given by

Hc =∑i<j

∆(ri − rj)Eσiσj , (17)

where ∆(ri−rj) = 1 if ri and rj are adjoining positions but i and j are not adjacent

in position along the sequence and ∆(ri−rj) = 0 otherwise. Depending on the types

σi of segments in contact, the interaction energy Eσiσj is considered. Therefore, the

energy of Eq. (17) takes into account only the local and nonlocal interactions along

the sequence of the chain.5 For example, if the hydrophobic (H) and hydrophilic or

polar (P) segments are taken into account, then only three types of energies, EHH,

EHP, EPP, are assigned.9

There is another configurational energy which comes from the rotational con-

formation between the adjacent segments. This is given by

Hrot =∑i

Uφi,i+1 , (18)

where Uφi,i+1 the rotational energy with angle φi,i+1 between adjacent segments.

For example, if the cis, trans, right gauche, and left gauche configurations are taken

into account, then only four energies, Uc, Ut, Ug+ and Ug− , are assigned.

When a protein is immersed in a medium such as water or oil, there emerges

an interaction energy between the segments of the protein and the surrounding

April 23, 1999 11:2 WSPC/140-IJMPB 0083


medium. This has been believed to be very important for the origin of the PF.5

This energy is given by

Hm =∑i

sσihσi , (19)

where hσi represents the energy cost according to the types σi of the segments

along the sequence of the chain, and sσi = 1 if the segment of type σi faces the

surrounding medium, sσi = 0 otherwise. For example, if the H and P segments are

taken into account, then only two types of energies, hH, hP, are assigned.

One must consider the quantum mechanical electronic energy of the system.

This comes from the energy of electrons in the protein chain under a certain con-

formation. This is given by the Hamiltonian:

Hel =∑i,j

tri,rjc†icj + viδijc†ici , (20)

where tri,rj means the hopping integral between segments, ri and rj such that

tri,rj = 1 if ri and rj are located in adjacent or adjoining segments along the

sequence of the chain, tri,rj = 0 otherwise, vj the potential at segment j and c†j the

usual electron creation operator that obeys the anticommutation relations.

Finally, the rotational energy of the protein should be taken into account if the

protein is regarded as a rigid body. This is given by

HR =1

2ImΩ2 , (21)

where Im is the moment of inertia of the protein and Ω the angular velocity. But

this term can be usually negligible as being very small or unimportant. Thus, the

total energy of the system is given by

Htot = Hc +Hrot +Hm +Hel +HR . (22)

However, in the standard arguments in the previous literature,9 only the energy of

Eq. (17) is taken into account for the PF.

8. Ground State Energy

The model Hamiltonian H can be defined over any conformation of the magic snake

model, represented by a string with 23 letters of c, t, g+, g−. For the unfolded linear

chain structure of t23, there is no contact energy of local and nonlocal interactions

of the chain. And suppose that the rectangular faces cost energy in a medium.

There are 24 such faces in the linear chain configuration. Hence, the total energy is

given by

Eunfold = 23Ut +24∑i=1

hσi +Eunfoldel +

1

2Iunfoldm Ω2 . (23)

April 23, 1999 11:2 WSPC/140-IJMPB 0083

340 K. Iguchi

Here Eunfoldel stands for the electronic energy of the linear chain (i.e. the total energy

of electrons filled in the spectrum for the linear chain). It is given by

Eunfoldel = 2

∑j,occupied

Eunfoldj , (24)

where the factor 2 in front of the right hand side comes from spin degeneracy and

Eunfoldj are the eigenvalues of the Schrodinger equation:

Hunfoldel |Ψj〉 = Eunfold

j |Ψj〉 , (25)

with the potential arrangement under the linear chain configuration of the magic

snake chain, where |Ψj〉 =∑24j=1 Ψjc

†j |0〉. I note here that I do not take into account

many-body effects such as electron–electron interactions in the chain and I assume

that there is one electron per segment (i.e. the half-filled case) for later purposes.

For the folded structure of S = 0 [Eq. (5)], there appear the contact local and

nonlocal interactions at seven positions between the 1st and 24th, the 1st and 19th,

the 3rd and 9th, the 5th and 23rd, the 7th and 13th, the 11th and 17th, and the

15th and 21st segments along the sequence of the chain, respectively. And there are

12 rectangular faces outward. Hence, the total energy is given by

EfoldS=0 = ES=0

c + 23Ug +∑i=even

hσi +ES=0el +

1

2I foldm Ω2 , (26)

ES=0c = E1,24 +E1,19 +E3,9 +E5,23 +E7,13 +E11,17 +E15,21 , (27)

where I have assumed that Ug± = Ug and the angular frequency Ω is the same as

that of the unfolded structure. Notice here that Ec mainly comes from the interac-

tions between odd number segments while hσi comes from even number segments.

And the ES=0el is the total electronic energy of the system given by

ES=0el = 2

∑j,occupied

ES=0j , (28)


ES=0j are the eigenvalues of the Schrodinger equation:

HS=0el |Ψj〉 = ES=0

j |Ψj〉 , (29)

with the potential arrangement under the compact folded structure of S = 0 of the

magic snake chain.

Similarly, I can obtain the total energyEfolds=±1 for the folded structure of S = ±1,

respectively. Here, the moment of inertia I foldm is also the same for these structures

since the compact structures are all identical. And I now find

EfoldS=±1 = ES=±1

c + 23Ug +∑i=even

hσi +ES=±1el +

1

2I foldm Ω2 , (30)

ES=±1c = E1,24 +E1,13 +E3,9 +E5,23 +E7,19 +E11,17 +E15,21 , (31)

April 23, 1999 11:2 WSPC/140-IJMPB 0083


which is due to mirror symmetry of the folded structures with helicity of S = ±1.

Here the ES=±1el is the total electronic energy of the system given by

ES=±1el = 2

∑j,occupied

ES=±1j , (32)


ES=±1j are the eigenvalues of the Schrodinger equation:

HS=±1el |Ψj〉 = ES=±1

j |Ψj〉 , (33)

with the potential arrangement under the compact folded structure of S = ±1 of

the magic snake chain. And I can, of course, do the same thing for any conformation

of the magic snake.

9. Ground State Energy Difference Between the Unfolded and the

Folded Structures

Let us consider which energy is the lowest between the ground state energy Eunfold

of the unfolded structure and the ground state energy EfoldS=0,±1 of the folded struc-

ture. This is carried out if and only if the amino acid sequence defining conforma-

tional interactions and potentials for the 24 segments along the chain is assigned.

For the sake of simplicity, I assume that the sequence is given by an alternative

repetition of the H and P segments such as

HPHPHPHPHPHPHPHPHPHPHPHP (34)

where I read this sequence from the left to the right putting the numbers of 1

through 24 .

In this case, Eqs. (23), (26) and (30) yield

Eunfold = 23Ut + 12(hP + hH) +Eunfoldel +

1

2Iunfoldm Ω2 , (35)

EfoldS=0 = ES=0

c + 23Ug + 12hP +ES=0el +

1

2I foldm Ω2 , (36)

EfoldS=±1 = ES=±1

c + 23Ug + 12hP +ES=±1el +

1

2I foldm Ω2 . (37)

In this case, I find that

∆Ec ≡ −ES=0,±1c > 0 , (38)

∆Eel ≡ Eunfoldel −ES=0,±1

el > 0 , (39)

∆Im ≡ Iunfoldm − I fold

m > 0 , (40)

are always valid.18 Therefore, if I assume Ut = Ug and hH = −hP = h > 0, then I

conclude

∆E ≡ Eunfold −EfoldS=0,±1 = 12hH + ∆Ec + ∆Eel +

1

2(∆Im)Ω2 0 . (41)

April 23, 1999 11:2 WSPC/140-IJMPB 0083

342 K. Iguchi

Hence, the ground state energy of the unfolded structure is much larger than the

ground state energy of the folded structures. Thus, the energy difference (i.e. the

energy gap) between the unfolded and the folded structures is mainly dominated

by the hydrophobic energy cost, the contact configurational energy, the electronic

ground state energy and the rigid body rotation energy of the system, as expected.

10. Ground State Energy Difference Between the Folded

Structures of S = 0 and S = ±1

Next, consider the energy difference between the folded structures of S = 0 and

S = ±1, which is defined by

∆Efold ≡ ES=±1 − ES=0 = ES=±1c +ES=±1

el − (ES=0c +ES=0

el )

= ∆Efoldc + ∆Efold

el , (42)

where

∆Efoldc ≡ E1,13 +E7,19 − (E1,19 +E7,13) , (43)

∆Efoldel ≡ ES=±1

el −ES=0el . (44)

For the magic snake model with the particular sequence of Eq. (34), it is natural

to assume that

E1,19 = E3,9 = E1,19 = E5,23 = E7,13 = E11,17

= E15,21 = E1,13 = E7,19 ≡ EHH < 0 (45)

since all these are interactions between the same H-types of segments, except the

interaction between the ends of the chain,

E1,24 ≡ EHP < 0 . (46)

Hence, ∆Efoldc = 0. This provides

∆Efold = ∆Efoldel . (47)

To obtain this difference, I have to explicitly solve the Schrodinger equation of

Eqs. (25), (29) and (33), respectively. To do so, I have to assign the on-site poten-

tials vj and the hopping potentials tri,rj explicitly. Otherwise, I cannot obtain the

solution of the Schrodinger equation, since the Schodinger equation such as Eq. (29)

provides the eigenequation:

24∑i=1

tri,rjΨi + vjΨj = EΨj . (48)

Let us first consider the case when there is no effect of the on-site potential:

vj = 0 (j = 1, . . . , 24) . (49)

April 23, 1999 11:2 WSPC/140-IJMPB 0083


This assumption means the following: Although the amino acid sequence is given

by Eq. (34), electrons inside the chain do not feel so strongly the effects of the

potential differences between the segments of the H and P types nor the effects of

the rotational conformations between the segments. On the other hand, only the

hydrophilic and hydrophobic energy costs play an important role in the energy of

the system. This would be good for the first starting point for the problem to see

how the electronic energy problem comes into the PF problem.

In the case of the unfolded chain structure [Eq. (25)], I can assign for the hopping

potentials as

tj,j+1 = −t (j = 1, . . . , 23) , (50)

otherwise tri,rj = 0, where t > 0 is the hopping potential (which is usually of order

of 0.1 ∼ 1 eV). This is due to the geometry of the linear chain structure such

that there is no electron hopping between adjacent segments through the contact

surfaces. The solution of this case is very well-known in quantum chemistry. The

energy spectrum is given by

Eunfoldj = −2t cos

(πj

25

)(j = 1, . . . , 24) . (51)

This provides the spectrum with the 24 eigenvalues:

±1.98423t, ±1.93717t, ±1.85955t, ±1.75261t, ±1.61803t, ±1.45794t ,

±1.25581t, ±1.07165t, ±0.851559t, ±0.618034t, ±0.374763t, ±0.125581t . (52)

Therefore, the total electronic energy of the system is given by

Eunfoldel = 2

∑j,occupied

Eunfoldj = −2(1.98423 + 1.93717

+ 1.85955 + 1.75261 + 1.61803 + 1.45794 + 1.25581 + 1.07165

+ 0.851559 + 0.618034 + 0.374763 + 0.125581)t = −29.845t . (53)

In the case of the folded structure of S = 0, I assume for the hopping potentials

as

tj,j+1 = −t (j = 1, . . . , 23) ,

t1,24 = t1,19 = t3,9 = t5,23 = t7,13 = t11,17 = t15,21 = −t , (54)

otherwise tri,rj = 0, since the rectangular faces of the 1st and 19th, the 3rd and 9th,

the 5th and 23rd, the 7th and 13th, the 11th and 17th, the 15th and 21st segments

and the square faces of the 1st and 24th are in contact with each other, respectively,

such that the electron hopping may appear through the adjacent segments as well

as the nearest neighbor segments, which are given by Eq. (27). Now, I obtain the

energy spectrum with the 24 eigenvalues of Eq. (29) as

−2.56155t, −2.1889t(2), −2t, −1.61803t(2), −1.30278t(2), −t, −0.45685t(2) ,

0, 0.45685t(2), 0.618034t(2), t(2), 1.56155t, 2t, 2.1889t(2), 2.30278t(2) (55)

April 23, 1999 11:2 WSPC/140-IJMPB 0083

344 K. Iguchi

where (2) denotes double degeneracy of the level. This provides the total electronic

energy:

ES=0el = 2

∑j,occupied

ES=0j = −2(2.56155 + 2× 2.1889 + 2 + 2× 1.61803

+ 2× 1.30278 + 1 + 2× 0.45685)t = −33.389t . (56)

Similarly, in the case of the folded structures of S = ±1, I can assign for the hopping

potentials as

tj,j+1 = −t (j = 1, . . . , 23) ,

t1,24 = t1,13 = t3,9 = t5,23 = t7,19 = t11,17 = t15,21 = −t , (57)

otherwise tri,rj = 0, since the rectangular faces of the 1st and 13th, the 3rd and 9th,

the 5th and 23rd, the 7th and 19th, the 11th and 17th, the 15th and 21st segments,

and the square faces of the 1st and 24th are in contact with each other, respectively,

such that the electron hopping may appear through the adjacent segments as well

as the nearest neighbor segments, which are given by Eq. (31). Then I obtain the

energy spectrum with the 24 eigenvalues for Eq. (33) as

−2.56155t, −2.1889t, −2.11491t, −2.08187t, −1.61803t(2), −1.33784t ,

− 1.30278t, −t, −0.45685t, −0.268058t, 0, 0.254102t, 0.45685t ,

0.618034t(2), 0.715828t, 1.53675t, 1.56155t, 1.86081t ,

2t, 2.1889t, 2.30278t, 2.43519t . (58)


energy:

ES=±1el = 2

∑j,occupied

ES=±1j = −2(2.56155 + 2.1889 + 2.11491

+ 2.08187 + 2× 1.61803 + 1.33784 + 1.30278

+ 1 + 0.45685 + 0.268058)t = −33.098t . (59)

The spectra of the above three cases are shown in Fig. 13.

I now find the energy difference between the folded structures of S = 0 and

S = ±1:

∆Efoldel = ES=±1

el −ES=0el = −33.098t− (−33.389t) = 0.291t > 0 . (60)

Hence, the total electronic energy ES=±1el of the folded structure of S = ±1 is larger

than that of the folded structure of S = 0 such that the lowest ground state energy

of the system is realized in the folded structure with helicity of S = 0. I would

like to remark that the electronic energies of the folded structures of S = 0,±1 are

24 degenerate for each helicity, since there is no potential difference between the

24 configurations of the folded structures with each helicity. In the same way, I can

April 23, 1999 11:2 WSPC/140-IJMPB 0083


Fig. 13. Electronic spectrum of the folded structures of S = 0,±1. The case of vj = 0 andtend = tcont = t = 1 is shown.

calculate the energy difference between the unfolded and the folded structures of

S = 0,±1, respectively:

∆ES=0el = Eunfold

el −ES=0el = −29.845t− (−33.389t) = 3.544t , (61)

∆ES=±1el = Eunfold

el −ES=±1el = −29.845t− (−33.098t) = 3.253t . (62)

Therefore, the order of magnitude of the electronic energy difference between the

unfolded and the folded structures is much greater than that between the folded

structures of S = 0 and S = ±1. The former is 10 times as large as (i.e. one order

larger than) the latter. If I use this result together with Eqs. (35)–(37), (45) and

(46), I can rewrite Eq. (41) as

∆E ≡ Eunfold −EfoldS=0(±1) = 12hH − 6EHH + 3.544t(3.253t) +

1

2(∆Im)Ω2

≈ o(hH)− o(EHH) + o(t) 0 . (63)

This shows that the hydrophobic interactions with the surrounding medium such as

water, the contact configurational interactions between the hydrophobic segments

as well as the electronic energy of the system are very important in the PF.

Second, as another example, let us consider the case when there is the effect of

the on-site potential. I now adopt the on-site potential as

vj = t (−t) (64)

for j = even (odd), where I have assumed the value t to vj and the segments of type

P (H) are located on the even (odd) sites, for the sake of simplicity. This assumption

April 23, 1999 11:2 WSPC/140-IJMPB 0083

346 K. Iguchi

means the following: According to the amino acid sequence given by Eq. (34),

electrons inside the chain feel the effects of the potential differences between the

segments of the H and P types such that the potential value is t (−t) on the segment

of type P (H). In this case, there is a choice of sign in front of t in Eq. (64). This

different choice of the sign may cause a different energy spectrum, which is the

problem of commensurability of the potential sequence with the folded structure.

This will be discussed in the next section. On the other hand, I assume the same

hopping potentials for each structure as defined before.

Now, in the same way as before, I can calculate the the electronic spectrum for

the unfolded and the folded structures of S = 0,±1, respectively. They are given

as follows: For the unfolded structure, I obtain the electronic spectrum with the

24 eigenvalues for Eq. (25) as

±2.22197t, ±2.18005t, ±2.11138t, ±2.01783t, ±1.90211t, ±1.76793t ,

± 1.62026t, ±1.46576t, ±1.31345t, ±1.17557t, ±1.06792t, ±1.00785t . (65)

This provides the total electronic energy:

Eunfoldel = 2

∑j,occupied

Eunfoldj = −2(2.22197 + 2.18005 + 2.11138 + 2.01783

+ 1.90211 + 1.76793 + 1.62026 + 1.46576 + 1.31345 + 1.17557

+ 1.06792 + 1.00785)t = −39.70t . (66)

For the folded structure with helicity of S = 0, I obtain the electronic spectrum

with the 24 eigenvalues for Eq. (29) as

−2.56155t, −2.2687t(2), −2t, −1.79129t(2), −1.61803t(2) ,

− 1.56155t, −1.12406t(2), −t, 0.618034t(2), 0.738758t(2) ,

t, 1.56155t, 2t, 2.56155t, 2.654t(2), 2.79129t(2) , (67)


energy:

ES=0el = 2

∑j,occupied

ES=0j = −2(2.56155 + 2× 2.2687 + 2 + 2

× 1.79129 + 2× 1.61803 + 1.56155 + 2× 1.12406 + 1)t = −41.45t . (68)

For the folded structure with helicity of S = ±1, I obtain the electronic spectrum


−2.56155t, −2.2687t, −2.16425t,−2.15309t, −1.79129t, −1.75844t ,

− 1.61803t(2),−1.56155t,−1.12406t, −1.09608t, −t ,

0.618034t(2), 0.738758t, 0.772866t, 0.90428t, 1.56155t ,

2.21183t, 2.39138t, 2.56155t, 2.654t, 2.79129t, 2.8915t , (69)

April 23, 1999 11:2 WSPC/140-IJMPB 0083


Fig. 14. Electronic spectrum of the folded structures of S = 0,±1. The case of vj = (−1)j t andtend = tcont = t = 1 is shown.


energy:

ES=±1el = 2

∑j,occupied

ES=±1j = −2(2.56155 + 2.2687 + 2.16425

+ 2.15309 + 1.79129 + 1.75844 + 2× 1.61803 + 1.56155

+ 1.12406 + 1.09608 + 1)t = −41.43t . (70)


From the above results, I find the energy difference between the folded structures

of S = 0 and S = ±1:

∆Efoldel = ES=±1

el −ES=0el = −41.43t− (−41.45t) = 0.02t > 0 . (71)

This shows that in the present case of the two types of the on-site potentials asso-

ciated with the two types of segments, the energy difference becomes much smaller

than that in the previous case of no potential difference. Hence, in this case, the

electronic ground state energy of the folded structures become closer to be 72-fold

degenerate rather than 24-degenerate.

In the same way, I can calculate the energy difference between the folded and

the unfolded structures of S = 0,±1, respectively:

∆ES=0el = Eunfold

el −ES=0el = −39.70t− (−41.45t) = 1.75t , (72)


el −ES=±1el = −39.70t− (−41.43) = 1.73t , (73)

April 23, 1999 11:2 WSPC/140-IJMPB 0083

348 K. Iguchi

which become of the same order. I now find from Eq. (41) that


1

2(∆Im)Ω2

≈ o(hH)− o(EHH) + o(t) 0 . (74)

Hence, again, I find that the hydrophobic interactions with the surrounding medium

such as water, the contact configurational interactions between the hydropho-

bic segments and the electronic energy of the system are crucially important in

the PF.

11. Commensurability Between the Potential Sequence and the

Folded Structure

Let us consider the relationship between the potential sequence and the folded

structure. As was discussed before, if one considers the magic snake chain as a

classical object, then the ground state energy is highly degenerate such that many

similar folded structures can have the same ground state energy. However, if one

considers the magic snake chain as a quantum object, then some of the degenerate

ground state energies become lowered than many others even if they come from

the same folded structures.

Let us consider this problem. To see this point, let us go back to the case with

two types of potential. But now, I assume the different sign in front of the potential

as follows:

vj = −t (t) (75)

for j = even (odd), where I have assumed the value t to vj and the segments of

type P (H) are located on the even (odd) sites. On the other hand, all the hopping

potentials are kept in the same as before.

This assumption means the following: According to the amino acid sequence

given by Eq. (34), electrons inside the chain feel the effects of the potential differ-

ences between the segments of the H and P types such that the potential value is −t(t) on the segment of type P (H). Therefore, according to the change of the sign of

the on-site potential, the accumulation of electrons in the chain is influenced. If the

sign is negative (positive) on the segment of type H (P), then electrons are attracted

(resisted) to exist on the segment of type H (P) and vice versa. This difference of

distribution of electrons in the chain can cause the ground state energy difference.

In the same way, I can calculate the electronic spectrum for the unfolded and

the folded structures of S = 0,±1, respectively. They are given as follows: For the

unfolded structure, I obtain the same electronic spectrum with the 24 eigenvalues

for Eq. (25) as Eq. (65), since in the linear chain structure there is no effect of the

potential sign difference.

For the folded structure with helicity of S = 0, I obtain the electronic spectrum


April 23, 1999 11:2 WSPC/140-IJMPB 0083


−3t, −2.654t(2), −2.56155t, −2.30278t(2), −1.30278t(2), −t ,

− 0.738758t(2), 0, t, 1.12406(2), 1.30278t(2), 1.56155t ,

2t(2), 2.2687t(2), 2.30278t(2) , (76)


energy:

ES=0el = 2

∑j,occupied

ES=0j = −2(3 + 2× 2.654 + 2.56155 + 2× 2.30278

+ 2× 1.30278 + 1 + 2× 0.738758)t = −41.12t . (77)

For the folded structure with helicity of S = ±1, I obtain the electronic spectrum


−3t, −2.654t, −2.61803t,−2.59562t, −2.30278t(2), −1.37824t ,

− 1.30278t(2), −t, −0.738758t, −0.381966t, −0.34411t, t, 1.12406t ,

× 1.17167t, 1.30278t(2), 2t(3), 2.2687t, 2.30278t, 2.44469t , (78)

where (2) [(3)] denotes double (triple) degeneracy of the level. This provides the

total electronic energy:

ES=±1el = 2

∑j,occupied

ES=±1j = −2(3 + 2.654 + 2.61803 + 2.59562 + 2× 2.30278

+ 1.37824 + 2× 1.30278 + 1 + 0.738758 + 0.381966)t = −43.16t . (79)


From the above results, I find the energy difference between the folded structures

of S = 0 and S = ±1:

∆Efoldel = ES=±1

el −ES=0el = −43.16t− (−41.12t) = −2.04t < 0 . (80)

And I can calculate the energy differences between the unfolded and the folded

structures of S = 0,±1 are given by

∆ES=0el = Eunfold

el −ES=0el = −39.70t− (−41.12t) = 1.42t , (81)


el −ES=±1el = −39.70t− (−43.16t) = 3.46t , (82)

respectively. I now find from Eq. (41) that


1

2(∆I)Ω2

≈ o(hH)− o(EHH) + o(t) 0 . (83)

Hence, once again, I find that the hydrophobic interactions with the surrounding

medium such as water, the contact configurational interactions between the hy-

drophobic segments and the electronic energy of the system are crucially important

in the PF. This situation is schematically shown in Fig. 16, and it is frequently

April 23, 1999 11:2 WSPC/140-IJMPB 0083

350 K. Iguchi

Fig. 15. Electronic spectrum of the folded structures of S = 0,±1. The case of vj = −(−1)jtand tend = tcont = t = 1 is shown.

Fig. 16. The landscape of the ground state energy of the system. The horizontal axis means thewhole configuration space of the structures while the vertical axis means the ground state energyof the system. The landscape is not like a funnel structure but more like a bucket structure.

called the funnel structure of the ground state energy of the system in the litera-

ture Ref. 9. However, as is shown in Fig. 16, it is more like a bucket structure since

there is the bottom in the structure.

Equation (80) shows that in the case of Eq. (75), the ground state energy of

the folded structure with helicity S = ±1 is lower than that of the folded structure

April 23, 1999 11:2 WSPC/140-IJMPB 0083


Fig. 17. Schematic diagram of the ground state energy. If vP > 0 > vH, then ES=0el is the lowest

such that ES=0el < ES=±1

el (Solid lines). On the other hand, if vP < 0 < vH, then ES=0el > ES=±1

el(Gray lines). For convenience, the case of Eqs. (94)–(96) is also shown in this figure (Weak graylines). Thus, it is apparent that the ground state degeneracy changes and is removed as thepotential sequence changes.

with helicity S = 0. This is opposite to the former cases of the on-site potential

Eq. (64). The difference becomes much larger and much more enhanced than that

in the previous cases of no potential difference and of the two types of the on-

site potential of Eq. (64). Hence, in this case, the electronic ground state energy

of the folded structures become closer to be 24-fold degenerate rather than 72-

degenerate. Thus, for vP > 0 > vH (vP < 0 < vH), ES=0el is the lowest such that

ES=0el < (>)ES=±1

el . This is schematically shown in Fig. 17. I would like to note

here that this tendency depends upon the choice of the sign for trirj . In the above

argument, I have used the condition, trirj < 0. However, if I use the condition,

trirj > 0, then the situation would be opposite; for vP > 0 > vH (vP < 0 < vH),

ES=0el is the lowest such that ES=0

el > (<)ES=±1el .

The above result shows that even if the ground state energy is highly degenerate

in the classical mechanics level, there may appear a more favorable structure with the

lower ground state energy from the degenerate ground state energy in the quantum

mechanics level. Indeed, the above difference of the ground state energies between

the two different signs of the on-site potential comes from only the electronic ground

state energy. Therefore, it takes place only when the system is treated as a quantum

object by quantum mechanics. This is what I mean by the word “commensurability”

of the potential sequence with the folded structure. However, this point is totally

absent in the previous literature Ref. 4, 5, 9.

12. The Unique Ground State and Broken Symmetry

From the discussions in the previous sections, I have drawn that there are mainly

three types of sequence which are crucially important in the PF.

April 23, 1999 11:2 WSPC/140-IJMPB 0083

352 K. Iguchi

(1) The first type is the conformational sequence coded by the rotational configu-

rations, c, t, g+, g−. This contributes to the energies, Uc, Ut, Ug+ , Ug−.(2) The second type is the sequence coming from the effects of the side chains of a

protein such as the sequence of the hydrophilic and hydrophobic segments, H,

P. This contributes to the energies, EHH, EHP, EPP;hP, hH.(3) The third is the potential sequence affecting electrons inside the chain such

as vj , tri,rj. This contributes to the energies, tc, tt, tg± , tcont; vP, vH, where

tcont means the electron hopping potential by the short-ranged and long-ranged

contact interactions between the contact segments along the chain.

All of these energies are related to the primary structure of the amino acid sequence.

To show this point, I have worked out the particular case with the conformational

sequence of the unfolded sequence of t23 and the folded sequences of Eqs. (5), (7)

and (9).

What can I do for other sequences? Of course, I can do the same thing for this

case, too. But much more fascinating idea for this problem is to use the concept

of broken symmetry. Suppose there is a sequence that is a bit different from the

sequence of Eq. (34). This discrepancy may cause broken symmetry of the chain

to yield the nondegenerate ground state so that one of the folded structures is

more favorable. For any sequence, it may be also true to destroy the multiple-

fold degeneracy of the ground state to produce a unique ground state of the magic

snake chain. This situation would provide a hint to consider the relationship between

the unique ground state of the system and the sequence of the chain. Hence, I

conjecture that this is also true for the real PF problem and in this sense the

second genetic code problem is nothing more than the unique ground state problem

in the real PF problem.

To investigate the above conjecture, let us first consider the on-site potential, vjand the hopping potential, trirj . These potentials are related to the configurations

of the two adjacent or nearest neighbor segments as well as the type of each segment

in the magic snake chain. Therefore, they are parameterized, in general, as follows:

trirj =

ti,i+1 ≡ −t(φi,i+1) if ri, rj ∈ NN

t1,24 ≡ −tend if ri = 1, rj = 24

ti,i+n ≡ −tcont if ri, rj ∈ SR,LR

0 otherwise

, (84)

Vj ≡ Vσj (85)

where NN means the nearest neighbor and SR (LR) the short-ranged (long-ranged)

contact interaction through the segments along the chain, and σj the type of the jth

segment and I assume that t(φi,i+1), tend, tcont > 0. Furthermore, it is reasonable

to assume that t(φi,i+1) can be parameterized as

t(φi,i+1) = t0 + δtr , (86)

where δtr = δtc, δtt, δtg± ≡ δt±.

April 23, 1999 11:2 WSPC/140-IJMPB 0083


From the above parameter setting, Eq. (85) gives the sequence:

vH, vP, vH, vP, vH, vP, vH, vP, vH, vP, vH, vP ,

vH, vP, vH, vP, vH, vP, vH, vP, vH, vP, vH, vP . (87)

On the other hand, Eq. (84) gives the sequences of the hopping potential along the

chain: For the unfolded structure of t23, I have

−t,−t,−t,−t,−t,−t,−t,−t,−t,−t,−t,−t ,

− t,−t,−t,−t,−t,−t,−t,−t,−t,−t,−t , (88)

where t ≡ t0 + δtt. And I have

−t+,−t−,−t+,−t−,−t−,−t+,−t−,−t+,−t+,−t−,−t+ ,

− t−,−t−,−t+,−t−,−t+,−t+,−t−,−t+,−t−,−t−,−t+,−t− , (89)

for the folded structure of S = 0 and

−t±,−t∓,−t±,−t±,−t∓,−t±,−t±,−t∓,−t∓,−t±,−t∓,−t± ,

− t±,−t∓,−t±,−t∓,−t∓,−t±,−t±,−t∓,−t∓,−t±,−t∓ , (90)

for the folded structure of S = ±1, respectively, where t± ≡ t0+δt± = t+(δt±−δtt).Let us calculate the electronic ground state energy for this case, assuming that

vH = −t, vP = t, t+ = 1.5t, t− = t and tcont = tend = t, where the hopping

potentials through the contact faces are assumed to be the same as before. Now

I obtain the 24 eigenvalues for the folded structures of S = 0,±1, respectively, as

follows: For the folded structure of S = 0, I have

−3.41108t, −3.04527t, −2.9478t, −2.82609t, −2.48973t, −2.45298t ,

−1.7551t, −1.68918t, −1.32111t, −0.89807t, −0.89048t,−0.0523378t ,

1.100115t, 1.10838t, 1.1307t, 1.53181t, 1.57999t, 1.87538t, 2.27708t ,

2.42318t, 2.60062t, 2.60556t, 2.77775t, 2.86764t . (91)

For the folded structure of S = 1, I have

−3.43742t, −3.06193t, −3.0121t, −2.8248t, −2.58789t, −2.41811t ,

−1.74706t, −1.69604t, −1.37906t, −1.00607t, −0.820767t,−0.0564129t ,

t, 1.10546t, 1.20365t, 1.49754t, 1.58527t, 1.84581t, 2.38611t, 2.40616t ,

2.63656t, 2.7218t, 2.79472t, 2.86457t . (92)

For the folded structure of S = −1, I have

−3.39715t, −3.0595t, −2.95476t, −2.79955t, −2.58069t, −2.40453 ,

− 1.71997t, −1.66809t, −1.334467t, −0.955957t, −0.829761t,−0.0772168t ,

April 23, 1999 11:2 WSPC/140-IJMPB 0083

354 K. Iguchi

1.00089t, 1.09728t, 1.21308t, 1.45491t, 1.59244t, 1.85655t, 2.31277t ,

2.38584t, 2.59758t, 2.67628t, 2.79326t, 2.81099t . (93)

From these, I obtain the electronic ground state energy:

ES=0el = −47.58t , (94)

ES=1el = −41.41t , (95)

ES=−1el = −41.81t , (96)

respectively. The spectra of the above three cases are shown in Fig. 18.

Thus, the degeneracy of the electronic ground state energies between the folded

structures of S = 1 and S = −1 is removed by the difference of the hopping

potential sequences. This is what I mean by the word “broken symmetry”, which

picks up one of the potential sequences to lower the ground state energy.

Let us further consider the effect of the broken symmetry. As was discussed so

far, each ground state energy of the unfolded and the folded structures of S = 0,±1

is 24-fold degenerate, since there is no effect of the location of the end faces along

the chain on the ground state energy. However, if one of the on-site or hopping

potentials is affected by the location of the two end faces in the chain, then this

situation discriminates the ground state energy. To show that the different location

of the end faces leads to the non-degenerate ground state energy, let us suppose

Fig. 18. Electronic spectrum of the folded structures of S = 0,±1. The case of vH = −t, vP = t,t+ = 1.5t, t− = t and tend = tcont = t = 1 is shown.

April 23, 1999 11:2 WSPC/140-IJMPB 0083


that the two end faces are placed in between the first and last segments. In this

case, it is natural to assume that the hopping potential between the 1st and 24th

segments, tend = t1,24, is changed such as

tend = t′ . (97)

Let us then calculate the ground state energy, assuming t′ = 0.2t, for example.

Now, I get the 24 eigenvalues for the folded structure of S = 0:

−3.39024t, −3.04662t, −2.87395t, −2.8226t, −2.52766t, −2.42333t ,

− 1.76479t, −1.6342t, −1.16859t, −0.932509t, −0.888087t ,

− 0.0980352t, 1.101262t, 1.08192t, 1.15696t, 1.49632t, 1.59128t, 1.89808t ,

2.18623t, 2.30263t, 2.60062t, 2.62237t, 2.775385t, 2.86772t . (98)

This provides the ground state energy:

ES=0el,(1,24) = −47.14t . (99)

On the other hand, if the two end faces are located in between the ith and

the (i + 1)th segments, then I assume that ti,i+1 = −t′ while the others are not

changed, and so forth. Let us next calculate the ground state energy, assuming

t10,11 = t11,10 = −0.2t, for example. Now, I get the 24 eigenvalues for the folded

structure of S = 0:

−3.35032t, −3.08966t, −2.89371t, −2.78998t, −2.50062t, −2.45168t ,

− 1.74959t, −1.59238t, −1.3443t, −0.919229t, −0.817707t,−0.0895456t ,

1.101373t, 1.06853t, 1.13301t, 1.52641t, 1.62462t, 1.85335t ,

2.21695t, 2.42356t, 2.55864t, 2.60954t, 2.75533t, 2.80506t . (100)

This provides the ground state energy:

ES=0el,(10,11) = −47.06t . (101)

The spectra of the above two cases are shown in Fig. 19.

The above result shows that the different location of the ends in the closed

loop structure of the magic snake chain causes the different electronic ground state

energy, which means that the 24-fold ground state degeneracy can be removed by

the effect of the location of the end faces in the chain. This can be thought of as an

example of the pinning of a defect by the potential sequence or the broken symmetry

by a defect between the folded protein structure and the potential sequence where

I have regarded the location of the end faces as a defect. In this way, I would like

to conclude that the second genetic code problem is related to broken symmetry of

the degenerate ground state energy to the unique ground state energy of the system

in the quantum mechanics level.

April 23, 1999 11:2 WSPC/140-IJMPB 0083

356 K. Iguchi

Fig. 19. Electronic spectrum of the folded structure of S = 0 with the effect of the location ofthe ends in the magic snake chain. (a) The end faces are located in between the 1st and the 24thsegments. t1,24 = t24,1 = 0.2t is used. (b) The end faces are located in between the 10th and the11th segments. t10,11 = t11,10 = 0.2t is used. Here t = 1 is assumed for both cases.

13. Discussion

In this section, I would like to remark further some important nature of the geom-

etry of the magic snake chain model:

(1) There is a class of equivalent models, which can be regarded as a dual structure

model of the magic snake chain.

(2) The magic snake model with 24 segments can be inflated to the model with as

many segments as a multiple times the 24 segments.

First, let us consider the case (1). In this case, there is another class of models

that the triangular segment which constructs the magic snake chain can be replaced

by the bent rod segment (Fig. 20). I call this model the magic rod model (Fig. 21).

This looks more like the standard protein folding model using the self-avoiding

random walk on a cubic lattice.9 However, in this magic rod model the chain or

rod is allowed to meet or attach each other at one position in space, while in the

standard lattice model the chain is not allowed to do so. This place of attachment

corresponds to the position where the two rectangular faces meet each other to make

contact surfaces in the magic snake chain (Fig. 22). Since the faces of a segment

in the magic snake chain turn our to be the points of a segment in the magic rod

model, the magic rod model can be thought of as a dual model to the magic snake

model.

April 23, 1999 11:2 WSPC/140-IJMPB 0083


Fig. 20. The correspondence between the magic snake model and the magic rod model. Thetriangular segment of the magic snake model corresponds to the bent rod segment of the magicrod model.

Fig. 21. The folded structure of the magic rod model. These are dual models in the sense thatthe faces of the segments in the magic snake model corresponds to the points at the segments inthe magic rod model.

Fig. 22. Relationship between the contact faces of the magic snake model and the contact rodsof the magic rod model. The attachment or contact of the rod segments is allowed in the magicrod model, while it is forbidden in the standard protein folding model using the three-dimensionalrandom walk on a cubic lattice.

Second, let us consider the case (2). The magic snake chain that is constructed

by the 24 segments can be made longer to consist of more segments in the chain

structure. Especially, there are a class of magic numbers such that one can find

the same type of the folded structure of the magic snake chain. It is given by the

numbers:

Nm = 24×m for m = 1, 3, 5, . . . . (102)

I call this class of the magic snake chains the inflated magic snake chains.

April 23, 1999 11:2 WSPC/140-IJMPB 0083

358 K. Iguchi

Fig. 23. Inflation scheme of the magic snake chain model. If the unit segment is inflated tothe unit segment with m (= 1, 3, 5, . . .) triangular segments, then the inflated magic snake chainrealizes the same folded structure as that of the original magic snake chain.

Fig. 24. The compact folded structure of the inflated magic snake chain with 72 segments. Thecase of helicity S = 0 is shown.

For example, consider the case of m = 3. The inflated magic snake chain is

constructed by the 24 inflated segments, each of which consists of the three segments

(Fig. 23). Hence, the total number of this magic snake chain is 72. The folded

structure is drawn in Fig. 24.19 Then, I find the total number of all configurations

of this inflated magic snake chain is 471 ≈ 5.6 × 1042. If I follow the argument

of the model Levinthal paradox in Sec. 3, then I have to conclude that generally

speaking, it is impossible for me to find a compact folded structure in a practical

April 23, 1999 11:2 WSPC/140-IJMPB 0083


time when I use the random searching. It is true even when I use a high performance

supercomputer. In fact, by this approach it is extremely difficult to find a folded

structure, since even if the supercomputer runs as fast as 10−12 s per one step (i.e. 1

tera-flops per second), it needs 10−12×1042 = 1030 s ≈ 1022 yr. Nevertheless, I can

find such a folded structure without searching the whole configuration space of the

structure. I can get the above class of folded structures once I inflate the system

by the inflation scheme for the constituting segments in the chain from the smaller

structure to the larger structures. This procedure saves huge time to find a folded

structure from the whole configuration space of the structure. Hence, I conjecture

that this kind of inflation of the unit segments in the chain structure can be a clue

to understand an evolution of the protein structure or the protein architecture.

From this point of view, there seems to exist a formal analogy between natural

language and the PF problem. Let us suppose that one writes a paper of 103 letters.

To do so, if one searches the whole configuration space of 261000 ≈ 105147, then it is

stupidly meaningless, since it cannot be accomplished at all. Therefore, one never

searches like this. Instead, one first uses a dictionary of words (i.e. a finite set of

words), where words are constructed by a finite number of letters. Second, one

combines the words to make a set of sentences. Third, one combines the sentences

to make a set of paragraphs. Finally, one arranges the paragraphs to make a paper,

and so forth.

In the PF problem, the situation is almost the same. If one searches the en-

tire configuration space of the structures, then it would be a hopeless procedure.

Instead, one first searches a structure of small segments constructed by a com-

bination of the unit segments, which are modules (i.e. the secondary structure).

Therefore, a module corresponds to a word in language. Second, one uses the mod-

ules to make a domain structure (i.e. the tertiary structure). Therefore, a domain

corresponds to a sentence in language. Third, one combines the domains to make a

three-dimensional structure (i.e. the quaternary structure). Therefore, a quaternary

structure corresponds to a paragraph in language.

In this way, the formal analogy between the language and the PF problem is

established. This is shown in Fig. 25. Although their correspondence is formal as

a physical problem, they are almost the same problem in the mathematical sense

that the entire search of the configuration space is never accomplished in both

cases. Thus, the above analogy is more than accidental. Hence, I speculate that

this analogy would play an important role to understand the real PF problem.

14. Conclusion

In conclusion, I have discussed a toy model of Rubik’s magic snake in order to

elucidate the conceptual framework of the PF problem. Even in this model, there

are many interesting problems such as the model Levinthal paradox, the nonunique

folded structure, constructing a folded structure from its modules, the function

of the chain, and the dual model and inflation of the magic snake chain. I have

April 23, 1999 11:2 WSPC/140-IJMPB 0083

360 K. Iguchi

Fig. 25. Formal analogy between natural language and the protein folding problem.

introduced the model Hamiltonian to discuss the ground state energy of the system.

The ground state is highly degenerate for the particular sequence of Eq. (34) as a

consequence of high possibilities of commensurability, and it is destroyed by an

arbitrary sequence as a broken symmetry to reach a unique ground state energy.

This type of argument may be useful to further investigate the intriguing nature of

the PF.

Acknowledgments

I would like to thank Prof. Mitiko Go, Prof. Satoshi Takahashi and Prof. Chao Tang

for sending me their recent works. I also thank Prof. Mitiko Go and Prof. Satoshi

Takahashi for useful discussions and Kazuko Iguchi for continuous support and

encouragement. This work is partially supported by The Mitsubishi Foundation.

References

1. T. E. Creighton, Proteins (Freeman, New York, 1993); G. E. Schulz and R. H.Schirmer, Principles of Protein Structure (Springer-Verlag, New York, 1979).

2. C. J. Epstein, R. F. Goldberger and C. B. Anfinsen, Cold Spring Harbor Symp. Quant.Biol. 28, 439 (1963); C. B. Anfinsen, Science 181, 223 (1973).

3. C. Levinthal, J. Chim. Phys. 65, 44 (1968).4. H. Frauenfelder, K. Chu and R. Philipp, “Physics From Proteins” in Biologically

Inspired Physics ed. L. Peliti (Plenum Press, New York, 1991).5. H. S. Chan and K. A. Dill, Physics Today 24 (1993), References therein.6. S. Takahashi, S.-R. Yeh, T. K. Das, C.-K. Chan, D. S. Gottfried and D. L. Rousseau,

Nature Struc. Biology. 4, 45 (1997); S.-R. Yeh, S. Takahashi, B. Fan and D. L.Rousseau, ibid. 4, 51 (1997); M. M. Millonas and D. A. Hanck, Phys. Rev. Lett.80, 401 (1998).

7. G. Williams and D. C. Watts, Trans. Faraday Soc. 66, 80 (1970); M. F. Schlesingerand E. W. Montroll, Proc. Natl. Acad. Sci. USA 81, 1280 (1984).

April 23, 1999 11:2 WSPC/140-IJMPB 0083


8. J. U. Bowie, R. Luty and D. Eisenberg, Science 253, 164 (1991); M. Levitt andC. Chothia, Nature 261, 552 (1992); J. U. Bowie and D. Eisenberg, Curr. Opin.Struc. Biol. 3, 437 (1993); M. Wilmanns and D. Eisenberg, Proc. Natl. Acad. Sci.USA 90, 1379 (1993).

9. A. Sali, E. I. Shaknovich and M. Karplus, Nature 369, 248 (1994); E. I. Shaknovich,Phys. Rev. Lett. 72, 3907 (1994); P. G. Wolynes, J. N. Onuchic and D. Thirumalai,Science 267, 1619 (1995); J. Wang, J. Onuchic and P. Wolynes, Phys. Rev. Lett. 764861 (1996); P.-A. Lindgard and H. Bohr, ibid. 77, 779 (1996); H. Li, R. Helling,C. Tang and N. S. Wingreen, Science 273, 666 (1996); H. Li, C. Tang and N.S. Wingreen, ibid. 79, 765 (1997); T. Haliloglu, I. Bahar and B. Erman, ibid. 79,3090 (1997); H. J. Bussemaker, D. Thirumulai and J. K. Bhattacharjee, ibid. 79,3530 (1997); E. D. Nelson, L. F. Teneyck and J. N. Onuchic, ibid. 79, 3534 (1997);C. Micheletti, F. Seno, A. Maritan and J. R. Banavar, ibid. 80, 2237 (1998).

10. This is a toy for kids, which was invented by Rubik who once created the famousRubik cube. E. Rubik, Magic Snake (Tsukuda Original, Tokyo, 1996). The latestversion of this toy has recently been available. Poki Poki Magic Snake (TsukudaOriginal, Tokyo, 1998).

11. K. Iguchi, Mod. Phys. Lett. B12, 499 (1998).12. C. Kittel, Introduction to Solid State Physics, 7th edition (Wiley, New York, 1996);

N. W. Ashcroft and N. D. Mermin, Solid State Physics (Saunders College, New York,1976).

13. P. J. Steinhardt and S. Ostlund, The Physics of Quasicrystals (World Scientific, Sin-gapore, 1987).

14. D. E. Ingber, Scientific American 278, 30 (1998). One of the simplest tensegritymodels has been sold as a toy for babies (called Squwish) from the Boston Musium.

15. This statement is valid only if the system is regarded as a classical object. But if oneconsiders the quantum mechanical characters of the model such as electronic spectrumand vibrations, then it becomes false. Because the folded structure for each helicityprovides different eigenvalues. This will be discussed later.

16. M. Go, Nature 290, 90 (1981); Proc. Natl. Acad. Sci. (USA) 80, 1964 (1983); M. Goand M. Nosaka, Cold Spring Harb. Symp. Quant. Biol. 52, 915 (1987); C. Titiger,S. Whyard and V. K. Walker, Nature 361, 470 (1993).

17. For example, see M. Nakahara, Geometry, Topology and Physics (Adam Hilger, NewYork, 1990).

18. The realtion, Iunfoldm > I fold

m , is obvious from the geometry of the unfolded and thefolded structures of the magic snake chain. If I explicitly calculate them, then I getIunfoldm = 330I0 and I fold

m ≈ 10.8I0 with I0 = ma2, where a is the side length of asquare surface and m the mass of a segment. Hence, Iunfold

m > I foldm . The difference

∆Eel ≡ Eunfoldel − ES=0,±1

el will be discussed in the later sections.19. I would like to comment here that the structure shown in Fig. 24 looks very similar

to the three-dimensional structure of insulin, where both structures exhibit the sameone three-fold symmetry axis. This is a remarkable coincidence between the foldedstructure of rubik’s magic snake model and a real globular protein structure. Thereis also a strinking similarity between the former and the structure of cytochrome c.See Protein Data Bank (www.pdb.bnl.gov). Hence, this shows a kind of reality of ourmathematical approach to the problem of the real PF. On the other hand, if one usesthe standard lattice models Ref. 9, then one has been unable to predict such pysicallymeaningful three-dimensional structures.

EXACTLY SOLVABLE MODEL OF PROTEIN FOLDING: RUBIK… · April 23, 1999 11:2 WSPC/140-IJMPB 0083 328...

Documents

Transcript of EXACTLY SOLVABLE MODEL OF PROTEIN FOLDING: RUBIK… · April 23, 1999 11:2 WSPC/140-IJMPB 0083 328...