Some Mathematical Challenges from Life Sciences
Part III
Peter Schuster, Universität WienPeter F.Stadler, Universität Leipzig
Günter Wagner, Yale University, New Haven, CTAngela Stevens, Max-Planck-Institut für Mathematik in den
Naturwissenschaften, Leipzigand
Ivo L. Hofacker, Universität Wien
Oberwolfach, GE, 16.-21.11.2003
1. Mathematics and the life sciences in the 21st century
2. Selection dynamics
3. RNA evolution in silico and optimization of structure and properties
OCH2
OHO
O
PO
O
O
N1
OCH2
OHO
PO
O
O
N2
OCH 2
OHO
PO
O
O
N3
OCH2
OHO
PO
O
O
N4
N A U G Ck = , , ,
3' - end
5 ' - end
Na
Na
Na
Na
5 '-end 3 ’-endGCG GAU AUUCG CUUA AG UUG GG A G CUG AAG A AGGUC UUCGAUC A ACCAGCUC GAG C CCAGA UCUGG CUGUG CACAG
3 '-end
5 ’-end
70
60
50
4030
20
10
Definition of RNA structure
0
421 8 16
10
19
9
14
6
13
5
11
3
7
12
21
17
22
18
25
20
26
24
28
272315 29 30
31
B in a r y seq u en c es a re e n co d edb y th e ir d ec im a l e q u iv a len ts :
= 0 a n d = 1 , fo r ex a m p le ,
" 0 " 0 0 0 0 0 =
" 1 4 " 0 111 0 = ,
" 2 9 " 111 0 1 = , e tc .
,
C
C CC CC
C C
C
G
GGG
GGG G
M u ta n t c la ss
0
1
2
3
4
5
Sequence space of binary sequences of chain lenght n=5
Population and population support in sequence space: The master sequence
s p ac e
S eq uence
Con
cent
rati
on
M a s te r s e q u e n ce
P o p u la tio n s u p p o r t
M a s te r s e q u e n ce
Population and population support in sequence space: The quasi-species
spac e
S equ enc e
Con
cent
ratio
n
M aste r seq uen ce
P o p u la tio n su pp o rt
M aste r seq uen ce
The increase in RNA production rate during a serial transfer experiment
Decrease in m ean fitnessdue to quasispecies form ation
Sk I. = ( )fk f S k = ( )
S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs
Mapping from sequence space into structure space and into function
Folding of RNA sequences into secondary structures of minimal free energy, G0300
G
G
G
G
GG
G GG
G
GG
G
G
G
G
U
U
U
U
UU
U
U
U
UU
A
A
AA
AA
AA
A
A
A
A
U
C
C
CC
C
C
C
C
C
C
C
C
5 ’-en d 3 ’-en d
GGCGCGCCCGGCGCC
GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA
UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGG
RNAStudio.lnk
The Hamming distance between structures in parentheses notation forms a metric
in structure space
H a m m in g d is ta n c e d (S ,S ) = H 1 2 4
d (S ,S ) = 0H 1 1
d (S ,S ) = d (S ,S )H H1 2 2 1
d (S ,S ) d (S ,S ) + d (S ,S )H H H1 3 1 2 2 3
( i)
( ii )
( ii i)
f0 f
f1
f2
f3
f4
f6
f5f7
Replication rate constant:
fk = / [ + dS (k)]
dS (k) = dH(Sk,S)
Evaluation of RNA secondary structures yields replication rate constants
The flowreactor as a device for studies of evolution in vitro and in silico
Replication rate constant:
fk = / [ + dS (k)]
dS (k) = dH(Sk,S)
Selection constraint:
# RNA molecules is controlled by the flow
Stock S olution Reaction M ixture
NNtN )(
spaceS equen ce
Con
cent
rati
on
M a s te r s e q u e n c e
M u ta n t c lo u d
“ O ff - th e -c lo u d ” m u ta tio n s
The molecular quasispecies in sequence space
S = ( )I
f S = ( )
S
f
I M
utat
ion
G e n o ty p e-P h e n o ty p e M a p p in g
E va lua t ion o f th e
P hen o ty pe
Q jI1
I2
I3
I4 I5
In
Q
f1
f2
f3
f4 f5
fn
I1
I2
I3
I4
I5
I
In + 1
f1
f2
f3
f4
f5
f
f n + 1
Q
Evolutionary dynamics including molecular phenotypes
In silico optimization in the flow reactor: Trajectory (biologists‘ view)
Tim e (arbitrary units)
Ave
rage
dis
tanc
e fr
om in
itial
str
uctu
re 5
0 -
d
S
500 750 1000 12502500
50
40
30
20
10
0
Evolutionary trajectory
In silico optimization in the flow reactor: Trajectory (physicists‘ view)
Tim e (arbitrary units)
Ave
rage
str
uctu
re d
ista
nce
to ta
rget
d
S
500 750 1000 12502500
50
40
30
20
10
0
Evolutionary trajectory
Movies of optimization trajectories over the AUGC and the GC alphabet
AUGC GC
Statistics of the lengths of trajectories from initial structure to target (AUGC-sequences)
R u n tim e o f tra je c to rie s
Fre
quen
cy
0 10 00 20 00 30 00 40 00 50 000
0 .0 5
0 .1
0 .1 5
0 .2
44A
ve
rag
e s
tru
ctu
re d
ista
nc
e t
o t
arg
et
d
S
Evolutionary trajectory
1250
10
0
44
42
40
38
36Relay steps
Nu
mb
er o
f rela
y s
tep
T ime
Endconformation of optimization
4443A
ve
rag
e s
tru
ctu
re d
ista
nc
e t
o t
arg
et
d
S
Evolutionary trajectory
1250
10
0
44
42
40
38
36Relay steps
Nu
mb
er o
f rela
y s
tep
T ime
Reconstruction of the last step 43 44
444342A
ve
rag
e s
tru
ctu
re d
ista
nc
e t
o t
arg
et
d
S
Evolutionary trajectory
1250
10
0
44
42
40
38
36Relay steps
Nu
mb
er o
f rela
y s
tep
T ime
Reconstruction of last-but-one step 42 43 ( 44)
Reconstruction of step 41 42 ( 43 44)
44434241A
ve
rag
e s
tru
ctu
re d
ista
nc
e t
o t
arg
et
d
S
Evolutionary trajectory
1250
10
0
44
42
40
38
36Relay steps
Nu
mb
er o
f rela
y s
tep
T ime
Reconstruction of step 40 41 ( 42 43 44)
4443424140A
ve
rag
e s
tru
ctu
re d
ista
nc
e t
o t
arg
et
d
S
Evolutionary trajectory
1250
10
0
44
42
40
38
36Relay steps
Nu
mb
er o
f rela
y s
tep
T ime
444342414039
Evolutionary process
Reconstruction
Av
era
ge
str
uc
ture
dis
tan
ce
to
ta
rge
t
dS
Evolutionary trajectory
1250
10
0
44
42
40
38
36Relay steps
Nu
mb
er o
f rela
y s
tep
T ime
Reconstruction of the relay series
Change in RNA sequences during the final five relay steps 39 44
Transition inducing point mutations Neutral point mutations
In silico optimization in the flow reactor: Trajectory and relay steps
Tim e (arbitrary units)
Ave
rage
str
uctu
re d
ista
nce
to ta
rget
d
S
500 750 1000 12502500
50
40
30
20
10
0
Evolutionary trajectory
Relay steps
S kS j
kP
kP
P
Birth-and-death process with immigration
NN-10 1 2 3 4 5 6 7 8 9 10
x
(x) = x + ( -x) N (x) = x
T1,0 T0,1
Tim e t
Par
ticle
nu
mbe
r
(t)
X
0
2
4
6
8
10
12
Calculation of transition probabilities by means of a birth-and-death process with immigration
1008
1214
Tim e (arbitrary units)
Av
era
ge
str
uc
ture
dis
tan
ce
to
ta
rge
t
dS
5002500
20
10
Uninterrupted presence
Evolutionary trajectory
Nu
mb
er o
f relay
ste
p
Neutral genotype evolution during phenotypic stasis
Neutral point mutationsTransition inducing point mutations
28 neutral point mutations during a long quasi-stationary epoch
18
19
20
21
26
28
29 31
A random sequence of minor or continuous transitions in the relay series
Tim e (arbitrary units)
750 1000 1250Av
era
ge
str
uc
ture
dis
tan
ce
to
ta
rget
d
S
30
20
10
Uninterrupted presence
Evolutionary trajectory
35
30
25
20 Nu
mb
er o
f relay
step
A random sequence of minor or continuous transitions in the relay series
18
192527
202224
2123
2630
28
29 31
Tim e (arbitrary units)
750 1000 1250Ave
rag
e st
ruct
ure
dis
tan
ce t
o t
arg
et
dS
30
20
10
Uninterrupted presence
Evolutionary trajectory
35
30
25
20 Nu
mb
er of relay step
A random sequence of minor or continuous transitions in the relay series
100
101
102
103
104
105
Rank
10-6
10-5
10-4
10-3
10-2
10-1
Fre
quen
cy o
f occ
urre
nce
5 '-E nd
3 '-E nd
7 0
6 0
5 0
4 03 0
2 0
1 01 0
25
Probability of occurrence of different structures in the mutational neighborhood of tRNAphe
Rare neighbors
Main transitions
Frequent neighbors
Minor transitions
In silico optimization in the flow reactor: Main transitions
M ain transitionsRelay steps
Tim e (arbitrary units)
Ave
rage
str
uctu
re d
ista
nce
to ta
rget
d
S
500 750 1000 12502500
50
40
30
20
10
0
Evolutionary trajectory
00 09 31 44
Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure corresponding to three main transitions.
Statistics of the numbers of transitions from initial structure to target (AUGC-sequences)
N u m b er o f tra ns ition s
Freq
uenc
y
0 20 40 60 80 1000
0 .05
0 .1
0 .15
0 .2
0 .25
0 .3
A ll tran sitio n s
M ain tra n sit ion s
Alphabet Runtime Transitions Main transitions No. of runs
AUGC 385.6 22.5 12.6 1017 GUC 448.9 30.5 16.5 611 GC 2188.3 40.0 20.6 107
Statistics of trajectories and relay series (mean values of log-normal distributions)
S 1(j)
S k(j)
S 2(j)
S 3(j)
S m( j )
k
k
k k
k
P
P
P P
P
P
Transition probabilities determining the presence of phenotype Sk(j) in the population
S 1(j)
S k(j)
S 2(j)
S 3(j)
S m( j )
k
k
k k
k
P
P
P P
P
P
N = sa t(j)
p . . < >l (j)
1
Statistics of evolutionary trajectories
Population size
N
Number of replications
< n >rep
Number of transitions
< n >tr
Number of main transitions
< n >dtr
The number of main transitions or evolutionary innovations is constant.
1008
1214
Tim e (arbitrary units)
Av
era
ge
str
uc
ture
dis
tan
ce
to
ta
rge
t
dS
5002500
20
10
Uninterrupted presence
Evolutionary trajectory
Nu
mb
er o
f relay
ste
p
Neutral genotype evolution during phenotypic stasis
Neutral point mutationsTransition inducing point mutations
28 neutral point mutations during a long quasi-stationary epoch
Variation in genotype space during optimization of phenotypes
Mean Hamming distance within the population and drift velocity of the population center in sequence space.
Spread of population in sequence space during a quasistationary epoch: t = 150
Spread of population in sequence space during a quasistationary epoch: t = 170
Spread of population in sequence space during a quasistationary epoch: t = 200
Spread of population in sequence space during a quasistationary epoch: t = 350
Spread of population in sequence space during a quasistationary epoch: t = 500
Spread of population in sequence space during a quasistationary epoch: t = 650
Spread of population in sequence space during a quasistationary epoch: t = 820
Spread of population in sequence space during a quasistationary epoch: t = 825
Spread of population in sequence space during a quasistationary epoch: t = 830
Spread of population in sequence space during a quasistationary epoch: t = 835
Spread of population in sequence space during a quasistationary epoch: t = 840
Spread of population in sequence space during a quasistationary epoch: t = 845
Spread of population in sequence space during a quasistationary epoch: t = 850
Spread of population in sequence space during a quasistationary epoch: t = 855
Sk I. = ( )fk f S k = ( )
S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs
Mapping from sequence space into structure space and into function
Sk I. = ( )fk f S k = ( )
S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs
Sk I. = ( )fk f S k = ( )
S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs
The pre-image of the structure Sk in sequence space is the neutral network Gk
Neutral networks are sets of sequences forming the same structure.
Gk is the pre-image of the structure Sk in sequence space:
Gk = -1(Sk) {Ij | (Ij) = Sk}
The set is converted into a graph by connecting all sequences of Hamming distance one.
Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA
sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations.
Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Mean degree of neutrality and connectivity of neutral networks
j = 2 7 = 0 .4 4 4 ,/1 2 k = (k )j
| |G k
c r = 1 - -1 ( 1 )/ -
k c r . . . .>
k c r . . . .<
n e tw o rk is c o n n e c te dG k
n e tw o rk is c o n n e c te dn o tG k
C o n n e c tiv i ty th re sh o ld :
A lp h a b e t s iz e : = 4 AUG C
G S Sk k k= ( ) | ( ) = -1 I Ij j
c r
2 0 .5
3 0 .4 2 3
4 0 .3 7 0
GC,AU
GUC,AUG
AUGC
A connected neutral network
G ian t C om po n en t
A multi-component neutral network
5 '-E n d5 '-E n d 5 '-E n d5 '-E n d
3 '-E n d3 '-E n d 3 '-E n d3 '-E n d
7070 7070
60606060
5050 5050
4040 4040 3030 3030
2020 20
20
1010 1010
A lp h a b e t D eg ree o f n eu tra lity
AU
AUG
AUGC
UGC
GC
- -
- -
0 .2 7 5 0 .0 6 4
0 .2 6 3 0 .0 7 1
0 .0 5 2 0 .0 3 3
- -
0 .2 1 7 0 .0 5 1
0 .2 7 9 0 .0 6 3
0 .2 5 7 0 .0 7 0
0 .0 5 7 0 .0 3 4
0 .0 7 3 0 .0 3 2
0 .2 0 1 0 .0 5 6
0 .3 1 3 0 .0 5 8
0 .2 5 0 0 .0 6 4
0 .0 6 8 0 .0 3 4
Degree of neutrality of cloverleaf RNA secondary structures over different alphabets
Reference for postulation and in silico verification of neutral networks
Stru ctu re
CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG
C o m p a tib le seq u en ceStru ctu re
5 ’-en d
3 ’-en d
CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG
G
G
G
G
G
G
G
C
C
C
C G
G
G
G
C
C
C
C
C
C
C
U A
U
U
G
U
AA
A
A
U
C o m p a tib le seq u en ceStru ctu re
5 ’-en d
3 ’-en d
CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG
G
G
C
C
C
C G
G
G
G
C
C
G
G
G
G
G
C
C
C
C
C
U A
U
U
G
U
AA
A
A
U
C o m p a tib le seq u en ceStru ctu re
5 ’-en d
3 ’-en d
B a se p a ir s : AU , UAG C , CGG U , UG
S in g le n u c le o tid e s : A U G C, , ,
The compatible set Ck of a structure Sk consists of all sequences which form
Sk as its minimum free energy structure (the neutral network Gk) or one of its
suboptimal structures.
G kN e u tra l N e tw o rk
S truc tu re S k
G k C k
C o m p a tib le S e t C k
The intersection of two compatible sets is always non empty: C0 C1
S tru c tu re S 0
S tru c tu re S 1
Reference for the definition of the intersection and the proof of the intersection theorem
A sequence at the intersection of two neutral networks is compatible with both structures
CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG
3 ’ -e n d
Min
imum
fre
e en
ergy
con
form
atio
n S
0
Sub
opti
mal
con
form
atio
n S 1
G
G
G
G
G
G
G
G
G
G
G
GC
C
C
C
U
U
U
U
C
C
C
C
C
C
U
A
A
A
A
A
C
G
G
G
G
G
G
C
C
C
C
U
U
G
G
G
G
G
C
C
C
C
C
C
C
U
U
AA
A
A
A
U
G
5.10
5.9
02
8
1415
1817
23
19
2722
38
45
25
3633
3940
4341
3.30
7.40
5
3
7
4
109
6
1312
3.1
0
11
21
20
16
2829
26
3032
424644
24
353437
49
31
4748
S 0S 1
b asin '1 '
lo n g liv in gm e tastab le s tru c tu re
b as in '0 '
m in im u m free en erg y s tru c tu re
Barrier tree for two long living structures
A ribozyme switch
E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis--virus (B)
The sequence at the intersection:
An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
Examples of smooth landscapes on Earth
Massif Central
Mount Fuji
Examples of rugged landscapes on Earth
Dolomites
Bryce Canyon
G en o ty p e S p ac e
Fitn
ess
Sta r t o f W a lk
E n d o f W a lk
Evolutionary optimization in absence of neutral paths in sequence space
Evolutionary optimization including neutral paths in sequence space
G e n o ty p e S p a ce
Fitn
ess
Sta r t o f W a lk
E n d o f W a lk
R an d o m D rif t P e rio d s
A d ap tiv e P erio d s
Example of a landscape on Earth with ‘neutral’ ridges and plateaus
Grand Canyon
Neutral ridges and plateus
Top Related