Princeton University
IMA, Minneapolis, January 2008
I.G. Kevrekidis -C. W. Gear, G. Hummer, R. Coifman and several other good people-
Department of Chemical Engineering, PACM & MathematicsPrinceton University, Princeton, NJ 08544
Computational Experimentsin coarse graining atomistic simulations
(Equation-free -& Variable Free- FrameworkFor Complex/Multiscale Systems)
Princeton University
SIAM– July, 2004
Clustering and stirring in a plankton model
Young, Roberts and Stuhne, Nature 2001
Princeton University
Dynamics of System with convection
Princeton University
Simulation Method
• Random (equal) birth and death, probability: = .
• Brownian motion.
• Advective stirring. are random phases)
• IC: 20000 particles randomly placed in 1*1 box
• Analytical Equation for G(r):
)]()('cos[2)(')(
)]()('cos[2)(')(
ttkxUtyty
ttkyUtxtx
kkk
kkk
Dxtxxx kkkk 2);(' 2
)(2)(1
)(2)(1
2 3 r CGrr
GrGr
DG rrrrt
Princeton University
Stirring by a random field (color = y)
Princeton University
Dynamics of System with convection
Princeton University
Princeton University
Projective Integration: From t=2,3,4,5 to 10
Princeton University
RESTRICTION - a many-one mapping from a high-dimensional description (such as a collection of particles in Monte Carlo simulations) to a low-dimensional description - such as a finite element approximation to a distribution of the particles.
LIFTING - a one-many mapping from low- to high-dimensional descriptions.
We do the step-by-step simulation in the high-dimensional description.
We do the macroscopic tasks in the low-dimensional description.
Princeton University
So, the main points:
• You have a “microscopic code”• Somebody tells you what are good coarse variable(s)• Somebody tells you what KIND of equation this variable
satisfies (deterministic, stochastic…) but NOT what the equation looks like.
• Then you can use the IDEA that such an equation exists and closes to accelerate the simulation/ extraction of information.
• KEY POINT – make micro states consistent with macro states
Princeton University
Application to Micelle Formation
• Hydrophobic tail (T)
• Hydrophilic head (H)
Princeton University
• Surfactant = chain of H and T beads (H4T4)
• No explicit solvent
• Hydrophobic-hydrophilic interactions modeled as direct interaction between H and T beads
• Pairwise interactions with nearest sites:
Lattice Model (Larson et al., 1985)
ji
ijE
2 ,0 TTHTHH
Princeton University
Snapshot of Micellar System
T = 7.0, µ = - 47.40
Princeton University
Kinetic Approach to Rare Events(Hummer and Kevrekidis, JCP 118, 10762 (2003))
• Evolution of coarse variables is slow– micelle birth/death are rare events
• Reconstruction of free energy surface:– long equilibrium simulation– series of short nonequilibrium simulations
Princeton University
Reconstruction of Free Energy Surface
)(1
Gdt
d
Tk
dt
d B2)var(
)()( tFG
Obtain G() and () from short-scale nonequilibrium simulations
Princeton University
Reconstructed free energy curve
Princeton University
Results: Micelle Destruction Rate
Kramers’ formula
TkG BeminGsaddleGsaddle
k /)()()(2
1
• Result of nonequilibrium simulation: k = 5.58 x 10-9
• Equilibrium result: k = 7.70 x 10-9
• CPU time required: less than 7% of equilibrium simulation
• Extension to multi-dimensional systems (Hummer and Kevrekidis, 2003)
– Chapman-Kolmogorov equation
Princeton University
Reverse Projective Integration – a sequence of outer integration steps backward; based on forward steps + estimation
We are studying the accuracy and stability of these methods
12
3
Reverse Integration: a little forward, and then a lot backward !
Princeton University
Reverse coarse integration from both sides
Princeton University
Reconstructed free energy curve
Princeton University
Details of Multidimensional Dynamics
Small clusters: 2d dynamics Larger clusters: 1d dynamics
Princeton University
Multidimensional Dynamics (2nd variable = E)
Princeton University
Alanine DipeptideIn 700 tip3p waters
w/ Gerhard Hummer, NIDDK / J.Chem.Phys. 03
The waters The dipeptide and the Ramachandran plot
Princeton University
G(ψ)
<ψ>ψ0
ψ0ψ1
var<ψ>
-180 180
)ψD((t)(dt
d
ψ
)ψG(
Tk
)ψD(
dt
ψd
B
2)var
0
3
6
1. Start with constrained MD2. Let 50 configurations free3. Estimate d/dt of average4. Perform projective step
Princeton University
Alanine Dipeptide Energy Landscape
G.Hummer, I.G.Kevrekidis.Coarse molecular dynamics of a peptide fragment:
Free energy, kinetics, and long-time dynamics computations
J.Chem. Phys. 118 (2003).
Princeton University
Princeton University
Protocols for Coarse MD (CMD) using Reverse Ring Integration
Step RingBACKWARD
in Energy
MD Simulationsrun FORWARD
in Time
Initialize ring nodes
Princeton University
Fokker-Planck Equation (2D)for distribution P(x1,x2)
2 2
1 1 2 11 1 2 12 1 21 1 1 1 2
2 2
2 1 2 21 1 2 22 1 22 2 1 2 2
, , ,
, , ,
Pv x x D x x D x x
t x x x x x
v x x D x x D x xx x x x x
1 2 1 2
1 2
,P x x S S
t x x
2D FPE:
Probability Currents
1 21 2
i i i iS v P D P D Px x
Drift Coefficient Diffusion Coefficients
Princeton University
Drift and Diffusion Coefficients
MD Simulations run FORWARD in Time
Ring node ICs
multiple replicas per nodecompute drift (v) and
diffusion (D) coefficients
use (v, D) estimates to check for existence of potential ()
and compute potential values at each node
Dihedral angles
P
Princeton University
2x
1x
Ring at+
Ring at
Ring Stepping in Generalized Potential
Goal: Compute potential associatedwith ring nodes (ring “height” above x1-x2 plane)
Princeton University
Right-handed -helical minimum
Princeton University
Potential Conditions I
Case I: Diffusion matrix is proportional to unit matrix: ij ijD D ( D = scalar)
1
lni iS P v D Px
Probability Current:
Probability Current vanishes
1 1
lniv D P Dx x
Conditions for existence of generalized potential : 1 2
2 1
v v
x x
1 2,b bx x
1
1
2
2
11 1 2 11
12 1 2 2
, d
, d
b
a
b
a
xa
x
xb
x
D v x x x
D v x x x
1 2,a ax x
2x
1x
1 2,b ax x
Path
Path
Drift Coefficient
Potential Conditions
Princeton University
CMD Exploration of Alanine Dipeptideusing drift coefficients
Reverse ring integration stagnates at saddle points
on coarse free energy landscape
Ring nodes
Short bursts of MD simulation initialized at each node in the ring
Data analysis of forward in time MD provides gradient informationsmall step FORWARD in time
large step BACKWARD in energy
ONLY drifts estimated herering step size is scaled
by unknown diffusivity D
Dstepsize ~ nodal “heights” unknown
Princeton University
Src homology 3 (SH3) domain
Small Modular Domain 55-75 amino acids long
Characteristic fold consisting of five or six β-strands arranged as two tightly packed anti-parallel β sheets
blue (N-terminus) to red (C-terminus)
Distal Loop
n-Src
DivergingTurn
Protein modeled using off-lattice C representation associated with simplified
minimally frustrated Hamiltonian
P. Das, S. Matysiak, and C. Clementi, Proc. Natl. Acad. Sci. USA 102, 10141 (2005).
N-terminus
C-terminusFraction native contacts formed
Fra
ctio
n n
on-n
ativ
e co
ntac
ts f
orm
ed
FOLDED
UNFOLDED
TRANSITIONSTATE
Protein folding intrinsically low-dimensional
“Collective” (coarse, slow) coordinatesfully describe long time system dynamics
Princeton University
Reverse Ring Integration and MDCoarse MD (CMD)
Step RingBACKWARD
in time
MD Simulationsrun FORWARD
in Time
Initialize ring nodes
Protein conformations “live”in high-dimensional space
described by Cartesian coordinates
Free energy landscape exploration using coarse reverse integration “backward-in-time” initialized near base of wells
reconstructed folding free energy surface of SH3
fraction of native contacts formed
frac
tion
of n
on-n
ativ
e co
ntac
ts f
orm
ed
0.05 ps MDforward
0.2 psbackwardtime step
Transition state identificationusing CMD
Princeton University
So, again, the main points
• Somebody needs to tell you what the coarse variables are
• And what TYPE of equation they satisfy
• And then you can use this information
to bias the simulations “intelligently”
accelerating the extraction of information
(also need hypothesis testing).
Princeton University
and now for something completely different: Little stars ! (well…. think fishes)
Princeton University
Princeton University
Fish Schooling Models
tstvtc iii , ,
Initial State
Compute Desired Direction
ij ij
iji
tctc
tctcttd
1j j
j
ij ij
iji
tv
tv
tctc
tctcttd
ii
iii
gttd
gttdttd
ˆ
ˆ'
ttdi
Update Direction for Informed Individuals ONLY ttdi '
Zone of Deflection Rij< Zone of Attraction Rij<
Normalize ttdi ˆ
INFORMEDUNINFORMED
Update Positions
tsttvtcttc iiii
Position, Direction, Speed
Couzin, Krause, Franks & Levin (2005)
Nature (433) 513
Princeton University
STUCK
~ typically aroundrxn coordinate
value of about 0.5
INFORMED DIRN
STICK STATES
INFORMED individualclose to front of group
away from centroid
Princeton University
SLIP
~ wider range ofrxn coordinate values
for slip 00.35
INFORMED DIRN
SLIP STATES
INFORMED individualclose to group centroid
Princeton University
Effective Fokker-Planck Equation
2
2
,,
P r tv r D r P r t
t r r
DriftCoefficient
DiffusionCoefficient
t
rtrrD
t
rtrrv
0
20 ,
2
1 ;
,
Constln'
'
'
0
rDdrrD
rv
Tk
r R
B
FPE:
Potential:
Princeton University
Coarse Free Energy Calculation
t
XtXXv
0,
t
XtXXD
0
2 ,
2
1
Estimate Drift and Diffusion coefficients numerically from simulation “bursts”
Kln'
'
'
0
XDdXXD
Xv
Tk
X R
B
X0
X0X0
Kopelevich, Panagiotopoulos & KevrekidisJ Chem Phys 122 (2005)
Hummer & KevrekidisJ Chem Phys 118 (2003)
Improved estimates using Maximum Likelihood Estimation (MLE)
Y. Aït-Sahalia.Maximum Likelihood Estimation of Discretely Sampled Diffusions:
A Close-Form Approximation Approach.Econometrica 70 (2002).
Princeton UniversitySTUCK
SLIPSLIP
Energy Landscape – Fish Swarming Problem
CENTROIDInformed DIRN
Princeton University
Using the computer to select variable
Rationale:
Lake Carnegie, Princeton, NJ
Straight Line Distance
between locations
NOT representative of actual transition
difficulty\distance
Princeton University
Using the computer to select variable
Rationale:
Lake Carnegie, Princeton, NJ
Straight Line Distance
Actual transition difficulty represented
by curved path
Curved Transition Distance
Princeton University
Using the computer to select good variables
Rationale:
Lake Carnegie, Princeton, NJ
Straight Line Distance IS representative of
actual transition difficulty\distance
in small LOCAL patches
Patch size related to problem “geography”
Princeton University
Using the computer to select variable
Rationale:
Lake Carnegie, Princeton, NJLake Carnegie, Princeton, NJ
Euclidean Distance
Selected Datapoint
XY
Z3D Dataset with 2D manifold
Euclidean distance in input spacemay be weak indicator
of INTRINSIC similarity of datapoints
Geodesic distance is good for this dataset
Princeton University
1
2
3
W12
W23
vertices
edges
weights
2
expi j
ijWt
x x
parameter t
Dataset as Weighted Graph
Princeton University
Multiple random walks through simulation data
initialized at
Unequal separation (Euclidean distance)
between IC ( ) and limits of random walk ( , )
Princeton University
Parameter Local neighborhood size
Compute NN “neighborhood” matrix K
2
, exp
i j
i jK
x x=
{ }ix
Compute diagonal normalization matrix D
, ,1
N
i i i jj
D K=
1M D KCompute Markovian matrix M
N datapoints
1 1 1M
2 2 2M
N N NM
Require: Eigenvalues λ and Eigenvectors Φ of M
1 2 3 N
A few Eigenvalues\Eigenvectors provide meaningful information on dataset geometry
Top
2nd
Nth
Princeton University
Dataset in x, y, z Dataset Diffusion Map
N datapoints N datapoints
eigencomputation
, , , 1,ii i ix y z i N x 2 3, , 1,i i i i N
Diffusion Maps
R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker,Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps.PNAS 102 (2005).
B. Nadler, S. Lafon, R. Coifman, and I. G. Kevrekidis,Diffusion maps, spectral clustering and reaction coordinates
of dynamical systems.Appl. Comput. Harmon. Anal. 21 (2006).
Princeton University
Diffusion Map (2, 3)
2
3
2
ABSOLUTE Coordinates SIGNED Coordinates
Report absolute distanceof all uninformed individuals
to informed individual to DMAP routine
Report (signed) distanceof all uninformed individuals
to informed individual to DMAP routine
STICK
SLIPSTICK
SLIP
Reaction Coordinate
Princeton University
MAN
MA
CH
INE
MAN
MA
CH
INE
ABSOLUTE Coordinates SIGNED Coordinates
Princeton University
So, again, the same simple theme
• If there is some reason to believe that there exist slow, effective dynamics in some smart collective variables
• Then this can be used to accelerate some features of the computation
• Tools for data-based detection of coarse variables…
• MAIN DIFFICULTY: finding physical initial conditions consistent with desired coarse initial conditions
Top Related