c 2007 by Jordi Cohen. All rights reserved.
Transcript of c 2007 by Jordi Cohen. All rights reserved.
c© 2007 by Jordi Cohen. All rights reserved.
GAS MIGRATION INSIDE PROTEINS: MECHANISM, CHARACTERIZATION, ANDAPPLICATIONS
BY
JORDI COHEN
B.Sc., McGill University, 1998M.Sc., Simon Fraser University, 2001
DISSERTATION
Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Physics
in the Graduate College of theUniversity of Illinois at Urbana-Champaign, 2007
Urbana, Illinois
Abstract
Gas migration inside proteins is a little-studied yet very important topic for many classes of proteins
such as globins, oxygenases, and oxidases, which store oxygen gas or use it for enzymatic purposes.
One reason why this process has not received prominent attention in recent years was because of
difficulties in identifying the pathways taken by oxygen or other gases diffusing inside proteins.
The reason for this difficulty is that, unlike typical ligand channels, gas pathways are not visible in
a protein’s static structure. This thesis rectifies these difficulties, by addressing many of the issues
important for finding, understanding, and manipulating gas migration pathways inside proteins.
First, it is found and convincingly demonstrated, through the use of a molecular dynamics
methodology called locally-enhanced sampling and a novel volumetric oxygen accessibility map
method, applied to the hydrogenase enzyme, that gas molecules make their way not through static
channels, but through well-defined “pathways”, which are completely defined by the details of a
protein’s thermal motion. This work is then followed up with the development of a new method,
called implicit ligand sampling, which allows for the first time to completely identify and energet-
ically characterize every oxygen pathway inside any protein of known structure merely from the
protein’s equilibrium dynamics. The protein dynamics, in this case, are collected through 10 ns-long
molecular dynamics simulations in the absence of internal gas ligands. Implicit ligand sampling is
then applied to and validated on the well-studied myoglobin oxygen-storage protein.
Finally, if one is to engineer oxygen pathways inside proteins, it is not enough to simply know
where such pathways are located, it is also important to understand how these pathways are
correlated with protein structure. For this reason, oxygen pathways were computed for a large
number of proteins from both the globin and copper-containing amine oxidase protein families.
It is found, surprisingly, that the locations of oxygen pathways are not conserevd within protein
families, and do not correlate at all with the proteins’ tertiary folds. However, a statistically-
iii
significant correlation was found between the proximity of certain residue types and protein oxygen
accessibility.
iv
Acknowledgments
First of all, particular thanks go to my adviser, Klaus Schulten, for introducing me to the field,
showing me the way, for his exceptional help and support throughout, and also for showing me
that physics can be used to meaningfully improve the world. None of this work would have existed
without my collaborators. Paul King introduced me to hydrogenase, which was the motivation
behind the many pages ahead. Paul King, Kwiseon Kim, Chris Chang, and Maria Ghirardi all
have inspired this work in too many ways to enumerate, have been remarkably welcoming hosts
during my visits to Colorado. Further thanks go to my collaborators Ken Olsen, James Knapp, Bill
Royer, Michael Seibert, Carrie Wilmot, and Bryan Johnson for showing me new directions, giving
me ideas, and motivating me, to John Stone for tolerating my hacking into the TCBG software,
and to Emad Tajkhorshid.
I also wish to especially thank the members of the TCBG group which have made my stay in the
in the middle of the cornfields particularly enjoyable. Elizabeth Villa has been an amazing friend,
the kind that doesn’t exist and is simply always there. Alek Aksimentiev, and his wife Angela
Peregud, have been a second family to me. Emma Falck, for being such an enjoyable person, and
for going out of her way for me many times. Yi Wang for keeping me nourished with chocolate
at all times, and for being a great office mate. Barry Isralewitz for reminding me everyday that
humor and wit rule the world, no matter what. Rosemary Braun, for being the kind and caring
model human being I try to be, and for all the delicious croissants. Finally, I cannot forget Justin
Gullingsrud, Leo Trabuco, Eric Lee, JC Gumbart, Mu Gao, Markus Dittrich, Amy Shih, Eduardo
Cruz-Chu, Alexander Balaef, Tim Isgro, Marcos Sotomayor, and Fatemeh Khalili-Araghi.
I also want to particularly thank Miriam Wodrich, for her wide open heart, for coming straight
out of a fairy tale, and for making my life a joy. And I definitely want to express my immense
appreciation and thanks to all the friends I have made in Illinois, to all the great friends abroad
v
that have kept me going, and to my family for their patience and support.
This work was supported by the National Institutes of Health grants PHS-5-P41-RR05969 and
1R01GM60946-01, the National Science Foundation grant SCI04-38712, and by the Department of
Energy’s Hydrogen Fuel Cells and Infrastructure Technologies Program. Supercomputer time was
provided by the National Center for Supercomputing Applications and the Pittsburgh Supercom-
puting Center via the National Resources Allocation Committee grant MCA93S028.
vi
Table of Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2 Mechanism of gas migration inside [FeFe]-hydrogenase . . . . . . . . 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 3 Imaging the migration pathways for gas ligands inside myoglobin . . 263.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Chapter 4 Effects of protein architecture and sequence on gas migration path-ways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Chapter 5 Conclusion and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Appendix A Mechanism of anionic conduction across ClC chloride channels . . 75A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vii
List of Tables
2.1 Proportion of hH2 exits by pathway. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Xenon Binding Site Free Energies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2 Ligand Solvation Energies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 List of penta-coordinated monomeric globins. . . . . . . . . . . . . . . . . . . . . . . 60
viii
List of Figures
1.1 The H2 production reaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Designing an O2-tolerant hydrogenase. . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Structure of CpI hydrogenase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Simulations of hH2 diffusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Simulations of O2 diffusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 Network of internal gas pathways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Comparison of TC-LES- and VSAM-predicted gas pathways. . . . . . . . . . . . . . 232.6 Comparison of average and maximum VSAMs. . . . . . . . . . . . . . . . . . . . . . 24
3.1 Probability of occurrence of the protein states. . . . . . . . . . . . . . . . . . . . . . 333.2 Predicted and actual Xe binding sites for sperm whale Mb. . . . . . . . . . . . . . . 393.3 Implicit ligand PMF for CO inside sperm whale Mb . . . . . . . . . . . . . . . . . . 413.4 Amino acids whose substitutions affect O2 or CO migration. . . . . . . . . . . . . . . 443.5 PMF profiles experienced by ligands exiting Mb. . . . . . . . . . . . . . . . . . . . . 473.6 Comparison of the implicit ligand PMF maps in Mbs of different species. . . . . . . 52
4.1 O2 PMF maps for various monomeric globins. . . . . . . . . . . . . . . . . . . . . . . 624.2 Aligned structure of 10 monomeric globins. . . . . . . . . . . . . . . . . . . . . . . . 644.3 Comparisons of O2 PMF surfaces for similar monomeric globins. . . . . . . . . . . . 654.4 Residue types favoring O2 pathway formation. . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Gas pathways and barriers in AQP1 aquaporin. . . . . . . . . . . . . . . . . . . . . . 735.2 Central pore gas diffusion in AQP1 aquaporin. . . . . . . . . . . . . . . . . . . . . . 74
A.1 Membrane view of the ClC dimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.2 Top and side view of the ClC simulation system. . . . . . . . . . . . . . . . . . . . . 79A.3 Superposition of the wild-type and mutant structures. . . . . . . . . . . . . . . . . . 81A.4 Potential of mean force for Cl− conduction. . . . . . . . . . . . . . . . . . . . . . . . 86A.5 The ClC pore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87A.6 Sequence of events during conduction. . . . . . . . . . . . . . . . . . . . . . . . . . . 89A.7 PMF profiles for Cl− conduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90A.8 Interaction energy of a Cl− with the ClC pore. . . . . . . . . . . . . . . . . . . . . . 92A.9 Water double-file inside ClC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
ix
List of Abbreviations
ClC family of chloride channels/transporters.
CpI name of hydrogenase I from Clostridium pasteurianum.
deoxyMb/deoxyHb deoxy-myoglobin (heme is unliganded, often implies that a water molecule
lies in the myoglobin dital pocket).
DP distal pocket (cavity in Mb containing the heme binding site).
FEP free-energy perturbation.
Hb hemoglobin.
hH2 “heavy” hydrogen gas (with an atomic mass identical to that of oxygen gas).
Lb leghemoglobin.
LES locally-enhanced sampling.
Mb myoglobin.
MD molecular dynamics.
metMb metmyoglobin (heme iron is in the unreactive ferric state).
oxyMb/oxyHb oxy-myoglobin (heme is bound to O2).
PDB protein databank (database and/or file format).
PMF potential of mean force.
POPE palmitoyl-oleoyl-phosphoethanolamine (common cell membrane lipid).
x
TC-LES temperature-controlled locally-enhanced sampling.
VSAM volumetric solvent accessibility map.
xi
Chapter 1
Introduction
The hydrogenase problem
With the world’s oil reserves dwindling, the development of viable alternative energy fuels is be-
coming an urgent priority. Hydrogen gas (H2), which boasts zero pollution and can be produced
renewably, is a promising alternative to gasoline. One method of producing H2 which is currently
under development is by means of the unicellular green algae Chlamydomonas reinhardtii. Because
Chlamydomonas has the natural ability to couple photosynthetic water oxidation to the generation
of H2 through the [FeFe]-hydrogenase enzyme [113, 162] (see Fig. 1.1), it could be used to produce
H2 commercially [15, 65, 103]. Such a means of H2 production would be affordable and efficient,
requiring only water and sunlight, with up to 10% of incident sunlight energy being converted.
2e-
2H+
H2
H-cluster
C
M
Y
CM
MY
CY
CMY
K
coll_hydrogenase_fig1.pdf 8/3/06 5:34:08 PMcoll_hydrogenase_fig1.pdf 8/3/06 5:34:08 PM
Figure 1.1: The H2 production reaction. The hydrogenase enzyme (shown in ribbons) generatesH2 from hydrogen ions (H+) and electrons (e−) produced from photosynthetic water oxidation. O2
inactivates this reaction by binding to the H-cluster’s active site irreversibly.
1
While [FeFe]-hydrogenase has much potential as a source of H2, it unfortunately does not
operate under high concentrations of oxygen gas (O2), such as exists in ambient air. In fact, a
single O2 molecule can irreversibly bind to hydrogenase’s buried active site, inhibiting its activity,
and leading to the enzyme’s degradation. The inability of hydrogenase to work under aerobic
conditions prevents it from being a cost-effective means of producing H2. If one could engineer
hydrogenase by mutation such as to prevent O2 molecules from accessing its active site, it would
make the enzyme more tolerant to O2 and increase its usefulness as a means of producing H2. The
ideal goal for this, depicted in Fig. 1.2, would have an ideal set of hydrogenase mutations which
would block the passage of O2 to the active site while still allowing the H2 product to escape.
Figure 1.2: Designing an O2-tolerant hydrogenase. Hydrogenase is easily deactivated by O2
molecules that reach its active site by permeating across the protein matrix. Given that H2,due to its smaller size, is possibly less restricted in its motion than O2, it may be feasible to blockO2 by the selective mutation of O2-pathway-lining residues (O2-blocking mutations schematicallydepicted as large red X’s), while still allowing the H2 product to exit the protein.
The gas migration rates for O2 and H2 inside hydrogenase can, of course, be affected by point
mutations of hydrogenase. Recent studies showing that it is possible to completely block O2 passage
across cytochrome c oxidase [139] as well as to make a [NiFe]-hydrogenase accessible to O2 that was
previously O2-proof [25], make an encouraging case that gas accessibility of proteins can indeed
not only be affected, but also dramatically so, by minor mutational protein modifications to the
2
protein core. That this goal is achievable, at least in principle, is a foregone conclusion. The
question of how to reach this goal is, on the other hand, currently unaddressed and unanswered by
contemporary science.
Much of the work in this thesis was inspired by the desire to create a mutant of hydrogenase
which would, put in not-so-humble terms, solve the world’s energy problems. When work on this
research began, the realization dawned that very little was known about how to proceed with
such an endeavor. Not only were there no existing guidelines on how to reliably alter protein
function through targeted mutations, but on top of that, even the basic mechanisms of how, and
even through where, the O2 molecules could reach the active site inside any protein, was not well
known. After all, gas molecules are small and do not need large dedicated channels inside proteins
to reach their target. As a result, gas channels cannot be “seen” by merely looking at a protein’s
structure, further compounding the problem. As such, the principal goal of this thesis is to build
a foundation of knowledge regarding gas migration inside proteins. The questions that we will
focus on are: By what mechanism do gas molecules migrate inside proteins to reach specific targets
inside them? Can the location of such pathways be predicted and their features characterized, and
how? Finally, how would one choose protein mutations that would alter the behavior of known
gas migration pathways to reach pre-determined goals (such as sealing off the pathways or creating
new pathways). The following chapters aim to shed light on these questions.
Overview
While the chapters of this thesis are not all devoted to the hydrogenase enzyme (which is still the
main inspiration for this work), they are all concerned with one overlying endeavor: to understand
how gas migration pathways are formed, evolved, and used in proteins. Such knowledge is very
important if one is to follow a targeted approach to hydrogenase engineering rather than relying
solely on trial and error.
Chapter 2 solves the long-standing problem of how gas ligands migrate inside proteins, and
introduce a method for locating the pathways taken by gases migrating inside proteins. A [FeFe]-
hydrogenase enzyme from the bacterium Clostridium pasteurianum is used as an example. It is
found that while hydrogenase does not possess any static channels for the passage of O2 and H2, it
3
instead allows for gas ligands to migrate by means of transiently-forming cavities that appear inside
the protein due to thermal fluctuations. The locations of these temporary cavities become connected
over time and allow gas ligands to travel inside the protein along pre-determined pathways. Given
the ephemeral nature of these “pathways”, and given the fact that they are often not seen in static
crystal structure, we will avoid using the terms “channel” and “tunnel” often used in the literature
to designate them, since the imagery associated with such concepts is no longer accurate in light
of our discovery. It is also found that O2 uses only a subset of the pathways used by H2, such that
it would be possible, in principle, to block access to O2 inside hydrogenase while still allowing H2
to escape, as proposed earlier.
The hydrogenase gas migration study laid the groundwork for understanding gas migration
inside proteins, and revealed the importance of fluctuating cavities and protein dynamics for the
migration of gases inside proteins. One deficiency of the methodologies introduced in Chapter 2 was
that, while all the O2 pathways can be located, information about the energy barriers along these
pathways was not made available. Chapter 3 addresses this issue by introducing the implicit ligand
sampling method, which is closely related to the free energy perturbation method. Implicit ligand
sampling can compute the free energy of placing a gas ligand everywhere inside a protein with
relatively very little computation (over 106 times faster than using traditional methods, effectively
making this computation even possible). The end result is a complete map of all the energy
barriers for gas permeation inside the protein. The muscle O2 storage protein myoglobin, rather
than hydrogenase, was used as a test case because its kinetic properties have been extensively
characterized experimentally.
By this point, much has been discovered about how to identify and characterize gas pathways
inside proteins. However, the ultimate goal of this research is the engineering of hydrogenase, and
a mutational strategy that will solve this problem is still missing. Rather than take a haphazard
approach and test random mutations along the discovered hydrogenase O2 pathways, Chapter 4
investigate how such pathways are formed in naturally-occurring proteins, and how they relate to
the protein’s structure and its amino acid sequence. In order to do this, implicit ligand sampling
maps are computed for 16 proteins from the globin and copper-containing amine oxidase super-
families, such that the pathways can be compared across a range of protein with similar structures.
4
To our surprise, and against conventional wisdom, the discovered pathways did not correlate with
the proteins’ architectures at all. However, a very definite and reproducible correlation between
specific residue types and O2 channels is found, which can be used as a guideline to alter or engineer
oxygen pathways in proteins such as hydrogenase.
While this thesis focused on oxygen pathways mostly in hydrogenase and globin proteins, the
implicit ligand sampling method has a much broader applicability. Chapter 5 concludes the thesis
by providing a brief overview of various applications of this work over past years.
Finally, Appendix A details the conduction mechanism of Cl− across ClC – a membrane span-
ning Cl− transporter. While, in principle, there should be strong similarities between hydrogenase
and ClC, notably, in both cases a small ligand must make its way across the protein, in practice
these two problems must be approached very differently. Unlike weakly-interacting O2 migrating
inside proteins, Cl− ions interact with the protein very strongly and distort the protein while per-
meating across it. Furthermore, being simultaneously selective to Cl− while making sure not to
bind it too tightly such that it does not stay long in the protein presents a challenge that ClC
overcomes in ingenious ways. The methodologies used in the Appendix to answer these questions
pose an interesting contrast with those used to study gas migration.
5
Chapter 2
Mechanism of gas migration inside[FeFe]-hydrogenase
Here, we report on a computational investigation of the passive transport of H2 and O2 between the
external solution and the hydrogen-producing active site of the CpI [FeFe]-hydrogenase structure
from Clostridium pasteurianum. Two distinct methodologies for studying gas access are discussed
and applied to the case of CpI: (1) temperature-controlled locally enhanced sampling, and (2)
volumetric solvent accessibility maps, providing consistent results. Both methodologies confirm
the existence and function of a previously hypothesized pathway and reveal a second major gas
pathway which had not been detected by previous analyses of CpI’s static crystal structure. We
describe two completely different modes of intra-protein transport for H2 and O2, which in our
model are differentiated only by their size. We present strong evidence that supports the hypothesis
that small hydrophobic molecules diffusing inside CpI, such as H2 and O2, take advantage of pre-
existing packing defects in the protein. We show how and why hydrophobic particles of a certain
size travel in the protein along predetermined pathways, which are not visible in the protein’s static
structure. We also show that one can efficiently predict the gas-accessible areas of any protein,
based on the assumption that gas passage requires the formation of spontaneous cavities. (This
chapter is based on work published in Cohen et al. [32, 33].)
2.1 Introduction
Hydrogenases are enzymes that catalyze reversibly the oxidation or production of molecular hy-
drogen according to the reaction
2e− + 2H+ H2. (2.1)
Various hydrogenases can be found in a wide variety of unicellular organisms and usually come in
one of two flavors: [NiFe]-hydrogenases, which are usually associated with H2 uptake, or “iron-only”
6
[FeFe]-hydrogenases [4], which are generally involved in H2 production. In a majority of microor-
ganisms, [FeFe]-hydrogenases [113] function in anaerobic metabolism to oxidize overly reduced
electron carriers. Much of the recent scientific interest in [FeFe]-hydrogenases, however, concerns
a different role entirely: the H2 production properties of [FeFe]-hydrogenases offering the promise
of a means for affordable large-scale production of H2 as a source of renewable energy [65, 103].
In this chapter, we investigate the structural properties of [FeFe]-hydrogenase CpI from Clostrid-
ium pasteurianum. Hydrogen production in CpI happens at the H-cluster, a metallic cluster bound
to and embedded inside the CpI protein matrix, and is achieved by the reduction of H+ ions from
the external solution using electrons acquired from a reduced carrier such as ferredoxin [4, 125].
The H+ ions (or “protons”) probably reach the H-cluster by means of a putative, but yet unveri-
fied, proton pathway contained in the protein [126]. The electrons are transferred to the embedded
H-cluster through a series of accessory iron-sulfur clusters aligned in a chain between the H-cluster
and one end of CpI. The CpI enzyme is displayed along with its embedded iron-sulfur clusters and
H-cluster in Fig. 2.1.
Figure 2.1: Structure of CpI hydrogenase, showing the enzyme with its embedded H-cluster andiron-sulfur clusters. Also shown are the residues lining the two principal gas pathways connectingthe external solution with the central H-cluster.
In addition to electron and H+ transport, CpI must also allow gas transport to and from its
7
active site. On one hand, it must allow the H2 product to exit the protein, and on the other hand,
it must also allow small gas molecules such as O2 and CO to penetrate through the enzyme and
reach the H-cluster, which can be irreversibly deactivated upon their binding. While O2-mediated
deactivation of hydrogenases is in some cases beneficial for the host organism, it severely limits
the practicality of using hydrogenase to produce H2 as a carrier of consumable energy [55]. A few
studies have investigated gas access in hydrogenases using either molecular dynamics sampling of
the possible paths for H2 permeation [33, 109], hydrophobic cavity searches on static structures [109,
113, 114] or crystallography of xenon-saturated structures [109]. Although some progress has been
made, aside from the certain but incomplete predictions from X-ray crystallography on proteins
in the presence of xenon, no other method has been able to comprehensively predict all of the gas
pathways in hydrogenases or in any other protein.
While investigating the possible pathways taken by H2 and O2 across CpI hydrogenase, we
noticed a significant difference between the dynamics of H2 and of O2 gas permeation through the
protein matrix, which is caused only by differences in ligand size. This behavior prompted us to
examine the effect of the protein’s internal dynamics in regulating gas access to its buried active
site. For this purpose, we introduce a new approach for mapping transient cavities based on the
dynamics of the lone protein (i.e. without the gas ligand) extracted from computer simulation and
compare the resulting maps to trajectories of the intra-protein gas migration process. An almost
perfect match between the two computations prompts us to suggest that the protein-wide pathways
taken by hydrophobic gases in CpI hydrogenase, and possibly in any gas transport protein, is fully
determined by density fluctuations within the protein. Furthermore, for the case of hydrogenase,
two major gas transport pathways are fully characterized by the protein’s motion at the nanosecond
time-scale, despite the fact that the actual diffusion of gases such as dioxygen may take much longer.
In addition to the physical insight gained, the new cavity mapping method holds the promise of
being able to predict protein-wide gas transport pathways inside macromolecules. Our results
provide insights into the gas transport mechanisms for H2 and O2 in CpI hydrogenase, as well as
a new way of looking at gas transport pathways, which is immediately applicable to other proteins
and structures.
8
2.2 Methods
2.2.1 Volumetric Solvent Accessibility Maps
We introduce volumetric solvent accessibility maps (VSAMs) as a means of representing the solvent
accessible surface of a protein. Given a set of hard atomic spheres representing a macromolecule,
existing methods typically calculate a closed polyhedron that represents the boundary between
space that is penetrable and impenetrable to a model solvent molecule represented by a hard sphere
of specified radius (e.g., Richards’ Smooth Molecular Surface [131] and Conolly’s method [36]).
While traditional techniques can produce almost exact results, there are significant advantages to
be gained in representing solvent maps as a volumetric data set (i.e., a 3D grid of scalar values).
VSAMs can contain information about not just one size of solvent molecule, but about all sizes
simultaneously, enabling the interactive visualization of how the intermolecular cavities change as
a function of probe radius. Most importantly, however, many VSAMs can be combined together
using average or maximum rules. When performed over a set of trajectory frames, rather than for
a static structure, they can reveal information that cannot be gotten from conventional methods,
such as presented in this chapter. However, this comes at the cost of limited resolution, precision
and increased computer memory requirements.
In practice, the solvent accessibility of a macromolecule is stored in a 3D volumetric grid which
overlaps all or part of the macromolecule in coordinate space. The value of each voxel (i.e., grid
point) of the grid is set to be the radius of the largest possible solvent sphere that (i) contains
the spatial coordinates of the center of that voxel, and (ii) does not overlap with any of the van
der Waals spheres that represent the molecule’s atoms. Voxels whose coordinates lie inside the
molecule’s van der Waals spheres are set to zero, since no solvent can be present there. Voxels in
small interstices will have very small radius values, whereas voxels in large cavities or outside the
protein can have very large radius values. To retrieve a close approximation of the conventional
solvent accessible surface, one needs only to display the isocontours of the VSAMs for a given value
corresponding to the desired probe radius (e.g., such as a 1.4 A radius for a water solvent). All
the voxels with values smaller than the probe radius will lie outside the contour, and those with
larger values will lie inside, because if a voxel can be contained in a large solvent sphere, it will
9
automatically be accessible to smaller solvent spheres.
We now report on average and maximum VSAMs, calculated from a set of frames from MD
simulation trajectories. One must imagine that a separate VSAM is calculated for each simulation
frame, that all the volumetric grids are aligned and of the same size, and that for all frames, the
macromolecule has been repositioned such that its Cα atoms are aligned using a best fit. The
average solvent map is then simply a map in which the voxel values have been averaged over all
frames, and provides a description of the average size of the solvent which can reach a specific
position in space over the course of the trajectory. The maximum solvent map, then, contains at
each voxel the maximum value encountered during the course of the trajectory. The maximum map
describes the maximum solvent size that can ever be spontaneously placed inside the macromolecule
during any frame of the trajectory. If a pathway is only transiently open, or if its different sections
are open at different times, then the maximum solvent map will show what the pathway would
look like in its maximally opened state (which may never be encountered during the course of an
equilibrium simulation), and represents the areas in space that are ever accessible to a ghost solvent
sphere which does not disrupt the protein.
2.2.2 Temperature-controlled Locally Enhanced Sampling (TC-LES)
LES defines an algorithm for simulating multiple copies of a certain number of particles (in this
case, gas ligands), which interact in a mean-field way, with a single copy of the environment (here,
the protein). One of the features of LES, as described by [47], is that the effective temperature of
the replicated particles’ dynamics (with N replicas) is increased by a factor of N . This enhanced
temperature, while often used as a means to cross barriers, also has disadvantages. For one, the
resulting trajectories can be completely unphysical (i.e., assuming 10 replicated copies, the behavior
of a particle at 3000 K is very different from that of one at 300 K) and the copies’ traveling speed
and energy landscape is dramatically altered compared to reality. Because of this, the extra factor
of N in the temperature of the replicated particles effectively limits the amount of copies possible. If
one could control the dynamics of the replicated particles such that they all act like 300 K particles,
then it would become practical to use much more than 10 simultaneous copies and to gain a greater
amount of information from a single simulation (we used 1,000 copies for the present case).
10
In LES, the kinetic energy K and potential energy V are defined as follows:
K =12
∑a∈A
maq2a +
12N
N∑i=1
∑x∈X
mxq2x,i (2.2)
V = VAA(qA) +1N
N∑i=1
[VXX(qX,i) + VAX(qA,qX,i)] (2.3)
where qn, qn and mn are the position, velocity and mass, respectively, of particle n. A and X
represent the sets of unreplicated and replicated particles, respectively, and N is the number of
LES copies.
As outlined in [152], equations (2.2) and (2.3) describe a non-Newtonian system (Newton’s
second law is not satisfied), with the consequence that, at equilibrium, they describe a system in
which each replicated particle will acquire a kinetic energy that is N times greater than that of an
unreplicated particle (for an LES particle, 〈KLES〉 = 32NkBT ). While their formal temperatures
(defined according to the zeroth law of thermodynamics) are the same, the “effective” temperature
(defined from the average kinetic energy per particle 〈K〉 = 32kBT ) of the LES particles is larger
by a factor of N . A commonly used method for dealing with the divergent kinetic energy of the
replicated particles has been to increase the mass of these particles by a factor of N [134, 152]. This
slows down the copied particles so that their motion can be calculated efficiently, but does nothing
to address the divergent kinetic energy itself or to avoid the resulting reduction in energy barriers
for the copied particles. The increased ligand energy results in significantly skewed simulations,
and one is unable to reproduce results from a limited set of single copy simulations, such as a
preference for certain cavities and pathways, by using straight LES with as few as 10 ligand copies
(results not shown). To overcome these problems, improvements to the LES algorithm have been
suggested and tested by [159]. Comparison of the original LES and its variant with single-copy
dynamics highlight the shortcomings of straight LES, as well as the success of the LES variant, in
reproducing the correct dynamics [160].
In our own attempt to improve the LES method, we sample a constant temperature ensemble
by coupling the actual protein and the replicated gas ligands to different Langevin heat baths. The
idea of controlling the replicated particles’ temperature was originally suggested in [152] and also
applied in [40]. For the regular particles, we use a temperature target of TA = 310 K, while for
11
the N replicated gas particles, the target must be set to TA/N or lower, such that the resulting
“effective” temperature of the replicated particles, as measured in the simulation, is close to 310 K.
As a result, the average kinetic energy is the same for all particles and the sampling of phase
space corresponds to a 310K constant-temperature ensemble. Maintaining parts of the system
at two different formal temperatures keeps the system out of equilibrium, and as such the LES
copies will have a tendency to heat up, whereas the protein matrix will want to cool down. In
practice, we found that in order to keep the LES copies at a stable temperature, a larger Langevin
friction coefficient was needed for the replicated gas particles (since they are surrounded by a much
larger bath of protein). We used a Langevin damping term of 5 ps−1 for the unreplicated particles,
and 10 ps−1 for the replicated particles. The resulting temperature distribution of the TC-LES-
replicated particles is broader than that of the unreplicated system (we measure a normalized
standard deviation for the temperature fluctuations of 10.0 K for the replicated particles and of
1.2K for the equivalent non-LES simulation). Using TC-LES, the only approximation made is that
instead of feeling one gas molecule, the environment feels a delocalized cloud of gas. Finally, in
order to drastically speed up the simulation, we ignored the contribution of the gas molecule to the
system’s charge distribution when computing PME electrostatics. This gives exact results in the
present case, since our models for O2 and H2 contain no partial charges. The increased sampling is
thus achieved predominantly by increasing the number of copies (reducing entropic barriers), and
not by altering the energy landscape as is done in the original LES method.
2.2.3 Simulation Parameters
O2 and H2 gas access was investigated by all-atom molecular dynamics simulations of the diffusion
of O2 and H2 molecules inside CpI, originating at the active site. We used a model for the gas
molecules that did not include any partial charges, and for which O2 and H2 differed only in their
Lennard-Jones parameters and bond spring constant and length. As stated before, we used a heavy
version of dihydrogen (hH2) instead of H2, so that we could compare the diffusion properties of O2
and H2 based on their size differences alone.
Our model of hydrogenase was based on the X-ray structure of CpI [FeFe]-hydrogenase [126]
(PDB accession code 1feh). A series of atoms in the active site, or H-cluster, were missing from the
12
Protein Database structure, and have been modeled here as a di(thiomethyl)amine bridge between
the two H-cluster sulfur atoms, as suggested by later studies [51, 113]. The partial charges for
the rest of the H-cluster were based on a density functional theory calculation on the 2- oxidation
state [157], and individual charges were tweaked by ±0.02 e to guarantee the system’s charge
neutrality. The structure was embedded in a water box, resulting in a 57,000-atom system consisting
of 9,000 hydrogenase atoms, 16,000 water molecules and 15 sodium ions to cancel the excess integer
charge.
The system was then equilibrated at a constant temperature (310K) and pressure (1 atm) for a
duration of 1 ns. The last frame of this equilibration was used as a starting point for all subsequent
simulations. Aside from the initial equilibration, all simulations were performed at constant volume
and temperature (310 K). In all cases, periodic boundary conditions were used. Temperature was
regulated within the TC-LES approach by using Langevin dynamics with damping constants of
5 ps−1 for unreplicated atoms and 10 ps−1 for replicated gas atoms, respectively. Multiple time
stepping was used, with integration time steps of 1 fs, 2 fs and 4 fs, respectively, for bonded, non-
bonded and long-range electrostatic interactions (a non-bonded time step of 1 fs was used for the
case of hH2 TC-LES simulations because of energy stability issues when replicated hH2 molecules
enter the water solution environment). Particle Mesh Ewald with a grid resolution of better than 1 A
was used for long-range electrostatics, and all other non-bonded interactions were calculated using
a cutoff of 12 A. The CHARMM22 force-field [99, 100] was employed for all protein interactions,
and simulations were performed using the NAMD [84] molecular dynamics software, modified by
the present authors to allow for TC-LES.
2.3 Results
We now describe in detail MD trajectories of H2 and O2 gas migration inside CpI. We then compare
our results with a dynamic mapping of the protein cavities in the absence of gas, and find a strong
correlation between locations of the protein’s natural cavities and of the diffusing gas molecules.
13
2.3.1 Simulations of Gas Diffusion Reveal Different Transport Mechanisms for
H2 and O2
While the diffusion of O2 or H2 molecules inside a protein is not a particularly slow process (a
transport event can take from 100 ps to hundreds of ns), it is a stochastic process, and cannot be
completely described by simply examining a few nanosecond-long trajectories. In order to dras-
tically speed-up the exploration of O2 and H2 entry/exit pathways, it becomes necessary to use
certain approximations. One such approximation, known as locally enhanced sampling (LES) [47],
and based on the time-dependent Hartree approximation [62], allows a certain subset of particles in
the simulated system to be replicated many times, where each replicated subset is simulated inde-
pendently. In such a scheme, each set of replicated particles interacts with a common environment
consisting of the unreplicated particles, but the replicated copies do not interact with each other
at all. In the present chapter, we make use of a variant of LES: temperature-controlled locally
enhanced sampling (TC-LES), which is described in the Experimental Procedures section.
We have run separate TC-LES simulations for H2 and O2, using 1,000 copies of the diffusing
gas molecule, in order to determine the pathways taken by H2 and O2 while transiting between the
active site of CpI and the external solution. In our simulations, we have used a heavy version of H2,
“heavy dihydrogen” (hH2), in which the hydrogen atoms’ masses were set to that of oxygen. The
reasons for this is that we were mainly interested in investigating the accessibility of the protein
to gas molecules, as a function of the gas’ molecule size. Since we use the same mass for both
O2 and hH2, we remove the variation in behavior between the two gases which is caused by their
difference in mass (and consequently velocity) and instead focus purely on the variation in behavior
which is due to their size difference. Changing the mass of the diffusing gas does not affect the
kinetic energy, momentum, or energy profile perceived by the gas or experienced by the protein.
The system will explore the same gas-protein conformational space as it would otherwise, except
that the actual velocities of the hH2 molecules will be slowed down with respect to H2. However,
the rate of diffusion of hH2 is not expected to be significantly different from that of real H2, due
to friction effects. The larger mass also circumvents the use of a much smaller simulation time
steps, which are necessary when dealing with the very sharp Lennard-Jones potentials of replicated
“light” H2.
14
simulation pathway A pathway B other pathway total exited#1 21% 22% 4% 47%#2 72% 4% 3% 79%#3 10% 3% 6% 19%#4 4% 25% 7% 36%
Table 2.1: Table showing the percentage (rounded) of the total replicated hH2 molecules whichhave exited CpI during each of four different 4 ns simulations of the diffusion of hH2 starting at theH-cluster binding site, sorted by pathways. Pathways “A”, “B” and “other” refer to the previouslysuggested (A), the newly-discovered exit pathway (B) discussed above, or neither (other). “Total”refers to the total percent of molecules that found their way out of the protein during the 4 ns.A molecule is considered to have exited when one of its atoms comes into contact with a watermolecule located outside of the protein.
For the case of H2, we performed four simulations of 4 ns each, in which the hH2 molecules were
initially placed at the active site (at which hydrogen production takes place), based on the location
of the active-site-bound carbon monoxide in the X-ray structure of CO-saturated CpI by Lemon
and Peters [94]. In all cases, hH2 exited predominantly through two major migration pathways,
the first one (pathway A) having been previously proposed as a H2-channel candidate [113, 125],
and the second one (pathway B) being newly discovered [33]. Both pathways meet at a large cavity
right next to the H-cluster binding site. Aside from the main cavity and the two pathways, the
hH2 molecules from all the simulations taken together consistently explored similar regions of the
protein, as displayed in Fig. 2.2. Despite the fact that hH2 exited simultaneously through both
pathways in each simulation and that the shape of these pathways explored by hH2 was the same
for all simulations, the proportion of hH2 exiting through one pathway or the other and the exit
rates of hH2 out of the protein varied significantly from one simulation to the next, as detailed in
Table 2.1. In all cases, the average hH2 time of first exit out of the protein was very fast and on
the order of nanoseconds (with our simulation suggesting that the first hH2 molecules will have
found the exit within 200 ps and that roughly half will have exited after 4 ns). We expect that real
H2 would exhibit exit times very similar to those of hH2.
For the case of O2 migration from the H-cluster, five independent 3.5–4 ns simulations were
performed, some of which are represented in Fig. 2.3. In these simulations, 1,000 TC-LES copies
of O2 were placed at the H-cluster binding site and allowed to diffuse. In only one of these five
simulations did we observe O2 to leave the central cavity through the newly discovered pathway B
15
Figure 2.2: Four 4 ns TC-LES simulations of 1,000 copies of hH2 diffusing out from the H-cluster.Each independent simulation is shown in a different color, highlighting the fact that the spaceexplored by hH2 was consistent between simulations. Frames taken from every 50 ps of the tra-jectories of the hH2 molecules located inside the protein are superimposed as a cloud. Shown inlicorice are the iron-sulfur clusters and H-cluster, as well as the residues that line the two majorexit pathways.
(see Fig. 2.3 (blue)). In the four other cases, O2 remained in the central cavity near the binding site
(Fig. 2.3 (red)) for the duration of the simulation. Since the other pathway (pathway A) through
which we observed hH2 to diffuse appears to be a narrow but unambiguously hydrophobic channel
in the crystal structure, it is suspected that O2’s failure to exit through it in our simulations is
simply due to lack of sufficient sampling of the protein’s degree of freedom. With this in mind,
additional TC-LES simulations of O2 were performed, using as starting positions various locations
where large densities of hH2 were observed in the previous hH2 diffusion simulations. When we
placed O2 inside the originally-proposed hydrophobic channel (pathway A), one cavity away from
the central cavity, we were able to successfully observe O2 migration along that channel. A fraction
16
of the O2 molecules placed in this cavity even diffused inward to the central cavity and back in
one of the three 3.5 ns simulations performed (one is shown in Fig. 2.3 (green)). In no case was
O2 observed to completely exit the protein and partition into the water solution. We suspect that
the hydrophobicity of O2 causes it to prefer the protein environment to that of the water solution;
however, the influence of our O2 model parameters and of the TC-LES dynamics might also be
playing a role. The rate of diffusion of O2 inside hydrogenase appears to be entirely determined by
the protein’s dynamical fluctuations. O2 was observed to be able to diffuse to the surface of the
protein very quickly (in as little as 1.2 ns) when the protein explored a set of ideal conformations.
However, in most simulations we did not observe any significant travel of O2 (in four instances
out of five), reflecting the fact that typical protein conformations are usually unfavorable to O2
migration.
For O2, we simulated the reverse of the natural process of O2 migration from the bulk solvent
toward the active site. This was done because, at the beginning of our investigation, it was not
known where O2 could enter the protein, and the active site was the only location of CpI which was a
priori known with certainty to be accessible to O2, based on previous studies of O2 inactivation [4].
However, since the transport mechanism of O2 inside CpI is almost undoubtedly passive, it does not
matter in which direction we simulate the migration, unless there is a strong overall energy gradient
between the outside and inside. In our simulations, we have observed back-and-forth motion of
O2 and H2 suggesting that the energy profile is approximately flat between solvent and active site
(excluding the intermediary energy barriers). The degree of flatness of the free energy profile of
the gas diffusing through the protein, as well as the accessibility of the identified pathway exits to
a O2 molecule entering CpI from the external solution, of course, still needs to be confirmed by
more detailed studies.
From our TC-LES results, we see that both O2 and hH2 can diffuse across the protein and exit
through two common pathways. However, we noticed that hH2 can penetrate a broader region of the
protein, and on shorter time scales, whereas O2 in our simulations was strictly limited to the above-
mentioned pathways. Aside from exploring very similar regions of CpI, the TC-LES trajectories for
O2 were qualitatively very different from those of H2 in terms of the collective dynamics. While the
different copies of hH2 spread out with time into a diffuse cloud covering the whole protein-water
17
Figure 2.3: Three representative TC-LES simulations of 1,000 copies of O2 diffusing out from theH-cluster or from the middle of a previously identified H2-channel. Each independent simulationis shown in a different color and one can see that, contrary to H2 diffusion, the O2 moleculesmove collectively through the same pathway for a given simulation, though they may employdifferent pathways for different independent simulations. Overall, the set of pathways exploredby O2 matches the dominant pathways explored by hH2. Snapshots were taken every 50 ps. Therepresentation of the protein is the same as in Fig. 2.2. Dotted arrows represent possible exitsbased on the proximity of the external solution.
system, the O2 molecules typically clustered together as a single cloud (which occasionally could
also split into more clouds on a ∼3–4 ns timescale). The fact that the O2 molecules cluster cannot
simply be attributed to the fact that they all experience similar condition: they all have different
initial velocities, Langevin random forces and interactions with the protein, and, in addition, the
dramatic clustering behavior is not observed at all for the smaller hH2 molecules. Instead, the
behavior of the collective O2 motion suggests that O2 does not diffuse in and out of CpI through
a permanent channel, contrary to what was previously assumed. As will be shown later, O2 fills
up small cavities inside the protein which are themselves dynamic. Through the protein’s natural
18
motion, combined with the disruptive effect of the O2, these cavities dynamically fluctuate in size
and in their connections with neighboring cavities at certain favorable locations. The transport
of O2 seems to be guided much more by the protein’s random peristaltic motion than by simple
diffusion through a static complex medium [111]. For H2 on the other hand, the protein appears
much more porous and, at any given time, there are many more cavities and partial channels that
are accessible at any given time to H2 than to O2. Because, in our simulations, O2 and H2 have the
same mass, the differences in behavior between the two gases is not due to differences in diffusion
speeds but is solely caused by their different Lennard-Jones parameters.
The major problem encountered with TC-LES was insufficient sampling, despite the 1,000-
fold increase in sampling as compared to regular MD, especially since the effect of the single
protein trajectory on ligand diffusion appears to be of significant importance. Taken together,
the TC-LES simulations do in fact confirm the existence of a new gas transport pathway and the
calculated trajectories are both realistic and consistent. But for the case of O2, we had difficulty
reproducing the same pathways from one simulation run to another. While we observed several
very likely permeation events from the active site to the external solution, we could not determine
with certainty whether there exist other pathways through which O2 could exit on the simulated
timescale, using TC-LES alone. To obtain a picture of the pathway topology for H2 and O2 inside
CpI, we clearly need a better method. We believe that the maximum volumetric solvent accessibility
map (VSAM) methods described in the Experimental Procedures section provides an excellent tool
to acquire the needed information.
2.3.2 Predicting Hydrophobic Gas Accessibility from the Equilibrium
Dynamics of the Protein Alone
In the previous section, we saw that O2 molecules permeating inside CpI moved as if they were
trapped in small dynamic pockets of empty space. Almost every copy of O2 followed exactly
the same trajectory in a given TC-LES simulation, yet these trajectories varied widely from one
simulation to the next. Since TC-LES has a single copy of the protein interacting with many
replicated gas molecules, our results suggest that transient conformations of the protein have a
huge impact on the pathways taken by gas molecules diffusing inside it. In this section, we specify
19
and confirm this hypothesis by monitoring the transient cavities that are intrinsically present
inside CpI in the absence of gas. We map out, by means of VSAMs, which regions of the entire
protein would be accessible to a “ghost” solvent of a given radius, assuming that the solvent does
not interact with the protein: it can only “occupy” free space if such space is ever spontaneously
available in the protein. Surprisingly, we find that the set of possible trajectories, taken by both O2
and H2 gas (and very likely by any other spherically shaped hydrophobic ligand), can be predicted
remarkably well. These results complement other related investigations of the influence of protein
conformations or mutations on ligand diffusion inside the myoglobin distal pocket [27, 67], [NiFe]-
hydrogenase [25, 155], catalase [6], and cytochrome-c oxidase [139]. Our results differ from most of
these previous studies by the methodologies that we have used (TC-LES and VSAMs), as well as
through the, in many cases, significantly longer timescales and larger areas of the protein probed,
and the fact that we looked at the protein’s accessible volume in the absence of ligands.
We have calculated a static 3D map of the largest ghost solvent spheres that could be placed at
any given time inside CpI, based on a 2 ns equilibrium simulation, according to the VSAM protocol.
VSAMs calculated based on either the first or second ns of the computed equilibration trajectory
showed little variation and strong reproducibility, as opposed to the TC-LES trajectories. Fig. 2.4a
shows the isosurface contour representing the area accessible to a solvent of radius 1.35 A (which
characterizes the van der Waals radius of an H2 molecule) along with the TC-LES trajectories of hH2
diffusion. Visually, the isosurface accurately describes the regions of space that had been explored
by hH2 during the four 4 ns TC-LES simulations of hH2 inside CpI, even though the VSAM was
calculated based on a trajectory that did not contain any gas molecules. Almost every predicted
cavity throughout the CpI structure was observed to have been visited by H2, and almost all areas
visited by H2 corresponded to regions where cavities had been predicted, including areas away from
where the H2 diffusion originated as well as, surprisingly, internal cavities disconnected from the
surface. The same excellent match was observed for the case of O2 and the 1.6 A iso-value contour
of the same VSAM (Fig. 2.4b), though in our TC-LES simulations, the O2 molecules only had time
to explore the cavities directly adjacent to the H-cluster active site, so the comparison was only
performed there. A comparison of the area of the protein accessible to H2 and O2-sized particles
is shown in Fig. 2.4c, and one can see that both pathways A and B are clearly identified. There
20
were very few exceptions in which the TC-LES simulations did not match the cavity predictions:
(1) exactly at the binding site where the 1,000 copies were placed, no cavity was predicted there
(but gas was observed there during TC-LES because this was the starting position), (2) for the
case of O2 there was one single region of disagreement, which corresponds to a region occupied
by water during equilibration, but in which O2 managed to go during the TC-LES simulation (if
we consider the space occupied by water to be accessible to O2, then O2 was never observed in
any other unpredicted cavity), and (3) hH2 was occasionally observed in regions not predicted by
the 1.35 A contour (but this happened for less than 1% of all hH2 positions explored). Fig. 2.5
shows the cumulative occupancy of O2 and H2 based on the value of the underlying maximum
VSAM grid points in the region around the active site. The figure displays definite thresholds for
the occupancy of H2 and O2 as a function of predicted maximum radius of solvent, below which
no TC-LES simulated gas has been found to go. This shows that the maximum VSAM correctly
predicts the accessibility of both H2 and O2 (in the sense that gas does not enter regions not
predicted, the converse appears to be true according to visual inspection for the case of H2 and
cannot be proven for the case of O2 due to lack of sampling). Only 0.9% of the TC-LES hH2
molecules were found in hollow regions with maximum predicted radii below 1.35 A, and only 0.1%
of the O2 was found in regions with a radius of less than 1.6 A.
It is important to realize that the gas-transport pathways described above could not be identified
by simple analysis of static crystal structures. For the case of pathway A, a solvent-accessible
surface for a solvent the size of H2 (radius ∼1.35 A) can indeed be detected this way, though it
becomes disconnected if one includes equilibrated hydrogen atoms or uses larger probe molecules
(such as O2 with a radius ∼1.6 A). Pathway B, however, could not be detected for either H2 or O2
using this type of analysis. If one compares the iso-value contours of our maximum and average
VSAMs, for H2- and O2-sized solvent particles, one can understand the difference in dynamics
observed between O2 and hH2 diffusion during the TC-LES simulations. The VSAM containing
the average value of the solvent molecule does not exhibit any channels of sufficient size to allow
O2 to access the active site. Only very few cavities along the two main diffusion pathways are
large enough to accommodate an O2 molecule on average, but these do not form a continuous
channel from the external solution to the active site (see Fig. 2.6). Only for the case of H2 is one
21
Figure 2.4: Comparison of the surface delimiting the maximum VSAM predicted from the equilib-rium simulation of CpI in the absence of gas for a particle the size of (a) H2 and (b) O2, along withthe locations explored by the centers of the hH2 or O2 atoms, respectively, during the TC-LESsimulations. (c) A slice through the computed gas-accessible surfaces for O2 (light gray internalvolume) and H2 (dark gray internal volume), highlighting the differences between the two.
of the two major diffusion pathways at least partially visible in either the average or instantaneous
VSAMs (a continuous channel is in fact observed for a ∼1 A-radius probe, which is smaller than
H2). This is the “hydrogen channel” that was originally proposed from an analysis of the X-ray
crystal structure of CpI [126]. As suggested by our TC-LES simulations, it does appear that O2
moves from cavity to cavity as the cavities fluctuate into existence inside the protein, and there is
no permanent “channel” to speak of. For the case of H2, the same also holds, but H2 is sufficiently
smaller in size, as compared to O2, such that more open space is accessible to it at any given time.
The instantaneous H2-sized cavities connect in more places as well as more often, allowing for
easier diffusion. A quick analysis of the probabilities at which different parts of the pathways are
large enough to accommodate gas molecules reveal that, over the course of the 2 ns equilibration,
most regions of pathways A and B were open 5-8% of the time for O2- and 30-35% of the time for
H2-sized particles, and each pathway had one constricted region which was only open for about 2%
of the time for O2 and 20% of the time for H2, thus limiting the rate of exit of the gas.
22
Figure 2.5: Comparison of TC-LES- and VSAM-predicted gas pathways. Histogram of the valuesof the VSAM maximum predicted probe radius for each of the positions explored by both O2 andhH2 in the TC-LES simulations. The abscissae represent the interpolated value of a 0.5 A resolution3-D maximum VSAM grid at the location of the center of each O2 or H2 atom from the TC-LESsimulations, and the ordinates indicate the number of times (normalized and cumulative) that thesevalues have been explored by O2 or hH2.
Finally, we wish to comment on the approximations made in the maximum VSAM method.
We have shown that we can predict with excellent accuracy where both H2 and O2 gas can go in a
hydrogenase protein, based solely on an analysis of the space accessible inside the protein during an
equilibrium simulation in the absence of gas. This statement appears to imply that gas and protein
do not interact strongly. However, other studies of gas transport in protein cavities have suggested
that the presence of a gas can strongly influence the internal conformations of the protein near the
gas [18]. To test this suggestion, we have performed our VSAM analysis on the trajectories which
did contain O2 and hH2 and we clearly see that the proteins that contain gas exhibit significantly
larger cavities (not shown) compared to the gas-less trajectories. What the present chapter intends
to demonstrate is not that the gas diffuses as if it were a non-interacting ghost particle, but rather
that, though the gas can strongly bias the openness and shape of nearby cavities, it does not create
new cavities that would not spontaneously appear by themselves inside the protein. The presence
of gas does not create new diffusion pathways. The gas molecules merely insert themselves into
23
H-cluster
pathway A
pathway B
Figure 2.6: Average and maximum cavities for the case of O2-sized particles. The outer contourrepresents the average VSAM. A solid slice through the surface shows which areas of the averageVSAMs are accessible to the gas, including the two main diffusion pathways (dark gray), and whichareas are excluded (light gray) according to the maximum VSAM. The dotted circles indicatethe only two discernable O2-sized cavities in the average VSAM (as well as in the static crystalstructure).
packing defects that arise spontaneously with or without gas in the protein and then alter the
defects’ sizes and “open” probabilities. In this respect, the lone protein approximation is a very
good indicator of what areas of the protein are accessible to hydrophobic gases.
2.4 Conclusion
There has been a steadily increasing body of evidence suggesting that packing defects play a major
role in gas transport inside many proteins [6, 18, 22, 23, 27, 67]. Our results further confirm previous
indications that a permanent channel is not needed to allow gas from a protein’s exterior to reach a
buried active site. Transient cavities, arising from the protein’s natural equilibrium dynamic motion
at the nanosecond timescale, can define predetermined pathways for hydrophobic gas transport.
Such observations have been stated before, but we show for the first time that the location of the
24
pathways taken by diffusing hydrophobic gases (in this case, H2 and O2) can be fully described
on a protein-wide scale, by simply analyzing the motion of the protein in the absence of internally
diffusing gas, and that the presence of the gas is not absolutely needed to open or activate these
pathways. We do not expect this to be the case for polar ligands: strong protein-ligand electrostatic
interactions might make accessible pathways that would otherwise remain tightly shut during the
protein’s equilibrium motion [88].
Comparing all of our TC-LES simulations for a given gas molecule (O2 or H2), we notice that
even though we could not reproduce the same trajectories and gas exit rates from one multi-
nanosecond run to another, all our runs had in common the fact that they were exploring the same
maximum cavity predicted by our VSAM, which itself was reproducible with very good agreement
between runs at the nanosecond timescale. This observation highlights the important possibility
that all the necessary protein conformations that enable gas permeation across CpI can occur
at the nanosecond time scale. Though the essential dynamics needed for the understanding of
gas permeation in globular proteins (namely the formation of transient cavities) occur on a short
timescale, results obtained by simulating the diffusion of individual particles were never observed
to converge during that time scale. This is due to the fact that, if we ignore for now the effects
of the gas-protein interactions, gas diffusion, as probed by MD, relies on the temporal and spatial
coupling of three simple random processes, namely, the transient formation of cavities, the transient
formation of passages between these cavities, and the random hopping of gas molecules across these
passages. Combined together, these three effects give the appearance of a very complex gas diffusion
process that cannot be fully characterized using multi-nanosecond MD diffusion studies alone. Our
results demonstrate that the very slow diffusion of O2 and H2 inside CpI can be characterized by
sampling the dynamics of a protein on a much shorter timescale. We cannot exclude the effect of
rare protein conformations not sampled in a 1 ns run, on gas diffusion, but we can assume that their
effect is probably a very minor one, since in order to be effective, these rare protein conformations
must also coincide with the presence of O2 molecules at just the right place and time.
25
Chapter 3
Imaging the migration pathways forgas ligands inside myoglobin
Myoglobin (Mb) is perhaps the most studied protein, experimentally and theoretically. Despite
the wealth of known details regarding the gas migration processes inside Mb, there exists no fully
conclusive picture of these pathways. We address this deficiency by presenting a complete map of all
the gas migration pathways inside Mb for small gas ligands (O2, NO, CO and Xe). To accomplish
this, we introduce a computational approach for studying gas migration, which we call implicit
ligand sampling. Rather than simulating actual gas migration events, we infer the location of gas
migration pathways based on a free-energy perturbation approach applied to simulations of Mb’s
dynamical fluctuations at equilibrium in the absence of ligand. The method provides complete
3-D maps of the potential of mean force of gas ligand placement anywhere inside a protein-solvent
system. From such free energy maps, we identify every gas docking sites, the pathways between
these sites, to the heme and to the external solution. Our maps match previously known features
of these pathways in Mb, but also point to the existence of additional exits from the protein matrix
in regions that are not easily probed by experiment. We also compare the pathway maps of Mb
for different gas ligands and for different animal species. (This chapter is based on work published
in Cohen et al. [31].)
3.1 Introduction
Myoglobin (Mb), the first protein to be resolved at the atomic level [85], is a relatively small
(approximately 150 amino-acids) globular protein, found mainly in heart and skeletal muscles of
numerous animal species [22, 56, 57, 169]. Its active site, the heme, binds small gas ligands such
as molecular oxygen (O2), carbon monoxide (CO), nitric oxide (NO) and cyanide (CN−), making
this protein an important participant in the intra-cellular transport and storage of gases, particu-
26
larly of O2. In addition to facilitating the oxygen transport from the cell membrane to the cell’s
mitochondria, Mb is now believed to also play important roles in oxidative phosphorylation [169]
and in the scavenging of NO [54, 61, 68, 104].
The heme is buried inside Mb, which protects it from the aqueous environment, and thus is
not directly accessible to ligands in solution. Because gas ligands must find their way to the heme
by diffusing inside Mb’s protein matrix, Mb has long been a prime candidate for the study of
gas migration inside proteins. Numerous experiments have investigated this process inside Mb.
A popular experimental measurement is the timescale of the geminate recombination of the Mb
moiety with its gas ligand (O2, CO, NO), in which the ligand dissociates from the heme upon
flash-photolysis [10, 66], wanders inside the protein for tens to hundreds of nanoseconds, and then
rebinds to the heme [24, 41, 67, 115, 120, 133, 145, 146]. By measuring the timescale distribution
of the recombination process, and the rate at which the ligand escapes the molecule instead of
rebinding, one gains insight into the internal network of gas migration pathways inside Mb, and
into the size of the energy barriers along these pathways.
Early on, Mb was found to bind xenon gas (Xe) and structures of Mb in the presence of
bound Xe pointed to cavities between which small gas ligands could potentially hop [156]. Early
simulations of the gas migration process, although constrained to short timescales and distances
from the heme, nevertheless revealed the relevance of the Xe cavities, as well as the importance
of the protein’s motion in allowing gas ligands to migrate between them [27, 47]. In the last few
years, experiments and simulations have covered considerable new ground. Time-resolved X-ray
crystallography of photolyzed Mb-CO geminate complexes provided movies of the evolution of the
average CO distribution as a function of time after photolysis [20, 77, 143, 144, 164, 165]. Long-
timescale simulations (greater than 80 ns) of the migration of CO or NO inside Mb reproduced some
of these results [17, 18, 117], and shed more light on the locations of the ligand-accessible regions
inside the Mb, as well as on how these regions are connected. The general picture emerging from
experiment and simulation is that Mb has, in its interior, several regions (“cavities”) favorable
for gas molecules to reside. These regions are identified as Xe binding sites observed by X-ray
crystallography or as empty space in static X-ray structures. The gas ligands hop from one cavity
to another via an unspecified mechanism, but of which it is generally agreed that the protein’s
27
thermal fluctuations play a role [32, 52, 93]. The location and properties of the connections between
the internal Mb cavities as well as of the exit pathways from Mb have not been fully characterized.
In the present work, we address the migration of small gas ligands inside proteins, using Mb
in particular, from a protein-wide perspective. To accomplish this, we introduce a computational
method, which we call implicit ligand sampling, which computes the potential of mean force (PMF)
corresponding to the placement of a given small gas ligand such as O2, CO, etc., everywhere inside
a protein. The PMF that we calculate describes the Gibbs free energy cost of having a particle
located at a given position, integrated over all the other degrees of freedom of the system, except
for the ligand’s position, and is the quantity that indicates which areas of the protein are accessible
to the ligand and at what free energy cost. The implicit ligand sampling method for computing a
monoatomic or diatomic gas ligand’s PMF inside a protein (see Methods) relies on the fact that
gas ligands are small and interact weakly with the protein matrix [32]. Because of this, we can
analyze the protein dynamics in the absence of the ligand and treat the ligand’s presence as a weak
perturbation, and yet still produce accurate results. This approach may seem surprising, but the
absence of ligands in the simulation is in fact beneficial because the protein’s migration pathways
can now be sampled at every point in space simultaneously, thus generating much better statistics,
in most cases, than what would be obtained if one were to follow the trajectory of a single ligand.
When applied to Mb, implicit ligand sampling provides a complete 3-D map of the favorable
regions and migration pathways for a small gas ligand inside the protein. We devote the rest of
the chapter to describing these pathways. Our maps of the migration pathways inside Mb that
are located near the heme match prior experimental and computational evidence for O2 and CO
well. We also convincingly find that Mb has more than one exit to and from the heme binding site,
that the network of cavities may have an influence in tuning the different migration properties of
various gas ligands, and that general features of the Mb migration pathways are conserved across
species.
28
3.2 Methods
3.2.1 Implicit ligand sampling: theory
Here, we derive an expression for the implicit ligand PMF. The implicit ligand PMF corresponds
to the estimated free energy of placing a gas ligand anywhere inside a protein and its immediate
environment, calculated from an equilibrium simulation of the protein in the absence of the ligand.
In order to keep the derivation simple, we examine and discuss the case in which the ligand is a
point particle. For the general case, however, we must also take into account the ligand’s internal
degrees of freedom (e.g., for the case of a diatomic molecule, bond length and orientation). The
derivation of the general case is presented separately in the Appendix.
The PMF W(r) for the ligand, which, in our case, represents the Gibbs free energy cost of
placing the ligand at a specific position r, is directly related to the probability ρ(r) of finding the
ligand at that position, and is defined as [135]
W(r) = −kBT ln[ρ(r)ρo
], (3.1)
where ρo is an arbitrary normalization factor.
At constant temperature (T ) and pressure (P ), the probability density distribution of the ligand
ρ(r) can be expressed as:
ρ(r) =
∫dV
∫d3Np
∫d3Nq
∫d3p′
∫d3r′ e−β[H(p,q,p′,r′)+PV ] δ3(r′ − r)∫
dV
∫d3Np
∫d3Nq
∫d3p′
∫d3r′ e−β[H(p,q,p′,r′)+PV ]
, (3.2)
where∫
d3Np∫
d3Nq refers to the integration over all degrees of freedom of the protein reference
system (which includes the surrounding solvent), and where∫
d3p′∫
d3r′ is the integration over
the ligand’s degrees of freedom; H(p,q,p′, r′) is the Hamiltonian for the protein-ligand system, V
is the volume enclosing the system, and we define β = (kBT )−1.
When calculating the PMF of a ligand from a MD simulation, the probability density ρ(r) is
usually measured directly from a trajectory of the ligand motion, often with the help of sampling
enhancement techniques such as umbrella sampling [71, 135] and/or locally-enhanced sampling [47].
29
Because a lot of sampling is needed to get an accurate PMF, the ligand is often artificially con-
strained to a restricted region of space since the thorough exploration of an entire protein by a
ligand is not possible during the timescales currently accessible to MD simulations. Since we are
interested in characterizing the PMF for ligand diffusion everywhere inside a protein, and not just
along a restricted path, this causes a problem. We overcome this limitation by using an implicit
ligand: we treat the ligand as a small perturbation of the lone protein dynamics. A previous study
of gas migration inside CpI hydrogenase [32] demonstrated that the pathways taken by O2 and H2
inside CpI could be accurately predicted from the protein’s equilibrium dynamics in the absence
of the ligand. This suggests that the perturbation approach is sensible for case of gas ligands.
We will now derive the PMF of ligand migration by treating the effect of the ligand as a pertur-
bation to a reference ensemble of protein states which contains no ligand. Under the presence of a
ligand with no internal degrees of freedom, the Hamiltonian for the protein reference system (Ho)
will be shifted by an amount equal to the protein-ligand interaction energy ∆E(r) and the ligand’s
kinetic energy K(p′) (the latter will eventually cancel out and disappear from our formulations).
The full Hamiltonian can now be expressed in terms of that of the reference protein as:
H(p,q,p′, r′) = Ho(p,q) + ∆E(q, r′) + K(p′). (3.3)
Inserting the perturbed Hamiltonian (Eq. 3.3) into the expression for the ligand probability
density (Eq. 3.2), we get:
ρ(r) =
∫dV
∫d3Np
∫d3Nq e−β[Ho(p,q)+PV ] e−β∆E(q,r)
∫d3p′ e−βK(p′)∫
dV
∫d3Np
∫d3Nq e−β[Ho(p,q)+PV ]
∫d3r′ e−β∆E(q,r′)
∫d3p′ e−βK(p′)
. (3.4)
We now wish to express our result in terms of an isobaric-isothermal ensemble (NPT ) average
over all states of the protein reference ensemble. In the reference protein NPT ensemble, the
average of any general observable A(r) is defined as:
⟨A(r)
⟩NPT
=
∫dV
∫d3Np
∫d3Nq e−β[Ho(p,q)+PV ] A(p,q, r)∫
dV
∫d3Np
∫d3Nq e−β[Ho(p,q)+PV ]
. (3.5)
30
Then, using the definition for the isobaric isothermal ensemble average, the ligand probability
distribution (Eq. 3.2) becomes:
ρ(r) =
⟨e−β∆E(r)
⟩NPT⟨∫
d3r′ e−β∆E(r′)⟩
NPT
. (3.6)
The denominator in Eq. 3.6 is simply a constant, which we will now refer to as λ. Inserting the
ligand probability density (Eq. 3.6) into our definition for the PMF (Eq. 3.1), we obtain:
W(r) = −kBT ln
⟨e−β∆E(r)
⟩NPT
ρoλ
. (3.7)
For convenience, we impose that our PMF be zero when the ligand is in vacuum (defined as a
region for which ∆E(q, r) = 0 always holds). This condition will be satisfied by setting ρo = 1/λ:
W(r) = −kBT ln⟨e−β∆E(r)
⟩NPT
. (3.8)
When computing the PMF for a diatomic gas such as O2, CO or NO, we must also take into
account the internal degrees of freedom of the ligand, in addition to those of its center of mass.
In our analysis, we approximate the diatomic bond length to be fixed, and we are only interested
in accounting for the ligand’s orientational degrees of freedom (which we denote as Ω). For this
particular case, and following the more general derivation found in the Appendix, the expression
for the PMF becomes (see Eq. 3.18):
W(r) = −kBT ln
⟨∫
dΩ e−β∆E(r,Ω)⟩
NPT∫dΩ
, (3.9)
where∫dΩ is the integration of unity over all internal degrees of freedom.
This formulation is equivalent to that used in the 1-step free energy perturbation (FEP) method
(e.g., see [14, 89]). Traditionally, FEP techniques are used to determine the free energy difference
between two similar states of a system. In that case, it is common to use a series of artificial
intermediate states in order to increase the accuracy of the FEP method. In the present case, since
31
our perturbation is very small, the 1-step FEP method already provides good results. We take
advantage of this fact by calculating not just one free energy difference, but a huge number of such
differences spatially distributed over the entire protein. This is possible because all that is needed
in order to perform the calculation is a trajectory of the unperturbed protein reference state.
In principle, the analytical form for the implicit ligand PMF (Eq. 3.8) is exact because the
integration is performed over all possible states. In practice, the validity of the implicit ligand PMF
is not guaranteed when the thermal average 〈. . .〉NPT is replaced by a sum over a finite number of
states, such as is the case for MD or Monte Carlo simulations. In this case, only a restricted set
of states probable according to the reference energy function Ho(p,q) is actually sampled. The
states that are probable according to the protein-ligand energy function H(p,q, r), which is what
we require, may be undersampled or not sampled at all. If the perturbation introduced by the
ligand is small, the two distributions will have significant overlap (see Fig. 1), and the computation
of the implicit ligand PMF is possible by simply re-weighting the states of the protein reference
simulation according to Eq. 3.9. If the perturbation caused by the ligand is large, then the overlap
between the two distributions will be small and the protein states relevant for the protein-ligand
system may not be sampled in the reference simulation. As we will see, for the specific case of
small gas ligands, the perturbation can, in many cases, be considered to be small enough for the
implicit ligand analysis to work.
We now express the implicit ligand PMF (Eq. 3.9) as an average over a finite number M of
protein states taken from a simulation. If we use C different equally-probable orientations of the
ligand, the final PMF will be given by
W(r) = −kBT lnM∑
m=1
C∑k=1
e−β∆E(qm,r,Ωk)
MC. (3.10)
In order to gain a better understanding of the applicability of the implicit ligand sampling
method, we can estimate the maximum error on our free energy measurements. The PMF calculated
by means of Eq. 3.10, like most other free energy calculations, suffers from the fact that it can be
significantly influenced by rare events. The error caused by the undersampling of rare events may
be estimated by calculating the change in PMF that such an event would cause. To do this, we
assume conservative values for the frequency and ligand interaction energies of such events. For
32
Figure 3.1: Schematic diagram showing the probability of occurrence of all the protein states,during a simulation of the protein reference system (solid line), or those desired in order to geta proper PMF for the protein-ligand system (dashed line). The introduction of a ligand insidethe protein at a given position perturbs the energies of all the protein states from the referenceensemble, and consequently alters their probability of occurrence.
the frequency, we assume that if a maximally favorable event was not sampled in M states (from
the simulation), then such events will on average occur less than once every M + 1 states. We also
assume that for this state, the protein-ligand interaction energy will be an optimal value ∆Emin,
which is independent of the ligand’s internal degrees of freedom. In practice, we choose ∆Emin to be
location independent, and we compute it by measuring the average interaction between the ligand
and its environment during a simulation using an explicit copy of the ligand and its environment.
Neglecting the effect of allosteric and/or conformational changes whose timescales are greater than
those sampled, the maximum lower error due to undersampling (undersampling will always cause
the PMF to be overestimated), can thus be estimated as:
∆W−(r) = −kBT ln
M∑
m=1
C∑k=1
e−β∆E(qm,r,Ωk) + Ce−β∆Emin
(M + 1) C
−W(r). (3.11)
For large values of the number of independent samples M , this becomes:
∆W−(r) = −kBT ln
1 +eβ[W(r)−∆Emin]
M
. (3.12)
33
The error estimate provided by Eq. 3.12 can be used to test the suitability of the implicit ligand
analysis to various ligands. If a ligand interacts strongly with the protein(e.g., Cl−-protein interac-
tion has been measured to be as strong as ∆Emin = −150 kcal/mol in ClC chloride channels [34]),
then the error on Eq. 3.10 will be gigantic, and the method will fail. Similarly, if the ligand is not
very small (e.g., ATP, glycerol, etc.) then the measured PMF will be very large for all simulated
reference protein states as compared to ∆Emin, and the error on the PMF will again be huge.
For small gas ligands, we have estimated the values of ∆Emin by measuring the average energy
during short equilibrium simulations of explicit ligands in a water box. A uniform water box gives
excellent statistics and, from our observations and expectations, the very mobile water molecules
provide very favorable ligand interaction energies, which in turn will result in an error on the PMF
which can be used as a conservative estimate of that inside the protein. We measured the gas-water
average interaction energies by placing one copy of the explicit gas ligand in a 30A× 30A× 30A
water box. The gas-water system was then simulated under NPT conditions (300 K; 1 atm) for
500 ps, during which the gas-water interaction energies were measured every 1 ps. The interaction
energies were found to be converged for the last 450 ps of simulation, over which the interaction
energy was averaged. This procedure returned values of ∆Emin = -3.2, -3.7, -4.1 and -5.6 kcal/mol
for O2, CO, NO and Xe, respectively (with a standard deviation of 0.7–0.8 kcal/mol for all four
ligands, and a negligible error). For the case of O2, this would imply that the lower error on the
PMF due to undersampling using 5,000 independent snapshots would be less than 0.1 kcal/mol for
a measured PMF of -1, 0.5 for a PMF of 2 and 3.1 for a PMF of 5 kcal/mol, etc. On top of this,
we add an additional uncertainty due to the variation in sampling, estimated from the variation
in the PMF across different points in space for a 5 ns water box simulation, which we evaluated to
be ±0.2 kcal/mol for O2, ±0.3 kcal/mol for CO and NO, and ±0.8 kcal/mol for Xe (trends in the
energy profile over large regions of space can be identified with a much better accuracy than this
because the actual error at each point in space acts independently).
3.2.2 Implicit ligand sampling: computational implementation
In practice, we compute the PMF using Eq. 3.10 for each possible ligand location on a regularly-
spaced grid (with a spacing of 1 A), and for many different ligand orientations. The ligand in-
34
teraction energy ∆E(q, r) is computed using a Lennard-Jones potential, truncated at 12 A, using
the van der Waals parameters taken from the CHARMM22 force-field along with realistic bond
lengths (O2: εO = −0.12 kcal/mol, Rmin,O/2 = 1.7 A, lbond = 1.12 A; CO: εC = −0.11 kcal/mol,
Rmin,C/2 = 2.1 A, lbond = 1.1 A; NO: εN = −0.20 kcal/mol, Rmin,N/2 = 1.85 A, lbond = 1.15 A).
The inclusion of charges in the ligand was found to slow down the computation to intractable levels.
An implicit ligand sampling analysis was performed using both explicitly dipolar and uncharged
ligands for selected test cases and it was found that the effect of the electrostatic dipoles of NO,
CO and O2 is negligible. Quantum mechanics calculations have determined partial charges to be
less than 0.025e for all ligands studied in this study, and the solvation energy calculated using
the implicit ligand sampling with ligand partial charges of 0.025e varied by less 0.05 kBT from the
values in Table 3.2 for all cases. This also held true for the energies measured for a small number of
frames of the Mb dynamics using both dipolar and uncharged models of O2; the error introduced
by the dipole was typically lower than 0.05 kBT . Because of this, the maps published herein were
computed using completely neutral ligands. The ligands’ quadrupole moments were not accounted
for in this study.
The parameter set for Xe (εN = −0.494 kcal/mol, Rmin,N/2 = 2.24 A) was picked from many
choices in the literature, and provided good agreement with Xe solvation and Xe-Mb binding
energies. The actual values of the PMF measured for Xe are sensitive to the the particular choice
of Xe parameters, due to Xe’s large size; however, other Xe parameters lead to identical binding
site locations and exhibit the same general behavior, but the actual energies measured can differ
in magnitude (Xe parameters that use small radii tend to exhibit much smaller barriers between
binding sites).
Within each grid cube, we calculated the energies for 23 (8) equally spaced positions for diatomic
ligands (e.g., O2) and 33 (27) positions for monoatomic ligands (e.g., Xe), providing much better
statistics (i.e., a much narrower distribution of energies for the same averaged value) for each grid
cube. For the case of O2, 50 randomly-chosen orientations of the ligand were evaluated at each
location. Furthermore, this was repeated for 5,000 trajectory snapshots (sampled at each ps), as
we found that this amount of sampling provided a satisfactory accuracy. In order to speed up
the calculation, the interaction between atoms located further than 5.5 A apart was calculated only
35
once per grid cube, per trajectory snapshot, while the interaction energy below 5.5 A was calculated
for all 23 or 33 points inside each grid cube; this approximation was shown to amount to less than
a 0.05 kBT maximum error, while reducing the total computation time for each O2 PMF map to
practical levels. In the end, for each grid point, 50× 23 = 400 energy calculations were performed
per trajectory snapshot for the diatomic ligands and 27 for monoatomic Xe. The value at each grid
point then represents the PMF of having an O2 molecule located within a 1 A3 cube centered at
that point. The implicit ligand sampling algorithm is included and distributed as part of the open
source VMD 1.8.4 software package [78] (in VMD’s volmap command).
3.2.3 PMF for ligands with internal degrees of freedom (optional)
When calculating the implicit ligand PMF for the case of diatomic (or more complex) ligands, we
must also take into account the internal degrees of freedom of the ligand, such as its orientation,
bond length, etc. In the following derivation, we will treat these generalized degrees of freedom
separately from those of the rest of the protein-ligand system. In our notation, r will refer to the
ligand’s center of mass, p′ will refer to the ligand’s momentum degree of freedom, and Ω will denote
all of the ligand’s remaining generalized coordinates degrees of freedom (i.e., those in addition to
its center-of-mass degrees of freedom).
When including the ligand’s internal degrees of freedom, the expression for the ligand’s proba-
bility density (Eq. 3.2) becomes:
ρ(r) =
∫dV
∫d3Np
∫d3Nq
∫dp′
∫dΩ
∫d3r′ e−β[H(p,q,p′,r′,Ω)+PV ] δ3(r′ − r)∫
dV
∫d3Np
∫d3Nq
∫dp′
∫dΩ
∫d3r′ e−β[H(p,q,p′,r′,Ω)+PV ]
, (3.13)
When adding the ligand, the Hamiltonian for the protein reference system (Ho) will again be
shifted by an amount equal to the protein-ligand interaction energy ∆E(q, r,Ω) and kinetic energy
K(p′), but also by the ligand’s internal potential energy U(Ω):
H(p,q,p′, r′,Ω) = Ho(p,q) + ∆E(q, r′,Ω) + U(Ω) + K(p′). (3.14)
Inserting the perturbed Hamiltonian (Eq. 3.3) into the expression for the ligand probability
36
density (Eq. 3.2), we get:
ρ(r) =
∫dV
∫d3Np
∫d3Nq
∫dΩ e−β[Ho(p,q)+PV ] e−β[∆E(q,r,Ω)+U(Ω)]
∫dp′e−βK(p′)∫
dV
∫d3Np
∫d3Nq
∫dΩ
∫d3r′ e−β[Ho(p,q)+PV ] e−β[∆E(q,r,Ω)+U(Ω)]
∫dp′e−βK(p′)
.
(3.15)
Using the definition for the isobaric isothermal ensemble average (Eq. 3.5), the ligand probability
distribution becomes:
ρ(r) =
⟨∫dΩ e−β[∆E(r,Ω)+U(Ω)]
⟩NPT⟨∫
d3r′∫
dΩ e−β[∆E(r′,Ω)+U(Ω)]⟩
NPT
. (3.16)
We now insert our expression for the ligand probability density (Eq. 3.16) into the definition of
the PMF (Eq. 3.1) and, just as we did for Eq. 3.8, we also impose that our PMF be zero when the
ligand is in vacuum (defined when ∆E(q, r,Ω) = 0). We then obtain:
W(r) = −kBT ln
⟨∫
dΩ e−β[∆E(r,Ω)+U(Ω)]⟩
NPT⟨∫dΩ e−βU(Ω)
⟩NPT
. (3.17)
For the case of diatomic ligands, we have chosen to keep the bond lengths fixed, such that the
only internal degrees of freedom Ω remaining are those that specify the orientation of the ligand.
In this case, the ligand’s internal energy U(Ω) is a constant, such that all the terms that contain
it in Eq. 3.17 cancel out. The expression for the PMF (Eq. 3.17) then takes on the simplified form
used in our analysis:
W(r) = −kBT ln
⟨∫
dΩ e−β∆E(r,Ω)⟩
NPT∫dΩ
. (3.18)
3.2.4 MD protocol and parameters
The dynamic trajectories of the proteins were computed by all-atom molecular dynamics (MD)
simulations, using the CHARMM27 force-field [99], the NAMD molecular dynamics program [127]
and the NAMD-G job submission and automation software [69]. Each Mb structure was embedded
into a water box and the resulting 20,000-30,000 atom systems were simulated using periodic
37
boundary conditions. Particle Mesh Ewald with a grid resolution of better than 1 A was used for
long-range electrostatics, and all other non-bonded interactions were calculated using a cut-off of
12 A. All simulations were carried out at constant temperature of 300K and constant pressure of
1 atm. Temperature and pressure were controlled using Langevin dynamics with damping constant
of 5 ps−1 and a Nose-Hoover Langevin piston with period of 100 fs and decay rate of 50 fs. The
integration timesteps were 1 fs, 2 fs and 4 fs for bonded, non-bonded and long-range electrostatic
interactions, respectively. Every system was initially equilibrated for 1 ns, after which the MD run
was extended for 5 ns, with static snapshots taken every 1 ps for analysis. Displacements of the
whole structure during the simulations were discounted by using a best fit alignment on the Cα
atoms. The implicit ligand sampling analysis was then performed on these trajectories.
3.3 Results
In the following section, we investigate the properties of the gas migration pathways inside Mb,
based on the free energy profiles calculated from our implicit ligand sampling method (see Methods).
We show that the computed 3-D maps of the PMFs for various ligands in Mb, which we will refer
to as implicit ligand PMF maps, match known experimental facts wherever the comparison can be
performed. In addition, our method makes predictions that are difficult to measure experimentally,
such as the existence and precise locations of additional gas diffusion pathways inside Mb that are
situated away from the heme.
3.3.1 Xe binding sites
X-ray crystallization of Mb in the presence of high-pressure Xe gas has been used to locate ligand
docking pockets that potentially accommodate small ligands such as O2, CO or NO [96, 156]. For
the most part, the location of Xe binding sites match small static cavities that consist of empty
space in the Mb crystal structure. However, the correspondence between empty space and Xe bind-
ing sites is not precise, since an empty space search finds many cavities which aren’t Xe binding
sites, provides no criterion for deciding a priori which cavities lodge Xe, and in most cases does
not pinpoint a specific location for the trapped Xe. The existence of atomic structures of Mb with
and without bound Xe provides an ideal test of our PMF calculation method. Fig. 2 shows the
38
location of Xe binding sites in the sperm whale Mb D122N mutant (PDB accession code:1J52),
juxtaposed with the locations of minimum free energy computed from implicit ligand sampling on
a 5 ns equilibrium simulation of the D122N mutant without Xe (PDB accession code:2MBW). In all
cases, the experimentally measured locations of the Xe binding site have been successfully pin-
pointed to well within the 1 A resolution of the PMF maps, except for the case of Xe3 (within 2 A)
which corresponds to a location occupied during our simulation by two water molecules present in
the crystal structure, one of which is actually completely displaced by Xe in the crystal structure
under Xe pressure (the binding site was predicted nevertheless based on the fluctuations of the
water molecule positions). The free energies of Xe at the binding sites estimated by the implicit
ligand sampling method (using the Xe force-field parameters described in Methods) and of the ex-
perimentally measured Xe occupancies are shown in Table 3.1. The exact experimental values differ
from the computed ones by 0.5 to 1.3 kcal/mol, most probably due to our choice of Xe parameters;
the relative differences in PMF for the various binding site are nevertheless well reproduced.
Figure 3.2: Predicted and actual Xe binding sites for the sperm whale Mb D122N mutant (shown inribbon representation with the heme drawn as licorice). The predictions, shown as red iso-surfacesrepresenting the areas where the Xe PMF is lower than -4.9 kcal/mol (points on this surface have anerror of ±0.8 kcal/mol), are based on a 4 ns equilibrium simulation of a Xe-less Mb structure (PDBaccession 2MBW). The four experimental Xe locations, represented by labeled circles, are taken froma structure of the same protein under 7 atm Xe pressure (1J52) [96].
39
binding site theoretical Xe PMF experimental Xe PMFXe1 -6.4 -5.1
Xe2 -5.2 -4.5
Xe3 -5.1 -4.6
Xe4 -5.5 -4.4
Table 3.1: Predicted and experimentally measured free energies for the four Xe binding sites (aslabeled in Fig. 2) in the sperm whale Mb D122N mutant, in units of kcal/mol. The theoretical PMFcorresponds to the minimum PMF measured in the vicinity of the binding site and the experimentalPMF is calculated from the crystal Xe occupancy at the given experimental Xe pressure, using the
approximate formula PMFXe = −kBT ln
((Xe occupancy)/1A
3
PXe/kBT
), where PXe is the experimental Xe
pressure (7 atm), and the Xe occupancy is provided for each Xe binding site in the 1J52 PDBstructure.
Experimentally determined Xe binding sites are often used to infer the location of gas diffusion
pathways. As we will argue later, the validity of this strategy is limited because the behavior of Xe
in proteins is quite different from that of smaller gas molecules such as O2, NO and CO, but the
results of such an approach are still meaningful. Nevertheless, the prediction of Xe binding sites
provides a successful test case for our implicit ligand PMF calculations.
3.3.2 CO migration pathways
Implicit ligand PMF maps for CO inside sperm whale Mb were computed and are shown in Fig. 3.
The PMF maps clearly show CO-accessible cavities inside Mb, as well as their connectivity and
the height of the energy barriers between them. The four Xe binding sites and the distal pocket,
all arranged in a loop around the heme, can be clearly identified in the PMF map. Additional
cavities near the heme that have been identified as participating in the migration of CO around
the heme by simulation [17, 18], are also distinctly present in the PMF map. These results also
are in good visual agreement with a picosecond-resolution X-ray crystallography movie of the CO
migration [144, 164]. Furthermore, one can observe an energy minimum at the exact location (Xe1
cavity) of a crystallized CO in the L29W Mb mutant [142, 143].
In addition to the distal cavity and Xe binding sites, the PMF map for CO migration reveals
additional cavities and O2 pathways that lead outside of Mb (see Fig. 3), suggesting that the distal
40
Figure 3.3: Implicit ligand PMF for CO inside sperm whale Mb, based on a 5 ns equilibriumsimulation of the 1DUK PDB structure, shown from four views looking towards the heme (a-d). Thethree energy iso-surfaces represent PMF values of -1.5 kcal/mol (red), 1 kcal/mol (blue cavities),and 5 kcal/mol (green). The empty white space corresponds to regions of measured PMF above5 kcal/mol; the zero energy value corresponds to the ligand in vacuum. Practically speaking, thered surfaces show gas docking sites, the inner blue surfaces show the areas inside the protein thatare more favorable to CO than the external aqueous solution, and the green surfaces highlightthe regions of lowest energy barriers between the various cavities. The low energy barrier exitsaccording to the displayed PMF map are indicated by red lines and circles, and dashed indicatorsmean that the exits is in the back. The error on points lying on the three PMF isosurfaces are ±0.3,+0.3/-0.4 and +0.3/-3.6 kcal/mol for red, blue, and green, respectively. The Mb’s static surface isrepresented in white-inside-blue-outside color and the heme is displayed with its bound proximalhistidine.
41
pocket may not be the only entrance/exit for gas ligands. We find three obvious exit pathways for
CO (defined as low barrier CO pathways that reach the solvent but do not necessarily continue
into it) in the implicit ligand PMF map of sperm whale Mb: the short distal pathway (gated by
His64), and two separate sets of exits from Mb at the far end away from the heme. In addition, we
observe three additional minor exits with higher energy barriers, one of which is a direct connection
from Xe2 binding site to the exterior. Unfortunately, there is little direct supporting experimental
evidence, since the pathways far away from the heme cannot be seen using time-resolved X-ray
experiments monitoring the migration of gas ligands in Mb after their photolysis from the heme.
This is because the gas ligand’s average density becomes very diffuse by the time it reaches these
pathways after photolysis, and also because these extra pathways do not appear to contain strongly
attractive docking regions where a significant gas ligand density could be experimentally observed
(represented by the lack of red surfaces in the bottom of Fig. 3).
Geminate rebinding rates of the gas ligand inside Mb are usually interpreted using a four state
model in which the gas ligand can, in turn, be in the external solution, inside the Mb distal
pocket, inside a system of internal cavities, or bound to the heme’s iron center (e.g., see [29]).
Despite experimental evidence pointing to possible ligand escape to the external solution by two
separate pathways – directly from the distal pocket and through the secondary cavity network [29]
– ligand escape has often been interpreted as occurring solely through the distal pathway (gated
by His64) [143, 145]. This has resulted into the popular view that Mb has a network of cavities
surrounding the heme, separated from the exterior by a single pathway [57].
This view of Mb having only one exit located at the distal pocket, however, besides being at
odds with our PMF map which reveals multiple exit points between the external solution and the
interior cavities of Mb for gas ligands, is also at odds with other studies. A simulation of CO
escape in Mb has identified a number of alternate exit pathways [47], though an increased CO
kinetic energy caused by the methodology used in that study may have influenced this observation.
A simulation performed by Bossa et al. [17] also suggests that some of the large cavities inside
Mb can be temporarily directly accessible from the external solution. Huang and Boxer [75] have
experimentally tested the geminate recombination parameters of Mb against a huge library of
about 1,500 single amino acid Mbs mutants, revealing that many mutations far away from the
42
heme and Xe-binding sites resulted in altered ligand migration behavior, suggesting that there may
be multiple access routes for the ligand between the Mb exterior and the Mb internal cavities.
3.3.3 Correlation with point mutations affecting gas ligand migration
Performing random mutagenesis on sperm whale Mb, Huang and Boxer [75] found a number of
residues whose substitution by another amino acid led to a substantial change in the geminate
recombination rates of Mb and O2 or CO, after testing roughly half of all possible mutations. These
“important” residues are shown in Fig. 4, along with the proposed pathways for O2 calculated from
our implicit ligand PMF analysis. We have classified the residues that affect gas ligand transport
into four groups, depending on their placement with respect to our calculated maps. Most residues
identified experimentally were also attributed important roles according to our theoretical analysis.
The first group (yellow in Fig. 4) is comprised of amino acids that form the commonly known
distal pathway (Leu29, Phe33, Phe43, Phe46, His64, and Val68). The distal pathway is well known
from numerous studies of Mb, and our PMF maps also suggest that this pathway is the most
favorable and the shortest one for gas ligands to reach (or to escape from) the heme. The residues
forming the distal pockets are generally found to be very conserved in Mb, in addition to strongly
influencing the recombination kinetics. Indeed, these residues are responsible for coordinating the
ligand before and while it binds to the heme, and they are responsible for the binding affinities of
various ligands to the heme [97, 118, 122, 149]. The second group (red in Fig. 4) are residues that
line putative exits from Mb’s interior, as defined by our PMF maps (Arg45, Thr67, and Leu137).
Mutation of any of these residues will affect the ability of gas ligands to enter or exit the interior
cavity network of Mb. The third group (blue in Fig. 4) is composed of amino acids with a small
profile that line a constriction between two cavities and also of bulky amino acids that directly
block the passage between two nearby ligand-accessible regions (Trp14, His24, Gln26, Ile30, Leu61,
Leu69, Ile99, Ile107, Ser108, Phe138, and Tyr146). We expect that mutating such residues would,
in general, cause a measurable change on internal migration rates since the cavity network topology
would be affected. The fourth group of residues (green in Fig. 4) does not demonstrate a substantial
correlation between their location and the PMF map (Lys16, Ala19, Lys34, His36, Asp44, Lys56,
Ala71, Gln91, Ala144, and Lys145). All are found on the periphery of Mb, pointing towards the
43
Figure 3.4: Amino acids whose substitutions significantly affect O2 or CO migration propertiesduring geminate rebinding in Mb, as determined by Huang and Boxer [75]. The heme is drawnwith the attached proximal His93. Residues forming the commonly recognized distal pathway areshown in yellow. The amino acids that are found at the exits from the Mb interior, according tothe PMF maps, are shown in red. Small amino acids that line a constriction between cavities, andlarge amino acids which directly block passages between neighboring ligand-accessible areas, arecolored in blue. Residues that were shown to affect ligand migration properties, but do not have anyvisible influence on the gas pathway according to the PMF maps are colored in green (some of theseresidues do cap alpha-helices and may play a structural role). The location of the gas migrationpathways is drawn schematically, with light gray and dark gray, respectively, representing likelyand highly likely regions for the ligand. Thick dashed lines indicate the exits that go out of theplane of the figure towards the viewer; thin dashed lines correspond to the exits behind the plane.Red arrows have been added to indicate the exits from Mb, dashed arrows represent exits behindthe plane. All residues except those lining the distal pocket (yellow) are labeled.
44
external solvent, and while some of these residues appear to be structurally important (such as
charged surface residues), it is not clear from our results why and how the remaining residues would
affect geminate rebinding rates. It is possible that these residues have an indirect influence; for
example, their presence may be critical for Mb to fold properly.
As pointed out by Huang and Boxer [75], many of the important residues are found far from
both the heme and from the distal pathway, which suggests that CO or O2 may use other pathways
in addition to the distal one to enter and exit Mb. Our implicit ligand PMF maps exhibit additional
exit pathways which are fully compatible with Huang and Boxer’s assessment.
3.3.4 O2, NO and CO share similar pathways to and from the binding pocket
We performed the implicit ligand analysis for O2, NO, CO, and Xe. To check that our ligand
parameters could reproduce real-world properties, and thus provide valid conclusions, we first used
the implicit ligand method to measure the ligands’ solvation energies. We accomplished this by
performing the implicit ligand analysis, using O2, NO, CO and Xe, on a 5 ns simulated trajectory
of a box of water. The PMF at each gridpoint in the entire water box was then properly averaged
(i.e., the ligand PMF was converted to and from its associated ligand occupancy probability, which
was the quantity used for the averaging) in order to compute a single free energy of solvation for
each ligand in water. Our calculated solvation energies were compared to experimental ones, and
the results are listed in Table 3.2. While the calculated energies are all slightly larger than the
experimental ones (by 5–30%), the relative differences between the ligands follow the correct trends
and are all respectably close to experiment.
We then computed implicit ligand PMF maps for O2, NO, CO and Xe inside sperm whale Mb
using the same equilibrium simulation of Mb for each analysis. Any observed variation between the
different maps is thus caused solely by the intrinsic properties of the different ligands (which differ
here only by their van der Waals parameters), and not by statistical variations since the protein
trajectory is identical for each ligand. Generally speaking, the PMF maps for all the ligands have
very similar cavity and pathway locations, but different absolute energy values.
Fig. 5a-c shows the PMF values at those points on our maps that lie on paths that were
computed to minimize the height of the energy barriers for O2, NO, CO and Xe between the heme
45
ligand ∆Gexp ∆Gtheo
Xe 1.04 1.25± 0.04
NO 1.53 1.60± 0.01
O2 1.78 1.97± 0.02
CO 1.94 2.54± 0.02
Table 3.2: Comparison of the free energies of solvation in units of kcal/mol for different gasmolecules measured from experiment and from the implicit ligand PMF analysis. The experi-mental values of the solvation energy at 20C are taken from those compiled in Scharlin et al. [140].Theoretical values are obtained by properly averaging the ligand PMF calculated for a 5 ns simu-lation (5,000 frames) of a 40×40×40 A3 water box at 300K and 1 atm. Quoted errors, which aresmall because of the huge amount of sampling, represent the statistical variance on the calculatedPMF, and do not account for the choice of the force-field parameters for the water and ligands.
binding site to three most likely exits identified by our maps. The actual paths taken through Mb
are displayed in Fig. 5d. It must be noted that the PMF values that we quote are the PMFs of
having a gas molecule present in a cubic box of 1 A side length, centered at the grid point where the
PMF is measured. The detailed PMF along a path, which is what we show, is defined differently
than the PMF of “being in a specify cavity” or of “being in the solvent”, since in the latter case,
the probability of being at every grid point within the specified cavity or in the solvent must be
summed and depends on the total size of the given cavity or of the accessible solvent.
For the case of O2, the energy barrier to enter Mb is very low – only a few kBT above the
computed solvation energy of O2. Not surprisingly, of all the ligands we have investigated, O2 has
the smallest energy difference between its highest barriers and most attractive cavities. We evaluate
the Gibbs free energy difference between the distal pocket’s most attractive region and the lowest
barrier to be crossed for O2 to exit through the distal pathway to be about 6 kcal/mol. This result
matches theoretical and experimental measurements of the same barrier energy of 6.4 kcal/mol [90]
and 7.5 kcal/mol [29], respectively. This implies that O2 is the ligand that can enter, exit and move
around Mb with the least hindrance of all gases studied, as would suggest Mb’s role in storing and
transporting O2.
As compared to O2, NO exhibits a stronger attraction to the Mb cavities by roughly 1 kcal/mol
(i.e., all else being equal, NO is about 7 times more likely than O2 to be in a given Mb cavity,
46
Figure 3.5: PMF profiles experienced by ligands exiting Mb along (a) the distal pathway and (b,c)the two other most favorable exit paths between the heme binding site and the external solution.The path profiles were determined by finding the path, between two pre-defined end-points (one atthe heme binding site, and one near an Mb exit), that exhibits the smallest energy barriers. Thevalues of the PMF at each point along these paths are then plotted as a function of the ligands’distance from the heme binding site. The procedure is repeated for O2, NO, CO and Xe ligands,using the same end-points for a given exit. The solvation energies of each ligand in water, as givenin Table 3.2, are represented as horizontal dashed lines, and the location of the distal pocket (DP)and Xe binding site Xe4 are indicated. (d) The actual points along the three paths in relation tothe Mb PMF maps are plotted in green (for the distal path a), red (path b) and yellow (path c).
47
such as the distal pocket), however the absolute height of the largest energy barriers between these
cavities as well as to the external solution is at roughly the same level as for O2, which translates
into higher relative barriers due to NO’s lower solvation energy. Our results suggest that sperm
whale Mb would keep NO trapped in its internal cavities, which surround the heme, longer than it
keeps O2. These results are relevant because NO is known to harmfully deactivate cytochrome-c
oxidase and recent studies suggest that oxy-Mb plays a role in scavenging stray NO from the cell,
which it then deactivates by reaction with its bound O2 ligand to produce nitrate (NO−3 ) [54]. It
has been suggested [21] that the cavities in Mb could act as hosting stations for NO, and act to
increase its chance of collision with heme-bound O2 by keeping it inside of Mb longer. This latter
hypothesis is well supported by our results.
In our modeling, of all the diatomic gas ligands, CO interacts the least favorably with Mb. CO
is less attracted to Mb’s cavities than O2 by about 0.5–1 kcal/mol. CO also experiences significantly
higher energy barriers (by roughly 3–5 kcal/mol as compared to O2) between internal cavities as
well as to the external solution. CO is toxic for Mb as well as for other proteins which are at
the receiving end of Mb’s O2 transport queue, such as respiratory cytochromes and cytochrome
oxidase. It appears that Mb is protected from CO by high energy barriers, which would reduce
the rate of CO intake (versus O2 intake), when Mb finds itself at the high concentration end of
the intracellular O2 and CO gradients. Our PMF profiles indicate that whereas the exit through
the distal pathway appears to be the most favorable one for O2 and NO, the variation in absolute
energy barriers between the different exits is less pronounced for CO (this conclusion comes with
the caveat that the error on large values of the PMF can be important, thus affecting the barriers
that we measured for CO). In any case, the increased availability of multiple exits from Mb for
CO as compared to O2 may have a functional role. Notably, the existence of multiple exits lends
support for the hypothesis by Radding and Phillips [130] that Mb protects itself from CO poisoning
through a kinetic proof-reading mechanism by preferentially allowing proportionally more CO than
O2 to exit Mb from the heme through the cavity network, thereby ensuring that only 4–7% (with
a relaxation time of about 180 ns) of photolyzed CO rebinds to the heme, as opposed to 27–42%
(with a relaxation time of about 55 ns) for O2.
We can compare the PMF profiles of Fig. 5 to the various experimental rates and estimates
48
of equilibrium constants and energy barriers for ligand migration in sperm whale Mb. Despite
the variations in methodology and results between studies, our results are generally consistent
with other measurements. Olson [119] estimate that the escape barrier height for CO migration
between the distal pocket and the solvent to be about 4 kcal/mol. Our analysis estimates a barrier
of 7.5 kcal/mol (with an error of +0.4/-3.6 kcal/mol), which meets the experimental value at the
bottom of our error. We expect our high barriers to always be overestimated and believe this to
be the case here. Rohlfs et al. [133] have estimated indirect rates for the solvent to distal pocket
migration (hereby referred to as kX→B) and solvent to distal pocket equilibrium constant (KX→B)
for O2, CO and NO. The experimental estimates for the equilibrium constants are 0.72± 0.25,
0.22± 0.12, and 0.07M−1 for NO, O2, and CO respectively (the CO value being a very rough
estimate with no associated error). The ordering of these occupation probabilities and the reduction
by a factor of three as one goes from NO to O2 to CO matches the sequential reductions in the PMF
by roughly 1 kBT PMF in going from NO to O2 to CO, as seen in the distal pocket (see Fig. 5b,c).
For the kX→B ligand entrance rates, we expect CO to enter Mb at much slower rates than O2 and
NO. Rohlfs et al. estimate all three rates to be nearly identical, the CO rate however having an
error of over ± 300%. We note here that the experimental results are derivative quantities and
thus the errors are large, making it hard to conclude that the agreement is definitive. In theory,
it would be possible, through computation, to estimate theoretical effective transport rates for the
ligand migration, based on our PMF maps (as opposed to qualitatively inferring trends from the
energy profiles).
Banushkina and Meuwly [11] measure a barrier of 7.8 kcal/mol from Xe4 to the distal pocket
for the CO migration in wild-type sperm whale myoglobin (and 4.3 kcal/mol for the reverse migra-
tion) using umbrella sampling. We estimate these same barriers to be about 4.5 and 3.5 kcal/mol
respectively. Bossa et al. [18] measure a symmetrical PMF barrier of about 2.6 kcal/mol from Xe4
to the distal pocket for CO, inferred from a long simulation (in essence, umbrella sampling with
a flat umbrella potential) of which about 3 ns is spent by CO at the barrier. All three method-
ologies are different and have different strengths, and for this specific case, we lean towards the
values provided by Bossa and ourselves as providing the more accurate theoretical results. The
implicit ligand sampling analysis is based on a larger amount of independent samples obtained at
49
every point in space (e.g., 5,000 ps × 400 conformers per point in space, a tenth of which can be
considered independent), as compared to the other methods which use a relatively low number of
independent samples per coordinate point at the barrier (e.g., 50–100 ps × 1 conformer per reaction
coordinate increment for the umbrella sampling), especially given that the sampling is spread over
many values of the reaction coordinate. On the other hand, in implicit ligand sampling, there is
little guarantee that large energy barriers will be sampled accurately due to the lacking influence
of the ligand, and this results in overestimated energy barriers. However, when a properly con-
ducted umbrella sampling analysis is compared to an implicit ligand sampling analysis and the
latter yields a lower final free energy, then the ligand sampling is almost certainly more correct
given the much larger number of independent conformations sampled per point is space versus an
umbrella sampling approach. When the implicit ligand approach yields a higher free energy (with
a large error), then it is possible that it did not sample the right protein conformations, and the
umbrella sampling may be more representative, as could be the case for the Bossa et al. results.
One must be aware, however that both methods do not measure the same quantity. The implicit
ligand sampling measures the PMF at every point, whereas umbrella sampling measures the PMF
of an area of space delimited by the area explored by the ligand during the simulation, projected
onto a pre-defined reaction coordinate.
While Xe has no relevant biological function, it is frequently used in X-ray crystallography
as a probe to identify the locations of cavities which may be involved in gas ligand migration.
Furthermore, it has been observed in mammalian Mbs, that the amino acids forming the Xe
binding sites are much more conserved than other amino acids [57]. For this reason, PMF profiles
for Xe are relevant because they provide an interpretation for Mb structures obtained under high
Xe pressure conditions. Since Xe interacts strongly with Mb and is also very large, its behavior
differs from that of small diatomic gases. In our PMF profiles, this translates simply into lower
binding energies for Xe in the Mb cavities sites and higher barriers between these cavities as well
as to the external solution, as compared to small diatomic gases. Very important, however, is the
observation that the location of Xe binding sites correlates very well with the regions of the protein
that are most attractive to O2, NO, and CO. In this respect, the Xe binding sites observed in X-ray
crystals do, in fact, truly indicate docking regions for diatomic gas molecules. Gas ligands do not,
50
however, solely diffuse in proteins by means of cavities accessible to Xe, and the presence of such
cavities does not automatically imply that diatomic gases must transit through them, nor does
their absence indicate that a favorable pathway for gas ligands does not exist. Xe cavities merely
indicate the regions in which there is a high probability of finding gas molecules, and more often
than not, these cavities will reside along the pathways taken by gas ligands to reach the heme.
Xe’s large size and strong interaction with the protein imply that, of all the ligands that we
have examined, the Xe PMF is the least accurate. However, the excellent match between predicted
and observed Xe binding sites for Mb (see Fig. 2 and Table 3.1) gives legitimacy to our Xe PMF
curves. It must be noted, though, that while the fact that we observe large energy barriers for Xe in
Fig. 5a-c is to be believed, the actual maximum height of these barriers is inevitably overestimated
by a significant amount in our calculations (for reasons detailed in the Methods section).
We have seen that the PMF profiles of various ligands inside Mb are in qualitative agreement
with Mb’s function. It remains to be seen whether this agreement is coincidental, or whether Mb’s
structure and dynamics are finely tuned by evolution to provide ideal energy profiles for different
ligands. A full study on the general properties of O2, NO, CO, and Xe migration in many different
proteins needs to be performed before this question can be accurately resolved.
3.3.5 Gas ligand pathways across species
The atomic structure of Mb has been solved for different animal species, and in order to compare
these, we have computed implicit ligand PMF maps for sperm whale (PDB accession codes 1DUK),
pig (1MWD), horse (1AZI), Asian elephant (1EMY), yellowfin tuna (1MYT), and sea hare (1MBA) based
on 4.6–5.0 ns equilibrium simulations of the above systems. The implicit O2 PMF maps for sperm
whale, pig, tuna, and sea hare Mbs are compared in Fig. 6.
The similarities between our calculated implicit ligand PMF maps for the various Mbs reflects
the evolutionary distance between species. Fig. 6a highlights the strong similarities between the
location of the O2 migration pathways inside pig and sperm whale Mbs. The implicit ligand PMF
maps for horse and the Asian elephant (not shown) demonstrate the same degree of resemblance to
sperm whale Mb as is exhibited by pig Mb. As the evolutionary distance between species increases,
the migration pathways look more and more different, as we show for the cases of yellowfin tuna
51
Figure 3.6: Comparison of the implicit ligand PMF maps in Mbs of different species. The implicitligand PMF map of O2 for (a) pig, (b) yellowfin tuna, and (c) sea hare Mbs (red) are compared withthat for the sperm whale Mb (blue). The iso-surfaces are drawn using a PMF value of 1.8 kcal/mol(points on these contours have an error of +0.2/-0.4 kcal/mol). The sperm whale Mb’s heme withthe connected proximal histidine is shown along with the protein’s external surface (black). Thisfigure was created by Anton Arkhipov.
(fish) Mb (see Fig. 6b) and sea hare (mollusk) Mb (see Fig. 6c), the latter being the least similar
to whale Mb in terms of migration pathways.
Despite the obvious differences, the O2 PMF maps for the Mb of the various species share some
common features. First, all three Mbs shown in Fig. 6 appear to be quite “open” to O2, in that
they all display many regions in their interior that are favorable to O2. This contrasts with what
is seen in the example of CpI hydrogenase, which only allows O2 in a very limited region of its
interior [32]. Second, all three PMF maps feature a pronounced distal cavity (to the right of the
heme in Fig. 6) which is connected to the Mb exterior by a short pathway (out of the page towards
the reader in the figure). In all three cases, the Xe binding sites of sperm whale Mb correspond
to favorable cavities (the residue lining the Xe binding sites and the distal cavity are, in fact,
more conserved than other residues across mammalian species [57]). Finally, all three Mbs exhibit
potential exits from the binding pocket other than through the distal pathway which suggests that
gas ligands can enter and leave Mb’s interior in many ways for all Mbs.
3.4 Discussion
We have described and applied a method to compute the PMF (which is related to the probabil-
ity of occupation) for the passive migration of small gas ligands inside Mb using a perturbative
52
framework. Our results are important for two reasons. First, they provide a complete and direct
determination of all the gas pathways in Mb. This complete picture of gas pathways can be used to
determine which residues are involved in gas transport without resorting to per-residue mutations.
They also provide a clear interpretation of experimental geminate recombination results, which
otherwise involve guesswork and/or numerous years of careful follow-up experiments in order to
be understood correctly. The fact that our observations are direct and detailed means that they
have strong predictive power over the effect of residue mutations as well as over the locations of
gas pathways and Xe binding sites in any other protein of known structure, irrespective of whether
that protein is suitable to be studied by traditional experimental methods such as the monitor-
ing of gas migration events after flash-photolysis. Secondly, they demonstrate unequivocally that
short-timescale random thermal motion of the protein matrix and its environment, alone define
reproducible and well-defined gas transport pathways inside proteins. In our model, the protein’s
thermal fluctuations are calculated explicitly without resorting to any assumption besides those
inherent in the CHARMM molecular dynamics force-field, which was parametrized to empirically
reproduce short timescale thermal fluctuations, and thus is particularly valid for the present appli-
cation.
The implicit ligand sampling method produces results that have very low errors when the PMF
values are low (high-probability regions), and large errors when the PMF is very large (inaccessible
regions), making it very suitable for the detection of gas migration pathways inside proteins, and to
a lesser but still significant extent, for the measurement of all free energy barrier heights along these
pathways. The approach works because gas ligands, being small and apolar, interact very weakly
with the protein, and thus, do not promote significant conformational changes in the protein.
Because of this, there is a strong overlap between the distribution of protein states in the lone
protein and protein with ligand ensemble, and the former can thus be used to calculate properties
of the latter. Although there is always an amount of uncertainty arising from molecular dynamics
simulation, due to short timescale sampling and empirical force-field models, we believe that our
specific analysis presents a convincing case despite these caveats.
On the biological side, our results have important ramifications regarding the general mechanism
by which gas ligands are transported inside the protein matrix. Numerous hypotheses have been
53
brought forth over the years to describe gas transport inside proteins. The first studies assumed
that gas ligand diffused through small permanent channels [156]. Other studies have suggested
that gas ligands enter proteins directly, as if they were simply a more viscous medium [26]. The
currently emerging view for many proteins is that, rather than diffusing along permanent channels,
gas ligands can migrate through bulky regions of the protein, guided by the proteins’ internal
thermal motion [17, 23, 27, 32, 47]. Our results suggest that this is the case, and furthermore
that the pathways taken by the gas ligands are not randomly distributed in the protein, but that
they are, in fact, located in well defined regions that can be identified by examining the protein’s
thermal fluctuations. The simple fact that we detected pathways that match known data implies
that a small ligand can diffuse in and out of Mb solely due to protein’s thermal fluctuations at the
nanosecond time scale, even though the timescale of the total ligand migration can be much longer.
Cavities inside the protein matrix, such as Mb’s xenon binding sites, appear to play a prominent
role in accommodating gas ligands inside Mb. Interestingly, such cavities are sometimes barely
present in other proteins that still exhibit thermally-defined gas ligand pathways that stretch over
long distances, such as in CpI hydrogenase [32]. Cavities would appear to create favorable docking
sites for the ligand, but are not necessary to account for the ligand’s mobility as it migrates inside
the protein matrix. Our analysis suggests that cavities could perhaps also play a role in the gas
ligand selectivity of the protein.
Finally, we wish to mention other systems, besides Mb, where the study of the migration of
small gas ligands inside the protein matrix is important. Oxygen sensitivity is a highly relevant
issue for hydrogenases, enzymes that produce or breakdown hydrogen gas. Their sulfur-metal active
sites can usually also bind O2. Recent developments aim at harnessing the hydrogen-producing
power of hydrogenases for biotechnological purposes, but for this to be practical, the sensitivity of
hydrogenases to O2 must be repressed. Buhrke et al. [25] have found that the [NiFe]-hydrogenase of
Ralstonia eutropha H16, which is usually resistant to O2, can be made sensitive to O2 by a mutation
of residues located along a putative channel leading to the active site. This study suggests that
the protein matrix of this hydrogenase may play an important role in regulating the access of its
active site to O2 (along with the O2 sensitivity being regulated by its affinity to the active site
and its environment). Another example involves O2 migration inside cytochrome c oxidase from
54
R. sphaeroides. It was shown [139] that a single point-mutation inside the protein is enough to
block O2 access. There are many examples of proteins which use small gas ligands as a substrate
or ligand and in many cases, the gas ligand must reach a buried region of the protein. The above
examples demonstrate the relevance of studying gas ligand migration inside proteins and underscore
the importance of being able to identify gas migration pathways that are not readily visible in the
protein’s static structure.
55
Chapter 4
Effects of protein architecture andsequence on gas migration pathways.
While networks of O2 pathways have already been characterized for a small number of proteins,
the general properties and locations of these pathways have not been compared across different
proteins. In this study, maps for the O2 pathways inside twelve different monomeric globins have
been computed. It is found, despite the conserved tertiary structure fold of the studied globins,
that the shape and topology of the O2 pathway networks exhibit a surprisingly large variability
between different globins, except when two globins are very closely related. The locations of the
O2 pathways are, however, found to be correlated with a protein’s local residue composition, and
the same correlation is observed for two independent sets of protein families: monomeric globins
and copper-containing amine oxidases. These results have implications for protein-engineering
applications involving modifications of gas pathways in proteins. (This chapter is based on work
published in Cohen and Schulten [35].)
4.1 Introduction
For many classes of proteins, enzymatic reaction with, or binding to, gas molecules is an essential
component of their function. Such proteins often bind gas molecules by means of buried active sites
consisting of metal ions or metal-containing compounds. Gas molecules such as O2 must make their
way across the protein’s interior to reach these active sites. In the majority of cases, permanent gas
channels can neither be detected nor are found present in the protein’s static structure; instead,
the migrating gas molecules take advantage of transient cavities that occur inside the protein due
to thermal fluctuations [27]. By monitoring the occurrence of these transient cavities as they
occur over time, recent methodological advances such as volumetric gas accessibility maps [32] and
implicit ligand sampling [31] have made it possible to comprehensively map and describe networks
56
of gas migration pathways inside proteins.
For a large number of proteins that interact with O2, finding the location of O2 migration
pathways has important implications. For example, locating the O2 pathways in oxygenases and
O2-consuming oxidases provides important clues regarding their enzymatic activity and operating
mechanism. Also, the elucidation of the O2 pathways in hydrogenases is helping current efforts
aiming to block O2 access to the hydrogenase active sites, which would, in turn, make these
proteins useful for commercial hydrogen gas production [15, 65, 103]. To this date, complete
maps of O2 pathway networks have only been computed for a small number of proteins, including
CpI hydrogenase [32], sperm-whale myoglobin (Mb) [31], and AQP1 aquaporin [167]. As more
proteins are visualized in terms of their O2 migration pathways, one will gain a better grasp of how
gases are transported inside proteins, discern patterns of how such pathways are conserved across
protein families, and develop rules of thumb for quickly identifying these pathways. In this chapter,
we address the question of O2 pathway conservancy within a given protein fold by computing and
comparing maps of the O2 pathway networks across a range of proteins from the globin superfamily.
Globins are a large and ancient family of proteins for which all members, with few exception,
share an exceptionally well-conserved tertiary structure: the globin fold. At the heart of this
fold lies a prosthetic group – the heme – which is universally used by globins to reversibly bind
to, and temporarily hold, O2 and other gas ligands. The present investigation focuses on three
globin subgroups: monomeric hemoglobins (Hbs), which transport O2 throughout entire organisms,
Mbs, which store and transport O2 within muscle cells, and leghemoglobins (Lbs), which store,
transport, and/or scavenge O2 to maintain a population of symbiotic bacteroids in the root nodules
of symbiotic plants [7].
In addition to binding O2, many globins have other physiological functions. For example, while
the role of Mb has been long-considered to be well-established [22, 56, 57, 169], recent studies
are showing that Mb is also involved in secondary roles [60, 169] such as the scavenging and
inactivation of nitric oxide (NO) [54] and a weak peroxidase activity [166]. Many invertebrate Hbs
also possess a number of interesting characteristics, such as the ability to react with sulfide [168],
and the ability to tune their affinities to O2 depending on their environment. By forming multimeric
assemblies [122, 137, 168], many Hb monomers can bind gases such as O2 and CO2 cooperatively
57
by making their affinity for O2 depend on the O2-occupancy of the neighboring monomers, either
through quaternary conformational changes and/or through multimeric association/dissociation.
It is clear that despite the well-studied nature of the globin family, there remains a large number
of globin properties relating to their structure that are still poorly understood, and in many cases,
not even known.
In this chapter, we focus on the O2-transporting role of the globin protein matrix. A globin’s
main function is performed by its heme which binds gas ligands for extended periods of time.
One could consequently regard the conserved protein fold surrounding the heme as merely a shell.
Nevertheless, this “shell” provides important functionality. First, the protein shell protects the
heme from oxidizing into an inactive ferric state, which would happen if the heme were to float
freely in solution. Second, the protein component strongly modulates the environment of the heme-
bound ligand and thus influence its binding affinity, its binding rates, and the globin’s relative
selectivity for various ligands. Finally, the protein matrix provides cavities and pathways for gas
ligands to travel from the exterior solution to the heme and vice versa. Since all globins have an
identical heme, the protein shell is what determines a globin’s specific role and properties. When
one considers the large variety of globin behaviors and properties, the importance of the protein
component becomes obvious.
A large body of work, both experimental and theoretical, has focused on finding gas ligand
pathways inside globins, particularly sperm whale Mb and certain Hbs. The main tools for such
studies are x-ray crystallography in the presence of xenon [156], x-ray crystallography of inter-
mediate states [24], time-resolved x-ray crystallography [20, 143, 144, 164, 165], spectroscopy of
the geminate recombination process [10, 41, 67, 115, 120, 133, 145, 146], and molecular dynamics
simulation [18, 28, 47, 77, 117]. For the experimental work in particular, the effects of many point
mutations on O2 transport rates inside globins has been investigated in detail. In almost all cases,
however, only overall rates of O2 association/dissociation are accessible, and, only occasionally,
pathways are mapped in very localized and restricted regions of the protein. Here, we inspect
the entire set of pathways inside a broad set of monomeric penta-coordinated globins for which
structural data is available.
58
4.2 Methods
Simulations were run for a selected set of monomeric globins of known structure, taken from the
Protein Databank (PDB). In every case, a deoxy- form of the globin was created from the PDB
coordinates, and any water or gas ligand present in the DP was removed. For the cases in which
only the ferric state of the globin is available (whether unbound or bound to small compounds),
the coordinates of the ferric heme were used as starting points, but the hemes were modeled using
parameters for the ferrous state.
For every globin, the simulation system was built from the PDB coordinates by adding hydrogen
atoms, binding the globin’s proximal histidine to the hemes (here, the heme is penta-coordinated in
all cases), and by picking an appropriate titration state for every histidine based on its immediate
environment. The Dowser water-placement program [171] was then used to internally solvate the
globin atomic structures, though rarely resulting in the placement of additional water molecules.
The heme–protein complex was then solvated using a water box whose sides exceeds those of the
protein by at least 20 A in all dimensions. 50 mM of NaCl was then added, adjusting the relative
concentrations of Na+ and Cl− to make the whole system chargeless.
The equilibration protocol used two pre-equilibration stages: an initial 30 ps simulation stage
had the protein and heme fixed, and allowed the solvent to relax at constant temperature (300 K)
and pressure (1 atm); a 50 ps stage then allowed both protein side chains and solvent to equilibrate
while constraining the protein backbone. The entire protein-solvent systems were then equilibrated
for another 950 ps. Finally, an additional 10 ns of simulation at the same NPT conditions was
performed for analysis. The 10 ns simulations were processed using the implicit ligand sampling
method [31] included in the VMD visualization program [78], resulting in a 3D potential of mean
force (PMF) map of the complete network of O2 migration pathways for each globin.
All simulations were performed using the molecular dynamics program NAMD [127] in combi-
nation with the NAMD-G simulation automation engine [69]. Simulation parameters were taken
from the CHARMM22 force-field [99]. Particle Mesh Ewald, with a resolution of at least 1 A was
used everywhere for long-range electrostatics. Langevin dynamics and a Langevin piston were used
to maintain constant temperature and pressure, respectively. Finally, integration timesteps of 1, 2,
and 4 ps, respectively, were used for bonded, non-bonded and long-range electrostatics interactions.
59
globin species PDB codeMyoglobins
sperm whale Mb Physeter catodon 1A6M, 1A6N [163]sperm whale Mb (YQR mutant) Physeter catodon 1MYZ [20]horse heart Mb Equus caballus 1WLA [102]sea hare Mb Aplysia limacina 1MBA [16]
Invertebrate hemoglobinspig roundworm Hb domain I Ascaris suum 1ASH [170]trematode Hb Paramphistomum epiclitum 1H97 [124]marine bloodworm Hb component III Glycera dibranchiata 1JF3 [121]midge HbIII Chironomus thummi thummi 1ECO [150]clam HbI Lucina pectinata 1FLP [132]
Leghemoglobinsyellow lupin Lb II Lupinus luteus 1GDJ [73]soybean Lb A Glycine max 1BIN [72]
Table 4.1: List of penta-coordinated monomeric globins investigated in this study.
4.3 Results
4.3.1 O2 pathways in monomeric globins
While the O2 pathways and cavities in Mbs (particularly sperm whale Mb) have been studied
extensively, those in most other globins remain unknown. We have investigated the networks of
O2 pathways in a broad set of penta-coordinated monomeric globins, listed in Table 4.1, for which
atomic coordinates are available. For each protein, we calculated the O2 PMF map according to
the protocol described in the Methods section. This provided the locations and energy barriers of
the complete network of O2 pathways inside every simulated globin.
Myoglobins. We have computed the O2 PMFs for sperm whale, horse heart, and sea hare Mb.
In the case of sperm whale Mb, we also looked at two additional variants: deoxy-Mb,in which the
distal pocket (DP) contains a water molecule, and the sperm whale (L29Y, H64Q, T67R) “YQR-
Mb” mutant, designed to mimic the slower association/dissociation rates observed in the Ascaris
nematode Hb [5].
The PMF maps for sperm whale oxy-Mb, deoxy-Mb, and YQR-Mb were computed in part
to test the reproducibility of the implicit ligand sampling approach, and in part to observe the
magnitude of the changes caused by the presence of water in the DP and by point mutations. The
60
O2 PMF maps for sperm whale oxy-Mb and deoxy-Mbs matched particularly well: every favorable
O2 holding region (red, in Fig. 4.1a,b) and the O2 pathway interconnections (blue, in Fig. 4.1a,b)
exhibit an excellent correspondence between the two maps, both in shape and in size, as they
should. The YQR-Mb mutant, also, exhibits strong similarities with the other sperm whale Mb,
except for the shape of the DP, as expected, which is where the three “YQR” point mutations are
located. The variation in shape of the O2 pathways near the Xe1 binding site in YQR-Mb is due
to the presence of a crystal water molecule at that location which is not present in the other Mbs.
Also, a reduction in the size of the pathways far away from the heme (bottom of Mb in Fig. 4.1c)
appears to be due to statistical variations in the presence of water molecules inside all Mbs near
those locations over the course of the simulations. All in all, excluding the effect of trapped water
molecules inside the proteins, the maps for all three sperm whale Mbs were the most similar to each
other of all globin maps, and the details of these maps were remarkably well-reproduced between
each other (as well as with the independently-computed CO PMF map for sperm whale computed
in Cohen et al. [31]). These results gives further credibility to the reproducibility of the implicit
ligand sampling approach for mapping gas migration pathways.
When comparing Mb O2 PMF maps across species, we again see a good agreement between
sperm whale Mb and horse Mb as we did between the various sperm whale Mbs, reflecting the fact
that these globins are all, to a practical extent, almost the same protein. The comparison becomes
interesting, however, when one looks at the O2 PMF for sea hare Mb (Fig. 4.1d). Despite the
very strong similarities in both function and structure between sperm whale and sea hare Mb, the
location of O2 pathways is, surprisingly, very different for these two Mbs.
The mapped O2 pathways provide new insights into the behavior of Mbs. A closer examination
of the YQR-Mb O2 PMF maps reveals that its DP is much more unfavorable to O2 than the DP of
sperm whale oxy-Mb, namely, by approximately 3 kcal/mol. Paradoxically, the shape and energy
features of the DP do not resemble those of the Ascaris roundworm Hb, which served as a template
for YQR-Mb. According to the O2 maps, YQR-Mb’s low association/dissociation constants are
due to a DP which is unfavorable to O2, resulting in a lower probability of O2 occupation in the
DP and lower chance of binding to the heme, rather than having higher energy barriers to reach
the DP, given that the latter is not observed here.
61
Figure 4.1: O2 PMF maps for various monomeric globins. Shown are the 0 kcal/mol (red) and1.6 kcal/mol (blue) O2 free energy contours, along with the four sperm whale Mb xenon bindingsites as green spheres. The globins are: sperm whale (a) oxyMb, (b) deoxyMb and (c) YQR mutantMb, (d) horse and (e) sea hare Mbs, (f ) soy and (i) lupin Lbs, (g) roundworm, (h) trematode, (j )bloodworm, (k) clam and (l) midge Hbs. The Xe binding sites of sperm whale Mb are shown asgreen spheres and the proteins’ α-helices are displayed as black lines.
62
In particular, we find it interesting that the minima of the 3D energy maps occur at the sperm
whale Mb’s DP, both in the presence (deoxy-Mb) and absence (oxy-Mb) of a water molecule inside
it, implying that water does not prevent O2 from reaching the DP. The deoxy-Mb DP is, however,
measured to be less favorable to O2 by 3 kcal/mol. Surprisingly, we did not observe any opening of
the distal channel (defined as the pathways going through the swinging His64 “gate”) during any
of the 10 ns simulations of sperm whale Mb (which we extended to 25 ns for the case of oxy-Mb) or
of horse Mb, resulting in the absence of this pathway in the maps presented in this chapter. The
distal pathway was observed in a previous PMF map of Mb [31], however the initial structure of Mb
used in that study contained crystal deformations that may have contributed to this discrepancy
in the distal pathway behavior. Other computational studies have, however, reported observing
the spontaneous swinging of the distal histidine “gate” in Mb at 10–100 ns timescales [17].
Invertebrate monomeric hemoglobins. When we extend the comparison to include the O2
pathways for various monomeric invertebrate Hbs, we note a surprising observation. Fig. 4.1
illustrates the O2 pathways for the 12 simulated globins. Aside from a prominent DP, the various
monomeric globins exhibit O2 pathway and cavity locations which are completely different from one
Hb to another. These variations are significant and reproducible and cannot at all be attributed
to errors in the evaluation of the PMF. Since the globin fold is well conserved amongst the studied
globins, and their tertiary structure is near-identical (see Fig. 4.2), our results suggest that the
locations of O2 pathways (which in general are the same as those for CO and NO [31]) are not
determined by the protein’s secondary and tertiary structures, which are conserved.
A second notable observation arising from the O2 PMF maps for the invertebrate monomeric
Hbs is the large number of exits and pathways present in each globin. Our results suggest that
multiple exits and an overall porousness to O2 might be the norm in globins. This is surprising
and contrasts with the common assumption of a single O2 entryway in many kinetic models of
O2 migration in Mb [143, 145], as well as with the O2 PMF maps of the other proteins for which
O2 pathways have been mapped: CpI hydrogenase, which was found to be largely impermeable to
O2, except along two well-defined and very localized pathways [32], and AQP1 aquaporin [167] for
which O2 pathways are only found at the interface of the protein’s constituent monomers.
63
Figure 4.2: The structure of 10 monomeric globins are aligned and superimposed, demonstratingthe very strong conservantion of their secondary structure globin fold. The structures are spermwhale (blue), horse (green) and sea hare (cyan) Mbs, soy (black) and lupin (white) Lbs, roundworm(yellow), trematode (red) and bloodworm (orange) Hbs, clam (pink), and midge (purple) Hbs. TheXe binding sites of sperm whale Mb are shown as green spheres and the proteins’ α-helices aredisplayed as black lines.
Leghemoglobins. We have studied two Lbs, from lupin and soy, both exhibiting very similar O2
PMF maps (see Fig. 4.3b). The O2 PMF maps for both Lbs show them to be mostly inaccessible
to O2, except for a very short and direct exit between the DP and the external solution. The main
exit seen in both lupin and soy Lbs (exit, in lupin Lb: Ala37, Leu43, His106, Val109, and in soy
Lb: Pro38, Leu43, Gln101, Val104), is the same as the one reported in lupin Lb by Czerminski and
Elber [40] using locally-enhanced sampling simulation. Despite the presence of a distal histidine in
Lb at the same location as the gating histidine in Mbs, the distal pathway is not present at all in
the Lb O2 PMF maps (the distal pathway was in fact observed here for every Mb, even though it
was in some cases seen to be blocked by a closed histidine “gate”).
When compared with the other monomeric globins from Fig. 4.1, the fact that both Lbs have
a single, and conserved, dominant exit next to the heme is an unusual feature. The only possible
other exit, according to the O2 PMF maps, is a much less probable secondary exit that is still
located right next to the primary exit. This is in stark contrast to all the other globins in this
study, which are all very porous to O2 and all possess multiple exits for O2. Both soybean and
64
Figure 4.3: Comparisons of the 1.6 kcal/mol O2 PMF surfaces for similar monomeric globins. Theglobins are: (a) sperm whale Mb (blue) vs. horse heart Mb (red), and (b) lupin (blue) and soy Lb(red).
lupin Lbs must transport O2 to symbiotic Rhizobium bacteroids inside root nodules in the plant,
while simultaneously ensuring that as little as possible of this O2 reaches the Rhizobia’s nitrogenase
enzymes, which are essential to the plant host, and which are intolerant to O2. By having a unique
exit, the possibility is raised that this exit could be blocked during transport and/or that Lb could
deliver O2 to a precise target while ensuring minimal O2 leakage. Our results do not prove such a
conclusion, nevertheless, they do show that Lbs have a peculiar O2 pathway arrangement which is
compatible with the possibility of their role in sequestering O2 in a way that is not realized by any
other globin examined in this study.
4.3.2 Specific residues promoting O2 pathways
From the comparison of the various globin O2 pathway maps performed in this study, it is evident
that O2 pathways are not determined by a protein’s tertiary structure; instead, O2 pathways are
found to correlate with a protein’s residue composition. While there is no guarantee that specific
residue types have any individual effect on the location of O2 pathways in the protein, we find
that certain residues, on average, are more likely than others to be found near O2 pathways. By
collecting statistical information regarding the predisposition of residue types to form or not form
O2 pathways, one can guide future efforts to manipulate O2 pathways inside proteins [25, 63, 139].
65
Fig. 4.4 shows the proportion of residues that lie next to an O2 pathway, sorted by residue
type. The analysis was done for two different sets of globular proteins: a set of nine monomeric
globins, and a set of three different copper-containing amine oxidases from Hansenula polymor-
pha [83], Pichia pastoris (PDB: 1N9E) [43] and Arthrobacter globiformis (PDB: 1W6G). “Core”
residues (Fig. 4.4a–c) are distinguished from “surface” residues (Fig. 4.4d–f), based on whether the
residues’ side chains (or entire residue for the cases of Gly and Ala) are in contact with the external
solution. The proportion of total protein residues which also happen to line the O2 pathways was
computed, for each type of residue, by counting the number of residues for which any atom of
their side chains (including hydrogen atoms) in the crystal structure was located within 2.5 A of a
gridpoint of the O2 PMF map for which the PMF is lower than a given threshold (taken as -2, 0
and 2 kcal/mol in this study). As can be seen, the propensity of given residue types to be near an
O2 pocket is loosely correlated with its hydrophobicity. Large flexible hydrophobic residues and
those possessing aromatic rings are the most often seen near O2 pathways, indicating that the large
size and mobility of these residues most likely promote the formation of cavities, rather than fill
them up.
The copper-containing amine oxidases studied here exhibit less O2 cavities and pathways, on
average, than the more loosely packed and smaller monomeric globins. However, there is never-
theless a very high level of correlation between the relative propensity for different residue type to
be near an O2 pathways (see Fig. 4.4) for the two protein families. For example, very favorable
holding regions (PMF < -2 kcal/mol) for O2 inside the protein are predominantly (and in similar
proportions) lined with Trp, Ile, Leu, Phe and Met residues in both protein families (Fig. 4.4c).
Especially surprising is the correlation between O2 favorable areas outside the protein and
surface residue types. Fig. 4.4e shows that surface residues have the near identical effects on O2
binding sites in the exterior of the protein for both globins and copper-containing amine oxidases,
suggesting that the affinity to O2 for both a protein’s interior and surface can be tuned by means
of point mutations. Fig. 4.4d shows the surface residues which are near O2 regions having an
affinity of better than 2 kcal/mol (close to the solvation energy of O2 in water). If any residue
type is not at 100% in this graph, this means that it is statistically repelling O2. For both globins
and copper-containing amine oxidases, Asp is an outlier in Fig. 4.4d, meaning that it repels O2 in
66
Figure 4.4: Percentage of residues of a given type whose side chains (or entire residue for the caseof Gly and Ala) are located within 2.5 A of a region of the implicit ligand O2 PMF map where thePMF is less than (a,d) 2 kcal/mol, (b,e) 0 kcal/mol, or (c,f ) -2 kcal/mol. The data was collectedseparately over a set of nine monomeric globins (excluding whale deoxy-Mb, YQR-Mb, and soy Lb,which are redundant), shown in black, and a set of three copper amine oxidases, shown in gray.Residues are treated separately based on whether they (a–c) comprise the hydrophobic core of theprotein or (e–f ) are in contact with the external solvent. The area of each data point is proportionalto the number of total residues of a given type and location (core/surface) found across the set ofproteins used for that analysis.
67
solution. Interestingly, it was also found in studies of gas permeation in aquaporin [76, 167], that
there exists a distinct and unexplained barrier to O2 permeation at a specific location inside an
otherwise uniform water channel. At that location, this water channel is surrounded by four Asp
residues. The natural tendency, observed here, of Asp to repel O2 on the surface of proteins would
explain this result.
4.3.3 Significance of the O2 pathway networks
There has been, over the course of many decades, a large body of work dedicated to understanding
and characterizing the O2 migration kinetics inside a small number of representative globins. Our
results clearly show that the shapes of the networks of O2 pathways inside globins vary greatly from
one globin to the next, and that whatever conclusion can be experimentally drawn for one specific
globin is likely to be only applicable to that specific globin. One might even wonder if the actual
location of pathways and cavities inside a globin bear much relevance to its function. As long as
the global properties of the O2 pathways match the desired function of the protein, there may well
be no incentive for the protein to conserve or even tune O2 pathway locations, especially given that
most globins appear to possess a large number of these pathways. To the present authors, it is not
clear that O2 pathways are critically important. A discussion of the role of O2 pathways should
therefore start with what they are not.
O2 pathway networks do not affect O2 binding affinities. A differentiating property of individual
globins is their affinity to gas ligands such as O2. Maintaining fine-tuned affinities to O2 is crucial
to organism function. For example, most Hbs undergo conformational changes to vary their O2
affinity through cooperative binding: they require high affinities near an O2 sources such as the
lungs along with a decreased affinity in low O2 environments. Similarly, vertebrate Mb, as well as
secondary Hbs in invertebrates, require a relatively high O2 affinity in order to uptake O2 from
the primary Hb carrier. Globin O2 affinity, however, cannot be a property of the O2 pathway
network. Instead, it can only depend on the free energy of O2 binding at the heme, which is almost
exclusively influenced by interaction with the residues located in the DP, as evidenced by the high
sensitivity of globin-gas affinities to mutations of DP residues [97, 118, 149]. The pathway taken
by O2 to reach the DP are themselves not expected to bear an influence on globin affinity to O2.
68
The properties of pathways in the protein matrix could, in theory, affect the O2 on/off rates
between the heme and the exterior. Given a fixed O2 affinity (which is related to the ratio of
the on and off rates), altering the energy barriers along, and shapes of, the O2 pathways could
slow or hasten O2’s migration speeds. While most globins bind to O2 for short times (with a 1–
100 ms half-life), the Hbs of the Paramphistomum trematode and Ascaris roundworm both display
exceptionally long O2 binding half-lives (21 s and 175 s, respectively) [168]. Despite the fact that
both the roundworm and trematode Hbs exhibit the two highest free energy barriers for O2 exit of all
the globins studied here, the barriers that we measure are too low to explain by themselves the 4–5
orders of magnitude difference in O2 binding times for these protein compared to that of the average
globin. Assuming an Arrhenius process, such a difference in binding times would require an energy
barrier of at least 9 kBT , which would have been readily measured by the implicit ligand sampling
calculation. Both roundworm and trematode rely mainly on exceptionally strong O2-binding at
the heme to accomplish their long binding times, and not on the O2 pathway network. In practice,
the effect of the O2 pathway shapes and locations thus appears to be of minor importance relative
to the effect of the bound ligand’s environment at the heme, even for extreme cases.
The different pathways in globins may however have important roles not yet appreciated or
understood. For one, these pathways may have possible enzymatic functions such as kinetic proof-
reading as a means to increase the selectivity of Mb binding to various gases as suggested by
Radding and Phillips [130], or as a means to promote the NO to NO−3 reaction catalyzed by some
oxyHbs and oxyMbs [23]. Most obviously, such pathways would provide many ways for O2 to enter
and escape globins, and many entrances in the globin surface would also increase the capture (and
release) rates of gas molecules by globins. Ever since the hypothesis by Perutz and Matthews [123]
that O2 enters Mb and some Hbs through a conserved swinging histidine gate, this pathways has
often been considered as the dominant pathway for O2 to enter these globins. This hypothesis has
remained popular because the His gate is and has been the only visible O2 pathway in static crystal
structures of Mbs/Hbs. However, when thermal fluctuations and protein dynamics are accounted
for, numerous other possible O2 pathways are revealed, and it is likely that the His gate is just one
pathway amongst many [31, 47, 75]. In all likelihood, the swinging of the His gate (as opposed to
the direct interaction of the gating His with the bound ligand in the DP) is not a critical factor in
69
the regulation of O2 entry or exit from the protein. We furthermore postulate that the conserved
His gate appears to gate a water channel, which could allow, e.g., for the escape of the NO−3 prod-
uct from Mb. At this point, the roles of such pathways in Mb and other globins, including the
histidine gate, are still partially speculative, but in light of new developments in the localization of
the O2 pathways in globins, it is now appropriate to carefully reconsider the assumptions made in
the past.
4.4 Conclusion
While there has been an accelerating rate of progress in recent years in first identifying major gas-
holding packing defects [23, 156], and later complete maps of O2 pathways inside many proteins [6,
17, 31, 32, 77, 109], a more fundamental understanding of how these pathways occur in general has
been lacking. In the present study, we have characterized the network of O2 pathways inside a large
range of proteins from the globin superfamily. Despite the fact that our results are reproducible
and similar for very closely related proteins, we find a complete lack of conservancy of the location,
topology, or sizes of the O2 pathway networks from one individual monomeric globin to the next,
despite the similar folds of the proteins. On one hand, this suggests that the specific details
and locations of O2 pathways in proteins do not matter much for the protein’s function, as long
as the pathways are present and provide adequate transport to gas molecules across the protein
matrix. On the other hand, this implies that while these pathways are rather independent of the
protein’s tertiary structural features, they may actually be dependent on the specific composition
of residues inside the protein. This hypothesis was tested and it was found that the propensity
of certain residues to be adjacent or not to O2 pathways is well-reproduced across two protein
families: monomeric globins and copper-containing amine oxidases. Such results can be used
to plan gas migration pathway-altering mutations inside proteins, by substituting residues which
have a predisposition to create O2 favorable regions with those that do not, and vice versa. The
correlation between residue types and O2 access is clear. Whether these correlations can be used
directly to plan or predict the effect of point mutations on O2 accessibility inside proteins, and the
effect of blocking O2 pathways on gas migration rates, remains to be tested.
70
Chapter 5
Conclusion and outlook
The development of methods for describing and understanding gas migration pathways has many
immediate applications beyond hydrogenase. Many important families of proteins such as oxy-
genases, oxidases, and globins, for example, must interact with O2 and/or other gas ligands to
perform their function. The ability to map O2 pathways in these proteins is of great interest in
understanding how they function.
For example, implicit gas ligand sampling has been applied to the problem of gas conduction
across the tetrameric AQP1 aquaporin water channel. A number of experimental studies [37, 112,
128] have reported that the expression or addition of AQP1 in reconstituted lipid membranes and
in the CO2-impermeable Xenopus oocytes (a huge unicellar egg cell) resulted in a measurable
increase in the membranes’ permeabilities to CO2. Figure 5.1a displays the match between the
implicit ligand sampling map and the positions of O2 collected from a 26 ns simulation of explicit
O2 diffusing across aquaporin, performed by Wang et al. [167]. Given that the explicit O2 simulation
started with initial conditions in which 100 O2 molecules were initially placed in the solution, regions
inside the protein which are favorable to O2 but which are protected from the external solution
by large energy barriers are not sampled sufficiently by the explicit O2 simulation, but are easily
discovered using implicit ligand sampling. Both Figs. 5.1a and 5.1b show the location of the three
most favorable gas pathways across the aquaporin, and Fig. 5.1c shows the computed effective free
energy barriers for O2 passage along each of these three pathways, as compared to a pure POPE
lipid bilayer. It should be noted, however, that POPE lipid bilayers are relatively permeable to
O2/CO2, whereas the experimental studies cited above used less-permeable membranes. Despite
being longer, the explicit O2 simulations did not collect enough statistics to provide the potential
of mean force along all three of these pathways, unlike the implicit ligand sampling analysis. And
even though the results from the implicit ligand sampling analysis generally mirrored the results
71
found from the explicit O2 simulations [76, 167], the very favorable “side” pathways were not even
detected using any of the explicit O2 simulations because they are open only very ephemerally,
even though they have the overall lowest energy barriers to O2 permeation.
The two O2 studies have both pointed to the aquaporin “central pore” as a very likely channel
for the passage of CO2 and O2. Where the implicit ligand method proved to be the most useful,
however, was in identifying the exact nature of this barrier. A previous computational study by
Hub and de Groot [76] had found this same barrier and had mistakenly attributed it to a narrow
hydrophobic constriction at the entrance of the central pore. Implicit ligand sampling revealed
that this was not the case: the barrier was located in the middle of a wide water channel above
this constriction, in a region where four aspartic acid residues (Asp 50) meet (see Fig. 5.2a). The
specific reason as for why the wide water channel blocks O2 passage at the precise location of Asp 50
is not well understood, since the water molecules are just as mobile there as anywhere else. It was
found, however, that the averaged density of water at that specific point was slightly higher than
that of the bulk water, as shown in Fig. 5.2b. It is probably not a coincidence that aspartic acid,
responsible for the O2 barrier here, was also identified as the residue type most likely to repel O2
in water, according to Fig. 4.4d (Chapter 4). In fact various mutations of this aspartic acid residue
across the four monomers resulted in a dramatic decrease in the central pore barrier, as measured
using implicit ligand sampling (Yi Wang, private communications).
Further work has also been performed and published using implicit ligand sampling on other
systems as well. One study aimed to find how O2 makes its way to the catalytic site of a copper
amine oxidase from Hansenula polymorpha, using an approach combining implicit ligand sampling
with the x-ray determination of the crystal structure in the presence of xenon [83]. Another set of
studies, both experimental and theoretical, are investigating the effect of hydrogenase mutations
on its O2 accessibility [63, 64, 87]. And more studies are currently in the works.
As with all discoveries, it is exciting to see how new methodologies and new ways of looking
at a problem lead to yet new discoveries. The study of gas migration inside proteins has certainly
always been relevant to the function of many classes of proteins, but until now, has been very
limited in its scope due to a lack of theoretical and experimental tools and of basic knowledge to
address the problem effectively. The present thesis provides many of these tools and much of this
72
Figure 5.1: (a) Top view of the AQP1 tetramer, displaying the locations of the three potentialO2/CO2 channels, which are (A) the central pore, (B) the water pores, and (C) the side pores.The 0 kcal/mol O2 PMF surface is displayed in yellow for all three cases. (b) The implicit ligand0.6 kcal/mol O2 PMF isosurfaces (mesh), superimposed on top of the explicit O2 from a 26 nssimulation, for comparison. Both datasets were symmetrized over the four identical AQP1 subunits.(c) The O2 PMFs for O2 migration along all three potential channels, and through a POPE bilayer,projected along the z-axis. All PMFs assume that AQP1 is maximally packed, with 1 AQP1 per50 nm2 of bilayer. The upper bound errors on the PMF are +0.25 kcal/mol, and the lower bounderrors are -0.25, -0.6, and -1.7 kcal/mol respectively for PMF values below 4, 6, and 8 kcal/mol.This figure was generated from material contained in [167], and (c) was created by Yi Wang.
73
Figure 5.2: (a) The 0 kcal/mol (yellow) and 3 kcal/mol (mesh) isosurfaces are display atop thecumulative positions of explicit O2 molecules (red lines) from a 26 ns simulation. An arrow points tothe location of a large energy barrier surrounded by four Asp 50 residues. This barrier correspondsto a region of denser water than average. (b) The regions of space which are occupied by watermore than 75% of the time are highlighted in blue. Such a region is found precisely at the locationof the barrier. The profile of the protein around the central pore is sketched using a simplifiedoutline. This figure was taken from [167], and (b) was created by Yi Wang.
knowledge by outlining a likely mechanism for gas permeation inside proteins along with a means to
detect and alter the gas migration pathways. Long-standing questions regarding gas migration in
proteins, especially those concerning how O2 and CO enter proteins, can now finally be addressed
with confidence. But because the field is so wide and so new, relatively little is still known about
the general properties of gas migration pathways in proteins, and a lot is yet to be learned. It is
hoped that the work presented in this thesis will be of invaluable help to all those who are currently
pursuing related research.
74
Appendix A
Mechanism of anionic conductionacross ClC chloride channels
Up until now, we have investigated methods for measuring free energy profiles for teh special case
of weakly-interacting ligands migrating inside proteins. In more typical scenarios, the ligands of
interest are either bulky or charged, and interact strongly with the protein. When this is the case,
the ligands cannot be treated implicitly as has been done in earlier chapters, and other free-energy
sampling techniques must be used. The ClC chloride transporter is a good example of such a
system.
ClC chloride transporters are voltage-gated transmembrane proteins which have been associ-
ated with a wide range of regulatory roles in vertebrates. To accomplish their function, they allow
small inorganic anions to efficiently pass through, while excluding passage to all other particles.
Understanding the conduction mechanism of ClC has been the subject of many experimental inves-
tigations, but until now, the detailed dynamic mechanism was not known despite the availability
of crystallographic structures. We investigate Cl− conduction by means of an all-atom molecular
dynamics simulation of the ClC transporters in a membrane environment. Based on our simulation
results, we propose a “king of the hill” mechanism for permeation, in which a lone ion bound to
the center of the ClC pore is pushed out by a second ion which enters the pore and takes its place.
While the energy required to extract the single central ion from the pore is enormous, by resorting
to this two-ion process, the largest free energy barrier for conduction is reduced to 4 kcal/mol.
At the narrowest part of the pore, residues Tyr 445 and Ser 107 stabilize the central ion. There,
the bound ion blocks the pore, disrupting the formation of a continuous water file that could leak
protons and possibly preventing the passage of uncharged solutes. (This chapter is based on work
published in Cohen and Schulten [34], Tajkhorshid et al. [153].)
75
A.1 Introduction
ClC chloride transporters were discovered by C. Miller in 1982 while investigating the Torpedo ray
electroplax membrane [105]. Since then, various members of the ClC family have been isolated in a
wide variety of organisms, ranging from animals and plants to yeast and almost all bacteria except
for a few species with small genomes. All ClCs have in common a selectivity for small inorganic
anions (e.g., Cl−, NO−3 , Br−, I−, SCN−, and some larger hydrophobic anions), though they tend to
discriminate rather poorly between these different anions. Despite their poor inter-anion selectivity,
ClCs are called “chloride-channels” and “chloride-transporters” because Cl− is the only inorganic
anion with a significant presence at physiological conditions.
Many roles for ClCs have been identified in higher organisms: they play vital cellular functions,
such as the regulation of blood pressure, of cell volume, of organelle pH, and of membrane excitabil-
ity [48, 80, 101, 161]. In prokaryotes, however, the ClCs’ various roles are only now emerging from
obscurity. First, Iyer et al. found that ClC is essential for E. coli to survive extreme acid shock [79].
Then, in an unexpected discovery, Accardi et al. concluded that the E. coli ClC was not in fact
a passive channel, as had been assumed for all ClCs, but in fact behaved like an active Cl−–H+
antiporter [1, 2]. This finding is particularly interesting because most other eukaryotic ClC homo-
logues are known to be passive channels, and the E. coli ClC has conduction rates (<0.2 pS) that
are intermediate between typical rates observed in channels and those observed in transporters,
prompting the suggestion that the evolutionary distance between channel and transporter proteins
is less than previously thought.
While most electrophysiological measurements have been performed on eukaryotic ClC homo-
logues [49, 50], such as the Torpedo ray ClC-0 and the human homologues ClC-1 and ClC-2, there
are believed to be many similarities between the core structures of prokaryotic and eukaryotic
ClCs [44, 106]. All ClCs are believed to share a “double-barrelled” architecture, in which each
protein consists of two identical monomers, each monomer consisting of two heterogeneous but
structurally-similar segments arranged in an anti-parallel fashion, and each monomer containing
its own independent water-filled pore [44, 105, 108] (Fig. A.1a). ClCs share a very high sequence
similarity for the selectivity filter: the central region of the pore that has been found to coordinate
the permeating Cl− ions according to x-ray crystallographic structures [44, 50]. On the other hand,
76
bacterial and eukaryotic ClCs differ considerably in size. Bacterial ClCs are much smaller (typi-
cally, 395-492 residues), whereas eukaryotic homologues are longer (687-988 residues), with most of
the extra residues lying in cytoplasmic dangling ends or in the periplasmic regions responsible for
regulation and gating functions. And while most, if not all, eukaryotic ClCs are voltage-gated in
some way or another [49, 129], it is still unclear as to whether the recently characterized bacterial
E. coli ClC possesses a voltage-gating mechanism at all.
Figure A.1: (a) View of the ClC dimer showing the broken helix architecture and the position ofthe Cl− ions in the crystal structure. Each monomer and pair of ions is displayed in a differentcolor. (b) Vertical cross-section of the solvent-accessible surface of the ClC protein embedded in alipid bilayer. The simulated model comprises 97,000 atoms. In the narrowest part of the protein,where the Cl− ions permeate, the residues that define the selectivity filter are shown.
Although individual macroscopic properties of ion channels have been studied extensively [74],
until recently, very little was known with certainty about their inner workings. The discovery of the
77
KcsA potassium channel structure [42] provided the first high-resolution structure of a channel that
was specifically selective for small ions. This discovery sparked a round of fruitful computational
experiments [9, 12, 13, 110, 147] that revisited long-standing assumptions about ion permeation
through ion channels and transporters.
The advent of a second generation of ion channel and transporters structures later emerged,
detailing the atomic structures of a bacterial ClC chloride transporters (at the time believed to be
channel) in a closed [44] and in a constitutively open form [45], and of the calcium-gated MthK [81]
and voltage-gated KvAP [82] potassium channels. These new structures came as much needed
data points in the ion channel structure landscape, and provided the opportunities to build a new
frameworks of simulations and studies on which to base revised microscopic theories of ion chan-
nels [86, 116, 136]. The discovery of the ClC structures made possible the computational study of
the ClC conduction mechanism [19, 38, 39, 107]. The present chapter details a computational study
of a bacterial ClC transporter that reveals the main energy landscape regulating the conduction of
Cl−.
A.2 Methods
A.2.1 Simulation system
Our simulations were based on a published x-ray structure of ClC from Salmonella serovar ty-
phimurium (stClC) at 3.0 A resolution (PDB accession code: 1KPL) [44]. The protein was then
placed in a POPE membrane and solvated with water. 24 Cl− and 2 Na+ ions were placed at ran-
dom positions in the water in order to neutralize the protein charge with a total ion concentration
of 100 mM. The resulting 97,000 atom system was then equilibrated for 5 ns in the NPT ensemble,
with a 2 fs integration time step, periodic boundary conditions and PME electrostatics, using the
NAMD2 molecular dynamics software [84] and the CHARMM22 force-fields [99, 141] for the en-
ergy parameters for lipids, protein and ions, and the TIP3P model for water molecules. During the
equilibration, the membrane relaxed to a state of hydrophobic match in which it became thinner
near the protein than away from it. The Cl− ions redistributed themselves near the very charged
cytoplasmic side of the transporter. Top and side views of the final system after equilibration are
78
shown in Fig. A.2.
Figure A.2: Top and side view of the ClC simulation system showing the POPE membrane andions. The reconstructed N-terminus is highlighted in black and lipid tails have been simplified.
A.2.2 Opening the pore
Each pore of the wild type ClC crystal structures is obstructed by a highly conserved glutamic
acid residue (Glu 148), which is bent so that its negatively charged carboxyl head is bound to a
region of the pore next to the periplasmic exit. In order to investigate the effect of this residue,
we ran interactive molecular dynamics (IMD) simulations [70, 151] in which Cl− ions were pulled
past Glu 148 to ascertain that the pore was indeed blocked and that the passage of Cl− across the
channel was not possible without displacing the glutamate side chain. While we can be certain that
the opening of the fast gate in ClC involves a change to the conformation of Glu 148, we cannot
79
exclude the possibility of other conformational changes. Recent indirect evidence suggests that the
open ClC-0 channel exhibits additional changes at its cytoplasmic mouth, unaccounted for in the
present study [3, 158]. Nevertheless, we set out to modify the conformation of Glu 148 on the basis
that this alteration was necessary and sufficient for unblocking ClC. Accordingly, we created an
open conformation of the ClC pore by using IMD to pull Glu 148’s hydroxyl group out of the pore
and into the channel’s periplasmic vestibule.
Recent electro-physiological measurement performed on the Torpedo ray ClC-0 show that both
the substitution of the pore-blocking glutamate (Glu 166 in ClC-0) with small non-charged residues
(E166G, E166A, E166V or E166Q) and the protonation of this glutamate, strongly reduced the
voltage dependence of the fast gate, allowing the pore to remain open for a wider range of condi-
tions [45, 58]. Similar mutation studies reached identical conclusions for ClC-4 (E224A) and ClC-5
(E211A) [58]. In addition, crystal structures of E. coli ClC (electro-physiological measurements
are not currently possible on the native protein) mutants (E148A, E148Q) showed them to be
virtually identical to the wild type structures except that the mutant pores were unobstructed by
the Glu 148 side chain. Further supporting the validity of our assumptions about the open pore
conformation and of the “molecular surgery” we performed, is the finding that in the E148Q mu-
tant, the neutrally charged glutamine residue, which has the same atomic geometry as glutamate,
has its side chain sticking out of the pore [45], suggesting that Glu 148 might do the same under
favorable conditions.
A superposition of the structures of the closed ClC transporter (PDB: 1KPL), of our manu-
ally opened transporter after equilibration, and of the E. coli constitutively open E148Q mutant
(PDB: 1OTU) [45] are shown in Fig. A.3. Since the beginning of our study predates the publication
of the constitutively open pore crystal structures, these later structures were not used as a starting
point; however, our modified “open” structure matches these new crystal structure closely.
It must be said that there is currently no certainty about the exact nature of the ClC fast gate.
The mechanism suggested by Dutzler et al. [45] is not without problems: for ClC-0, we note the
conflicting studies mentioned above [3, 158]; we also note that this mechanism does not account
for the measured fast gate charge of 0.92–2.2 e for ClC [45, 53, 98]; finally we note that it fails
to explain the reported effect of external Cl− ions on the fast gate [30, 129]. Nevertheless, the
80
Figure A.3: Superposition of the structures for the 1KPL closed transporter (dark gray), of the1OTU constitutively open channel E148Q mutant (orange) and of our manually opened transporterafter equilibration (atom-based coloring).
mentioned studies, complemented by Lin and Chen [95], are all either indicative of, or compatible
with, putative conformational changes at or outside the cytoplasmic mouth of the transporter. In
the present study, we seek to shed some light of our own on these matters. We demonstrate that
anionic conduction is possible and probable across the pore of bacterial ClC as observed in current
crystal structures, once the pore-blocking glutamate has been displaced. Furthermore, we find that
there exists ample space for the permeating Cl− ions to move in the pore. In fact, it is probable
that an open conformation of ClC that would exhibit a notably wider internal pore could lead to a
dramatic loss in selectivity and is as such unlikely. It is a credible possibility, and compatible with
the experimental data so far amassed, that the internal pore structure of ClC is not significantly
affected by the opening and closing of the fast gate.
A.2.3 Molecular alterations
The crystal structure (PDB:1KPL) is missing an entire N-terminal segment for one of the two
monomers (chain A). We have partially modeled this segment (residues 12 to 32) by duplicating
it from the other monomer (chain B) and splicing it into chain A. This reconstituted segment,
81
highlighted in black in Fig. A.2, dangles underneath the monomer opposite to the one to which it
belongs and binds with that monomer’s C-terminus, possibly contributing to the dimer’s stability.
At the end of the 5 ns equilibration, the two ClC pores, which are devoid of crystallized water
molecules in the published crystal structure, did not acquire any water molecules from the bulk
solution. Neither did any of the crystal structure Cl− ions budge from their binding site at the
center of each ClC pore. To remedy this situation, we placed water molecules in single file across
each pore. We also placed an additional Cl− at the Glu 148’s binding site in each pore. This new
configuration was then equilibrated for 0.5 ns and used as a starting point for all further simulations.
A.2.4 Mapping the potential of mean force
In order to reconstruct the potential of mean force (PMF) of permeation through ClC, we have
employed umbrella sampling [71, 135]. In our study, the PMF describes the overall free energy
profile experienced by two simultaneous Cl− ions traversing either of the ClC’s two pores in the
absence of an external electric field, as a function of their respective positions along the pore (z-
axis), averaged over all other degrees of freedom. In each of the sampling simulations, two Cl−
ions per ClC pore were separately tethered by means of virtual 1-D springs acting along the z-axis
with spring constants of 15 kcal/mol/A2. This was done for both pores simultaneously, which are
located so far apart (40 A) that the correlation between the configurations of the ions in each pore
is negligible. The tethering points for the umbrella potentials were never distributed more than 1 A
apart for any ion, requiring 92 simulations of 370 ps each to sample the range of motion of the ions
during their conduction through the pore. Starting configurations for the different simulations were
created by translating the permeating ions to the z-coordinate minima of the tethering (umbrella)
potentials. The energy of the ions and water molecules in the pore was then minimized (keeping
all the other atoms fixed) so that the ions could reposition themselves laterally in the pore. The
ions were then repositioned to the correct initial z-coordinate and the pore water molecules were
equilibrated for 10 ps (keeping everything else fixed), after which the system was equilibrated for
an extra 70 ps. This procedure ensured that the sudden artificial displacement of the ions in the
pore, at the beginning of each simulation, did not unnecessarily perturb the protein. The spatial
distributions of the ions along the z-axis for each simulation were then combined using the weighted
82
histogram analysis method [91] applied in 2-D in order to obtain the full two-ion distribution of
Cl−, which was then inverted to obtain the PMF.
As is often the case with atomistic MD simulations, the studied events occur at natural time
scales which cannot be directly probed using available computer power. Taking the example of
ClC-0, a typical pore current of 0.5 pA [30] corresponds to one elementary charge exiting the pore
every 200–300 ps. This includes the time for an ion to diffuse to the pore’s mouth as well as the
time it takes for an ion (not necessarily the same one) to subsequently exit on the other side. Since
we are only interested in the “permeation” phase, the conservative 300 ps time scale is a high upper
bound estimate of the time needed to observe one conduction event. Nevertheless, even assuming
the shortest event times, an equilibrium simulation would only provide a tiny number of complete
conduction events, if any at all. Umbrella sampling allows us to sample high energy states with a
much higher probability than is possible with equilibrium trajectories. With sufficient sampling,
this methodology permits us to accurately determine the energy barriers as well as the physical
pathway taken by the ions as they permeate through the pore. The accelerated sampling does
come with restrictions: non-equilibrium dynamic trajectories cannot be directly observed. And
while we provide a detailed statistical description of permeation across the ClC pore, the limitation
of our method is that we cannot directly resolve the concerted motions of individual residues, water
molecules and permeant ions as they occur during a conduction event.
A.2.5 Computing a slow process in little time
Our goal is to characterize the way in which Cl− passes through the ClC transporter under ex-
tremely favorable conditions: open gates, and no proton-coupling to slow the dynamics. For this,
the energy profile of Cl− ions going through the selectivity filter will be measured. When one
wishes to understand a long timescale process using computer simulation, MD trajectories of the
atomic motions, by themselves, are of limited interest. In the present case, we know that a Cl− ion
will conduct across E. coli ClC, in a favorable electrochemical gradient, within roughly one mil-
lisecond [1]. This timescale is out of the reach of contemporary atomic-level simulations running
on the best currently available hardware. But even if we could simulate the complete translocation
of a Cl− ion across one of ClC’s pores, the computed trajectory would represent only a single event
83
(or a handful at most). As we know, nature allows many paths between two end states, all with
different probabilities. Computing just one of these paths leaves us with no information about its
statistical relevance and prevents us from reaching any meaningful conclusion about the studied
mechanism.
Fortunately, one does not require a full description of the atomic trajectories in order to under-
stand the translocation process. What is really needed is the energy profile – the energy mountains
and valleys – experienced by anions as they conduct through ClC. A complete description of the
free energy profile would provide us with all the statistical information about the ion translocation
process, since the free energy for a given ion state (i.e. the set of positions along the transporter
occupied by ions at a given instant) can be interpreted as a measure of the probability of occurrence
of that ion state. Now, in order to compute the energy profile for a conduction event, we will in
fact need to carry out calculations of simulation trajectories. There are, however, two superfluous
aspects of real-time simulation that can be taken advantage of in order to allow the calculation
of the relevant energy profile using minimal computational requirements: (1) not all degrees of
freedom are important to the problem at hand and (2) not all parts of a trajectory are sampled
equally by equilibrium simulations. In fact, most slow processes spend the overwhelming majority
of the time in a small favorable region of phase space. If the system’s time evolution can be biased
such that states with high energy are sampled as often as the more favorable low energy states, we
could calculate the energy profile of the system projected along a chosen reaction coordinate much
faster than it would take to perform a full equilibrium simulation.
The potential of mean force (PMF) is the desired quantity that describes the overall free energy
profile experienced by the system as it evolves along one or more reaction coordinates, averaged
over all other degrees of freedom. In our case, this means that the PMF will describe an energy as
a function of the position of ions in the transporter, averaged over all possible conformations that
the ClC transporter itself can take in order to accommodate the permeant ions.
84
A.3 Results
A.3.1 Energetics of Selectivity and Conduction
In this section, we describe the energetics involved in ion conduction. First, we introduce a detailed
calculation of the free energy profile that governs the coordinated conduction of two simultaneous
Cl− ions in the pore in the absence of an external field. This reveals the respective motions of the
two ions as they permeate through the pore, and explains the anionic binding sites in ClC. We
then analyze the interaction energies between the individual ions and the major constituents of the
ClC transporter, providing evidence that confirms and dispels common assumptions.
Potential of mean force
Our map of the potential is shown for each pore (chains A and B) in Fig. A.4. The maps describe
the energetics involved in the transition between an initial state I and a final state II. In state I, a
first Cl− is bound to the transporter’s central binding site (determined from the crystal structure)
and a second Cl− is positioned in the transporter’s cytoplasmic entrance; in state II, the first
Cl− is positioned in the periplasmic exit and the second Cl− is bound in the central binding site.
This process effectively describes a conduction event since states I and II share identical pore
configurations except that in state I, one Cl− is at the top of the pore, and in state II, it is at
the bottom. In order for a new translocation to occur, one simply has to wait for a third ion to
diffuse from the bulk solution into either of the pore’s entrances, after which the transition I→II or
II→I can repeat. In the figure, distances along the z-axis (which approximately follows the pore)
are measured with respect to the center of the lipid membrane, with positive values toward the
periplasm. For the naming of the ClC binding sites, we follow the nomenclature of Dutzler et al.
[45]: Scen is the central binding site coordinated primarily by Ser 107 and Tyr 445, Sint is at the
cytoplasmic end of the pore and Sext is the outer binding site on the other side, for which Cl−
competes with Glu 148. The location of the Cl− binding sites as well as the z-axis coordinates are
indicated for reference in Fig. A.5.
In order to test the consistency and validity of our results, we have measured the PMF for
conduction in both monomers of the ClC dimer. Apart from a reconstructed N-terminus (for
85
Figure A.4: Potential of mean force for both pores (chains A and B) of ClC as a function of thepositions along the z-axis of the top and bottom Cl− ions. Contours represent slices of 1 kcal/molat a spatial resolution of 0.15 A. Red indicates a low energy while blue denotes high energy. Thegray background denotes areas not sampled and the black contour represents energies above athreshold. The position of the three Cl− binding sites from the x-ray structure of Dutzler et al.[45] are identified by straight lines. The minimum energy path, corresponding to the most likelyconduction pathway, is shown for each pore as a thick black line.
86
Figure A.5: The ClC pore, showing the positions of the permeating Cl− ions (small balls), the back-bone amide hydrogens involved in permeation (large spheres) and the related non-helical backbone(tube), the two pore-lining polar residues Tyr 445 and Ser 107 and the two charged residues Glu 148and Arg 147 possibly involved in the fast gate. The location of the three crystal binding sites aswell as the z axis coordinates are displayed for reference (with z = 0 indicating the middle of thelipid bilayer).
87
chain A of the 1KPL PDB structure), the two monomers have the same spatial structure. We
therefore expect that the variations between the PMFs calculated for the two different pores are
caused, for the most part, by the different distribution of microstates being sampled, rather than
by macroscopic conformational differences between the two pores. There was one exception, in
which a different behavior was observed between the two monomers: in chain A, when the two ions
were kept close together (≈ 4 A apart) at one specific location, the top ion was temporarily pushed
sideways in the pore to maintain its distances from the bottom ion, disrupting the PMF for chain
A at that location (located at ztop ≈ −1.5 and zbottom ≈ −6 A); this was not observed anywhere
else. In describing the PMF of ClC, we will only concern ourselves with features common to both
pores.
Looking at the two PMFs, we notice similar characteristics. First of all, the PMFs share a
similar two-ion pathway (shown as as a black line) for ion translocation across the pore. From the
PMF maps, we can hypothesize a probable sequence of events describing ion permeation across
the transporter that proceeds in a semi-stepwise manner, as shown in Fig. A.6. Initially, a Cl−
(Cl1) is bound to Scen and a second Cl− (Cl2) enters the pore from the cytoplasm until it reaches
the general area of Sint. At this point, Cl2 stays put and repels Cl1 while the latter attempts to
overcome a barrier located between Scen and Sext. Once this barrier has been crossed, Cl2 can inch
closer to the central binding site to a location about -2.5 A below it (which we call S−). Cl1 and
Cl2 then move simultaneously and gradually toward their intermediate destinations of 1.5 A above
Sext (which we call S+) and Scen, respectively. With Cl2 now tightly bound to the crystal binding
site Scen, Cl1 is free to exit the pore into the periplasm.
For both pores, the second Cl− enters or exits while the first Cl− is tightly bound to Scen.
We refer to this as a “king of the hill” mechanism, in which an ion is always ultimately always
left to dominate the central region of the pore. Also, in both cases, the PMF exhibits minima in
regions that correspond to two of the crystal structure binding sites being simultaneously occupied
(located at the intersections of the straight lines in Fig. A.4): Sint and Scen can be occupied
simultaneously and so can Sext and Sint. On the other hand, the simultaneous occupation of
Sint and Scen is energetically unfavorable compared to the other possibilities (the appearance of a
minimum of the potential of mean force at that location for chain A corresponds to the top ion
88
Figure A.6: Sequence of steps that occur during a conduction event in which an ion is moved fromthe cytoplasmic side to the periplasmic side of the transporter. The schematic locations of thecrystal structure binding sites (defined in the text) are shown as lines.
being pushed to the side, as described earlier). Simultaneous occupation of these two sites had been
speculated by Dutzler et al. [45], despite the fact that the sites are only 4 A apart, on the basis of
the observation that in some crystal structures, the authors observe simultaneous occupation of a
Cl− at Scen and of a charged oxygen from Glu 148’s carboxylic head group at Sext. Instead, we
observe two nearby stable intermediate states which involve alternate binding locations for either
ion: one where Sext and S− are occupied, and one where Sint and S+ are occupied. It comes as no
surprise that the location of S+ coincides perfectly with the location of Glu 148’s other carboxylic
oxygen, according to the crystal structures in which this residue is present. Occupancy of S+ and
S− have not been observed to date by x-ray diffraction.
Following the permeating Cl− ions along the most probable path, i.e., the path of minimum free
energy, we measure similar energy profiles for both pores. Fig. A.7 shows the PMF profile along
this path. Between the two pore entrances, the PMF profile is relatively flat (with 1–2 kcal/mol
fluctuations), permitting fast permeation. The main barrier in the profile occurs when a Cl−
moves between Sint and Scen, with a height of 3–4 kcal/mol (or 4–4.5± 2 kcal/mol between lowest
and highest energy in the pore). This barrier seems to be caused mainly by a lack of exposed
backbone amides in that specific area of the pore, for Cl− to interact with. The overall energy
barriers are consistent with those determined by a simulation of the Kcsa potassium channel [13],
in which a maximum energy barrier of 2–3 kcal/mol was measured for K+ permeation (with three
simultaneous ions in the pore).
89
Figure A.7: PMF profiles along the minimum energy pathway for both pores (chains A and B)with the locations of the two permeating Cl− ions indicated for each local minimum. The curvefollows the PMF along the black lines of figure A.4 and reports the minimum value of the PMFmeasured between this path and the two parallel paths representing the cases where the two ionsare displaced toward and away from each other by 0.15 A each. This protocol generates a fairlyaccurate description of the minimum energy pathway except for a small region (between 10 and12 A along the path) of chain A where an optimal solution could not be reliably found. The freeenergy is measured with respect to a base configuration in which one Cl− is bound to Scen and theother Cl− has been exchanged with a water molecule in bulk solution.
90
Permeation pathway
By stitching together all of our local sampling simulations along the most probable path, we obtain
a picture of the physical pathway taken by Cl− as it crosses the transporter. The trajectory of the
top-most ion as it was moved up and down the transporter joins with that of the bottom-most ion,
resulting in the continuous trajectory shown in Fig. A.5. While only a full trajectory capturing
entire Cl− permeation events can provide the true sequence of events during conduction, an analysis
of the interaction energies between the transporter and the ions, using the stitched trajectories,
can reveal information about the role of the pore residues during permeation.
Fig. A.8 shows the electrostatic and van der Waals interaction energies between the Cl− ions
and their environment as a function of their position along the pore. The energies were averaged
over all four permeating ions and no appreciable variation was observed between the curves for the
different ions and pores. The weakening of electrostatic interactions by the polarization of water
molecules in the pore was not taken into account. The interaction energy of the ions with their
surroundings can be decomposed into separate contributions from the various components of the
protein, giving insight into the roles of these components in tuning channel energetics.
In Fig. A.8a, the energy contribution of the pore-lining residues is compared with that from
the rest of the protein excluding the pore (referred to as the bulk protein). While the pore residues
dominate the interaction with Cl−, the bulk protein still contributes a significant fraction of the
attractive interaction with Cl−. The bulk protein’s purpose appears to be to provide a barrier-less
and energetically favorable background for negatively-charged particles present in the transporter’s
pore.
Fig. A.8b shows the interaction energies between Cl− and the pore’s backbone and non-polar
residues (including glycine) as well as with the pore’s four polar and charged residues, as a function
of position along the pore. We note that the backbone and non-polar residues account for virtually
the entirety of the total integrated interaction energy with the permeating Cl−. The backbone and
non-polar residues of the pore provide a flat basin of attraction for anions, while the polar and
charged residues modulate the Cl− energy’s position dependence. Indeed, aside from the central
binding pocket in which Cl− is coordinated by polar residues and the periplasmic exit in which
charged residues form a putative gate, the transporter pore is lined in its entirety with non-polar,
91
Figure A.8: Interaction energy of a permeating Cl− with the various constituents of (a) theClC transporter and (b) the pore region, calculated using a cutoff distance of 16 A for non-bonded interactions. The standard deviation for the energy is ± 5 kcal/mol for pore-lining residues,± 15 kcal/mol for water and ± 2 kcal/mol for bulk protein.
non-charged residues.
It is intriguing to note that the pore’s two conserved polar residues are present at the same
location and define the pore’s strongest Cl− binding site (Scen). The role of these two residues,
Ser 107 and Tyr 445, is open to speculation, but it is clear that they are not by themselves respon-
sible for the ClC transporter’s anion over cation selectivity since their interaction energy with Cl−
is not significant compared to the energy due to the strong electrical polarization of the protein.
We believe that the most compelling reason for the existence of these residues is to keep an anion
permanently in the pore in order to prevent such events as the formation of a proton-carrying
continuous water file stretching across the transporter or the passage of hydrophobic anions [138].
Indeed, Ser 107 and Tyr 445 provide an abrupt and significant narrowing of the pore simultaneously
with a very strong binding site for anions.
92
A.3.2 ClC Architecture
Non-Helical backbone and protein polarization
It has been echoed throughout the literature on ClC that the broken helix architecture stabilizes
Cl− through α helix electrostatic dipole interactions. While the α helix dipoles certainly contribute
to a favorable environment for Cl−, it is not certain that their role is fundamental in determining
the preferred Cl− binding sites [8]. The interaction energy between Cl− and helices F and N
(which both have their positive ends pointing toward Sext and have been credited for creating a
favorable binding location), excluding the interaction with the pore-lining helix-capping residues, is
shown in Fig. A.8a. This energy does not constitute a particularly prominent feature of the energy
profile controlling Cl− conduction and does not explain the transporter’s intricate broken-helix
architecture.
We believe that the the broken-helix architecture stems from nature’s desire to expose its
backbone’s amide groups to the permeant ions. While the conventional picture of a membrane
protein is that of a bundle of parallel α helices, other structurally known transporter proteins also
exhibit a “broken helix” conformation. Notable examples are the potassium channels, the only
other ion channels of known structure, and the aquaporin family of water channels [59, 154]. In
these other channels, it is the protein’s backbone carbonyl groups which are exposed to the pore,
as opposed to backbone amide groups for the case of ClC. It must be noted that the amide - Cl−
interaction is by itself electrostatically unfavorable, but that the interaction between Cl− and the
total dipole moment of an amino acid’s backbone, when its amide group points toward Cl−, is
favorable over a region of a few A. In all cases, the proteins seem to favor the presence of a non-
helical secondary structure in the pore region, making the backbone available for interaction with
solutes. The lack of a stable secondary structure would presumably require that the non-helical
segments be held in place at their ends by α helices solidly anchored inside the protein. A cursory
look at the location of the non-helical segments in ClC reveals that, with few exceptions, these
are all either at the surface of the protein and act as connecting loops, or are concentrated near
the pore. Further examination of the conserved genetic sequence shows that all three pore-lining
5-peptide segments end with a helix-breaking proline and are rich in small flexible hydrophobic
residues [44].
93
Given the small size of the pore region compared to the great size and complexity of the scaffold
that supports its non-helical structure (if one can consider the ClC protein as a scaffold), there
must be a strong incentive for channel proteins to expose their naked backbone. We believe that
there is. The discoveries of the first ion channel structures elicited surprise because of the absence
of significant charge in their pore regions, leading to the suggestion that strong charges would
be problematic because, although the channels would be very selective, the strong electrostatic
interactions could prevent the solute from unbinding from the pore. With this in mind, if we
consider that a channel’s idealized function in the absence of any bias is to provide a flat free energy
potential for solutes, mimicking the bulk solution, then the ideal channel (from a permeation point
of view) would have a continuous line of charge along which ions could glide. The important idea
here is that of a flat potential energy surface: the ion should not get particularly attached to any
location in the pore. In this respect, the backbone dipole moments provide a ladder of closely-
spaced, but weak, identical interactions. Such a configuration confers a much higher mobility to
Cl− than would a few isolated strong charges.
The role of the protein’s electrical polarization is not to be neglected, however. As we have
seen, most of the pore-ion attraction is caused by either backbone atoms or by non-polar residues.
The same is true of the interactions between the permeating Cl− ions and the channel as a whole
(except at the cytoplasmic end of the pore, where the interactions between Cl− and the positively
charged (+28 e) cytoplasmic side is important). At first, the role of the backbone atoms and
non-polar residues might seem strange. However, if the protein relied on polar residues to create
favorable basins of attraction for anions, there would always be the danger that if the side chains
of these residues were mobile enough, they would reorient themselves to be attractive to cations.
Furthermore, polar residues interact strongly with external electrical fields, and their collective
polarization might disrupt the field experienced by the permeating ions. Conversely, an external
field would disrupt their interactions with the permeating ions, affecting the selective behavior of
the channel. On the other hand, backbone atoms and non-polar residues do not react strongly
to external electrical fields or the presence of ions. In that sense, they provide a reliable “frozen”
interaction with the permeating ions, since their polarization is dependent more on the channel
architecture and on local interactions with neighboring residues, and less on external electrical
94
fields. These “weak” dipolar interactions become quite substantial when one accounts for the sheer
number of non-polar residues involved.
Interrupted water file geometry
The water geometry that we observed in the ClC pore did not conform to the single water file picture
expected of narrow channels. After equilibration, we found that the ClC pore instead encouraged
an interrupted double-file geometry of water molecules around the Cl− ions. In Fig. A.9, we have
superimposed the locations of the water oxygens in and around the ClC pore for all local sampling
simulations along the permeation pathway. One sees clear evidence that, on average, the water
molecules are localized into two distinct files. One of these files is always present and also carries
the permeating Cl−. The second file, only observed in the immediate proximity of Cl−, reflects
the observed fact that in most of the pore, permeating Cl− ions remained partially hydrated (one
water molecule above and below and up to three on the outer side).
Figure A.9: Overlay of the positions of the water molecules (blue) and Cl− (orange) for all thelocal sampling simulations along the permeation pathway showing the water double-file.
95
This partial hydration shell did not follow the permeating anions all the way, however. A
continuous water file would create many problems such as loss of selectivity against larger anions
(if the pore were wide throughout) and the conduction of protons if a bridge of connected water
molecules were to span the pore. Instead, the double-file is broken at the constriction at Scen, to
which a Cl− is quasi-permanently bound. An obvious advantage of the broken double-file geometry
is that the pore can offer an environment for permeating anions that is similar to that of the bulk
solution outside the pore, thereby reducing the free energy cost of removing the anion’s water shell
as it enters the pore from either end. The only exception is at the central binding site Scen where
the ion-pore attraction is very strong but there the ion can more easily part with its solvation shell.
Since the double water file is so advantageous, one may wonder why it is not observed in the
potassium channel. Theoretical studies on generalized channels suggest that while electrostatic
effects contribute to selectivity between ions of different charge, the discrimination between ions of
same charge and valence can be controlled by the pore’s radius and size fluctuations [92], energetic
contributions that arise in addition to the dehydration energies at the pore entrance. These con-
siderations are important for cation channels where strong selectivity is crucial, but in ClC there
is little need to discriminate between different inorganic anions. Therefore one may imagine that,
unlike for the potassium channel, the ClC pore geometry attempts to optimize permeation rather
than inter-anion selectivity.
Multi-ion conduction
The pore’s central binding site binds to Cl− very tightly through the side chain hydroxyl groups of
Tyr 445 and Ser 107 as well as through the backbone dipole moments of Phe 357 and Ile 356, such
that a pore configuration devoid of the presence of anions is not possible in ClC. To dislodge the
central Cl−, its binding energy needs to be considerably reduced. The presence of additional anions
in the pore is thus required so that the transporter–Cl− attraction and the Cl−-Cl− repulsion can
balance each other. Our results confirm what had been previously suggested for ClC [129] and has
already been established for the K+–channel [13]: conduction across the transporter requires more
than one ion in the pore. In addition, our PMF describes a permeation process for two simultaneous
Cl− ions in the ClC pore, establishing that this number may be sufficient for the Cl− conduction
96
in ClC.
With both ion channels/transporters of known structure (ClC and Kcsa) exhibiting multi-ion
permeation, we can ask whether multi-ion permeation is an ion channel necessity or merely a
coincidence. On one hand, if the aim is to ensure that permeation can occur along a path with
a flat energy profile, then increasing the dimensionality of the PMF (i.e., adding essential degrees
of freedom) makes it easier to get around barriers. On the other hand, it is conceivable that one
could engineer an ion channel with single ion occupancy by relegating the degrees of freedom of the
extra ions to internal components of the channel. Multiple occupancy would in that case not be
a necessity. There is, however, a problem with single occupancy pores: if the single ion is allowed
to exit the channel, then the possibility arises that a continuous file of water across the channel
connects both sides of the cell membrane, possibly allowing for proton conduction to the detriment
of the cell.
A.4 Conclusion
We have mapped the energetics involved in the conduction of a pair of Cl− ions across the ClC
transporter. The result suggests that ion dynamics in the pore follows a “king of the hill” mecha-
nism, in which two Cl− ions compete over an energetically favorable central location in the pore.
This strategy would ensure that an ion is always left inside the pore to block it. During a conduc-
tion event, the ion configurations in the pore appear to evolve through a succession of four stable
states. The positions of the ions in these states coincide with the locations of three binding sites
observed by x-ray crystallography. In addition, we observe stable intermediate states in which the
ions are located at two novel locations, S− and S+.
Inspection of the interaction energies between the ClC transporter and the permeating Cl− ions
reveals the importance of the protein’s overall polarization in making the pore attractive to anions.
Indeed, backbone and non-polar residues account for a large majority of the attraction between
Cl− and the transporter. Our calculations do not a priori support the common assumption that
polar residues and specific α helix dipole interactions play a pivotal role in assuring the pore’s
anion over cation selectivity. Instead, we suggest that the main role for the ClC’s broken helix
structure is to provide the pore region with a non-helical backbone structure, allowing anions to
97
interact favorably with the exposed backbone’s electric dipole moment. This type of interaction
has the advantage that it is rather evenly distributed along the pore and prevents Cl− from getting
stuck in deep energy wells. We also suggest a novel role for the polar residues Ser 107 and Tyr 445
that is consistent with the nature of their measured interaction with Cl−: they ensure that the
pore remains blocked by an anion at all times, preventing the formation of a continuous water file.
These residues likely also contribute to the size-selectivity of the pore and prevent the passage of
hydrophobic particles, and this will likely be resolved by further experimental and theoretical work
investigating selectivity [13, 50, 148] in ClCs.
Compared to the other ion channel family of known structure, the K+ channels, the ClC narrow
pore region is longer and wider. Whereas Kcsa has a very symmetrical pore and tight interactions
with the permeating K+ ions, the ClC pore is irregular and accommodates partially solvated anions
through most of its length. The consequence of these two diverging architectures is that ClC is
a lot less selective between ion species of same charge than is Kcsa, in harmony with a lack of
evolutionary pressure in that direction. On the other hand, both channels share a similar peculiar
feature: they both exhibit a broken helix architecture resulting in their pores being lined almost
exclusively with a non-helical backbone structure. This pore architecture has been previously
shown to be a crucial ingredient for the efficient conduction and selectivity of ions in K+ channels
and here we observe the same for ClC. Based on these discoveries, one should expect to observe
the exposed backbone architecture in the pores of the many ion channel and transporter structures
yet to come.
98
References
[1] Accardi, A., L. Kolmakova-Partensky, C. Williams, and C. Miller. 2004. Ionic currents mediatedby a prokaryotic homologue of CLC Cl- channels. Journal of General Physiology 123:109–119.
[2] Accardi, A. and C. Miller. 2004. Secondary active transport mediated by a prokaryotic homo-logue of ClC Cl− channels. Nature 427:803–807.
[3] Accardi, A. and M. Pusch. 2003. Conformal changes in the pore of CLC-0. Journal of GeneralPhysiology 122:277–293.
[4] Adams, M. W. M. 1990. The structure and function of iron-hydrogenase. Biochimica et Bio-physica Acta 1020:115–145.
[5] Allocatelli, C. T., F. Cutruzzola, A. Brancaccio, B. Vallone, and M. Brunori. 1994. EngineeringAscaris hemoglobin oxygen affinity in sperm whale myoglobin: role of tyrosine B10. FEBSLetters 352:63–66.
[6] Amara, P., P. Andreoletti, H. M. Jouve, and M. J. Field. 2001. Ligand diffusion in the catalasefrom Proteus mirabilis: A molecular dynamics study. Protein Science 10:1927–1935.
[7] Appleby, C. A. 1984. Leghemoglobin and rhizobium respiration. Ann. Rev. Plant Physiol.35:443–478.
[8] Aqvist, J., H. Luecke, F. A. Quiocho, and A. Warshel. 1991. Dipoles localized at helix termini ofprotein stabilize charges. Proceedings of the National Academy of Sciences, USA 88:2026–2030.
[9] Aqvist, J. and V. Luzhkov. 2000. Ion permeation mechanism of the potassium channel. Nature404:881–884.
[10] Austin, R. H., K. W. Beeson, L. Eisenstein, H. Frauenfelder, and I. C. Gunsalus. 1975. Dy-namics of ligand binding to myoglobin. Biochemistry 14:5355–5373.
[11] Banushkina, P. and M. Meuwly. 2005. Free-energy barriers in MbCO rebinding. Journal ofPhysical Chemistry B 109:16911–16917.
[12] Berneche, S. and B. Roux. 2000. Molecular dynamics of the KcsA K+ channel in a bilayermembrane. Biophysical Journal 78:2900–2917.
[13] Berneche, S. and B. Roux. 2001. Energetics of ion conduction through the K+ channel. Nature414:73–77.
[14] Beveridge, D. L. and F. M. DiCapua. 1989. Free energy via molecular simulation: Applicationsto chemical and biological systems. Annual Review of Biophysics and Biophysical Chemistry18:431–492.
99
[15] Boichenko, V. A., E. Greenbaum, and M. Seibert. 2004. Hydrogen production by photosyn-thetic microorganisms. In Photoconversion of Solar Energy: Molecular to Global Photosynthesis.M. D. Archer and J. Barber, editors. Imperial College Press, London. 397–452.
[16] Bolognesi, M., S. Onesti, G. Gatti, A. Coda, P. Ascenzi, and M. Brunori. 1989. Aplysialimacina myoglobin. crystallographic analysis at 1.6 A resolution. Journal of Molecular Biology205:529–544.
[17] Bossa, C., A. Amadei, I. Daidone, M. Anselmi, B. Vallone, M. Brunori, and A. D. Nola. 2005.Molecular dynamics simulation of sperm whale myoglobin: Effects of mutations and trapped COon the structure and dynamics of cavities. Biophysical Journal 89:465–474.
[18] Bossa, C., M. Anselmi, D. Roccatano, A. Amadei, B. Vallone, M. Brunori, and A. D. Nola.2004. Extended molecular dynamics simulation of the carbon monoxide migration in spermwhale myoglobin. Biophysical Journal 86:3855–3862.
[19] Bostick, D. L. and M. L. Berkowitz. 2004. Exterior site occupancy infers chloride-inducedproton gating in a prokaryotic homolog of the ClC chloride channel. Biophysical Journal 87:1686–1696.
[20] Bourgeois, D., B. Vallone, F. Schotte, A. Arcovito, A. E. Miele, G. Sciara, M. Wulff, P. An-finrud, and M. Brunori. 2003. Complex landscape of protein structural dynamics unveiledby nanosecond Laue crystallography. Proceedings of the National Academy of Sciences, USA100:8704–8709.
[21] Brunori, M. 2001. Nitric oxide moves myoglobin centre stage. Trends in Biochemical Sciences26:209–210.
[22] Brunori, M., D. Bourgeois, and B. Vallone. 2004. The structural dynamics of myoglobin.Journal of Structural Biology 147:223–234.
[23] Brunori, M. and Q. H. Gibson. 2001. Cavities and packing defects in the structural dynamicsof myoglobin. EMBO Reports 2:676–679.
[24] Brunori, M., B. Vallone, F. Cutruzzola, C. Travaglini-Allocatelli, J. Berendzen, K. Chu, R. M.Sweeti, and I. Schlichting. 2000. The role of cavities in protein dynamics: Crystal structureof a photolytic intermediate of a mutant myoglobin. Proceedings of the National Academy ofSciences, USA 97:2058–2063.
[25] Buhrke, T., O. Lenz, N. Krauss, and B. Friedrich. 2005. Oxygen tolerance of the H2-sensing[NiFe] hydrogenase from Ralstonia eutropha H16 is based on limited access of oxygen to theactive site. Journal of Biological Chemistry 280:23791–23796.
[26] Calhoun, D. B., J. M. Vanderkooi, G. V. Woodrow 3rd, and S. W. Englander. 1983. Penetrationof dioxygen into proteins studied by quenching of phosphorescence and fluorescence. Biochemistry22:1526–1532.
[27] Carlson, M. L., R. M. Regan, and Q. H. Gibson. 1996. Distal cavity fluctuations in myoglobin:Protein motion and ligand diffusion. Biochemistry 35:1125–1136.
[28] Case, D. A. and M. Karplus. 1979. Ligands binding to heme proteins. Journal of MolecularBiology 132:353–368.
100
[29] Chatfield, M. D., K. N. Walda, and D. Magde. 1990. Activation parameters for ligand es-cape from myoglobin proteins at room temperature. Journal of the American Chemical Society112:4680–4687.
[30] Chen, T.-Y. and C. Miller. 1996. Nonequilibrium gating and voltage-dependence of the ClC-0Cl− channel. Journal of General Physiology 108:237–250.
[31] Cohen, J., A. Arkhipov, R. Braun, and K. Schulten. 2006. Imaging the migration pathwaysfor O2, CO, NO, and Xe inside myoglobin. Biophysical Journal 91:1844–1857.
[32] Cohen, J., K. Kim, P. King, M. Seibert, and K. Schulten. 2005a. Finding gas diffusion pathwaysin proteins: Application to O2 and H2 transport in CpI [FeFe]-hydrogenase and the role of packingdefects. Structure 13:1321–1329.
[33] Cohen, J., K. Kim, M. Posewitz, M. L. Ghirardi, K. Schulten, M. Seibert, and P. King. 2005b.Molecular dynamics and experimental investigation of H2 and O2 diffusion in [Fe]-hydrogenase.Biochemical Society Transactions 33:80–82.
[34] Cohen, J. and K. Schulten. 2004. Mechanism of anionic conduction across ClC. BiophysicalJournal 86:836–845.
[35] Cohen, J. and K. Schulten. 2007. O2 migration pathways in monomeric globins are determinedby residue composition, not tertiary structure Submitted.
[36] Connolly, M. L. 1983. Solvent-accessible surfaces of proteins and nucleic acids. Science221:709–713.
[37] Cooper, G. and W. Boron. 1998. Effect of PCMBS on CO2 permeability of Xenopus Oocytesexpressing aquaporin 1 or its C189S mutant 275:C1481–C1486.
[38] Corry, B. and S. Chung. 2005. Influence of protein flexibility on the electrostatic energylandscape in gramicidin A. European Biophysics Journal 34:208–216.
[39] Corry, B., M. O’Mara, and S.-H. Chung. 2004. Conduction mechanisms of chloride ions inClC-type channels. Biophysical Journal 86:846–860.
[40] Czerminski, R. and R. Elber. 1991. Computational studies of ligand diffusion in globins: I.leghemoglobin. PROTEINS: Structure, Function, and Genetics 10:70–80.
[41] Dantsker, D., C. Roche, U. Samuni, G. Blouin, J. S. Olson, and J. M. Friedman. 2005. Theposition 68(E11) side chain in myoglobin regulates ligand capture, bond formation with hemeiron, and internal movement into the xenon cavities. Journal of Biological Chemistry 280:38740–38755.
[42] Doyle, D. A., J. M. Cabral, R. A. Pfuetzer, A. Kuo, J. M. Gulbis, S. L. Cohen, B. T. Chait, andR. MacKinnon. 1998. The structure of the potassium channel: molecular basis of K+ conductionand selectivity. Science 280:69–77.
[43] Duff, A., A. E. Cohen, P. J. Ellis, J. A. Kuchar, D. B. Langley, E. M. Shepard, D. M. Dooley,H. C. Freeman, and J. M. Guss. 2003. The crystal structure of Pichia pastoris lysyl oxidase.Biochemistry 42:15148–14157.
101
[44] Dutzler, R., E. B. Campbell, M. Cadene, B. T. Chait, and R. MacKinnon. 2002. X-raystructure of a ClC chloride channel at 3.0 A reveals the molecular basis of anion selectivity.Nature 415:287–294.
[45] Dutzler, R., E. B. Campbell, and R. MacKinnon. 2003. Gating the selectivity filter in ClCchloride channels. Science 300:108–112.
[46] Eargle, J. and Z. Luthey-Schulten. 2006. Visualizing the dual space of biological molecules30:219–226.
[47] Elber, R. and M. Karplus. 1990. Enhanced sampling in molecular dynamics: Use of thetime-dependent Hartree approximation for a simulation of carbon monoxide diffusion throughmyoglobin. Journal of the American Chemical Society 112:9161–9175.
[48] Estevez, R. and T. J. Jentsch. 2002. ClC chloride channels: correlating structure with function.Current Opinion in Structural Biology 12:531–539.
[49] Fahlke, C. 2001. Ion permeation and selectivity in ClC-type chloride channels. AmericanJournal of Physiology – Renal Physiology 280:F748–F757.
[50] Fahlke, C., H. Yu, C. L. Beck, T. R. Rhodes, and A. L. George, Jr. 1997. Pore-formingsegments in voltage-gated chloride channels. Nature 390:529–532.
[51] Fan, H.-J. and M. B. Hall. 2001. A capable bridging ligand for Fe-only hydrogenase: Densityfunctional calculations of a low-energy route for heterolytic cleavage and formation of dihydrogen.Journal of the American Chemical Society 123:3828–3829.
[52] Feher, V. A., E. P. Baldwin, and F. W. Dahlquist. 1996. Access of ligand to cavities withinthe core of a protein is rapid. Nature Structural Biology 3:516–521.
[53] Ferroni, S., C. Marchini, M. Nobile, and C. Rapisarda. 1997. Characterization of an inwardlyrectifying chloride conductance expressed by cultured rat cortical astrocytes. Glia 21:217–227.
[54] Flogel, U., M. W. Merx, A. Godecke, U. K. M. Decking, and J. Schrader. 2001. Myoglobin: Ascavenger of bioactive NO. Proceedings of the National Academy of Sciences, USA 98:735–740.
[55] Flynn, T., M. L. Ghirardi, and M. Seibert. 2002. Accumulation of O2-tolerant phenotypesin H2-producing strains of Chlamydomonas reinhartdtii by sequential applications of chemicalmutagenesis and selection. International Journal of Hydrogen Energy 27:1421–1430.
[56] Frauenfelder, H., B. H. McMahon, R. H. Austin, K. Chu, and J. T. Groves. 2001. The roleof structure, energy landscape, dynamics, and allostery in the enzymatic function of myoglobin.Proceedings of the National Academy of Sciences, USA 98:2370–2374.
[57] Frauenfelder, H., B. H. McMahon, and P. W. Fenimore. 2003. Myoglobin: The hydrogen atomof biology and a paradigm of complexity. Proceedings of the National Academy of Sciences, USA100:8615–8617.
[58] Friedrich, T., T. Breiderhoff, and T. J. Jentsch. 1999. Mutational analysis demonstrates thatClC-4 and ClC-5 directly mediate plasma membrane currents. Journal of Biological Chemistry274:896–902.
102
[59] Fu, D., A. Libson, L. J. W. Miercke, C. Weitzman, P. Nollert, J. Krucinski, and R. M. Stroud.2000. Structure of a glycerol conducting channel and the basis for its selectivity. Science 290:481–486.
[60] Garry, D. J., S. B. Kanatous, and P. P. A. Mammen. 2003. Emerging roles for myoglobin inthe heart. Trends in Cardiovascular Medecine 13:111–116.
[61] Garry, D. J., A. Meeson, Z. Yan, and R. S. Williams. 2000. Life without myoglobin. Cellularand Molecular Life Sciences 57:896–898.
[62] Gerber, R., V. Buch, and M. Ratner. 1982. Time-dependent self-consistent field approximationfor intramolecular energy transfer. I. formulation and application to dissociation of van der Waalsmolecules. Journal of Chemical Physics 94:3022–3030.
[63] Ghirardi, M. L., J. Cohen, P. King, K. Schulten, K. Kim, and M. Seibert. 2006. [FeFe]-hydrogenases and photobiological hydrogen production. In Solar hydrogen and Nanotechnology.L. Vayssieres, editor, volume 6340 of Proceedings of the Society of Photo-Optical InstrumentationEngineers, 253–258.
[64] Ghirardi, M. L., P. W. King, M. C. Posewitz, P. C. Maness, A. Fedorov, K. Kim, J. Cohen,K. Schulten, and M. Seibert. 2005. Approaches to developing biological H2-photoproducingorganisms and processes. Biochemical Society Transactions 33:70–72.
[65] Ghirardi, M. L., L. Zhang, J. W. Lee, T. Flynn, M. Seibert, E. Greenbaum, and A. Melis.2000. Microalgae: A green source of renewable H2. Trends in Biotechnology 18:506–511.
[66] Gibson, Q. H. and S. Ainsworth. 1957. Photosensitivity of haem compounds. Nature 180:1416–1417.
[67] Gibson, Q. H., R. Regan, R. Elber, J. S. Olson, and T. E. Carver. 1992. Distal pocketresidues affect picosecond ligand recombination in myoglobin. Journal of Biological Chemistry267:22022–22034.
[68] Giuffre, A., E. Forte, M. Brunori, and P. Sarti. 2005. Nitric oxide, cytochrome c oxidase andmyoglobin: Competition and reaction pathways. FEBS Letters 579:2528–2532.
[69] Gower, M., J. Cohen, J. Phillips, R. Kufrin, and K. Schulten. 2006. Managing biomolec-ular simulations in a grid environment with NAMD-G. In Proceedings of the 2006 TeraGridConference. In press.
[70] Grayson, P., E. Tajkhorshid, and K. Schulten. 2003. Mechanisms of selectivity in channelsand enzymes studied with interactive molecular dynamics. Biophysical Journal 85:36–48.
[71] Gullingsrud, J., R. Braun, and K. Schulten. 1999. Reconstructing potentials of mean forcethrough time series analysis of steered molecular dynamics simulations. Journal of ComputationalPhysics 151:190–211.
[72] Hargrove, M., J. Barry, E. Brucker, M. Berry, G. Phillips, Jr., J. Olson, R. Arredondo-Peter,J. Dean, R. Klucas, and G. Sarath. 1997. Characterization of recombinant soybean leghemoglobina and apolar distal histidine mutants. Journal of Molecular Biology 266:1032–1042.
103
[73] Harutyunyan, H. E., T. N. Safonova, I. P. Kuranova, A. N. Popov, A. V. Teplyakov, G. V.Obmolova, A. A. Rusakov, B. K. Vainshtein, G. G. Dodson, J. C. Wilson, and M. F. Perutz.1995. The structure of deoxy- and oxy-leghaemoglobin from lupin. Journal of Molecular Biology251:104–115.
[74] Hille, B. 1992. Ionic channels of excitable membranes. Sinauer Associates, Sunderland, MA,second edition.
[75] Huang, X. and S. G. Boxer. 1994. Discovery of new ligand binding pathways in myoglobin byrandom mutagenesis. Nature Structural Biology 1:226–229.
[76] Hub, J. S. and B. L. de Groot. 2006. Does CO2 permeate through Aquaporin-1? BiophysicalJournal 91:842–848.
[77] Hummer, G., F. Schotte, and P. A. Anfinrud. 2004. Unveiling functional protein motionswith picosecond x-ray crystallography and molecular dynamics simulations. Proceedings of theNational Academy of Sciences, USA 101:15330–15334.
[78] Humphrey, W., A. Dalke, and K. Schulten. 1996. VMD – Visual Molecular Dynamics. Journalof Molecular Graphics 14:33–38.
[79] Iyer, R., T. M. Iverson, A. Accardi, and C. Miller. 2002. A biological role for prokaryotic ClCchloride channels. Nature 419:715–718.
[80] Jentsch, T. J., T. Friedrich, A. Schriever, and H. Yamada. 1999. The ClC chloride channelfamily. Pflugers Archiv – European Journal of Physiology 437:783–795.
[81] Jiang, Y., A. Lee, J. Chen, M. Cadene, B. T. Chait, and R. MacKinnon. 2002. Crystalstructure and mechanism of a calcium-gated potassium channel. Nature 417:515–522.
[82] Jiang, Y., A. Lee, J. Chen, M. Cadene, B. T. Chait, and R. MacKinnon. 2003. X-ray structureof a voltage-dependent K+ channel. Nature 423:33–41.
[83] Johnson, B. J., J. Cohen, R. W. Welford, A. R. Pearson, K. Schulten, J. P. Klinman, andC. M. Wilmot. 2007. Lessons on substrate specificity: The crystal structure of Hansenula poly-morpha copper-containing amine oxidase in complex with xenon. Nature Chemical Biology Inpreparation.
[84] Kale, L., R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shi-nozaki, K. Varadarajan, and K. Schulten. 1999. NAMD2: Greater scalability for parallel molec-ular dynamics. Journal of Computational Physics 151:283–312.
[85] Kendrew, J. C., R. E. Dickerson, B. E. Strandberg, R. G. Hart, D. R. Davies, D. C. Phillips,and V. C. Shore. 1960. Structure of myoglobin: A three-dimensional Fourier synthesis at2 Angstrom resolution. Nature 185:422–427.
[86] Khalili-Araghi, F., E. Tajkhorshid, and K. Schulten. 2006. Dynamics of K+ ion conductionthrough Kv1.2. Biophysical Journal 91:L72–L74.
[87] King, P. W., D. Svedruzic, J. Cohen, K. Schulten, M. Seibert, and M. L. Ghirardi. 2006.Structural and functional investigations of biological catalysts for optimization of solar-driven,H2 production systems. In Solar Hydrogen and Nanotechnology. L. Vayssieres, editor, volume6340 of Proceedings of the Society of Photo-Optical Instrumentation Engineers, 259–267.
104
[88] Kocher, J.-P., M. Prevost, S. J. Wodak, and B. Lee. 1996. Properties of the protein matrixrevealed by the free energy of cavity formation. Structure 4:1517–1529.
[89] Kollman, P. 1993. Free energy calculations: Applications to chemical and biochemical phe-nomena. Chemical Reviews 93:2395–2417.
[90] Kottalam, J. and D. A. Case. 1988. Dynamics of ligand escape from the heme pocket ofmyoglobin. Journal of the American Chemical Society 110:7690–7697.
[91] Kumar, S., D. Bouzida, R. H. Swendsen, P. A. Kollman, and J. M. Rosenberg. 1992. Theweighted histogram analysis method for free-energy calculations on biomolecules. I. The method.Journal of Computational Chemistry 13:1011–1021.
[92] Laio, A. and V. Torre. 1999. Physical Origin of Selectivity in Ionic Channels of BiologicalMembranes. Biophysical Journal 76:129–148.
[93] Lakowicz, J. and G. Weber. 1973. Quenching of fluorescence by oxygen. probe for structuralfluctuations in macromolecules. Biological Cybernetics 12:4161–4170.
[94] Lemon, B. J. and J. W. Peters. 1999. Binding of exogenously added carbon monoxide atthe active site of the iron-only hydrogenase (CpI) from Clostridium pasteurianum. Biochemistry38:12969–12973.
[95] Lin, C.-W. and T.-Y. Chen. 2003. Probing the pore of ClC-0 by substituted cysteine accessi-bility method using methane thiosulfonate reagents. Journal of General Physiology 122:147–159.
[96] Liong, E. C. 1999. Structural and functional analysis of proximal pocket mutants of spermwhale myoglobin. Ph.D. thesis, Rice University, Houston, TX.
[97] Liong, E. C., Y. Dou, E. E. Scott, J. S. Olson, and G. N. Phillips. 2001. Waterproofing theheme pocket. Journal of Biological Chemistry 276:9093–9100.
[98] Ludewig, U., T. J. Jentsch, and M. Pusch. 1997. Inward rectification in ClC-0 chloride channelscaused by mutations in several protein regions. Journal of General Physiology 110:165–171.
[99] MacKerell, Jr., A., D. Bashford, M. Bellott, R. L. Dunbrack, Jr., J. Evanseck, M. J. Field,S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos,S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, I. W. E. Reiher, B. Roux, M. Schlenkrich,J. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus.1998. All-atom empirical potential for molecular modeling and dynamics studies of proteins.Journal of Physical Chemistry B 102:3586–3616.
[100] MacKerell, Jr., A. D., D. Bashford, M. Bellott, J. R. L. Dunbrack, J. Evanseck, M. J.Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, L. Kuchnir, K. Kuczera, F. T. K. Lau,C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, B. Roux, M. Schlenkrich, J. Smith,R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus. 1992.Self-consistent parameterization of biomolecules for molecular modeling and condensed phasesimulations. FASEB Journal 6:A143–A143.
[101] Maduke, M., C. Miller, and J. A. Mindell. 2000. A decade of ClC chloride channels: structure,mechanism, and many unsettled questions. Annual Review of Biophysics and BiomolecularStructure 29:411–438.
105
[102] Maurus, R., C. Overall, R. Bogumil, Y. Luo, A. Mauk, M. Smith, and G. Brayer. 1997.A myoglobin variant with a polar substitution in a conserved hydrophobic cluster in the hemebinding pocket. Biochimica et Biophysica Acta 1341:1–13.
[103] Mertens, R. and A. Liese. 2004. Biotechnological applications of hydrogenases. CurrentOpinion in Biotechnology 15:343–348.
[104] Merx, M. W., A. Godecke, U. Flogel, and J. Schrader. 2005. Oxygen supply and nitric oxidescavenging by myoglobin contribute to exercise endurance and cardiac function. FASEB Journal19:1015–1017.
[105] Miller, C. 1982. Open-state substructure of single chloride channels from Torpedo electroplax.Philosophical Transactions of the Royal Society of London B. (Biological Sciences) 299:401–411.
[106] Miller, C. 2003. Reading eukaryotic function through prokaryotic spectacles. Journal ofGeneral Physiology 122:129–131.
[107] Miloshevsky, G. V. and P. C. Jordan. 2004. Anion pathway and potential energy profilesalong curvilinear bacterial ClC Cl− pores: Electrostatic effects of charged residues. BiophysicalJournal 86:825–835.
[108] Mindell, J. A., M. Maduke, C. Miller, and N. Grigorieff. 2001. Projection structure of aClC-type chloride channel at 6.5 a resolution. Nature 409:219–223.
[109] Montet, Y., P. Amara, A. Volbeda, X. Vernede, E. C. Hatchikian, M. J. Field, M. Frey, andJ. C. Fontecilla-Camps. 1997. Gas access to the active site of Ni-Fe hydrogenase probed by x-raycrystallography and molecular dynamics. Nature Structural Biology 4:523–526.
[110] Morais-Cabral, J. H., Y. Zhou, and R. MacKinnon. 2001. Energetic optimization of ionconduction rate by the K+ selectivity filter. Nature 414:37–41.
[111] Nadler, W. and D. L. Stein. 1996. Reaction-diffusion description of biological transportprocesses in general dimension. Journal of Chemical Physics 104:1918–1936.
[112] Nakhoul, N., B. Davis, M. Romero, and W. Boron. 1998. Effect of expressing the waterchannel aquaporin-1 on the CO2 permeability of Xenopus oocytes 274:C543–548.
[113] Nicolet, Y., C. Cavazza, and J. C. Fontecilla-Camps. 2002. Fe-only hydrogenases: structure,function and evolution. Journal of Inorganic Biochemistry 91:1–8.
[114] Nicolet, Y., C. Piras, P. Legrand, C. E. Hatchikian, and J. C. Fontecilla-Camps. 1999. Desul-fovibrio desulfuricans iron hydrogenase: the structure shows unusual coordination to an activesite Fe binuclear center. Structure 7:13–23.
[115] Nienhaus, K., P. Deng, J. M. Kriegl, and G. U. Nienhaus. 2003. Structural dynamics ofmyoglobin: Effect of internal cavities on ligand migration and binding. Biochemistry 42:9647–9658.
[116] Noskov, S., S. Berneche, and B. Roux. 2004. Control of ion selectivity in potassium channelsby electrostatic and dynamic properties of carbonyl ligands. Nature 431:830–834.
106
[117] Nutt, D. R. and M. Meuwly. 2004. CO migration in native and mutant myoglobin: Atomisticsimulations for the understanding of protein function. Proceedings of the National Academy ofSciences, USA 101:5998–6002.
[118] Olson, J. S. and G. N. Phillips, Jr. 1997. Myoglobin discriminates between O2, NO, and COby electrostatic interactions with the bound ligand. Journal of Biological Inorganic Chemistry2:544–552.
[119] Olson, W. K. 1996. Simulating DNA at low resolution. Current Opinion in Structural Biology6:242–256.
[120] Ostermann, A., R. Waschipky, F. G. Parak, and G. U. Nienhaus. 2000. Ligand binding andconformational motions in myoglobin. Nature 404:205–208.
[121] Park, H. J., C. Yang, N. Treff, J. D. Satterlee, and C. Kang. 2002. Crystal structures ofunligated and CN-ligated Glycera dibranchiata monomer ferric hemoglobin components III andIV. PROTEINS: Structure, Function, and Genetics 49:49–60.
[122] Perutz, M. F. 1979. Regulation of oxygen affinity of hemoglobin: Influence of structure ofthe globin on the heme iron. Annual Review of Biochemistry 48:327–386.
[123] Perutz, M. F. and F. S. Mathews. 1966. An X-ray study of azide methaemoglobin. Journalof Molecular Biology 21:199–202.
[124] Pesce, A., S. Dewilde, L. Kiger, M. Milani, P. Ascenzi, M. C. Marden, M. L. V. Hauwaert,J. Vanfleteren, L. Moens, and M. Bolognesi. 2001. Very high resolution structure of a trematodehemoglobin displaying a TyrB10-TyrE7 heme distal residue pair and high oxygen affinity. Journalof Molecular Biology 309:1153–1164.
[125] Peters, J. W. 1999. Structure and mechanism of iron-only hydrogenases. Current Opinion inStructural Biology 9:670–676.
[126] Peters, J. W., W. N. Lanzilotta, B. J. Lemon, and L. C. Seefeldt. 1998. X-ray crystalstructure of the Fe-only hydrogenase (CpI) from Clostridium pasteurianum to 1.8 angstromresolution. Science 282:1853–1858.
[127] Phillips, J. C., R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D.Skeel, L. Kale, and K. Schulten. 2005. Scalable molecular dynamics with NAMD. Journal ofComputational Chemistry 26:1781–1802.
[128] Prasad, G. V. T., L. A. Coury, F. Finn, and M. L. Zeidel. 1998. Reconstituted aquaporin 1water channels transport CO2 across membranes. Journal of Biological Chemistry 273:33123–33126.
[129] Pusch, M., U. Ludewig, A. Rehfeldt, and T. J. Jentsch. 1995. Gating of the voltage-dependentchloride channel ClC-0 by the permeant anion. Nature 373:527–531.
[130] Radding, W. and G. N. Phillips, Jr. 2004. Kinetic proofreading by the cavity system ofmyoglobin: protection from poisoning. BioEssays 26:422–433.
[131] Richards, F. M. 1977. Areas, volumes, packing, and protein structure. Annual Review ofBiophysics and Bioengineering 6:151–176.
107
[132] Rizzi, M., J. B. Wittenberg, A. Coda, M. Fasano, P. Ascenzi, and M. Bolognesi. 1994.Structure of the sulfide-reactive hemoglobin from the clam Lucina pectinata. crystallographicanalysis at 1.5 A resolution. Journal of Molecular Biology 244:86–99.
[133] Rohlfs, R. J., J. S. Olson, and Q. H. Gibson. 1988. A comparison of the geminate recombina-tion kinetics of several monomeric heme proteins. Journal of Biological Chemistry 263:1803–1813.
[134] Roitberg, A. and R. Elber. 1991. Modeling side chains in peptides and proteins: Application ofthe locally enhanced sampling technique and the simulated annealing methods to find minimumenergy conformations. Journal of Chemical Physics 95:9277–9287.
[135] Roux, B. 1995. The calculation of the potential of mean force using computer simulations.Computer Physics Communications 91:275–282.
[136] Roux, B. 1999. Statistical Mechanical Equilibrium Theory of Selective Ion Channels. Biophys.J. 77:139–153.
[137] Royer, Jr., W. E., H. Zhu, T. A. Gorr, J. F. Flores, and J. E. Knapp. 2005. Allosterichemoglobin assembly: diversity and similarity. Journal of Biological Chemistry 39:27477–27480.
[138] Rychkov, G. Y., M. Pusch, M. L. Roberts, T. J. Jentsch, and A. H. Bretag. 1998. Permeationand block of the skeletal muscle chloride channel, ClC-1, by foreign anions. Journal of GeneralPhysiology 111:653–665.
[139] Salomonsson, L., A. Lee, R. B. Gennis, and P. Brzezinski. 2004. A single-amino-acid lidrenders a gas-tight compartment within a membrane-bound transporter. Proceedings of theNational Academy of Sciences, USA 101:11617–11621.
[140] Scharlin, P., R. Battino, E. Silla, I. Tunon, and J. L. Pascual-Ahuir. 1998. Solubility of gasesin water: Correlation between solubility and the number of water molecules in the first solvationshell. Pure and Applied Chemistry 70:1895–1904.
[141] Schlenkrich, M., J. Brickmann, A. D. MacKerell Jr., and M. Karplus. 1996. Empirical po-tential energy function for phospholipids: Criteria for parameter optimization and applications.In Biological Membranes: A Molecular Perspective from Computation and Experiment. K. M.Merz and B. Roux, editors. Birkhauser, Boston. 31–81.
[142] Schlichting, I. and K. Chu. 2000. Trapping intermediates in the crystal: ligand binding tomyoglobin. Current Opinion in Structural Biology 10:744–752.
[143] Schmidt, M., K. Nienhaus, R. Pahl, A. Krasselt, S. Anderson, F. Parak, G. U. Nienhaus, andV. Srajer. 2005. Ligand migration pathway and protein dynamics in myoglobin: A time-resolvedcrystallographic study on L29W MbCO. Proceedings of the National Academy of Sciences, USA102:11704–11709.
[144] Schotte, F., M. Lim, T. A. Jackson, A. V. Smirnov, J. Soman, J. S. Olson, G. N. Phillips, Jr.,M. Wulff, and P. A. Anfinrud. 2003. Watching a protein as it functions with 150-ps time-resolvedX-ray crystallography. Science 300:1944–1947.
[145] Scott, E. E. and Q. H. Gibson. 1997. Ligand migration in sperm whale myoglobin. Biochem-istry 36:11909–11917.
108
[146] Scott, E. E., Q. H. Gibson, and J. S. Olson. 2001. Mapping the pathways for O2 entry intoand exit from myoglobin. Journal of Biological Chemistry 276:5177–5188.
[147] Shrivastava, I. H. and M. S. P. Sansom. 2000. Simulations of ion permeation through apotassium channel: Molecular dynamics of KcsA in a phospholipid bilayer. Biophysical Journal78:557–570.
[148] Shrivastava, I. H., D. P. Tieleman, P. C. Biggin, and M. S. P. Sansom. 2002. K+ versus Na+
in a K channel selectivity filter: a simulation study. Biophysical Journal 83:633–645.
[149] Springer, B. A., S. G. Sligar, J. S. Olson, and G. N. Phillips, Jr. 1994. Mechanisms of ligandrecognition in myoglobin. Chemical Reviews 94:699–714.
[150] Steigemann, W. and E. Weber. 1979. Structure of erythrocruorin in different ligand statesrefined at 1.4 A resolution. Journal of Molecular Biology 127:309–338.
[151] Stone, J., J. Gullingsrud, P. Grayson, and K. Schulten. 2001. A system for interactivemolecular dynamics simulation. In 2001 ACM Symposium on Interactive 3D Graphics. J. F.Hughes and C. H. Sequin, editors, 191–194, New York. ACM SIGGRAPH.
[152] Straub, J. E. and M. Karplus. 1991. Energy equipartitioning in the classical time-dependentHartree approximation. Journal of Chemical Physics 94:6737–6739.
[153] Tajkhorshid, E., J. Cohen, A. Aksimentiev, M. Sotomayor, and K. Schulten. 2005. Towardsunderstanding membrane channels. In Bacterial ion channels and their eukaryotic homologues.B. Martinac and A. Kubalski, editors. ASM Press, Washington, DC. 153–190.
[154] Tajkhorshid, E., P. Nollert, M. Ø. Jensen, L. J. W. Miercke, J. O’Connell, R. M. Stroud, andK. Schulten. 2002. Control of the selectivity of the aquaporin water channel family by globalorientational tuning. Science 296:525–530.
[155] Teixeira, V. H., A. M. Baptista, and C. M. Soares. 2006. Pathways of H2 toward the activesite of [NiFe]-hydrogenase. Biophysical Journal 91:2035–2045.
[156] Tilton, R. F., I. D. Kuntz, and G. A. Petsko. 1984. Cavities in proteins: Structure of ametmyoglobin-xenon complex solved to 1.9 A. Biochemistry 23:2849–2857.
[157] Torres, R. A., T. Lovell, L. Noodleman, and D. A. Case. 2003. Density functional andreduction potential calculations of Fe4S4 clusters. Journal of the American Chemical Society125:1923–1936.
[158] Traverso, S., L. Elia, and M. Pusch. 2003. Gating competence of constitutively open ClC-0mutants revealed by the interaction with a small organic inhibitor. Journal of General Physiology122:295–306.
[159] Ulitsky, A. and R. Elber. 1993. The thermal equilibrium aspects of the time dependentHartree and the locally enhanced sampling approximations: Formal properties, a correction, andcomputational examples for rare gas clusters. Journal of Physical Chemistry 98:3380–3388.
[160] Ulitsky, A. and R. Elber. 1994. Application of the locally enhanced sampling (LES) and amean field with a binary collision correction (cLES) to the simulation of Ar diffusion and NOrecombination in myoglobin. Journal of Physical Chemistry 98:1034–1043.
109
[161] Valverde, M. A. 1999. ClC channels: leaving the dark ages on the verge of a new millenium.Current Opinion in Cell Biology 11:509–516.
[162] Vignais, P. M., B. Billoud, and J. Meyer. 2001. Classification and phylogeny of hydrogenases.FEMS Microbiol. Rev. 25:455–501.
[163] Vojtechovsky, J., K. Chu, J. Berendzen, R. Sweet, and I. Schlichting. 1999. Crystal structuresof myoglobin-ligand complexes at near-atomic resolution. Biophysical Journal 77:2153–2164.
[164] Srajer, V., Z. Ren, T. Y. Teng, M. Schmidt, T. Ursby, D. Bourgeois, C. Pradervand,W. Schildkamp, M. Wulff, and K. Moffat. 2001. Protein conformational relaxation and lig-and migration in myoglobin: a nanosecond to millisecond molecular movie from time-resolvedLaue X-ray diffraction. Biochemistry 40:13802–13815.
[165] Srajer, V., T. Y. Teng, T. Ursby, C. Pradervand, Z. Ren, S. Adachi, W. Schildkamp, D. Bour-geois, M. Wulff, and K. Moffat. 1996. Photolysis of the carbon monoxide complex of myoglobin:nanosecond time-resolved crystallography. Science 274:1726–1729.
[166] Wan, L., M. B. Twitchett, L. D. Eltis, A. G. Mauk, and M. Smith. 1998. In vitro evolutionof horse heart myoglobin to increase peroxidase activity. Proceedings of the National Academyof Sciences, USA 95:12825–12831.
[167] Wang, Y., J. Cohen, W. Boron, K. Schulten, and E. Tajkhorshid. 2006. Exploring gaspermeability of cellular membranes and membrane channels with molecular dynamics. Journalof Structural Biology In press.
[168] Weber, R. E. and S. N. Vinogradov. 2001. Non-vertebrate hemoglobins: functions and molec-ular adaptations. Physiological Reviews 81:569–627.
[169] Wittenberg, J. B. and B. A. Wittenberg. 2003. Myoglobin function reassessed. Journal ofExperimental Biology 206:2011–2020.
[170] Yang, J., A. P. Kloek, D. E. Goldberg, and F. S. Mathews. 1995. The structure of Ascarishemoglobin domain I at 2.2 A resolution: molecular features of oxygen avidity. Proceedings ofthe National Academy of Sciences, USA 92:4224–4228.
[171] Zhang, L. and J. Hermans. 1996. Hydrophilicity of cavities in proteins. PROTEINS: Struc-ture, Function, and Genetics 24:433–438.
110
Author’s Biography
Jordi Cohen was born in Montreal, Canada, on August 11, 1977. He completed a B.Sc. in Physics
from McGill University, and a M.Sc. in Physics from Simon Fraser University. As a graduate
student in the Physics Department at the University of Illinois at Urbana-Champaign, he studied
theoretical biophysics under the direction of Klaus Schulten.
Publications
1. Johnson, B. J., J. Cohen, R. W. Welford, A. R. Pearson, K. Schulten, J. P. Klinman, and
C. M. Wilmot. 2007. Lessons on substrate specificity: The crystal structure of Hansenula
polymorpha copper-containing amine oxidase in complex with xenon. In preparation.
2. Cohen, J. and K. J. Schulten. 2007. O2 migration pathways in monomeric globins are
determined by residue composition, not tertiary structure. Submitted.
3. Gower, M., J. Cohen, J. Phillips, R. Kufrin, and K. Schulten. 2006. Managing biomolecular
simulations in a grid environment with NAMD-G. In Proceedings of the 2006 TeraGrid
Conference. In press.
4. Wang, Y., J. Cohen, W. Boron, K. Schulten, and E. Tajkhorshid. 2006. Exploring gas per-
meability of cellular membranes and membrane channels with molecular dynamics. Journal
of Structural Biology In press.
5. Cohen, J., A. Arkhipov, R. Braun, and K. Schulten. 2006. Imaging the migration pathways
for O2, CO, NO, and Xe inside myoglobin. Biophysical Journal 91:1844–1857.
6. Ghirardi, M. L., J. Cohen, P. King, K. Schulten, K. Kim, and M. Seibert. 2006. [FeFe]-
hydrogenases and photobiological hydrogen production. In Solar hydrogen and Nanotech-
111
nology. L. Vayssieres, editor, volume 6340 of Proceedings of the Society of Photo-Optical
Instrumentation Engineers, 253–258.
7. King, P. W., D. Svedruzic, J. Cohen, K. Schulten, M. Seibert, and M. L. Ghirardi. 2006.
Structural and functional investigations of biological catalysts for optimization of solar-driven,
H2 production systems. In Solar Hydrogen and Nanotechnology. L. Vayssieres, editor, volume
6340 of Proceedings of the Society of Photo-Optical Instrumentation Engineers, 259–267.
8. Cohen, J., K. Kim, P. King, M. Seibert, and K. Schulten. 2005a. Finding gas diffusion
pathways in proteins: Application to O2 and H2 transport in CpI [FeFe]-hydrogenase and the
role of packing defects. Structure 13:1321–1329.
9. Cohen, J., K. Kim, M. Posewitz, M. L. Ghirardi, K. Schulten, M. Seibert, and P. King.
2005b. Molecular dynamics and experimental investigation of H2 and O2 diffusion in [Fe]-
hydrogenase. Biochemical Society Transactions 33:80–82.
10. Ghirardi, M. L., P. W. King, M. C. Posewitz, P. C. Maness, A. Fedorov, K. Kim, J. Cohen,
K. Schulten, and M. Seibert. 2005. Approaches to developing biological H2-photoproducing
organisms and processes. Biochemical Society Transactions 33:70–72.
11. Tajkhorshid, E., J. Cohen, A. Aksimentiev, M. Sotomayor, and K. Schulten. 2005. To-
wards understanding membrane channels. In Bacterial ion channels and their eukaryotic
homologues. B. Martinac and A. Kubalski, editors. ASM Press, Washington, DC. 153–190.
12. Cohen, J. and K. Schulten. 2004. Mechanism of anionic conduction across ClC. Biophysical
Journal 86:836–845.
112