c 2007 by Jordi Cohen. All rights reserved.

123
c 2007 by Jordi Cohen. All rights reserved.

Transcript of c 2007 by Jordi Cohen. All rights reserved.

Page 1: c 2007 by Jordi Cohen. All rights reserved.

c© 2007 by Jordi Cohen. All rights reserved.

Page 2: c 2007 by Jordi Cohen. All rights reserved.

GAS MIGRATION INSIDE PROTEINS: MECHANISM, CHARACTERIZATION, ANDAPPLICATIONS

BY

JORDI COHEN

B.Sc., McGill University, 1998M.Sc., Simon Fraser University, 2001

DISSERTATION

Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Physics

in the Graduate College of theUniversity of Illinois at Urbana-Champaign, 2007

Urbana, Illinois

Page 3: c 2007 by Jordi Cohen. All rights reserved.

Abstract

Gas migration inside proteins is a little-studied yet very important topic for many classes of proteins

such as globins, oxygenases, and oxidases, which store oxygen gas or use it for enzymatic purposes.

One reason why this process has not received prominent attention in recent years was because of

difficulties in identifying the pathways taken by oxygen or other gases diffusing inside proteins.

The reason for this difficulty is that, unlike typical ligand channels, gas pathways are not visible in

a protein’s static structure. This thesis rectifies these difficulties, by addressing many of the issues

important for finding, understanding, and manipulating gas migration pathways inside proteins.

First, it is found and convincingly demonstrated, through the use of a molecular dynamics

methodology called locally-enhanced sampling and a novel volumetric oxygen accessibility map

method, applied to the hydrogenase enzyme, that gas molecules make their way not through static

channels, but through well-defined “pathways”, which are completely defined by the details of a

protein’s thermal motion. This work is then followed up with the development of a new method,

called implicit ligand sampling, which allows for the first time to completely identify and energet-

ically characterize every oxygen pathway inside any protein of known structure merely from the

protein’s equilibrium dynamics. The protein dynamics, in this case, are collected through 10 ns-long

molecular dynamics simulations in the absence of internal gas ligands. Implicit ligand sampling is

then applied to and validated on the well-studied myoglobin oxygen-storage protein.

Finally, if one is to engineer oxygen pathways inside proteins, it is not enough to simply know

where such pathways are located, it is also important to understand how these pathways are

correlated with protein structure. For this reason, oxygen pathways were computed for a large

number of proteins from both the globin and copper-containing amine oxidase protein families.

It is found, surprisingly, that the locations of oxygen pathways are not conserevd within protein

families, and do not correlate at all with the proteins’ tertiary folds. However, a statistically-

iii

Page 4: c 2007 by Jordi Cohen. All rights reserved.

significant correlation was found between the proximity of certain residue types and protein oxygen

accessibility.

iv

Page 5: c 2007 by Jordi Cohen. All rights reserved.

Acknowledgments

First of all, particular thanks go to my adviser, Klaus Schulten, for introducing me to the field,

showing me the way, for his exceptional help and support throughout, and also for showing me

that physics can be used to meaningfully improve the world. None of this work would have existed

without my collaborators. Paul King introduced me to hydrogenase, which was the motivation

behind the many pages ahead. Paul King, Kwiseon Kim, Chris Chang, and Maria Ghirardi all

have inspired this work in too many ways to enumerate, have been remarkably welcoming hosts

during my visits to Colorado. Further thanks go to my collaborators Ken Olsen, James Knapp, Bill

Royer, Michael Seibert, Carrie Wilmot, and Bryan Johnson for showing me new directions, giving

me ideas, and motivating me, to John Stone for tolerating my hacking into the TCBG software,

and to Emad Tajkhorshid.

I also wish to especially thank the members of the TCBG group which have made my stay in the

in the middle of the cornfields particularly enjoyable. Elizabeth Villa has been an amazing friend,

the kind that doesn’t exist and is simply always there. Alek Aksimentiev, and his wife Angela

Peregud, have been a second family to me. Emma Falck, for being such an enjoyable person, and

for going out of her way for me many times. Yi Wang for keeping me nourished with chocolate

at all times, and for being a great office mate. Barry Isralewitz for reminding me everyday that

humor and wit rule the world, no matter what. Rosemary Braun, for being the kind and caring

model human being I try to be, and for all the delicious croissants. Finally, I cannot forget Justin

Gullingsrud, Leo Trabuco, Eric Lee, JC Gumbart, Mu Gao, Markus Dittrich, Amy Shih, Eduardo

Cruz-Chu, Alexander Balaef, Tim Isgro, Marcos Sotomayor, and Fatemeh Khalili-Araghi.

I also want to particularly thank Miriam Wodrich, for her wide open heart, for coming straight

out of a fairy tale, and for making my life a joy. And I definitely want to express my immense

appreciation and thanks to all the friends I have made in Illinois, to all the great friends abroad

v

Page 6: c 2007 by Jordi Cohen. All rights reserved.

that have kept me going, and to my family for their patience and support.

This work was supported by the National Institutes of Health grants PHS-5-P41-RR05969 and

1R01GM60946-01, the National Science Foundation grant SCI04-38712, and by the Department of

Energy’s Hydrogen Fuel Cells and Infrastructure Technologies Program. Supercomputer time was

provided by the National Center for Supercomputing Applications and the Pittsburgh Supercom-

puting Center via the National Resources Allocation Committee grant MCA93S028.

vi

Page 7: c 2007 by Jordi Cohen. All rights reserved.

Table of Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 2 Mechanism of gas migration inside [FeFe]-hydrogenase . . . . . . . . 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Chapter 3 Imaging the migration pathways for gas ligands inside myoglobin . . 263.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Chapter 4 Effects of protein architecture and sequence on gas migration path-ways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Chapter 5 Conclusion and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Appendix A Mechanism of anionic conduction across ClC chloride channels . . 75A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

vii

Page 8: c 2007 by Jordi Cohen. All rights reserved.

List of Tables

2.1 Proportion of hH2 exits by pathway. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Xenon Binding Site Free Energies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2 Ligand Solvation Energies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 List of penta-coordinated monomeric globins. . . . . . . . . . . . . . . . . . . . . . . 60

viii

Page 9: c 2007 by Jordi Cohen. All rights reserved.

List of Figures

1.1 The H2 production reaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Designing an O2-tolerant hydrogenase. . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Structure of CpI hydrogenase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Simulations of hH2 diffusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Simulations of O2 diffusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 Network of internal gas pathways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Comparison of TC-LES- and VSAM-predicted gas pathways. . . . . . . . . . . . . . 232.6 Comparison of average and maximum VSAMs. . . . . . . . . . . . . . . . . . . . . . 24

3.1 Probability of occurrence of the protein states. . . . . . . . . . . . . . . . . . . . . . 333.2 Predicted and actual Xe binding sites for sperm whale Mb. . . . . . . . . . . . . . . 393.3 Implicit ligand PMF for CO inside sperm whale Mb . . . . . . . . . . . . . . . . . . 413.4 Amino acids whose substitutions affect O2 or CO migration. . . . . . . . . . . . . . . 443.5 PMF profiles experienced by ligands exiting Mb. . . . . . . . . . . . . . . . . . . . . 473.6 Comparison of the implicit ligand PMF maps in Mbs of different species. . . . . . . 52

4.1 O2 PMF maps for various monomeric globins. . . . . . . . . . . . . . . . . . . . . . . 624.2 Aligned structure of 10 monomeric globins. . . . . . . . . . . . . . . . . . . . . . . . 644.3 Comparisons of O2 PMF surfaces for similar monomeric globins. . . . . . . . . . . . 654.4 Residue types favoring O2 pathway formation. . . . . . . . . . . . . . . . . . . . . . . 67

5.1 Gas pathways and barriers in AQP1 aquaporin. . . . . . . . . . . . . . . . . . . . . . 735.2 Central pore gas diffusion in AQP1 aquaporin. . . . . . . . . . . . . . . . . . . . . . 74

A.1 Membrane view of the ClC dimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.2 Top and side view of the ClC simulation system. . . . . . . . . . . . . . . . . . . . . 79A.3 Superposition of the wild-type and mutant structures. . . . . . . . . . . . . . . . . . 81A.4 Potential of mean force for Cl− conduction. . . . . . . . . . . . . . . . . . . . . . . . 86A.5 The ClC pore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87A.6 Sequence of events during conduction. . . . . . . . . . . . . . . . . . . . . . . . . . . 89A.7 PMF profiles for Cl− conduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90A.8 Interaction energy of a Cl− with the ClC pore. . . . . . . . . . . . . . . . . . . . . . 92A.9 Water double-file inside ClC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

ix

Page 10: c 2007 by Jordi Cohen. All rights reserved.

List of Abbreviations

ClC family of chloride channels/transporters.

CpI name of hydrogenase I from Clostridium pasteurianum.

deoxyMb/deoxyHb deoxy-myoglobin (heme is unliganded, often implies that a water molecule

lies in the myoglobin dital pocket).

DP distal pocket (cavity in Mb containing the heme binding site).

FEP free-energy perturbation.

Hb hemoglobin.

hH2 “heavy” hydrogen gas (with an atomic mass identical to that of oxygen gas).

Lb leghemoglobin.

LES locally-enhanced sampling.

Mb myoglobin.

MD molecular dynamics.

metMb metmyoglobin (heme iron is in the unreactive ferric state).

oxyMb/oxyHb oxy-myoglobin (heme is bound to O2).

PDB protein databank (database and/or file format).

PMF potential of mean force.

POPE palmitoyl-oleoyl-phosphoethanolamine (common cell membrane lipid).

x

Page 11: c 2007 by Jordi Cohen. All rights reserved.

TC-LES temperature-controlled locally-enhanced sampling.

VSAM volumetric solvent accessibility map.

xi

Page 12: c 2007 by Jordi Cohen. All rights reserved.

Chapter 1

Introduction

The hydrogenase problem

With the world’s oil reserves dwindling, the development of viable alternative energy fuels is be-

coming an urgent priority. Hydrogen gas (H2), which boasts zero pollution and can be produced

renewably, is a promising alternative to gasoline. One method of producing H2 which is currently

under development is by means of the unicellular green algae Chlamydomonas reinhardtii. Because

Chlamydomonas has the natural ability to couple photosynthetic water oxidation to the generation

of H2 through the [FeFe]-hydrogenase enzyme [113, 162] (see Fig. 1.1), it could be used to produce

H2 commercially [15, 65, 103]. Such a means of H2 production would be affordable and efficient,

requiring only water and sunlight, with up to 10% of incident sunlight energy being converted.

2e-

2H+

H2

H-cluster

C

M

Y

CM

MY

CY

CMY

K

coll_hydrogenase_fig1.pdf 8/3/06 5:34:08 PMcoll_hydrogenase_fig1.pdf 8/3/06 5:34:08 PM

Figure 1.1: The H2 production reaction. The hydrogenase enzyme (shown in ribbons) generatesH2 from hydrogen ions (H+) and electrons (e−) produced from photosynthetic water oxidation. O2

inactivates this reaction by binding to the H-cluster’s active site irreversibly.

1

Page 13: c 2007 by Jordi Cohen. All rights reserved.

While [FeFe]-hydrogenase has much potential as a source of H2, it unfortunately does not

operate under high concentrations of oxygen gas (O2), such as exists in ambient air. In fact, a

single O2 molecule can irreversibly bind to hydrogenase’s buried active site, inhibiting its activity,

and leading to the enzyme’s degradation. The inability of hydrogenase to work under aerobic

conditions prevents it from being a cost-effective means of producing H2. If one could engineer

hydrogenase by mutation such as to prevent O2 molecules from accessing its active site, it would

make the enzyme more tolerant to O2 and increase its usefulness as a means of producing H2. The

ideal goal for this, depicted in Fig. 1.2, would have an ideal set of hydrogenase mutations which

would block the passage of O2 to the active site while still allowing the H2 product to escape.

Figure 1.2: Designing an O2-tolerant hydrogenase. Hydrogenase is easily deactivated by O2

molecules that reach its active site by permeating across the protein matrix. Given that H2,due to its smaller size, is possibly less restricted in its motion than O2, it may be feasible to blockO2 by the selective mutation of O2-pathway-lining residues (O2-blocking mutations schematicallydepicted as large red X’s), while still allowing the H2 product to exit the protein.

The gas migration rates for O2 and H2 inside hydrogenase can, of course, be affected by point

mutations of hydrogenase. Recent studies showing that it is possible to completely block O2 passage

across cytochrome c oxidase [139] as well as to make a [NiFe]-hydrogenase accessible to O2 that was

previously O2-proof [25], make an encouraging case that gas accessibility of proteins can indeed

not only be affected, but also dramatically so, by minor mutational protein modifications to the

2

Page 14: c 2007 by Jordi Cohen. All rights reserved.

protein core. That this goal is achievable, at least in principle, is a foregone conclusion. The

question of how to reach this goal is, on the other hand, currently unaddressed and unanswered by

contemporary science.

Much of the work in this thesis was inspired by the desire to create a mutant of hydrogenase

which would, put in not-so-humble terms, solve the world’s energy problems. When work on this

research began, the realization dawned that very little was known about how to proceed with

such an endeavor. Not only were there no existing guidelines on how to reliably alter protein

function through targeted mutations, but on top of that, even the basic mechanisms of how, and

even through where, the O2 molecules could reach the active site inside any protein, was not well

known. After all, gas molecules are small and do not need large dedicated channels inside proteins

to reach their target. As a result, gas channels cannot be “seen” by merely looking at a protein’s

structure, further compounding the problem. As such, the principal goal of this thesis is to build

a foundation of knowledge regarding gas migration inside proteins. The questions that we will

focus on are: By what mechanism do gas molecules migrate inside proteins to reach specific targets

inside them? Can the location of such pathways be predicted and their features characterized, and

how? Finally, how would one choose protein mutations that would alter the behavior of known

gas migration pathways to reach pre-determined goals (such as sealing off the pathways or creating

new pathways). The following chapters aim to shed light on these questions.

Overview

While the chapters of this thesis are not all devoted to the hydrogenase enzyme (which is still the

main inspiration for this work), they are all concerned with one overlying endeavor: to understand

how gas migration pathways are formed, evolved, and used in proteins. Such knowledge is very

important if one is to follow a targeted approach to hydrogenase engineering rather than relying

solely on trial and error.

Chapter 2 solves the long-standing problem of how gas ligands migrate inside proteins, and

introduce a method for locating the pathways taken by gases migrating inside proteins. A [FeFe]-

hydrogenase enzyme from the bacterium Clostridium pasteurianum is used as an example. It is

found that while hydrogenase does not possess any static channels for the passage of O2 and H2, it

3

Page 15: c 2007 by Jordi Cohen. All rights reserved.

instead allows for gas ligands to migrate by means of transiently-forming cavities that appear inside

the protein due to thermal fluctuations. The locations of these temporary cavities become connected

over time and allow gas ligands to travel inside the protein along pre-determined pathways. Given

the ephemeral nature of these “pathways”, and given the fact that they are often not seen in static

crystal structure, we will avoid using the terms “channel” and “tunnel” often used in the literature

to designate them, since the imagery associated with such concepts is no longer accurate in light

of our discovery. It is also found that O2 uses only a subset of the pathways used by H2, such that

it would be possible, in principle, to block access to O2 inside hydrogenase while still allowing H2

to escape, as proposed earlier.

The hydrogenase gas migration study laid the groundwork for understanding gas migration

inside proteins, and revealed the importance of fluctuating cavities and protein dynamics for the

migration of gases inside proteins. One deficiency of the methodologies introduced in Chapter 2 was

that, while all the O2 pathways can be located, information about the energy barriers along these

pathways was not made available. Chapter 3 addresses this issue by introducing the implicit ligand

sampling method, which is closely related to the free energy perturbation method. Implicit ligand

sampling can compute the free energy of placing a gas ligand everywhere inside a protein with

relatively very little computation (over 106 times faster than using traditional methods, effectively

making this computation even possible). The end result is a complete map of all the energy

barriers for gas permeation inside the protein. The muscle O2 storage protein myoglobin, rather

than hydrogenase, was used as a test case because its kinetic properties have been extensively

characterized experimentally.

By this point, much has been discovered about how to identify and characterize gas pathways

inside proteins. However, the ultimate goal of this research is the engineering of hydrogenase, and

a mutational strategy that will solve this problem is still missing. Rather than take a haphazard

approach and test random mutations along the discovered hydrogenase O2 pathways, Chapter 4

investigate how such pathways are formed in naturally-occurring proteins, and how they relate to

the protein’s structure and its amino acid sequence. In order to do this, implicit ligand sampling

maps are computed for 16 proteins from the globin and copper-containing amine oxidase super-

families, such that the pathways can be compared across a range of protein with similar structures.

4

Page 16: c 2007 by Jordi Cohen. All rights reserved.

To our surprise, and against conventional wisdom, the discovered pathways did not correlate with

the proteins’ architectures at all. However, a very definite and reproducible correlation between

specific residue types and O2 channels is found, which can be used as a guideline to alter or engineer

oxygen pathways in proteins such as hydrogenase.

While this thesis focused on oxygen pathways mostly in hydrogenase and globin proteins, the

implicit ligand sampling method has a much broader applicability. Chapter 5 concludes the thesis

by providing a brief overview of various applications of this work over past years.

Finally, Appendix A details the conduction mechanism of Cl− across ClC – a membrane span-

ning Cl− transporter. While, in principle, there should be strong similarities between hydrogenase

and ClC, notably, in both cases a small ligand must make its way across the protein, in practice

these two problems must be approached very differently. Unlike weakly-interacting O2 migrating

inside proteins, Cl− ions interact with the protein very strongly and distort the protein while per-

meating across it. Furthermore, being simultaneously selective to Cl− while making sure not to

bind it too tightly such that it does not stay long in the protein presents a challenge that ClC

overcomes in ingenious ways. The methodologies used in the Appendix to answer these questions

pose an interesting contrast with those used to study gas migration.

5

Page 17: c 2007 by Jordi Cohen. All rights reserved.

Chapter 2

Mechanism of gas migration inside[FeFe]-hydrogenase

Here, we report on a computational investigation of the passive transport of H2 and O2 between the

external solution and the hydrogen-producing active site of the CpI [FeFe]-hydrogenase structure

from Clostridium pasteurianum. Two distinct methodologies for studying gas access are discussed

and applied to the case of CpI: (1) temperature-controlled locally enhanced sampling, and (2)

volumetric solvent accessibility maps, providing consistent results. Both methodologies confirm

the existence and function of a previously hypothesized pathway and reveal a second major gas

pathway which had not been detected by previous analyses of CpI’s static crystal structure. We

describe two completely different modes of intra-protein transport for H2 and O2, which in our

model are differentiated only by their size. We present strong evidence that supports the hypothesis

that small hydrophobic molecules diffusing inside CpI, such as H2 and O2, take advantage of pre-

existing packing defects in the protein. We show how and why hydrophobic particles of a certain

size travel in the protein along predetermined pathways, which are not visible in the protein’s static

structure. We also show that one can efficiently predict the gas-accessible areas of any protein,

based on the assumption that gas passage requires the formation of spontaneous cavities. (This

chapter is based on work published in Cohen et al. [32, 33].)

2.1 Introduction

Hydrogenases are enzymes that catalyze reversibly the oxidation or production of molecular hy-

drogen according to the reaction

2e− + 2H+ H2. (2.1)

Various hydrogenases can be found in a wide variety of unicellular organisms and usually come in

one of two flavors: [NiFe]-hydrogenases, which are usually associated with H2 uptake, or “iron-only”

6

Page 18: c 2007 by Jordi Cohen. All rights reserved.

[FeFe]-hydrogenases [4], which are generally involved in H2 production. In a majority of microor-

ganisms, [FeFe]-hydrogenases [113] function in anaerobic metabolism to oxidize overly reduced

electron carriers. Much of the recent scientific interest in [FeFe]-hydrogenases, however, concerns

a different role entirely: the H2 production properties of [FeFe]-hydrogenases offering the promise

of a means for affordable large-scale production of H2 as a source of renewable energy [65, 103].

In this chapter, we investigate the structural properties of [FeFe]-hydrogenase CpI from Clostrid-

ium pasteurianum. Hydrogen production in CpI happens at the H-cluster, a metallic cluster bound

to and embedded inside the CpI protein matrix, and is achieved by the reduction of H+ ions from

the external solution using electrons acquired from a reduced carrier such as ferredoxin [4, 125].

The H+ ions (or “protons”) probably reach the H-cluster by means of a putative, but yet unveri-

fied, proton pathway contained in the protein [126]. The electrons are transferred to the embedded

H-cluster through a series of accessory iron-sulfur clusters aligned in a chain between the H-cluster

and one end of CpI. The CpI enzyme is displayed along with its embedded iron-sulfur clusters and

H-cluster in Fig. 2.1.

Figure 2.1: Structure of CpI hydrogenase, showing the enzyme with its embedded H-cluster andiron-sulfur clusters. Also shown are the residues lining the two principal gas pathways connectingthe external solution with the central H-cluster.

In addition to electron and H+ transport, CpI must also allow gas transport to and from its

7

Page 19: c 2007 by Jordi Cohen. All rights reserved.

active site. On one hand, it must allow the H2 product to exit the protein, and on the other hand,

it must also allow small gas molecules such as O2 and CO to penetrate through the enzyme and

reach the H-cluster, which can be irreversibly deactivated upon their binding. While O2-mediated

deactivation of hydrogenases is in some cases beneficial for the host organism, it severely limits

the practicality of using hydrogenase to produce H2 as a carrier of consumable energy [55]. A few

studies have investigated gas access in hydrogenases using either molecular dynamics sampling of

the possible paths for H2 permeation [33, 109], hydrophobic cavity searches on static structures [109,

113, 114] or crystallography of xenon-saturated structures [109]. Although some progress has been

made, aside from the certain but incomplete predictions from X-ray crystallography on proteins

in the presence of xenon, no other method has been able to comprehensively predict all of the gas

pathways in hydrogenases or in any other protein.

While investigating the possible pathways taken by H2 and O2 across CpI hydrogenase, we

noticed a significant difference between the dynamics of H2 and of O2 gas permeation through the

protein matrix, which is caused only by differences in ligand size. This behavior prompted us to

examine the effect of the protein’s internal dynamics in regulating gas access to its buried active

site. For this purpose, we introduce a new approach for mapping transient cavities based on the

dynamics of the lone protein (i.e. without the gas ligand) extracted from computer simulation and

compare the resulting maps to trajectories of the intra-protein gas migration process. An almost

perfect match between the two computations prompts us to suggest that the protein-wide pathways

taken by hydrophobic gases in CpI hydrogenase, and possibly in any gas transport protein, is fully

determined by density fluctuations within the protein. Furthermore, for the case of hydrogenase,

two major gas transport pathways are fully characterized by the protein’s motion at the nanosecond

time-scale, despite the fact that the actual diffusion of gases such as dioxygen may take much longer.

In addition to the physical insight gained, the new cavity mapping method holds the promise of

being able to predict protein-wide gas transport pathways inside macromolecules. Our results

provide insights into the gas transport mechanisms for H2 and O2 in CpI hydrogenase, as well as

a new way of looking at gas transport pathways, which is immediately applicable to other proteins

and structures.

8

Page 20: c 2007 by Jordi Cohen. All rights reserved.

2.2 Methods

2.2.1 Volumetric Solvent Accessibility Maps

We introduce volumetric solvent accessibility maps (VSAMs) as a means of representing the solvent

accessible surface of a protein. Given a set of hard atomic spheres representing a macromolecule,

existing methods typically calculate a closed polyhedron that represents the boundary between

space that is penetrable and impenetrable to a model solvent molecule represented by a hard sphere

of specified radius (e.g., Richards’ Smooth Molecular Surface [131] and Conolly’s method [36]).

While traditional techniques can produce almost exact results, there are significant advantages to

be gained in representing solvent maps as a volumetric data set (i.e., a 3D grid of scalar values).

VSAMs can contain information about not just one size of solvent molecule, but about all sizes

simultaneously, enabling the interactive visualization of how the intermolecular cavities change as

a function of probe radius. Most importantly, however, many VSAMs can be combined together

using average or maximum rules. When performed over a set of trajectory frames, rather than for

a static structure, they can reveal information that cannot be gotten from conventional methods,

such as presented in this chapter. However, this comes at the cost of limited resolution, precision

and increased computer memory requirements.

In practice, the solvent accessibility of a macromolecule is stored in a 3D volumetric grid which

overlaps all or part of the macromolecule in coordinate space. The value of each voxel (i.e., grid

point) of the grid is set to be the radius of the largest possible solvent sphere that (i) contains

the spatial coordinates of the center of that voxel, and (ii) does not overlap with any of the van

der Waals spheres that represent the molecule’s atoms. Voxels whose coordinates lie inside the

molecule’s van der Waals spheres are set to zero, since no solvent can be present there. Voxels in

small interstices will have very small radius values, whereas voxels in large cavities or outside the

protein can have very large radius values. To retrieve a close approximation of the conventional

solvent accessible surface, one needs only to display the isocontours of the VSAMs for a given value

corresponding to the desired probe radius (e.g., such as a 1.4 A radius for a water solvent). All

the voxels with values smaller than the probe radius will lie outside the contour, and those with

larger values will lie inside, because if a voxel can be contained in a large solvent sphere, it will

9

Page 21: c 2007 by Jordi Cohen. All rights reserved.

automatically be accessible to smaller solvent spheres.

We now report on average and maximum VSAMs, calculated from a set of frames from MD

simulation trajectories. One must imagine that a separate VSAM is calculated for each simulation

frame, that all the volumetric grids are aligned and of the same size, and that for all frames, the

macromolecule has been repositioned such that its Cα atoms are aligned using a best fit. The

average solvent map is then simply a map in which the voxel values have been averaged over all

frames, and provides a description of the average size of the solvent which can reach a specific

position in space over the course of the trajectory. The maximum solvent map, then, contains at

each voxel the maximum value encountered during the course of the trajectory. The maximum map

describes the maximum solvent size that can ever be spontaneously placed inside the macromolecule

during any frame of the trajectory. If a pathway is only transiently open, or if its different sections

are open at different times, then the maximum solvent map will show what the pathway would

look like in its maximally opened state (which may never be encountered during the course of an

equilibrium simulation), and represents the areas in space that are ever accessible to a ghost solvent

sphere which does not disrupt the protein.

2.2.2 Temperature-controlled Locally Enhanced Sampling (TC-LES)

LES defines an algorithm for simulating multiple copies of a certain number of particles (in this

case, gas ligands), which interact in a mean-field way, with a single copy of the environment (here,

the protein). One of the features of LES, as described by [47], is that the effective temperature of

the replicated particles’ dynamics (with N replicas) is increased by a factor of N . This enhanced

temperature, while often used as a means to cross barriers, also has disadvantages. For one, the

resulting trajectories can be completely unphysical (i.e., assuming 10 replicated copies, the behavior

of a particle at 3000 K is very different from that of one at 300 K) and the copies’ traveling speed

and energy landscape is dramatically altered compared to reality. Because of this, the extra factor

of N in the temperature of the replicated particles effectively limits the amount of copies possible. If

one could control the dynamics of the replicated particles such that they all act like 300 K particles,

then it would become practical to use much more than 10 simultaneous copies and to gain a greater

amount of information from a single simulation (we used 1,000 copies for the present case).

10

Page 22: c 2007 by Jordi Cohen. All rights reserved.

In LES, the kinetic energy K and potential energy V are defined as follows:

K =12

∑a∈A

maq2a +

12N

N∑i=1

∑x∈X

mxq2x,i (2.2)

V = VAA(qA) +1N

N∑i=1

[VXX(qX,i) + VAX(qA,qX,i)] (2.3)

where qn, qn and mn are the position, velocity and mass, respectively, of particle n. A and X

represent the sets of unreplicated and replicated particles, respectively, and N is the number of

LES copies.

As outlined in [152], equations (2.2) and (2.3) describe a non-Newtonian system (Newton’s

second law is not satisfied), with the consequence that, at equilibrium, they describe a system in

which each replicated particle will acquire a kinetic energy that is N times greater than that of an

unreplicated particle (for an LES particle, 〈KLES〉 = 32NkBT ). While their formal temperatures

(defined according to the zeroth law of thermodynamics) are the same, the “effective” temperature

(defined from the average kinetic energy per particle 〈K〉 = 32kBT ) of the LES particles is larger

by a factor of N . A commonly used method for dealing with the divergent kinetic energy of the

replicated particles has been to increase the mass of these particles by a factor of N [134, 152]. This

slows down the copied particles so that their motion can be calculated efficiently, but does nothing

to address the divergent kinetic energy itself or to avoid the resulting reduction in energy barriers

for the copied particles. The increased ligand energy results in significantly skewed simulations,

and one is unable to reproduce results from a limited set of single copy simulations, such as a

preference for certain cavities and pathways, by using straight LES with as few as 10 ligand copies

(results not shown). To overcome these problems, improvements to the LES algorithm have been

suggested and tested by [159]. Comparison of the original LES and its variant with single-copy

dynamics highlight the shortcomings of straight LES, as well as the success of the LES variant, in

reproducing the correct dynamics [160].

In our own attempt to improve the LES method, we sample a constant temperature ensemble

by coupling the actual protein and the replicated gas ligands to different Langevin heat baths. The

idea of controlling the replicated particles’ temperature was originally suggested in [152] and also

applied in [40]. For the regular particles, we use a temperature target of TA = 310 K, while for

11

Page 23: c 2007 by Jordi Cohen. All rights reserved.

the N replicated gas particles, the target must be set to TA/N or lower, such that the resulting

“effective” temperature of the replicated particles, as measured in the simulation, is close to 310 K.

As a result, the average kinetic energy is the same for all particles and the sampling of phase

space corresponds to a 310K constant-temperature ensemble. Maintaining parts of the system

at two different formal temperatures keeps the system out of equilibrium, and as such the LES

copies will have a tendency to heat up, whereas the protein matrix will want to cool down. In

practice, we found that in order to keep the LES copies at a stable temperature, a larger Langevin

friction coefficient was needed for the replicated gas particles (since they are surrounded by a much

larger bath of protein). We used a Langevin damping term of 5 ps−1 for the unreplicated particles,

and 10 ps−1 for the replicated particles. The resulting temperature distribution of the TC-LES-

replicated particles is broader than that of the unreplicated system (we measure a normalized

standard deviation for the temperature fluctuations of 10.0 K for the replicated particles and of

1.2K for the equivalent non-LES simulation). Using TC-LES, the only approximation made is that

instead of feeling one gas molecule, the environment feels a delocalized cloud of gas. Finally, in

order to drastically speed up the simulation, we ignored the contribution of the gas molecule to the

system’s charge distribution when computing PME electrostatics. This gives exact results in the

present case, since our models for O2 and H2 contain no partial charges. The increased sampling is

thus achieved predominantly by increasing the number of copies (reducing entropic barriers), and

not by altering the energy landscape as is done in the original LES method.

2.2.3 Simulation Parameters

O2 and H2 gas access was investigated by all-atom molecular dynamics simulations of the diffusion

of O2 and H2 molecules inside CpI, originating at the active site. We used a model for the gas

molecules that did not include any partial charges, and for which O2 and H2 differed only in their

Lennard-Jones parameters and bond spring constant and length. As stated before, we used a heavy

version of dihydrogen (hH2) instead of H2, so that we could compare the diffusion properties of O2

and H2 based on their size differences alone.

Our model of hydrogenase was based on the X-ray structure of CpI [FeFe]-hydrogenase [126]

(PDB accession code 1feh). A series of atoms in the active site, or H-cluster, were missing from the

12

Page 24: c 2007 by Jordi Cohen. All rights reserved.

Protein Database structure, and have been modeled here as a di(thiomethyl)amine bridge between

the two H-cluster sulfur atoms, as suggested by later studies [51, 113]. The partial charges for

the rest of the H-cluster were based on a density functional theory calculation on the 2- oxidation

state [157], and individual charges were tweaked by ±0.02 e to guarantee the system’s charge

neutrality. The structure was embedded in a water box, resulting in a 57,000-atom system consisting

of 9,000 hydrogenase atoms, 16,000 water molecules and 15 sodium ions to cancel the excess integer

charge.

The system was then equilibrated at a constant temperature (310K) and pressure (1 atm) for a

duration of 1 ns. The last frame of this equilibration was used as a starting point for all subsequent

simulations. Aside from the initial equilibration, all simulations were performed at constant volume

and temperature (310 K). In all cases, periodic boundary conditions were used. Temperature was

regulated within the TC-LES approach by using Langevin dynamics with damping constants of

5 ps−1 for unreplicated atoms and 10 ps−1 for replicated gas atoms, respectively. Multiple time

stepping was used, with integration time steps of 1 fs, 2 fs and 4 fs, respectively, for bonded, non-

bonded and long-range electrostatic interactions (a non-bonded time step of 1 fs was used for the

case of hH2 TC-LES simulations because of energy stability issues when replicated hH2 molecules

enter the water solution environment). Particle Mesh Ewald with a grid resolution of better than 1 A

was used for long-range electrostatics, and all other non-bonded interactions were calculated using

a cutoff of 12 A. The CHARMM22 force-field [99, 100] was employed for all protein interactions,

and simulations were performed using the NAMD [84] molecular dynamics software, modified by

the present authors to allow for TC-LES.

2.3 Results

We now describe in detail MD trajectories of H2 and O2 gas migration inside CpI. We then compare

our results with a dynamic mapping of the protein cavities in the absence of gas, and find a strong

correlation between locations of the protein’s natural cavities and of the diffusing gas molecules.

13

Page 25: c 2007 by Jordi Cohen. All rights reserved.

2.3.1 Simulations of Gas Diffusion Reveal Different Transport Mechanisms for

H2 and O2

While the diffusion of O2 or H2 molecules inside a protein is not a particularly slow process (a

transport event can take from 100 ps to hundreds of ns), it is a stochastic process, and cannot be

completely described by simply examining a few nanosecond-long trajectories. In order to dras-

tically speed-up the exploration of O2 and H2 entry/exit pathways, it becomes necessary to use

certain approximations. One such approximation, known as locally enhanced sampling (LES) [47],

and based on the time-dependent Hartree approximation [62], allows a certain subset of particles in

the simulated system to be replicated many times, where each replicated subset is simulated inde-

pendently. In such a scheme, each set of replicated particles interacts with a common environment

consisting of the unreplicated particles, but the replicated copies do not interact with each other

at all. In the present chapter, we make use of a variant of LES: temperature-controlled locally

enhanced sampling (TC-LES), which is described in the Experimental Procedures section.

We have run separate TC-LES simulations for H2 and O2, using 1,000 copies of the diffusing

gas molecule, in order to determine the pathways taken by H2 and O2 while transiting between the

active site of CpI and the external solution. In our simulations, we have used a heavy version of H2,

“heavy dihydrogen” (hH2), in which the hydrogen atoms’ masses were set to that of oxygen. The

reasons for this is that we were mainly interested in investigating the accessibility of the protein

to gas molecules, as a function of the gas’ molecule size. Since we use the same mass for both

O2 and hH2, we remove the variation in behavior between the two gases which is caused by their

difference in mass (and consequently velocity) and instead focus purely on the variation in behavior

which is due to their size difference. Changing the mass of the diffusing gas does not affect the

kinetic energy, momentum, or energy profile perceived by the gas or experienced by the protein.

The system will explore the same gas-protein conformational space as it would otherwise, except

that the actual velocities of the hH2 molecules will be slowed down with respect to H2. However,

the rate of diffusion of hH2 is not expected to be significantly different from that of real H2, due

to friction effects. The larger mass also circumvents the use of a much smaller simulation time

steps, which are necessary when dealing with the very sharp Lennard-Jones potentials of replicated

“light” H2.

14

Page 26: c 2007 by Jordi Cohen. All rights reserved.

simulation pathway A pathway B other pathway total exited#1 21% 22% 4% 47%#2 72% 4% 3% 79%#3 10% 3% 6% 19%#4 4% 25% 7% 36%

Table 2.1: Table showing the percentage (rounded) of the total replicated hH2 molecules whichhave exited CpI during each of four different 4 ns simulations of the diffusion of hH2 starting at theH-cluster binding site, sorted by pathways. Pathways “A”, “B” and “other” refer to the previouslysuggested (A), the newly-discovered exit pathway (B) discussed above, or neither (other). “Total”refers to the total percent of molecules that found their way out of the protein during the 4 ns.A molecule is considered to have exited when one of its atoms comes into contact with a watermolecule located outside of the protein.

For the case of H2, we performed four simulations of 4 ns each, in which the hH2 molecules were

initially placed at the active site (at which hydrogen production takes place), based on the location

of the active-site-bound carbon monoxide in the X-ray structure of CO-saturated CpI by Lemon

and Peters [94]. In all cases, hH2 exited predominantly through two major migration pathways,

the first one (pathway A) having been previously proposed as a H2-channel candidate [113, 125],

and the second one (pathway B) being newly discovered [33]. Both pathways meet at a large cavity

right next to the H-cluster binding site. Aside from the main cavity and the two pathways, the

hH2 molecules from all the simulations taken together consistently explored similar regions of the

protein, as displayed in Fig. 2.2. Despite the fact that hH2 exited simultaneously through both

pathways in each simulation and that the shape of these pathways explored by hH2 was the same

for all simulations, the proportion of hH2 exiting through one pathway or the other and the exit

rates of hH2 out of the protein varied significantly from one simulation to the next, as detailed in

Table 2.1. In all cases, the average hH2 time of first exit out of the protein was very fast and on

the order of nanoseconds (with our simulation suggesting that the first hH2 molecules will have

found the exit within 200 ps and that roughly half will have exited after 4 ns). We expect that real

H2 would exhibit exit times very similar to those of hH2.

For the case of O2 migration from the H-cluster, five independent 3.5–4 ns simulations were

performed, some of which are represented in Fig. 2.3. In these simulations, 1,000 TC-LES copies

of O2 were placed at the H-cluster binding site and allowed to diffuse. In only one of these five

simulations did we observe O2 to leave the central cavity through the newly discovered pathway B

15

Page 27: c 2007 by Jordi Cohen. All rights reserved.

Figure 2.2: Four 4 ns TC-LES simulations of 1,000 copies of hH2 diffusing out from the H-cluster.Each independent simulation is shown in a different color, highlighting the fact that the spaceexplored by hH2 was consistent between simulations. Frames taken from every 50 ps of the tra-jectories of the hH2 molecules located inside the protein are superimposed as a cloud. Shown inlicorice are the iron-sulfur clusters and H-cluster, as well as the residues that line the two majorexit pathways.

(see Fig. 2.3 (blue)). In the four other cases, O2 remained in the central cavity near the binding site

(Fig. 2.3 (red)) for the duration of the simulation. Since the other pathway (pathway A) through

which we observed hH2 to diffuse appears to be a narrow but unambiguously hydrophobic channel

in the crystal structure, it is suspected that O2’s failure to exit through it in our simulations is

simply due to lack of sufficient sampling of the protein’s degree of freedom. With this in mind,

additional TC-LES simulations of O2 were performed, using as starting positions various locations

where large densities of hH2 were observed in the previous hH2 diffusion simulations. When we

placed O2 inside the originally-proposed hydrophobic channel (pathway A), one cavity away from

the central cavity, we were able to successfully observe O2 migration along that channel. A fraction

16

Page 28: c 2007 by Jordi Cohen. All rights reserved.

of the O2 molecules placed in this cavity even diffused inward to the central cavity and back in

one of the three 3.5 ns simulations performed (one is shown in Fig. 2.3 (green)). In no case was

O2 observed to completely exit the protein and partition into the water solution. We suspect that

the hydrophobicity of O2 causes it to prefer the protein environment to that of the water solution;

however, the influence of our O2 model parameters and of the TC-LES dynamics might also be

playing a role. The rate of diffusion of O2 inside hydrogenase appears to be entirely determined by

the protein’s dynamical fluctuations. O2 was observed to be able to diffuse to the surface of the

protein very quickly (in as little as 1.2 ns) when the protein explored a set of ideal conformations.

However, in most simulations we did not observe any significant travel of O2 (in four instances

out of five), reflecting the fact that typical protein conformations are usually unfavorable to O2

migration.

For O2, we simulated the reverse of the natural process of O2 migration from the bulk solvent

toward the active site. This was done because, at the beginning of our investigation, it was not

known where O2 could enter the protein, and the active site was the only location of CpI which was a

priori known with certainty to be accessible to O2, based on previous studies of O2 inactivation [4].

However, since the transport mechanism of O2 inside CpI is almost undoubtedly passive, it does not

matter in which direction we simulate the migration, unless there is a strong overall energy gradient

between the outside and inside. In our simulations, we have observed back-and-forth motion of

O2 and H2 suggesting that the energy profile is approximately flat between solvent and active site

(excluding the intermediary energy barriers). The degree of flatness of the free energy profile of

the gas diffusing through the protein, as well as the accessibility of the identified pathway exits to

a O2 molecule entering CpI from the external solution, of course, still needs to be confirmed by

more detailed studies.

From our TC-LES results, we see that both O2 and hH2 can diffuse across the protein and exit

through two common pathways. However, we noticed that hH2 can penetrate a broader region of the

protein, and on shorter time scales, whereas O2 in our simulations was strictly limited to the above-

mentioned pathways. Aside from exploring very similar regions of CpI, the TC-LES trajectories for

O2 were qualitatively very different from those of H2 in terms of the collective dynamics. While the

different copies of hH2 spread out with time into a diffuse cloud covering the whole protein-water

17

Page 29: c 2007 by Jordi Cohen. All rights reserved.

Figure 2.3: Three representative TC-LES simulations of 1,000 copies of O2 diffusing out from theH-cluster or from the middle of a previously identified H2-channel. Each independent simulationis shown in a different color and one can see that, contrary to H2 diffusion, the O2 moleculesmove collectively through the same pathway for a given simulation, though they may employdifferent pathways for different independent simulations. Overall, the set of pathways exploredby O2 matches the dominant pathways explored by hH2. Snapshots were taken every 50 ps. Therepresentation of the protein is the same as in Fig. 2.2. Dotted arrows represent possible exitsbased on the proximity of the external solution.

system, the O2 molecules typically clustered together as a single cloud (which occasionally could

also split into more clouds on a ∼3–4 ns timescale). The fact that the O2 molecules cluster cannot

simply be attributed to the fact that they all experience similar condition: they all have different

initial velocities, Langevin random forces and interactions with the protein, and, in addition, the

dramatic clustering behavior is not observed at all for the smaller hH2 molecules. Instead, the

behavior of the collective O2 motion suggests that O2 does not diffuse in and out of CpI through

a permanent channel, contrary to what was previously assumed. As will be shown later, O2 fills

up small cavities inside the protein which are themselves dynamic. Through the protein’s natural

18

Page 30: c 2007 by Jordi Cohen. All rights reserved.

motion, combined with the disruptive effect of the O2, these cavities dynamically fluctuate in size

and in their connections with neighboring cavities at certain favorable locations. The transport

of O2 seems to be guided much more by the protein’s random peristaltic motion than by simple

diffusion through a static complex medium [111]. For H2 on the other hand, the protein appears

much more porous and, at any given time, there are many more cavities and partial channels that

are accessible at any given time to H2 than to O2. Because, in our simulations, O2 and H2 have the

same mass, the differences in behavior between the two gases is not due to differences in diffusion

speeds but is solely caused by their different Lennard-Jones parameters.

The major problem encountered with TC-LES was insufficient sampling, despite the 1,000-

fold increase in sampling as compared to regular MD, especially since the effect of the single

protein trajectory on ligand diffusion appears to be of significant importance. Taken together,

the TC-LES simulations do in fact confirm the existence of a new gas transport pathway and the

calculated trajectories are both realistic and consistent. But for the case of O2, we had difficulty

reproducing the same pathways from one simulation run to another. While we observed several

very likely permeation events from the active site to the external solution, we could not determine

with certainty whether there exist other pathways through which O2 could exit on the simulated

timescale, using TC-LES alone. To obtain a picture of the pathway topology for H2 and O2 inside

CpI, we clearly need a better method. We believe that the maximum volumetric solvent accessibility

map (VSAM) methods described in the Experimental Procedures section provides an excellent tool

to acquire the needed information.

2.3.2 Predicting Hydrophobic Gas Accessibility from the Equilibrium

Dynamics of the Protein Alone

In the previous section, we saw that O2 molecules permeating inside CpI moved as if they were

trapped in small dynamic pockets of empty space. Almost every copy of O2 followed exactly

the same trajectory in a given TC-LES simulation, yet these trajectories varied widely from one

simulation to the next. Since TC-LES has a single copy of the protein interacting with many

replicated gas molecules, our results suggest that transient conformations of the protein have a

huge impact on the pathways taken by gas molecules diffusing inside it. In this section, we specify

19

Page 31: c 2007 by Jordi Cohen. All rights reserved.

and confirm this hypothesis by monitoring the transient cavities that are intrinsically present

inside CpI in the absence of gas. We map out, by means of VSAMs, which regions of the entire

protein would be accessible to a “ghost” solvent of a given radius, assuming that the solvent does

not interact with the protein: it can only “occupy” free space if such space is ever spontaneously

available in the protein. Surprisingly, we find that the set of possible trajectories, taken by both O2

and H2 gas (and very likely by any other spherically shaped hydrophobic ligand), can be predicted

remarkably well. These results complement other related investigations of the influence of protein

conformations or mutations on ligand diffusion inside the myoglobin distal pocket [27, 67], [NiFe]-

hydrogenase [25, 155], catalase [6], and cytochrome-c oxidase [139]. Our results differ from most of

these previous studies by the methodologies that we have used (TC-LES and VSAMs), as well as

through the, in many cases, significantly longer timescales and larger areas of the protein probed,

and the fact that we looked at the protein’s accessible volume in the absence of ligands.

We have calculated a static 3D map of the largest ghost solvent spheres that could be placed at

any given time inside CpI, based on a 2 ns equilibrium simulation, according to the VSAM protocol.

VSAMs calculated based on either the first or second ns of the computed equilibration trajectory

showed little variation and strong reproducibility, as opposed to the TC-LES trajectories. Fig. 2.4a

shows the isosurface contour representing the area accessible to a solvent of radius 1.35 A (which

characterizes the van der Waals radius of an H2 molecule) along with the TC-LES trajectories of hH2

diffusion. Visually, the isosurface accurately describes the regions of space that had been explored

by hH2 during the four 4 ns TC-LES simulations of hH2 inside CpI, even though the VSAM was

calculated based on a trajectory that did not contain any gas molecules. Almost every predicted

cavity throughout the CpI structure was observed to have been visited by H2, and almost all areas

visited by H2 corresponded to regions where cavities had been predicted, including areas away from

where the H2 diffusion originated as well as, surprisingly, internal cavities disconnected from the

surface. The same excellent match was observed for the case of O2 and the 1.6 A iso-value contour

of the same VSAM (Fig. 2.4b), though in our TC-LES simulations, the O2 molecules only had time

to explore the cavities directly adjacent to the H-cluster active site, so the comparison was only

performed there. A comparison of the area of the protein accessible to H2 and O2-sized particles

is shown in Fig. 2.4c, and one can see that both pathways A and B are clearly identified. There

20

Page 32: c 2007 by Jordi Cohen. All rights reserved.

were very few exceptions in which the TC-LES simulations did not match the cavity predictions:

(1) exactly at the binding site where the 1,000 copies were placed, no cavity was predicted there

(but gas was observed there during TC-LES because this was the starting position), (2) for the

case of O2 there was one single region of disagreement, which corresponds to a region occupied

by water during equilibration, but in which O2 managed to go during the TC-LES simulation (if

we consider the space occupied by water to be accessible to O2, then O2 was never observed in

any other unpredicted cavity), and (3) hH2 was occasionally observed in regions not predicted by

the 1.35 A contour (but this happened for less than 1% of all hH2 positions explored). Fig. 2.5

shows the cumulative occupancy of O2 and H2 based on the value of the underlying maximum

VSAM grid points in the region around the active site. The figure displays definite thresholds for

the occupancy of H2 and O2 as a function of predicted maximum radius of solvent, below which

no TC-LES simulated gas has been found to go. This shows that the maximum VSAM correctly

predicts the accessibility of both H2 and O2 (in the sense that gas does not enter regions not

predicted, the converse appears to be true according to visual inspection for the case of H2 and

cannot be proven for the case of O2 due to lack of sampling). Only 0.9% of the TC-LES hH2

molecules were found in hollow regions with maximum predicted radii below 1.35 A, and only 0.1%

of the O2 was found in regions with a radius of less than 1.6 A.

It is important to realize that the gas-transport pathways described above could not be identified

by simple analysis of static crystal structures. For the case of pathway A, a solvent-accessible

surface for a solvent the size of H2 (radius ∼1.35 A) can indeed be detected this way, though it

becomes disconnected if one includes equilibrated hydrogen atoms or uses larger probe molecules

(such as O2 with a radius ∼1.6 A). Pathway B, however, could not be detected for either H2 or O2

using this type of analysis. If one compares the iso-value contours of our maximum and average

VSAMs, for H2- and O2-sized solvent particles, one can understand the difference in dynamics

observed between O2 and hH2 diffusion during the TC-LES simulations. The VSAM containing

the average value of the solvent molecule does not exhibit any channels of sufficient size to allow

O2 to access the active site. Only very few cavities along the two main diffusion pathways are

large enough to accommodate an O2 molecule on average, but these do not form a continuous

channel from the external solution to the active site (see Fig. 2.6). Only for the case of H2 is one

21

Page 33: c 2007 by Jordi Cohen. All rights reserved.

Figure 2.4: Comparison of the surface delimiting the maximum VSAM predicted from the equilib-rium simulation of CpI in the absence of gas for a particle the size of (a) H2 and (b) O2, along withthe locations explored by the centers of the hH2 or O2 atoms, respectively, during the TC-LESsimulations. (c) A slice through the computed gas-accessible surfaces for O2 (light gray internalvolume) and H2 (dark gray internal volume), highlighting the differences between the two.

of the two major diffusion pathways at least partially visible in either the average or instantaneous

VSAMs (a continuous channel is in fact observed for a ∼1 A-radius probe, which is smaller than

H2). This is the “hydrogen channel” that was originally proposed from an analysis of the X-ray

crystal structure of CpI [126]. As suggested by our TC-LES simulations, it does appear that O2

moves from cavity to cavity as the cavities fluctuate into existence inside the protein, and there is

no permanent “channel” to speak of. For the case of H2, the same also holds, but H2 is sufficiently

smaller in size, as compared to O2, such that more open space is accessible to it at any given time.

The instantaneous H2-sized cavities connect in more places as well as more often, allowing for

easier diffusion. A quick analysis of the probabilities at which different parts of the pathways are

large enough to accommodate gas molecules reveal that, over the course of the 2 ns equilibration,

most regions of pathways A and B were open 5-8% of the time for O2- and 30-35% of the time for

H2-sized particles, and each pathway had one constricted region which was only open for about 2%

of the time for O2 and 20% of the time for H2, thus limiting the rate of exit of the gas.

22

Page 34: c 2007 by Jordi Cohen. All rights reserved.

Figure 2.5: Comparison of TC-LES- and VSAM-predicted gas pathways. Histogram of the valuesof the VSAM maximum predicted probe radius for each of the positions explored by both O2 andhH2 in the TC-LES simulations. The abscissae represent the interpolated value of a 0.5 A resolution3-D maximum VSAM grid at the location of the center of each O2 or H2 atom from the TC-LESsimulations, and the ordinates indicate the number of times (normalized and cumulative) that thesevalues have been explored by O2 or hH2.

Finally, we wish to comment on the approximations made in the maximum VSAM method.

We have shown that we can predict with excellent accuracy where both H2 and O2 gas can go in a

hydrogenase protein, based solely on an analysis of the space accessible inside the protein during an

equilibrium simulation in the absence of gas. This statement appears to imply that gas and protein

do not interact strongly. However, other studies of gas transport in protein cavities have suggested

that the presence of a gas can strongly influence the internal conformations of the protein near the

gas [18]. To test this suggestion, we have performed our VSAM analysis on the trajectories which

did contain O2 and hH2 and we clearly see that the proteins that contain gas exhibit significantly

larger cavities (not shown) compared to the gas-less trajectories. What the present chapter intends

to demonstrate is not that the gas diffuses as if it were a non-interacting ghost particle, but rather

that, though the gas can strongly bias the openness and shape of nearby cavities, it does not create

new cavities that would not spontaneously appear by themselves inside the protein. The presence

of gas does not create new diffusion pathways. The gas molecules merely insert themselves into

23

Page 35: c 2007 by Jordi Cohen. All rights reserved.

H-cluster

pathway A

pathway B

Figure 2.6: Average and maximum cavities for the case of O2-sized particles. The outer contourrepresents the average VSAM. A solid slice through the surface shows which areas of the averageVSAMs are accessible to the gas, including the two main diffusion pathways (dark gray), and whichareas are excluded (light gray) according to the maximum VSAM. The dotted circles indicatethe only two discernable O2-sized cavities in the average VSAM (as well as in the static crystalstructure).

packing defects that arise spontaneously with or without gas in the protein and then alter the

defects’ sizes and “open” probabilities. In this respect, the lone protein approximation is a very

good indicator of what areas of the protein are accessible to hydrophobic gases.

2.4 Conclusion

There has been a steadily increasing body of evidence suggesting that packing defects play a major

role in gas transport inside many proteins [6, 18, 22, 23, 27, 67]. Our results further confirm previous

indications that a permanent channel is not needed to allow gas from a protein’s exterior to reach a

buried active site. Transient cavities, arising from the protein’s natural equilibrium dynamic motion

at the nanosecond timescale, can define predetermined pathways for hydrophobic gas transport.

Such observations have been stated before, but we show for the first time that the location of the

24

Page 36: c 2007 by Jordi Cohen. All rights reserved.

pathways taken by diffusing hydrophobic gases (in this case, H2 and O2) can be fully described

on a protein-wide scale, by simply analyzing the motion of the protein in the absence of internally

diffusing gas, and that the presence of the gas is not absolutely needed to open or activate these

pathways. We do not expect this to be the case for polar ligands: strong protein-ligand electrostatic

interactions might make accessible pathways that would otherwise remain tightly shut during the

protein’s equilibrium motion [88].

Comparing all of our TC-LES simulations for a given gas molecule (O2 or H2), we notice that

even though we could not reproduce the same trajectories and gas exit rates from one multi-

nanosecond run to another, all our runs had in common the fact that they were exploring the same

maximum cavity predicted by our VSAM, which itself was reproducible with very good agreement

between runs at the nanosecond timescale. This observation highlights the important possibility

that all the necessary protein conformations that enable gas permeation across CpI can occur

at the nanosecond time scale. Though the essential dynamics needed for the understanding of

gas permeation in globular proteins (namely the formation of transient cavities) occur on a short

timescale, results obtained by simulating the diffusion of individual particles were never observed

to converge during that time scale. This is due to the fact that, if we ignore for now the effects

of the gas-protein interactions, gas diffusion, as probed by MD, relies on the temporal and spatial

coupling of three simple random processes, namely, the transient formation of cavities, the transient

formation of passages between these cavities, and the random hopping of gas molecules across these

passages. Combined together, these three effects give the appearance of a very complex gas diffusion

process that cannot be fully characterized using multi-nanosecond MD diffusion studies alone. Our

results demonstrate that the very slow diffusion of O2 and H2 inside CpI can be characterized by

sampling the dynamics of a protein on a much shorter timescale. We cannot exclude the effect of

rare protein conformations not sampled in a 1 ns run, on gas diffusion, but we can assume that their

effect is probably a very minor one, since in order to be effective, these rare protein conformations

must also coincide with the presence of O2 molecules at just the right place and time.

25

Page 37: c 2007 by Jordi Cohen. All rights reserved.

Chapter 3

Imaging the migration pathways forgas ligands inside myoglobin

Myoglobin (Mb) is perhaps the most studied protein, experimentally and theoretically. Despite

the wealth of known details regarding the gas migration processes inside Mb, there exists no fully

conclusive picture of these pathways. We address this deficiency by presenting a complete map of all

the gas migration pathways inside Mb for small gas ligands (O2, NO, CO and Xe). To accomplish

this, we introduce a computational approach for studying gas migration, which we call implicit

ligand sampling. Rather than simulating actual gas migration events, we infer the location of gas

migration pathways based on a free-energy perturbation approach applied to simulations of Mb’s

dynamical fluctuations at equilibrium in the absence of ligand. The method provides complete

3-D maps of the potential of mean force of gas ligand placement anywhere inside a protein-solvent

system. From such free energy maps, we identify every gas docking sites, the pathways between

these sites, to the heme and to the external solution. Our maps match previously known features

of these pathways in Mb, but also point to the existence of additional exits from the protein matrix

in regions that are not easily probed by experiment. We also compare the pathway maps of Mb

for different gas ligands and for different animal species. (This chapter is based on work published

in Cohen et al. [31].)

3.1 Introduction

Myoglobin (Mb), the first protein to be resolved at the atomic level [85], is a relatively small

(approximately 150 amino-acids) globular protein, found mainly in heart and skeletal muscles of

numerous animal species [22, 56, 57, 169]. Its active site, the heme, binds small gas ligands such

as molecular oxygen (O2), carbon monoxide (CO), nitric oxide (NO) and cyanide (CN−), making

this protein an important participant in the intra-cellular transport and storage of gases, particu-

26

Page 38: c 2007 by Jordi Cohen. All rights reserved.

larly of O2. In addition to facilitating the oxygen transport from the cell membrane to the cell’s

mitochondria, Mb is now believed to also play important roles in oxidative phosphorylation [169]

and in the scavenging of NO [54, 61, 68, 104].

The heme is buried inside Mb, which protects it from the aqueous environment, and thus is

not directly accessible to ligands in solution. Because gas ligands must find their way to the heme

by diffusing inside Mb’s protein matrix, Mb has long been a prime candidate for the study of

gas migration inside proteins. Numerous experiments have investigated this process inside Mb.

A popular experimental measurement is the timescale of the geminate recombination of the Mb

moiety with its gas ligand (O2, CO, NO), in which the ligand dissociates from the heme upon

flash-photolysis [10, 66], wanders inside the protein for tens to hundreds of nanoseconds, and then

rebinds to the heme [24, 41, 67, 115, 120, 133, 145, 146]. By measuring the timescale distribution

of the recombination process, and the rate at which the ligand escapes the molecule instead of

rebinding, one gains insight into the internal network of gas migration pathways inside Mb, and

into the size of the energy barriers along these pathways.

Early on, Mb was found to bind xenon gas (Xe) and structures of Mb in the presence of

bound Xe pointed to cavities between which small gas ligands could potentially hop [156]. Early

simulations of the gas migration process, although constrained to short timescales and distances

from the heme, nevertheless revealed the relevance of the Xe cavities, as well as the importance

of the protein’s motion in allowing gas ligands to migrate between them [27, 47]. In the last few

years, experiments and simulations have covered considerable new ground. Time-resolved X-ray

crystallography of photolyzed Mb-CO geminate complexes provided movies of the evolution of the

average CO distribution as a function of time after photolysis [20, 77, 143, 144, 164, 165]. Long-

timescale simulations (greater than 80 ns) of the migration of CO or NO inside Mb reproduced some

of these results [17, 18, 117], and shed more light on the locations of the ligand-accessible regions

inside the Mb, as well as on how these regions are connected. The general picture emerging from

experiment and simulation is that Mb has, in its interior, several regions (“cavities”) favorable

for gas molecules to reside. These regions are identified as Xe binding sites observed by X-ray

crystallography or as empty space in static X-ray structures. The gas ligands hop from one cavity

to another via an unspecified mechanism, but of which it is generally agreed that the protein’s

27

Page 39: c 2007 by Jordi Cohen. All rights reserved.

thermal fluctuations play a role [32, 52, 93]. The location and properties of the connections between

the internal Mb cavities as well as of the exit pathways from Mb have not been fully characterized.

In the present work, we address the migration of small gas ligands inside proteins, using Mb

in particular, from a protein-wide perspective. To accomplish this, we introduce a computational

method, which we call implicit ligand sampling, which computes the potential of mean force (PMF)

corresponding to the placement of a given small gas ligand such as O2, CO, etc., everywhere inside

a protein. The PMF that we calculate describes the Gibbs free energy cost of having a particle

located at a given position, integrated over all the other degrees of freedom of the system, except

for the ligand’s position, and is the quantity that indicates which areas of the protein are accessible

to the ligand and at what free energy cost. The implicit ligand sampling method for computing a

monoatomic or diatomic gas ligand’s PMF inside a protein (see Methods) relies on the fact that

gas ligands are small and interact weakly with the protein matrix [32]. Because of this, we can

analyze the protein dynamics in the absence of the ligand and treat the ligand’s presence as a weak

perturbation, and yet still produce accurate results. This approach may seem surprising, but the

absence of ligands in the simulation is in fact beneficial because the protein’s migration pathways

can now be sampled at every point in space simultaneously, thus generating much better statistics,

in most cases, than what would be obtained if one were to follow the trajectory of a single ligand.

When applied to Mb, implicit ligand sampling provides a complete 3-D map of the favorable

regions and migration pathways for a small gas ligand inside the protein. We devote the rest of

the chapter to describing these pathways. Our maps of the migration pathways inside Mb that

are located near the heme match prior experimental and computational evidence for O2 and CO

well. We also convincingly find that Mb has more than one exit to and from the heme binding site,

that the network of cavities may have an influence in tuning the different migration properties of

various gas ligands, and that general features of the Mb migration pathways are conserved across

species.

28

Page 40: c 2007 by Jordi Cohen. All rights reserved.

3.2 Methods

3.2.1 Implicit ligand sampling: theory

Here, we derive an expression for the implicit ligand PMF. The implicit ligand PMF corresponds

to the estimated free energy of placing a gas ligand anywhere inside a protein and its immediate

environment, calculated from an equilibrium simulation of the protein in the absence of the ligand.

In order to keep the derivation simple, we examine and discuss the case in which the ligand is a

point particle. For the general case, however, we must also take into account the ligand’s internal

degrees of freedom (e.g., for the case of a diatomic molecule, bond length and orientation). The

derivation of the general case is presented separately in the Appendix.

The PMF W(r) for the ligand, which, in our case, represents the Gibbs free energy cost of

placing the ligand at a specific position r, is directly related to the probability ρ(r) of finding the

ligand at that position, and is defined as [135]

W(r) = −kBT ln[ρ(r)ρo

], (3.1)

where ρo is an arbitrary normalization factor.

At constant temperature (T ) and pressure (P ), the probability density distribution of the ligand

ρ(r) can be expressed as:

ρ(r) =

∫dV

∫d3Np

∫d3Nq

∫d3p′

∫d3r′ e−β[H(p,q,p′,r′)+PV ] δ3(r′ − r)∫

dV

∫d3Np

∫d3Nq

∫d3p′

∫d3r′ e−β[H(p,q,p′,r′)+PV ]

, (3.2)

where∫

d3Np∫

d3Nq refers to the integration over all degrees of freedom of the protein reference

system (which includes the surrounding solvent), and where∫

d3p′∫

d3r′ is the integration over

the ligand’s degrees of freedom; H(p,q,p′, r′) is the Hamiltonian for the protein-ligand system, V

is the volume enclosing the system, and we define β = (kBT )−1.

When calculating the PMF of a ligand from a MD simulation, the probability density ρ(r) is

usually measured directly from a trajectory of the ligand motion, often with the help of sampling

enhancement techniques such as umbrella sampling [71, 135] and/or locally-enhanced sampling [47].

29

Page 41: c 2007 by Jordi Cohen. All rights reserved.

Because a lot of sampling is needed to get an accurate PMF, the ligand is often artificially con-

strained to a restricted region of space since the thorough exploration of an entire protein by a

ligand is not possible during the timescales currently accessible to MD simulations. Since we are

interested in characterizing the PMF for ligand diffusion everywhere inside a protein, and not just

along a restricted path, this causes a problem. We overcome this limitation by using an implicit

ligand: we treat the ligand as a small perturbation of the lone protein dynamics. A previous study

of gas migration inside CpI hydrogenase [32] demonstrated that the pathways taken by O2 and H2

inside CpI could be accurately predicted from the protein’s equilibrium dynamics in the absence

of the ligand. This suggests that the perturbation approach is sensible for case of gas ligands.

We will now derive the PMF of ligand migration by treating the effect of the ligand as a pertur-

bation to a reference ensemble of protein states which contains no ligand. Under the presence of a

ligand with no internal degrees of freedom, the Hamiltonian for the protein reference system (Ho)

will be shifted by an amount equal to the protein-ligand interaction energy ∆E(r) and the ligand’s

kinetic energy K(p′) (the latter will eventually cancel out and disappear from our formulations).

The full Hamiltonian can now be expressed in terms of that of the reference protein as:

H(p,q,p′, r′) = Ho(p,q) + ∆E(q, r′) + K(p′). (3.3)

Inserting the perturbed Hamiltonian (Eq. 3.3) into the expression for the ligand probability

density (Eq. 3.2), we get:

ρ(r) =

∫dV

∫d3Np

∫d3Nq e−β[Ho(p,q)+PV ] e−β∆E(q,r)

∫d3p′ e−βK(p′)∫

dV

∫d3Np

∫d3Nq e−β[Ho(p,q)+PV ]

∫d3r′ e−β∆E(q,r′)

∫d3p′ e−βK(p′)

. (3.4)

We now wish to express our result in terms of an isobaric-isothermal ensemble (NPT ) average

over all states of the protein reference ensemble. In the reference protein NPT ensemble, the

average of any general observable A(r) is defined as:

⟨A(r)

⟩NPT

=

∫dV

∫d3Np

∫d3Nq e−β[Ho(p,q)+PV ] A(p,q, r)∫

dV

∫d3Np

∫d3Nq e−β[Ho(p,q)+PV ]

. (3.5)

30

Page 42: c 2007 by Jordi Cohen. All rights reserved.

Then, using the definition for the isobaric isothermal ensemble average, the ligand probability

distribution (Eq. 3.2) becomes:

ρ(r) =

⟨e−β∆E(r)

⟩NPT⟨∫

d3r′ e−β∆E(r′)⟩

NPT

. (3.6)

The denominator in Eq. 3.6 is simply a constant, which we will now refer to as λ. Inserting the

ligand probability density (Eq. 3.6) into our definition for the PMF (Eq. 3.1), we obtain:

W(r) = −kBT ln

⟨e−β∆E(r)

⟩NPT

ρoλ

. (3.7)

For convenience, we impose that our PMF be zero when the ligand is in vacuum (defined as a

region for which ∆E(q, r) = 0 always holds). This condition will be satisfied by setting ρo = 1/λ:

W(r) = −kBT ln⟨e−β∆E(r)

⟩NPT

. (3.8)

When computing the PMF for a diatomic gas such as O2, CO or NO, we must also take into

account the internal degrees of freedom of the ligand, in addition to those of its center of mass.

In our analysis, we approximate the diatomic bond length to be fixed, and we are only interested

in accounting for the ligand’s orientational degrees of freedom (which we denote as Ω). For this

particular case, and following the more general derivation found in the Appendix, the expression

for the PMF becomes (see Eq. 3.18):

W(r) = −kBT ln

⟨∫

dΩ e−β∆E(r,Ω)⟩

NPT∫dΩ

, (3.9)

where∫dΩ is the integration of unity over all internal degrees of freedom.

This formulation is equivalent to that used in the 1-step free energy perturbation (FEP) method

(e.g., see [14, 89]). Traditionally, FEP techniques are used to determine the free energy difference

between two similar states of a system. In that case, it is common to use a series of artificial

intermediate states in order to increase the accuracy of the FEP method. In the present case, since

31

Page 43: c 2007 by Jordi Cohen. All rights reserved.

our perturbation is very small, the 1-step FEP method already provides good results. We take

advantage of this fact by calculating not just one free energy difference, but a huge number of such

differences spatially distributed over the entire protein. This is possible because all that is needed

in order to perform the calculation is a trajectory of the unperturbed protein reference state.

In principle, the analytical form for the implicit ligand PMF (Eq. 3.8) is exact because the

integration is performed over all possible states. In practice, the validity of the implicit ligand PMF

is not guaranteed when the thermal average 〈. . .〉NPT is replaced by a sum over a finite number of

states, such as is the case for MD or Monte Carlo simulations. In this case, only a restricted set

of states probable according to the reference energy function Ho(p,q) is actually sampled. The

states that are probable according to the protein-ligand energy function H(p,q, r), which is what

we require, may be undersampled or not sampled at all. If the perturbation introduced by the

ligand is small, the two distributions will have significant overlap (see Fig. 1), and the computation

of the implicit ligand PMF is possible by simply re-weighting the states of the protein reference

simulation according to Eq. 3.9. If the perturbation caused by the ligand is large, then the overlap

between the two distributions will be small and the protein states relevant for the protein-ligand

system may not be sampled in the reference simulation. As we will see, for the specific case of

small gas ligands, the perturbation can, in many cases, be considered to be small enough for the

implicit ligand analysis to work.

We now express the implicit ligand PMF (Eq. 3.9) as an average over a finite number M of

protein states taken from a simulation. If we use C different equally-probable orientations of the

ligand, the final PMF will be given by

W(r) = −kBT lnM∑

m=1

C∑k=1

e−β∆E(qm,r,Ωk)

MC. (3.10)

In order to gain a better understanding of the applicability of the implicit ligand sampling

method, we can estimate the maximum error on our free energy measurements. The PMF calculated

by means of Eq. 3.10, like most other free energy calculations, suffers from the fact that it can be

significantly influenced by rare events. The error caused by the undersampling of rare events may

be estimated by calculating the change in PMF that such an event would cause. To do this, we

assume conservative values for the frequency and ligand interaction energies of such events. For

32

Page 44: c 2007 by Jordi Cohen. All rights reserved.

Figure 3.1: Schematic diagram showing the probability of occurrence of all the protein states,during a simulation of the protein reference system (solid line), or those desired in order to geta proper PMF for the protein-ligand system (dashed line). The introduction of a ligand insidethe protein at a given position perturbs the energies of all the protein states from the referenceensemble, and consequently alters their probability of occurrence.

the frequency, we assume that if a maximally favorable event was not sampled in M states (from

the simulation), then such events will on average occur less than once every M + 1 states. We also

assume that for this state, the protein-ligand interaction energy will be an optimal value ∆Emin,

which is independent of the ligand’s internal degrees of freedom. In practice, we choose ∆Emin to be

location independent, and we compute it by measuring the average interaction between the ligand

and its environment during a simulation using an explicit copy of the ligand and its environment.

Neglecting the effect of allosteric and/or conformational changes whose timescales are greater than

those sampled, the maximum lower error due to undersampling (undersampling will always cause

the PMF to be overestimated), can thus be estimated as:

∆W−(r) = −kBT ln

M∑

m=1

C∑k=1

e−β∆E(qm,r,Ωk) + Ce−β∆Emin

(M + 1) C

−W(r). (3.11)

For large values of the number of independent samples M , this becomes:

∆W−(r) = −kBT ln

1 +eβ[W(r)−∆Emin]

M

. (3.12)

33

Page 45: c 2007 by Jordi Cohen. All rights reserved.

The error estimate provided by Eq. 3.12 can be used to test the suitability of the implicit ligand

analysis to various ligands. If a ligand interacts strongly with the protein(e.g., Cl−-protein interac-

tion has been measured to be as strong as ∆Emin = −150 kcal/mol in ClC chloride channels [34]),

then the error on Eq. 3.10 will be gigantic, and the method will fail. Similarly, if the ligand is not

very small (e.g., ATP, glycerol, etc.) then the measured PMF will be very large for all simulated

reference protein states as compared to ∆Emin, and the error on the PMF will again be huge.

For small gas ligands, we have estimated the values of ∆Emin by measuring the average energy

during short equilibrium simulations of explicit ligands in a water box. A uniform water box gives

excellent statistics and, from our observations and expectations, the very mobile water molecules

provide very favorable ligand interaction energies, which in turn will result in an error on the PMF

which can be used as a conservative estimate of that inside the protein. We measured the gas-water

average interaction energies by placing one copy of the explicit gas ligand in a 30A× 30A× 30A

water box. The gas-water system was then simulated under NPT conditions (300 K; 1 atm) for

500 ps, during which the gas-water interaction energies were measured every 1 ps. The interaction

energies were found to be converged for the last 450 ps of simulation, over which the interaction

energy was averaged. This procedure returned values of ∆Emin = -3.2, -3.7, -4.1 and -5.6 kcal/mol

for O2, CO, NO and Xe, respectively (with a standard deviation of 0.7–0.8 kcal/mol for all four

ligands, and a negligible error). For the case of O2, this would imply that the lower error on the

PMF due to undersampling using 5,000 independent snapshots would be less than 0.1 kcal/mol for

a measured PMF of -1, 0.5 for a PMF of 2 and 3.1 for a PMF of 5 kcal/mol, etc. On top of this,

we add an additional uncertainty due to the variation in sampling, estimated from the variation

in the PMF across different points in space for a 5 ns water box simulation, which we evaluated to

be ±0.2 kcal/mol for O2, ±0.3 kcal/mol for CO and NO, and ±0.8 kcal/mol for Xe (trends in the

energy profile over large regions of space can be identified with a much better accuracy than this

because the actual error at each point in space acts independently).

3.2.2 Implicit ligand sampling: computational implementation

In practice, we compute the PMF using Eq. 3.10 for each possible ligand location on a regularly-

spaced grid (with a spacing of 1 A), and for many different ligand orientations. The ligand in-

34

Page 46: c 2007 by Jordi Cohen. All rights reserved.

teraction energy ∆E(q, r) is computed using a Lennard-Jones potential, truncated at 12 A, using

the van der Waals parameters taken from the CHARMM22 force-field along with realistic bond

lengths (O2: εO = −0.12 kcal/mol, Rmin,O/2 = 1.7 A, lbond = 1.12 A; CO: εC = −0.11 kcal/mol,

Rmin,C/2 = 2.1 A, lbond = 1.1 A; NO: εN = −0.20 kcal/mol, Rmin,N/2 = 1.85 A, lbond = 1.15 A).

The inclusion of charges in the ligand was found to slow down the computation to intractable levels.

An implicit ligand sampling analysis was performed using both explicitly dipolar and uncharged

ligands for selected test cases and it was found that the effect of the electrostatic dipoles of NO,

CO and O2 is negligible. Quantum mechanics calculations have determined partial charges to be

less than 0.025e for all ligands studied in this study, and the solvation energy calculated using

the implicit ligand sampling with ligand partial charges of 0.025e varied by less 0.05 kBT from the

values in Table 3.2 for all cases. This also held true for the energies measured for a small number of

frames of the Mb dynamics using both dipolar and uncharged models of O2; the error introduced

by the dipole was typically lower than 0.05 kBT . Because of this, the maps published herein were

computed using completely neutral ligands. The ligands’ quadrupole moments were not accounted

for in this study.

The parameter set for Xe (εN = −0.494 kcal/mol, Rmin,N/2 = 2.24 A) was picked from many

choices in the literature, and provided good agreement with Xe solvation and Xe-Mb binding

energies. The actual values of the PMF measured for Xe are sensitive to the the particular choice

of Xe parameters, due to Xe’s large size; however, other Xe parameters lead to identical binding

site locations and exhibit the same general behavior, but the actual energies measured can differ

in magnitude (Xe parameters that use small radii tend to exhibit much smaller barriers between

binding sites).

Within each grid cube, we calculated the energies for 23 (8) equally spaced positions for diatomic

ligands (e.g., O2) and 33 (27) positions for monoatomic ligands (e.g., Xe), providing much better

statistics (i.e., a much narrower distribution of energies for the same averaged value) for each grid

cube. For the case of O2, 50 randomly-chosen orientations of the ligand were evaluated at each

location. Furthermore, this was repeated for 5,000 trajectory snapshots (sampled at each ps), as

we found that this amount of sampling provided a satisfactory accuracy. In order to speed up

the calculation, the interaction between atoms located further than 5.5 A apart was calculated only

35

Page 47: c 2007 by Jordi Cohen. All rights reserved.

once per grid cube, per trajectory snapshot, while the interaction energy below 5.5 A was calculated

for all 23 or 33 points inside each grid cube; this approximation was shown to amount to less than

a 0.05 kBT maximum error, while reducing the total computation time for each O2 PMF map to

practical levels. In the end, for each grid point, 50× 23 = 400 energy calculations were performed

per trajectory snapshot for the diatomic ligands and 27 for monoatomic Xe. The value at each grid

point then represents the PMF of having an O2 molecule located within a 1 A3 cube centered at

that point. The implicit ligand sampling algorithm is included and distributed as part of the open

source VMD 1.8.4 software package [78] (in VMD’s volmap command).

3.2.3 PMF for ligands with internal degrees of freedom (optional)

When calculating the implicit ligand PMF for the case of diatomic (or more complex) ligands, we

must also take into account the internal degrees of freedom of the ligand, such as its orientation,

bond length, etc. In the following derivation, we will treat these generalized degrees of freedom

separately from those of the rest of the protein-ligand system. In our notation, r will refer to the

ligand’s center of mass, p′ will refer to the ligand’s momentum degree of freedom, and Ω will denote

all of the ligand’s remaining generalized coordinates degrees of freedom (i.e., those in addition to

its center-of-mass degrees of freedom).

When including the ligand’s internal degrees of freedom, the expression for the ligand’s proba-

bility density (Eq. 3.2) becomes:

ρ(r) =

∫dV

∫d3Np

∫d3Nq

∫dp′

∫dΩ

∫d3r′ e−β[H(p,q,p′,r′,Ω)+PV ] δ3(r′ − r)∫

dV

∫d3Np

∫d3Nq

∫dp′

∫dΩ

∫d3r′ e−β[H(p,q,p′,r′,Ω)+PV ]

, (3.13)

When adding the ligand, the Hamiltonian for the protein reference system (Ho) will again be

shifted by an amount equal to the protein-ligand interaction energy ∆E(q, r,Ω) and kinetic energy

K(p′), but also by the ligand’s internal potential energy U(Ω):

H(p,q,p′, r′,Ω) = Ho(p,q) + ∆E(q, r′,Ω) + U(Ω) + K(p′). (3.14)

Inserting the perturbed Hamiltonian (Eq. 3.3) into the expression for the ligand probability

36

Page 48: c 2007 by Jordi Cohen. All rights reserved.

density (Eq. 3.2), we get:

ρ(r) =

∫dV

∫d3Np

∫d3Nq

∫dΩ e−β[Ho(p,q)+PV ] e−β[∆E(q,r,Ω)+U(Ω)]

∫dp′e−βK(p′)∫

dV

∫d3Np

∫d3Nq

∫dΩ

∫d3r′ e−β[Ho(p,q)+PV ] e−β[∆E(q,r,Ω)+U(Ω)]

∫dp′e−βK(p′)

.

(3.15)

Using the definition for the isobaric isothermal ensemble average (Eq. 3.5), the ligand probability

distribution becomes:

ρ(r) =

⟨∫dΩ e−β[∆E(r,Ω)+U(Ω)]

⟩NPT⟨∫

d3r′∫

dΩ e−β[∆E(r′,Ω)+U(Ω)]⟩

NPT

. (3.16)

We now insert our expression for the ligand probability density (Eq. 3.16) into the definition of

the PMF (Eq. 3.1) and, just as we did for Eq. 3.8, we also impose that our PMF be zero when the

ligand is in vacuum (defined when ∆E(q, r,Ω) = 0). We then obtain:

W(r) = −kBT ln

⟨∫

dΩ e−β[∆E(r,Ω)+U(Ω)]⟩

NPT⟨∫dΩ e−βU(Ω)

⟩NPT

. (3.17)

For the case of diatomic ligands, we have chosen to keep the bond lengths fixed, such that the

only internal degrees of freedom Ω remaining are those that specify the orientation of the ligand.

In this case, the ligand’s internal energy U(Ω) is a constant, such that all the terms that contain

it in Eq. 3.17 cancel out. The expression for the PMF (Eq. 3.17) then takes on the simplified form

used in our analysis:

W(r) = −kBT ln

⟨∫

dΩ e−β∆E(r,Ω)⟩

NPT∫dΩ

. (3.18)

3.2.4 MD protocol and parameters

The dynamic trajectories of the proteins were computed by all-atom molecular dynamics (MD)

simulations, using the CHARMM27 force-field [99], the NAMD molecular dynamics program [127]

and the NAMD-G job submission and automation software [69]. Each Mb structure was embedded

into a water box and the resulting 20,000-30,000 atom systems were simulated using periodic

37

Page 49: c 2007 by Jordi Cohen. All rights reserved.

boundary conditions. Particle Mesh Ewald with a grid resolution of better than 1 A was used for

long-range electrostatics, and all other non-bonded interactions were calculated using a cut-off of

12 A. All simulations were carried out at constant temperature of 300K and constant pressure of

1 atm. Temperature and pressure were controlled using Langevin dynamics with damping constant

of 5 ps−1 and a Nose-Hoover Langevin piston with period of 100 fs and decay rate of 50 fs. The

integration timesteps were 1 fs, 2 fs and 4 fs for bonded, non-bonded and long-range electrostatic

interactions, respectively. Every system was initially equilibrated for 1 ns, after which the MD run

was extended for 5 ns, with static snapshots taken every 1 ps for analysis. Displacements of the

whole structure during the simulations were discounted by using a best fit alignment on the Cα

atoms. The implicit ligand sampling analysis was then performed on these trajectories.

3.3 Results

In the following section, we investigate the properties of the gas migration pathways inside Mb,

based on the free energy profiles calculated from our implicit ligand sampling method (see Methods).

We show that the computed 3-D maps of the PMFs for various ligands in Mb, which we will refer

to as implicit ligand PMF maps, match known experimental facts wherever the comparison can be

performed. In addition, our method makes predictions that are difficult to measure experimentally,

such as the existence and precise locations of additional gas diffusion pathways inside Mb that are

situated away from the heme.

3.3.1 Xe binding sites

X-ray crystallization of Mb in the presence of high-pressure Xe gas has been used to locate ligand

docking pockets that potentially accommodate small ligands such as O2, CO or NO [96, 156]. For

the most part, the location of Xe binding sites match small static cavities that consist of empty

space in the Mb crystal structure. However, the correspondence between empty space and Xe bind-

ing sites is not precise, since an empty space search finds many cavities which aren’t Xe binding

sites, provides no criterion for deciding a priori which cavities lodge Xe, and in most cases does

not pinpoint a specific location for the trapped Xe. The existence of atomic structures of Mb with

and without bound Xe provides an ideal test of our PMF calculation method. Fig. 2 shows the

38

Page 50: c 2007 by Jordi Cohen. All rights reserved.

location of Xe binding sites in the sperm whale Mb D122N mutant (PDB accession code:1J52),

juxtaposed with the locations of minimum free energy computed from implicit ligand sampling on

a 5 ns equilibrium simulation of the D122N mutant without Xe (PDB accession code:2MBW). In all

cases, the experimentally measured locations of the Xe binding site have been successfully pin-

pointed to well within the 1 A resolution of the PMF maps, except for the case of Xe3 (within 2 A)

which corresponds to a location occupied during our simulation by two water molecules present in

the crystal structure, one of which is actually completely displaced by Xe in the crystal structure

under Xe pressure (the binding site was predicted nevertheless based on the fluctuations of the

water molecule positions). The free energies of Xe at the binding sites estimated by the implicit

ligand sampling method (using the Xe force-field parameters described in Methods) and of the ex-

perimentally measured Xe occupancies are shown in Table 3.1. The exact experimental values differ

from the computed ones by 0.5 to 1.3 kcal/mol, most probably due to our choice of Xe parameters;

the relative differences in PMF for the various binding site are nevertheless well reproduced.

Figure 3.2: Predicted and actual Xe binding sites for the sperm whale Mb D122N mutant (shown inribbon representation with the heme drawn as licorice). The predictions, shown as red iso-surfacesrepresenting the areas where the Xe PMF is lower than -4.9 kcal/mol (points on this surface have anerror of ±0.8 kcal/mol), are based on a 4 ns equilibrium simulation of a Xe-less Mb structure (PDBaccession 2MBW). The four experimental Xe locations, represented by labeled circles, are taken froma structure of the same protein under 7 atm Xe pressure (1J52) [96].

39

Page 51: c 2007 by Jordi Cohen. All rights reserved.

binding site theoretical Xe PMF experimental Xe PMFXe1 -6.4 -5.1

Xe2 -5.2 -4.5

Xe3 -5.1 -4.6

Xe4 -5.5 -4.4

Table 3.1: Predicted and experimentally measured free energies for the four Xe binding sites (aslabeled in Fig. 2) in the sperm whale Mb D122N mutant, in units of kcal/mol. The theoretical PMFcorresponds to the minimum PMF measured in the vicinity of the binding site and the experimentalPMF is calculated from the crystal Xe occupancy at the given experimental Xe pressure, using the

approximate formula PMFXe = −kBT ln

((Xe occupancy)/1A

3

PXe/kBT

), where PXe is the experimental Xe

pressure (7 atm), and the Xe occupancy is provided for each Xe binding site in the 1J52 PDBstructure.

Experimentally determined Xe binding sites are often used to infer the location of gas diffusion

pathways. As we will argue later, the validity of this strategy is limited because the behavior of Xe

in proteins is quite different from that of smaller gas molecules such as O2, NO and CO, but the

results of such an approach are still meaningful. Nevertheless, the prediction of Xe binding sites

provides a successful test case for our implicit ligand PMF calculations.

3.3.2 CO migration pathways

Implicit ligand PMF maps for CO inside sperm whale Mb were computed and are shown in Fig. 3.

The PMF maps clearly show CO-accessible cavities inside Mb, as well as their connectivity and

the height of the energy barriers between them. The four Xe binding sites and the distal pocket,

all arranged in a loop around the heme, can be clearly identified in the PMF map. Additional

cavities near the heme that have been identified as participating in the migration of CO around

the heme by simulation [17, 18], are also distinctly present in the PMF map. These results also

are in good visual agreement with a picosecond-resolution X-ray crystallography movie of the CO

migration [144, 164]. Furthermore, one can observe an energy minimum at the exact location (Xe1

cavity) of a crystallized CO in the L29W Mb mutant [142, 143].

In addition to the distal cavity and Xe binding sites, the PMF map for CO migration reveals

additional cavities and O2 pathways that lead outside of Mb (see Fig. 3), suggesting that the distal

40

Page 52: c 2007 by Jordi Cohen. All rights reserved.

Figure 3.3: Implicit ligand PMF for CO inside sperm whale Mb, based on a 5 ns equilibriumsimulation of the 1DUK PDB structure, shown from four views looking towards the heme (a-d). Thethree energy iso-surfaces represent PMF values of -1.5 kcal/mol (red), 1 kcal/mol (blue cavities),and 5 kcal/mol (green). The empty white space corresponds to regions of measured PMF above5 kcal/mol; the zero energy value corresponds to the ligand in vacuum. Practically speaking, thered surfaces show gas docking sites, the inner blue surfaces show the areas inside the protein thatare more favorable to CO than the external aqueous solution, and the green surfaces highlightthe regions of lowest energy barriers between the various cavities. The low energy barrier exitsaccording to the displayed PMF map are indicated by red lines and circles, and dashed indicatorsmean that the exits is in the back. The error on points lying on the three PMF isosurfaces are ±0.3,+0.3/-0.4 and +0.3/-3.6 kcal/mol for red, blue, and green, respectively. The Mb’s static surface isrepresented in white-inside-blue-outside color and the heme is displayed with its bound proximalhistidine.

41

Page 53: c 2007 by Jordi Cohen. All rights reserved.

pocket may not be the only entrance/exit for gas ligands. We find three obvious exit pathways for

CO (defined as low barrier CO pathways that reach the solvent but do not necessarily continue

into it) in the implicit ligand PMF map of sperm whale Mb: the short distal pathway (gated by

His64), and two separate sets of exits from Mb at the far end away from the heme. In addition, we

observe three additional minor exits with higher energy barriers, one of which is a direct connection

from Xe2 binding site to the exterior. Unfortunately, there is little direct supporting experimental

evidence, since the pathways far away from the heme cannot be seen using time-resolved X-ray

experiments monitoring the migration of gas ligands in Mb after their photolysis from the heme.

This is because the gas ligand’s average density becomes very diffuse by the time it reaches these

pathways after photolysis, and also because these extra pathways do not appear to contain strongly

attractive docking regions where a significant gas ligand density could be experimentally observed

(represented by the lack of red surfaces in the bottom of Fig. 3).

Geminate rebinding rates of the gas ligand inside Mb are usually interpreted using a four state

model in which the gas ligand can, in turn, be in the external solution, inside the Mb distal

pocket, inside a system of internal cavities, or bound to the heme’s iron center (e.g., see [29]).

Despite experimental evidence pointing to possible ligand escape to the external solution by two

separate pathways – directly from the distal pocket and through the secondary cavity network [29]

– ligand escape has often been interpreted as occurring solely through the distal pathway (gated

by His64) [143, 145]. This has resulted into the popular view that Mb has a network of cavities

surrounding the heme, separated from the exterior by a single pathway [57].

This view of Mb having only one exit located at the distal pocket, however, besides being at

odds with our PMF map which reveals multiple exit points between the external solution and the

interior cavities of Mb for gas ligands, is also at odds with other studies. A simulation of CO

escape in Mb has identified a number of alternate exit pathways [47], though an increased CO

kinetic energy caused by the methodology used in that study may have influenced this observation.

A simulation performed by Bossa et al. [17] also suggests that some of the large cavities inside

Mb can be temporarily directly accessible from the external solution. Huang and Boxer [75] have

experimentally tested the geminate recombination parameters of Mb against a huge library of

about 1,500 single amino acid Mbs mutants, revealing that many mutations far away from the

42

Page 54: c 2007 by Jordi Cohen. All rights reserved.

heme and Xe-binding sites resulted in altered ligand migration behavior, suggesting that there may

be multiple access routes for the ligand between the Mb exterior and the Mb internal cavities.

3.3.3 Correlation with point mutations affecting gas ligand migration

Performing random mutagenesis on sperm whale Mb, Huang and Boxer [75] found a number of

residues whose substitution by another amino acid led to a substantial change in the geminate

recombination rates of Mb and O2 or CO, after testing roughly half of all possible mutations. These

“important” residues are shown in Fig. 4, along with the proposed pathways for O2 calculated from

our implicit ligand PMF analysis. We have classified the residues that affect gas ligand transport

into four groups, depending on their placement with respect to our calculated maps. Most residues

identified experimentally were also attributed important roles according to our theoretical analysis.

The first group (yellow in Fig. 4) is comprised of amino acids that form the commonly known

distal pathway (Leu29, Phe33, Phe43, Phe46, His64, and Val68). The distal pathway is well known

from numerous studies of Mb, and our PMF maps also suggest that this pathway is the most

favorable and the shortest one for gas ligands to reach (or to escape from) the heme. The residues

forming the distal pockets are generally found to be very conserved in Mb, in addition to strongly

influencing the recombination kinetics. Indeed, these residues are responsible for coordinating the

ligand before and while it binds to the heme, and they are responsible for the binding affinities of

various ligands to the heme [97, 118, 122, 149]. The second group (red in Fig. 4) are residues that

line putative exits from Mb’s interior, as defined by our PMF maps (Arg45, Thr67, and Leu137).

Mutation of any of these residues will affect the ability of gas ligands to enter or exit the interior

cavity network of Mb. The third group (blue in Fig. 4) is composed of amino acids with a small

profile that line a constriction between two cavities and also of bulky amino acids that directly

block the passage between two nearby ligand-accessible regions (Trp14, His24, Gln26, Ile30, Leu61,

Leu69, Ile99, Ile107, Ser108, Phe138, and Tyr146). We expect that mutating such residues would,

in general, cause a measurable change on internal migration rates since the cavity network topology

would be affected. The fourth group of residues (green in Fig. 4) does not demonstrate a substantial

correlation between their location and the PMF map (Lys16, Ala19, Lys34, His36, Asp44, Lys56,

Ala71, Gln91, Ala144, and Lys145). All are found on the periphery of Mb, pointing towards the

43

Page 55: c 2007 by Jordi Cohen. All rights reserved.

Figure 3.4: Amino acids whose substitutions significantly affect O2 or CO migration propertiesduring geminate rebinding in Mb, as determined by Huang and Boxer [75]. The heme is drawnwith the attached proximal His93. Residues forming the commonly recognized distal pathway areshown in yellow. The amino acids that are found at the exits from the Mb interior, according tothe PMF maps, are shown in red. Small amino acids that line a constriction between cavities, andlarge amino acids which directly block passages between neighboring ligand-accessible areas, arecolored in blue. Residues that were shown to affect ligand migration properties, but do not have anyvisible influence on the gas pathway according to the PMF maps are colored in green (some of theseresidues do cap alpha-helices and may play a structural role). The location of the gas migrationpathways is drawn schematically, with light gray and dark gray, respectively, representing likelyand highly likely regions for the ligand. Thick dashed lines indicate the exits that go out of theplane of the figure towards the viewer; thin dashed lines correspond to the exits behind the plane.Red arrows have been added to indicate the exits from Mb, dashed arrows represent exits behindthe plane. All residues except those lining the distal pocket (yellow) are labeled.

44

Page 56: c 2007 by Jordi Cohen. All rights reserved.

external solvent, and while some of these residues appear to be structurally important (such as

charged surface residues), it is not clear from our results why and how the remaining residues would

affect geminate rebinding rates. It is possible that these residues have an indirect influence; for

example, their presence may be critical for Mb to fold properly.

As pointed out by Huang and Boxer [75], many of the important residues are found far from

both the heme and from the distal pathway, which suggests that CO or O2 may use other pathways

in addition to the distal one to enter and exit Mb. Our implicit ligand PMF maps exhibit additional

exit pathways which are fully compatible with Huang and Boxer’s assessment.

3.3.4 O2, NO and CO share similar pathways to and from the binding pocket

We performed the implicit ligand analysis for O2, NO, CO, and Xe. To check that our ligand

parameters could reproduce real-world properties, and thus provide valid conclusions, we first used

the implicit ligand method to measure the ligands’ solvation energies. We accomplished this by

performing the implicit ligand analysis, using O2, NO, CO and Xe, on a 5 ns simulated trajectory

of a box of water. The PMF at each gridpoint in the entire water box was then properly averaged

(i.e., the ligand PMF was converted to and from its associated ligand occupancy probability, which

was the quantity used for the averaging) in order to compute a single free energy of solvation for

each ligand in water. Our calculated solvation energies were compared to experimental ones, and

the results are listed in Table 3.2. While the calculated energies are all slightly larger than the

experimental ones (by 5–30%), the relative differences between the ligands follow the correct trends

and are all respectably close to experiment.

We then computed implicit ligand PMF maps for O2, NO, CO and Xe inside sperm whale Mb

using the same equilibrium simulation of Mb for each analysis. Any observed variation between the

different maps is thus caused solely by the intrinsic properties of the different ligands (which differ

here only by their van der Waals parameters), and not by statistical variations since the protein

trajectory is identical for each ligand. Generally speaking, the PMF maps for all the ligands have

very similar cavity and pathway locations, but different absolute energy values.

Fig. 5a-c shows the PMF values at those points on our maps that lie on paths that were

computed to minimize the height of the energy barriers for O2, NO, CO and Xe between the heme

45

Page 57: c 2007 by Jordi Cohen. All rights reserved.

ligand ∆Gexp ∆Gtheo

Xe 1.04 1.25± 0.04

NO 1.53 1.60± 0.01

O2 1.78 1.97± 0.02

CO 1.94 2.54± 0.02

Table 3.2: Comparison of the free energies of solvation in units of kcal/mol for different gasmolecules measured from experiment and from the implicit ligand PMF analysis. The experi-mental values of the solvation energy at 20C are taken from those compiled in Scharlin et al. [140].Theoretical values are obtained by properly averaging the ligand PMF calculated for a 5 ns simu-lation (5,000 frames) of a 40×40×40 A3 water box at 300K and 1 atm. Quoted errors, which aresmall because of the huge amount of sampling, represent the statistical variance on the calculatedPMF, and do not account for the choice of the force-field parameters for the water and ligands.

binding site to three most likely exits identified by our maps. The actual paths taken through Mb

are displayed in Fig. 5d. It must be noted that the PMF values that we quote are the PMFs of

having a gas molecule present in a cubic box of 1 A side length, centered at the grid point where the

PMF is measured. The detailed PMF along a path, which is what we show, is defined differently

than the PMF of “being in a specify cavity” or of “being in the solvent”, since in the latter case,

the probability of being at every grid point within the specified cavity or in the solvent must be

summed and depends on the total size of the given cavity or of the accessible solvent.

For the case of O2, the energy barrier to enter Mb is very low – only a few kBT above the

computed solvation energy of O2. Not surprisingly, of all the ligands we have investigated, O2 has

the smallest energy difference between its highest barriers and most attractive cavities. We evaluate

the Gibbs free energy difference between the distal pocket’s most attractive region and the lowest

barrier to be crossed for O2 to exit through the distal pathway to be about 6 kcal/mol. This result

matches theoretical and experimental measurements of the same barrier energy of 6.4 kcal/mol [90]

and 7.5 kcal/mol [29], respectively. This implies that O2 is the ligand that can enter, exit and move

around Mb with the least hindrance of all gases studied, as would suggest Mb’s role in storing and

transporting O2.

As compared to O2, NO exhibits a stronger attraction to the Mb cavities by roughly 1 kcal/mol

(i.e., all else being equal, NO is about 7 times more likely than O2 to be in a given Mb cavity,

46

Page 58: c 2007 by Jordi Cohen. All rights reserved.

Figure 3.5: PMF profiles experienced by ligands exiting Mb along (a) the distal pathway and (b,c)the two other most favorable exit paths between the heme binding site and the external solution.The path profiles were determined by finding the path, between two pre-defined end-points (one atthe heme binding site, and one near an Mb exit), that exhibits the smallest energy barriers. Thevalues of the PMF at each point along these paths are then plotted as a function of the ligands’distance from the heme binding site. The procedure is repeated for O2, NO, CO and Xe ligands,using the same end-points for a given exit. The solvation energies of each ligand in water, as givenin Table 3.2, are represented as horizontal dashed lines, and the location of the distal pocket (DP)and Xe binding site Xe4 are indicated. (d) The actual points along the three paths in relation tothe Mb PMF maps are plotted in green (for the distal path a), red (path b) and yellow (path c).

47

Page 59: c 2007 by Jordi Cohen. All rights reserved.

such as the distal pocket), however the absolute height of the largest energy barriers between these

cavities as well as to the external solution is at roughly the same level as for O2, which translates

into higher relative barriers due to NO’s lower solvation energy. Our results suggest that sperm

whale Mb would keep NO trapped in its internal cavities, which surround the heme, longer than it

keeps O2. These results are relevant because NO is known to harmfully deactivate cytochrome-c

oxidase and recent studies suggest that oxy-Mb plays a role in scavenging stray NO from the cell,

which it then deactivates by reaction with its bound O2 ligand to produce nitrate (NO−3 ) [54]. It

has been suggested [21] that the cavities in Mb could act as hosting stations for NO, and act to

increase its chance of collision with heme-bound O2 by keeping it inside of Mb longer. This latter

hypothesis is well supported by our results.

In our modeling, of all the diatomic gas ligands, CO interacts the least favorably with Mb. CO

is less attracted to Mb’s cavities than O2 by about 0.5–1 kcal/mol. CO also experiences significantly

higher energy barriers (by roughly 3–5 kcal/mol as compared to O2) between internal cavities as

well as to the external solution. CO is toxic for Mb as well as for other proteins which are at

the receiving end of Mb’s O2 transport queue, such as respiratory cytochromes and cytochrome

oxidase. It appears that Mb is protected from CO by high energy barriers, which would reduce

the rate of CO intake (versus O2 intake), when Mb finds itself at the high concentration end of

the intracellular O2 and CO gradients. Our PMF profiles indicate that whereas the exit through

the distal pathway appears to be the most favorable one for O2 and NO, the variation in absolute

energy barriers between the different exits is less pronounced for CO (this conclusion comes with

the caveat that the error on large values of the PMF can be important, thus affecting the barriers

that we measured for CO). In any case, the increased availability of multiple exits from Mb for

CO as compared to O2 may have a functional role. Notably, the existence of multiple exits lends

support for the hypothesis by Radding and Phillips [130] that Mb protects itself from CO poisoning

through a kinetic proof-reading mechanism by preferentially allowing proportionally more CO than

O2 to exit Mb from the heme through the cavity network, thereby ensuring that only 4–7% (with

a relaxation time of about 180 ns) of photolyzed CO rebinds to the heme, as opposed to 27–42%

(with a relaxation time of about 55 ns) for O2.

We can compare the PMF profiles of Fig. 5 to the various experimental rates and estimates

48

Page 60: c 2007 by Jordi Cohen. All rights reserved.

of equilibrium constants and energy barriers for ligand migration in sperm whale Mb. Despite

the variations in methodology and results between studies, our results are generally consistent

with other measurements. Olson [119] estimate that the escape barrier height for CO migration

between the distal pocket and the solvent to be about 4 kcal/mol. Our analysis estimates a barrier

of 7.5 kcal/mol (with an error of +0.4/-3.6 kcal/mol), which meets the experimental value at the

bottom of our error. We expect our high barriers to always be overestimated and believe this to

be the case here. Rohlfs et al. [133] have estimated indirect rates for the solvent to distal pocket

migration (hereby referred to as kX→B) and solvent to distal pocket equilibrium constant (KX→B)

for O2, CO and NO. The experimental estimates for the equilibrium constants are 0.72± 0.25,

0.22± 0.12, and 0.07M−1 for NO, O2, and CO respectively (the CO value being a very rough

estimate with no associated error). The ordering of these occupation probabilities and the reduction

by a factor of three as one goes from NO to O2 to CO matches the sequential reductions in the PMF

by roughly 1 kBT PMF in going from NO to O2 to CO, as seen in the distal pocket (see Fig. 5b,c).

For the kX→B ligand entrance rates, we expect CO to enter Mb at much slower rates than O2 and

NO. Rohlfs et al. estimate all three rates to be nearly identical, the CO rate however having an

error of over ± 300%. We note here that the experimental results are derivative quantities and

thus the errors are large, making it hard to conclude that the agreement is definitive. In theory,

it would be possible, through computation, to estimate theoretical effective transport rates for the

ligand migration, based on our PMF maps (as opposed to qualitatively inferring trends from the

energy profiles).

Banushkina and Meuwly [11] measure a barrier of 7.8 kcal/mol from Xe4 to the distal pocket

for the CO migration in wild-type sperm whale myoglobin (and 4.3 kcal/mol for the reverse migra-

tion) using umbrella sampling. We estimate these same barriers to be about 4.5 and 3.5 kcal/mol

respectively. Bossa et al. [18] measure a symmetrical PMF barrier of about 2.6 kcal/mol from Xe4

to the distal pocket for CO, inferred from a long simulation (in essence, umbrella sampling with

a flat umbrella potential) of which about 3 ns is spent by CO at the barrier. All three method-

ologies are different and have different strengths, and for this specific case, we lean towards the

values provided by Bossa and ourselves as providing the more accurate theoretical results. The

implicit ligand sampling analysis is based on a larger amount of independent samples obtained at

49

Page 61: c 2007 by Jordi Cohen. All rights reserved.

every point in space (e.g., 5,000 ps × 400 conformers per point in space, a tenth of which can be

considered independent), as compared to the other methods which use a relatively low number of

independent samples per coordinate point at the barrier (e.g., 50–100 ps × 1 conformer per reaction

coordinate increment for the umbrella sampling), especially given that the sampling is spread over

many values of the reaction coordinate. On the other hand, in implicit ligand sampling, there is

little guarantee that large energy barriers will be sampled accurately due to the lacking influence

of the ligand, and this results in overestimated energy barriers. However, when a properly con-

ducted umbrella sampling analysis is compared to an implicit ligand sampling analysis and the

latter yields a lower final free energy, then the ligand sampling is almost certainly more correct

given the much larger number of independent conformations sampled per point is space versus an

umbrella sampling approach. When the implicit ligand approach yields a higher free energy (with

a large error), then it is possible that it did not sample the right protein conformations, and the

umbrella sampling may be more representative, as could be the case for the Bossa et al. results.

One must be aware, however that both methods do not measure the same quantity. The implicit

ligand sampling measures the PMF at every point, whereas umbrella sampling measures the PMF

of an area of space delimited by the area explored by the ligand during the simulation, projected

onto a pre-defined reaction coordinate.

While Xe has no relevant biological function, it is frequently used in X-ray crystallography

as a probe to identify the locations of cavities which may be involved in gas ligand migration.

Furthermore, it has been observed in mammalian Mbs, that the amino acids forming the Xe

binding sites are much more conserved than other amino acids [57]. For this reason, PMF profiles

for Xe are relevant because they provide an interpretation for Mb structures obtained under high

Xe pressure conditions. Since Xe interacts strongly with Mb and is also very large, its behavior

differs from that of small diatomic gases. In our PMF profiles, this translates simply into lower

binding energies for Xe in the Mb cavities sites and higher barriers between these cavities as well

as to the external solution, as compared to small diatomic gases. Very important, however, is the

observation that the location of Xe binding sites correlates very well with the regions of the protein

that are most attractive to O2, NO, and CO. In this respect, the Xe binding sites observed in X-ray

crystals do, in fact, truly indicate docking regions for diatomic gas molecules. Gas ligands do not,

50

Page 62: c 2007 by Jordi Cohen. All rights reserved.

however, solely diffuse in proteins by means of cavities accessible to Xe, and the presence of such

cavities does not automatically imply that diatomic gases must transit through them, nor does

their absence indicate that a favorable pathway for gas ligands does not exist. Xe cavities merely

indicate the regions in which there is a high probability of finding gas molecules, and more often

than not, these cavities will reside along the pathways taken by gas ligands to reach the heme.

Xe’s large size and strong interaction with the protein imply that, of all the ligands that we

have examined, the Xe PMF is the least accurate. However, the excellent match between predicted

and observed Xe binding sites for Mb (see Fig. 2 and Table 3.1) gives legitimacy to our Xe PMF

curves. It must be noted, though, that while the fact that we observe large energy barriers for Xe in

Fig. 5a-c is to be believed, the actual maximum height of these barriers is inevitably overestimated

by a significant amount in our calculations (for reasons detailed in the Methods section).

We have seen that the PMF profiles of various ligands inside Mb are in qualitative agreement

with Mb’s function. It remains to be seen whether this agreement is coincidental, or whether Mb’s

structure and dynamics are finely tuned by evolution to provide ideal energy profiles for different

ligands. A full study on the general properties of O2, NO, CO, and Xe migration in many different

proteins needs to be performed before this question can be accurately resolved.

3.3.5 Gas ligand pathways across species

The atomic structure of Mb has been solved for different animal species, and in order to compare

these, we have computed implicit ligand PMF maps for sperm whale (PDB accession codes 1DUK),

pig (1MWD), horse (1AZI), Asian elephant (1EMY), yellowfin tuna (1MYT), and sea hare (1MBA) based

on 4.6–5.0 ns equilibrium simulations of the above systems. The implicit O2 PMF maps for sperm

whale, pig, tuna, and sea hare Mbs are compared in Fig. 6.

The similarities between our calculated implicit ligand PMF maps for the various Mbs reflects

the evolutionary distance between species. Fig. 6a highlights the strong similarities between the

location of the O2 migration pathways inside pig and sperm whale Mbs. The implicit ligand PMF

maps for horse and the Asian elephant (not shown) demonstrate the same degree of resemblance to

sperm whale Mb as is exhibited by pig Mb. As the evolutionary distance between species increases,

the migration pathways look more and more different, as we show for the cases of yellowfin tuna

51

Page 63: c 2007 by Jordi Cohen. All rights reserved.

Figure 3.6: Comparison of the implicit ligand PMF maps in Mbs of different species. The implicitligand PMF map of O2 for (a) pig, (b) yellowfin tuna, and (c) sea hare Mbs (red) are compared withthat for the sperm whale Mb (blue). The iso-surfaces are drawn using a PMF value of 1.8 kcal/mol(points on these contours have an error of +0.2/-0.4 kcal/mol). The sperm whale Mb’s heme withthe connected proximal histidine is shown along with the protein’s external surface (black). Thisfigure was created by Anton Arkhipov.

(fish) Mb (see Fig. 6b) and sea hare (mollusk) Mb (see Fig. 6c), the latter being the least similar

to whale Mb in terms of migration pathways.

Despite the obvious differences, the O2 PMF maps for the Mb of the various species share some

common features. First, all three Mbs shown in Fig. 6 appear to be quite “open” to O2, in that

they all display many regions in their interior that are favorable to O2. This contrasts with what

is seen in the example of CpI hydrogenase, which only allows O2 in a very limited region of its

interior [32]. Second, all three PMF maps feature a pronounced distal cavity (to the right of the

heme in Fig. 6) which is connected to the Mb exterior by a short pathway (out of the page towards

the reader in the figure). In all three cases, the Xe binding sites of sperm whale Mb correspond

to favorable cavities (the residue lining the Xe binding sites and the distal cavity are, in fact,

more conserved than other residues across mammalian species [57]). Finally, all three Mbs exhibit

potential exits from the binding pocket other than through the distal pathway which suggests that

gas ligands can enter and leave Mb’s interior in many ways for all Mbs.

3.4 Discussion

We have described and applied a method to compute the PMF (which is related to the probabil-

ity of occupation) for the passive migration of small gas ligands inside Mb using a perturbative

52

Page 64: c 2007 by Jordi Cohen. All rights reserved.

framework. Our results are important for two reasons. First, they provide a complete and direct

determination of all the gas pathways in Mb. This complete picture of gas pathways can be used to

determine which residues are involved in gas transport without resorting to per-residue mutations.

They also provide a clear interpretation of experimental geminate recombination results, which

otherwise involve guesswork and/or numerous years of careful follow-up experiments in order to

be understood correctly. The fact that our observations are direct and detailed means that they

have strong predictive power over the effect of residue mutations as well as over the locations of

gas pathways and Xe binding sites in any other protein of known structure, irrespective of whether

that protein is suitable to be studied by traditional experimental methods such as the monitor-

ing of gas migration events after flash-photolysis. Secondly, they demonstrate unequivocally that

short-timescale random thermal motion of the protein matrix and its environment, alone define

reproducible and well-defined gas transport pathways inside proteins. In our model, the protein’s

thermal fluctuations are calculated explicitly without resorting to any assumption besides those

inherent in the CHARMM molecular dynamics force-field, which was parametrized to empirically

reproduce short timescale thermal fluctuations, and thus is particularly valid for the present appli-

cation.

The implicit ligand sampling method produces results that have very low errors when the PMF

values are low (high-probability regions), and large errors when the PMF is very large (inaccessible

regions), making it very suitable for the detection of gas migration pathways inside proteins, and to

a lesser but still significant extent, for the measurement of all free energy barrier heights along these

pathways. The approach works because gas ligands, being small and apolar, interact very weakly

with the protein, and thus, do not promote significant conformational changes in the protein.

Because of this, there is a strong overlap between the distribution of protein states in the lone

protein and protein with ligand ensemble, and the former can thus be used to calculate properties

of the latter. Although there is always an amount of uncertainty arising from molecular dynamics

simulation, due to short timescale sampling and empirical force-field models, we believe that our

specific analysis presents a convincing case despite these caveats.

On the biological side, our results have important ramifications regarding the general mechanism

by which gas ligands are transported inside the protein matrix. Numerous hypotheses have been

53

Page 65: c 2007 by Jordi Cohen. All rights reserved.

brought forth over the years to describe gas transport inside proteins. The first studies assumed

that gas ligand diffused through small permanent channels [156]. Other studies have suggested

that gas ligands enter proteins directly, as if they were simply a more viscous medium [26]. The

currently emerging view for many proteins is that, rather than diffusing along permanent channels,

gas ligands can migrate through bulky regions of the protein, guided by the proteins’ internal

thermal motion [17, 23, 27, 32, 47]. Our results suggest that this is the case, and furthermore

that the pathways taken by the gas ligands are not randomly distributed in the protein, but that

they are, in fact, located in well defined regions that can be identified by examining the protein’s

thermal fluctuations. The simple fact that we detected pathways that match known data implies

that a small ligand can diffuse in and out of Mb solely due to protein’s thermal fluctuations at the

nanosecond time scale, even though the timescale of the total ligand migration can be much longer.

Cavities inside the protein matrix, such as Mb’s xenon binding sites, appear to play a prominent

role in accommodating gas ligands inside Mb. Interestingly, such cavities are sometimes barely

present in other proteins that still exhibit thermally-defined gas ligand pathways that stretch over

long distances, such as in CpI hydrogenase [32]. Cavities would appear to create favorable docking

sites for the ligand, but are not necessary to account for the ligand’s mobility as it migrates inside

the protein matrix. Our analysis suggests that cavities could perhaps also play a role in the gas

ligand selectivity of the protein.

Finally, we wish to mention other systems, besides Mb, where the study of the migration of

small gas ligands inside the protein matrix is important. Oxygen sensitivity is a highly relevant

issue for hydrogenases, enzymes that produce or breakdown hydrogen gas. Their sulfur-metal active

sites can usually also bind O2. Recent developments aim at harnessing the hydrogen-producing

power of hydrogenases for biotechnological purposes, but for this to be practical, the sensitivity of

hydrogenases to O2 must be repressed. Buhrke et al. [25] have found that the [NiFe]-hydrogenase of

Ralstonia eutropha H16, which is usually resistant to O2, can be made sensitive to O2 by a mutation

of residues located along a putative channel leading to the active site. This study suggests that

the protein matrix of this hydrogenase may play an important role in regulating the access of its

active site to O2 (along with the O2 sensitivity being regulated by its affinity to the active site

and its environment). Another example involves O2 migration inside cytochrome c oxidase from

54

Page 66: c 2007 by Jordi Cohen. All rights reserved.

R. sphaeroides. It was shown [139] that a single point-mutation inside the protein is enough to

block O2 access. There are many examples of proteins which use small gas ligands as a substrate

or ligand and in many cases, the gas ligand must reach a buried region of the protein. The above

examples demonstrate the relevance of studying gas ligand migration inside proteins and underscore

the importance of being able to identify gas migration pathways that are not readily visible in the

protein’s static structure.

55

Page 67: c 2007 by Jordi Cohen. All rights reserved.

Chapter 4

Effects of protein architecture andsequence on gas migration pathways.

While networks of O2 pathways have already been characterized for a small number of proteins,

the general properties and locations of these pathways have not been compared across different

proteins. In this study, maps for the O2 pathways inside twelve different monomeric globins have

been computed. It is found, despite the conserved tertiary structure fold of the studied globins,

that the shape and topology of the O2 pathway networks exhibit a surprisingly large variability

between different globins, except when two globins are very closely related. The locations of the

O2 pathways are, however, found to be correlated with a protein’s local residue composition, and

the same correlation is observed for two independent sets of protein families: monomeric globins

and copper-containing amine oxidases. These results have implications for protein-engineering

applications involving modifications of gas pathways in proteins. (This chapter is based on work

published in Cohen and Schulten [35].)

4.1 Introduction

For many classes of proteins, enzymatic reaction with, or binding to, gas molecules is an essential

component of their function. Such proteins often bind gas molecules by means of buried active sites

consisting of metal ions or metal-containing compounds. Gas molecules such as O2 must make their

way across the protein’s interior to reach these active sites. In the majority of cases, permanent gas

channels can neither be detected nor are found present in the protein’s static structure; instead,

the migrating gas molecules take advantage of transient cavities that occur inside the protein due

to thermal fluctuations [27]. By monitoring the occurrence of these transient cavities as they

occur over time, recent methodological advances such as volumetric gas accessibility maps [32] and

implicit ligand sampling [31] have made it possible to comprehensively map and describe networks

56

Page 68: c 2007 by Jordi Cohen. All rights reserved.

of gas migration pathways inside proteins.

For a large number of proteins that interact with O2, finding the location of O2 migration

pathways has important implications. For example, locating the O2 pathways in oxygenases and

O2-consuming oxidases provides important clues regarding their enzymatic activity and operating

mechanism. Also, the elucidation of the O2 pathways in hydrogenases is helping current efforts

aiming to block O2 access to the hydrogenase active sites, which would, in turn, make these

proteins useful for commercial hydrogen gas production [15, 65, 103]. To this date, complete

maps of O2 pathway networks have only been computed for a small number of proteins, including

CpI hydrogenase [32], sperm-whale myoglobin (Mb) [31], and AQP1 aquaporin [167]. As more

proteins are visualized in terms of their O2 migration pathways, one will gain a better grasp of how

gases are transported inside proteins, discern patterns of how such pathways are conserved across

protein families, and develop rules of thumb for quickly identifying these pathways. In this chapter,

we address the question of O2 pathway conservancy within a given protein fold by computing and

comparing maps of the O2 pathway networks across a range of proteins from the globin superfamily.

Globins are a large and ancient family of proteins for which all members, with few exception,

share an exceptionally well-conserved tertiary structure: the globin fold. At the heart of this

fold lies a prosthetic group – the heme – which is universally used by globins to reversibly bind

to, and temporarily hold, O2 and other gas ligands. The present investigation focuses on three

globin subgroups: monomeric hemoglobins (Hbs), which transport O2 throughout entire organisms,

Mbs, which store and transport O2 within muscle cells, and leghemoglobins (Lbs), which store,

transport, and/or scavenge O2 to maintain a population of symbiotic bacteroids in the root nodules

of symbiotic plants [7].

In addition to binding O2, many globins have other physiological functions. For example, while

the role of Mb has been long-considered to be well-established [22, 56, 57, 169], recent studies

are showing that Mb is also involved in secondary roles [60, 169] such as the scavenging and

inactivation of nitric oxide (NO) [54] and a weak peroxidase activity [166]. Many invertebrate Hbs

also possess a number of interesting characteristics, such as the ability to react with sulfide [168],

and the ability to tune their affinities to O2 depending on their environment. By forming multimeric

assemblies [122, 137, 168], many Hb monomers can bind gases such as O2 and CO2 cooperatively

57

Page 69: c 2007 by Jordi Cohen. All rights reserved.

by making their affinity for O2 depend on the O2-occupancy of the neighboring monomers, either

through quaternary conformational changes and/or through multimeric association/dissociation.

It is clear that despite the well-studied nature of the globin family, there remains a large number

of globin properties relating to their structure that are still poorly understood, and in many cases,

not even known.

In this chapter, we focus on the O2-transporting role of the globin protein matrix. A globin’s

main function is performed by its heme which binds gas ligands for extended periods of time.

One could consequently regard the conserved protein fold surrounding the heme as merely a shell.

Nevertheless, this “shell” provides important functionality. First, the protein shell protects the

heme from oxidizing into an inactive ferric state, which would happen if the heme were to float

freely in solution. Second, the protein component strongly modulates the environment of the heme-

bound ligand and thus influence its binding affinity, its binding rates, and the globin’s relative

selectivity for various ligands. Finally, the protein matrix provides cavities and pathways for gas

ligands to travel from the exterior solution to the heme and vice versa. Since all globins have an

identical heme, the protein shell is what determines a globin’s specific role and properties. When

one considers the large variety of globin behaviors and properties, the importance of the protein

component becomes obvious.

A large body of work, both experimental and theoretical, has focused on finding gas ligand

pathways inside globins, particularly sperm whale Mb and certain Hbs. The main tools for such

studies are x-ray crystallography in the presence of xenon [156], x-ray crystallography of inter-

mediate states [24], time-resolved x-ray crystallography [20, 143, 144, 164, 165], spectroscopy of

the geminate recombination process [10, 41, 67, 115, 120, 133, 145, 146], and molecular dynamics

simulation [18, 28, 47, 77, 117]. For the experimental work in particular, the effects of many point

mutations on O2 transport rates inside globins has been investigated in detail. In almost all cases,

however, only overall rates of O2 association/dissociation are accessible, and, only occasionally,

pathways are mapped in very localized and restricted regions of the protein. Here, we inspect

the entire set of pathways inside a broad set of monomeric penta-coordinated globins for which

structural data is available.

58

Page 70: c 2007 by Jordi Cohen. All rights reserved.

4.2 Methods

Simulations were run for a selected set of monomeric globins of known structure, taken from the

Protein Databank (PDB). In every case, a deoxy- form of the globin was created from the PDB

coordinates, and any water or gas ligand present in the DP was removed. For the cases in which

only the ferric state of the globin is available (whether unbound or bound to small compounds),

the coordinates of the ferric heme were used as starting points, but the hemes were modeled using

parameters for the ferrous state.

For every globin, the simulation system was built from the PDB coordinates by adding hydrogen

atoms, binding the globin’s proximal histidine to the hemes (here, the heme is penta-coordinated in

all cases), and by picking an appropriate titration state for every histidine based on its immediate

environment. The Dowser water-placement program [171] was then used to internally solvate the

globin atomic structures, though rarely resulting in the placement of additional water molecules.

The heme–protein complex was then solvated using a water box whose sides exceeds those of the

protein by at least 20 A in all dimensions. 50 mM of NaCl was then added, adjusting the relative

concentrations of Na+ and Cl− to make the whole system chargeless.

The equilibration protocol used two pre-equilibration stages: an initial 30 ps simulation stage

had the protein and heme fixed, and allowed the solvent to relax at constant temperature (300 K)

and pressure (1 atm); a 50 ps stage then allowed both protein side chains and solvent to equilibrate

while constraining the protein backbone. The entire protein-solvent systems were then equilibrated

for another 950 ps. Finally, an additional 10 ns of simulation at the same NPT conditions was

performed for analysis. The 10 ns simulations were processed using the implicit ligand sampling

method [31] included in the VMD visualization program [78], resulting in a 3D potential of mean

force (PMF) map of the complete network of O2 migration pathways for each globin.

All simulations were performed using the molecular dynamics program NAMD [127] in combi-

nation with the NAMD-G simulation automation engine [69]. Simulation parameters were taken

from the CHARMM22 force-field [99]. Particle Mesh Ewald, with a resolution of at least 1 A was

used everywhere for long-range electrostatics. Langevin dynamics and a Langevin piston were used

to maintain constant temperature and pressure, respectively. Finally, integration timesteps of 1, 2,

and 4 ps, respectively, were used for bonded, non-bonded and long-range electrostatics interactions.

59

Page 71: c 2007 by Jordi Cohen. All rights reserved.

globin species PDB codeMyoglobins

sperm whale Mb Physeter catodon 1A6M, 1A6N [163]sperm whale Mb (YQR mutant) Physeter catodon 1MYZ [20]horse heart Mb Equus caballus 1WLA [102]sea hare Mb Aplysia limacina 1MBA [16]

Invertebrate hemoglobinspig roundworm Hb domain I Ascaris suum 1ASH [170]trematode Hb Paramphistomum epiclitum 1H97 [124]marine bloodworm Hb component III Glycera dibranchiata 1JF3 [121]midge HbIII Chironomus thummi thummi 1ECO [150]clam HbI Lucina pectinata 1FLP [132]

Leghemoglobinsyellow lupin Lb II Lupinus luteus 1GDJ [73]soybean Lb A Glycine max 1BIN [72]

Table 4.1: List of penta-coordinated monomeric globins investigated in this study.

4.3 Results

4.3.1 O2 pathways in monomeric globins

While the O2 pathways and cavities in Mbs (particularly sperm whale Mb) have been studied

extensively, those in most other globins remain unknown. We have investigated the networks of

O2 pathways in a broad set of penta-coordinated monomeric globins, listed in Table 4.1, for which

atomic coordinates are available. For each protein, we calculated the O2 PMF map according to

the protocol described in the Methods section. This provided the locations and energy barriers of

the complete network of O2 pathways inside every simulated globin.

Myoglobins. We have computed the O2 PMFs for sperm whale, horse heart, and sea hare Mb.

In the case of sperm whale Mb, we also looked at two additional variants: deoxy-Mb,in which the

distal pocket (DP) contains a water molecule, and the sperm whale (L29Y, H64Q, T67R) “YQR-

Mb” mutant, designed to mimic the slower association/dissociation rates observed in the Ascaris

nematode Hb [5].

The PMF maps for sperm whale oxy-Mb, deoxy-Mb, and YQR-Mb were computed in part

to test the reproducibility of the implicit ligand sampling approach, and in part to observe the

magnitude of the changes caused by the presence of water in the DP and by point mutations. The

60

Page 72: c 2007 by Jordi Cohen. All rights reserved.

O2 PMF maps for sperm whale oxy-Mb and deoxy-Mbs matched particularly well: every favorable

O2 holding region (red, in Fig. 4.1a,b) and the O2 pathway interconnections (blue, in Fig. 4.1a,b)

exhibit an excellent correspondence between the two maps, both in shape and in size, as they

should. The YQR-Mb mutant, also, exhibits strong similarities with the other sperm whale Mb,

except for the shape of the DP, as expected, which is where the three “YQR” point mutations are

located. The variation in shape of the O2 pathways near the Xe1 binding site in YQR-Mb is due

to the presence of a crystal water molecule at that location which is not present in the other Mbs.

Also, a reduction in the size of the pathways far away from the heme (bottom of Mb in Fig. 4.1c)

appears to be due to statistical variations in the presence of water molecules inside all Mbs near

those locations over the course of the simulations. All in all, excluding the effect of trapped water

molecules inside the proteins, the maps for all three sperm whale Mbs were the most similar to each

other of all globin maps, and the details of these maps were remarkably well-reproduced between

each other (as well as with the independently-computed CO PMF map for sperm whale computed

in Cohen et al. [31]). These results gives further credibility to the reproducibility of the implicit

ligand sampling approach for mapping gas migration pathways.

When comparing Mb O2 PMF maps across species, we again see a good agreement between

sperm whale Mb and horse Mb as we did between the various sperm whale Mbs, reflecting the fact

that these globins are all, to a practical extent, almost the same protein. The comparison becomes

interesting, however, when one looks at the O2 PMF for sea hare Mb (Fig. 4.1d). Despite the

very strong similarities in both function and structure between sperm whale and sea hare Mb, the

location of O2 pathways is, surprisingly, very different for these two Mbs.

The mapped O2 pathways provide new insights into the behavior of Mbs. A closer examination

of the YQR-Mb O2 PMF maps reveals that its DP is much more unfavorable to O2 than the DP of

sperm whale oxy-Mb, namely, by approximately 3 kcal/mol. Paradoxically, the shape and energy

features of the DP do not resemble those of the Ascaris roundworm Hb, which served as a template

for YQR-Mb. According to the O2 maps, YQR-Mb’s low association/dissociation constants are

due to a DP which is unfavorable to O2, resulting in a lower probability of O2 occupation in the

DP and lower chance of binding to the heme, rather than having higher energy barriers to reach

the DP, given that the latter is not observed here.

61

Page 73: c 2007 by Jordi Cohen. All rights reserved.

Figure 4.1: O2 PMF maps for various monomeric globins. Shown are the 0 kcal/mol (red) and1.6 kcal/mol (blue) O2 free energy contours, along with the four sperm whale Mb xenon bindingsites as green spheres. The globins are: sperm whale (a) oxyMb, (b) deoxyMb and (c) YQR mutantMb, (d) horse and (e) sea hare Mbs, (f ) soy and (i) lupin Lbs, (g) roundworm, (h) trematode, (j )bloodworm, (k) clam and (l) midge Hbs. The Xe binding sites of sperm whale Mb are shown asgreen spheres and the proteins’ α-helices are displayed as black lines.

62

Page 74: c 2007 by Jordi Cohen. All rights reserved.

In particular, we find it interesting that the minima of the 3D energy maps occur at the sperm

whale Mb’s DP, both in the presence (deoxy-Mb) and absence (oxy-Mb) of a water molecule inside

it, implying that water does not prevent O2 from reaching the DP. The deoxy-Mb DP is, however,

measured to be less favorable to O2 by 3 kcal/mol. Surprisingly, we did not observe any opening of

the distal channel (defined as the pathways going through the swinging His64 “gate”) during any

of the 10 ns simulations of sperm whale Mb (which we extended to 25 ns for the case of oxy-Mb) or

of horse Mb, resulting in the absence of this pathway in the maps presented in this chapter. The

distal pathway was observed in a previous PMF map of Mb [31], however the initial structure of Mb

used in that study contained crystal deformations that may have contributed to this discrepancy

in the distal pathway behavior. Other computational studies have, however, reported observing

the spontaneous swinging of the distal histidine “gate” in Mb at 10–100 ns timescales [17].

Invertebrate monomeric hemoglobins. When we extend the comparison to include the O2

pathways for various monomeric invertebrate Hbs, we note a surprising observation. Fig. 4.1

illustrates the O2 pathways for the 12 simulated globins. Aside from a prominent DP, the various

monomeric globins exhibit O2 pathway and cavity locations which are completely different from one

Hb to another. These variations are significant and reproducible and cannot at all be attributed

to errors in the evaluation of the PMF. Since the globin fold is well conserved amongst the studied

globins, and their tertiary structure is near-identical (see Fig. 4.2), our results suggest that the

locations of O2 pathways (which in general are the same as those for CO and NO [31]) are not

determined by the protein’s secondary and tertiary structures, which are conserved.

A second notable observation arising from the O2 PMF maps for the invertebrate monomeric

Hbs is the large number of exits and pathways present in each globin. Our results suggest that

multiple exits and an overall porousness to O2 might be the norm in globins. This is surprising

and contrasts with the common assumption of a single O2 entryway in many kinetic models of

O2 migration in Mb [143, 145], as well as with the O2 PMF maps of the other proteins for which

O2 pathways have been mapped: CpI hydrogenase, which was found to be largely impermeable to

O2, except along two well-defined and very localized pathways [32], and AQP1 aquaporin [167] for

which O2 pathways are only found at the interface of the protein’s constituent monomers.

63

Page 75: c 2007 by Jordi Cohen. All rights reserved.

Figure 4.2: The structure of 10 monomeric globins are aligned and superimposed, demonstratingthe very strong conservantion of their secondary structure globin fold. The structures are spermwhale (blue), horse (green) and sea hare (cyan) Mbs, soy (black) and lupin (white) Lbs, roundworm(yellow), trematode (red) and bloodworm (orange) Hbs, clam (pink), and midge (purple) Hbs. TheXe binding sites of sperm whale Mb are shown as green spheres and the proteins’ α-helices aredisplayed as black lines.

Leghemoglobins. We have studied two Lbs, from lupin and soy, both exhibiting very similar O2

PMF maps (see Fig. 4.3b). The O2 PMF maps for both Lbs show them to be mostly inaccessible

to O2, except for a very short and direct exit between the DP and the external solution. The main

exit seen in both lupin and soy Lbs (exit, in lupin Lb: Ala37, Leu43, His106, Val109, and in soy

Lb: Pro38, Leu43, Gln101, Val104), is the same as the one reported in lupin Lb by Czerminski and

Elber [40] using locally-enhanced sampling simulation. Despite the presence of a distal histidine in

Lb at the same location as the gating histidine in Mbs, the distal pathway is not present at all in

the Lb O2 PMF maps (the distal pathway was in fact observed here for every Mb, even though it

was in some cases seen to be blocked by a closed histidine “gate”).

When compared with the other monomeric globins from Fig. 4.1, the fact that both Lbs have

a single, and conserved, dominant exit next to the heme is an unusual feature. The only possible

other exit, according to the O2 PMF maps, is a much less probable secondary exit that is still

located right next to the primary exit. This is in stark contrast to all the other globins in this

study, which are all very porous to O2 and all possess multiple exits for O2. Both soybean and

64

Page 76: c 2007 by Jordi Cohen. All rights reserved.

Figure 4.3: Comparisons of the 1.6 kcal/mol O2 PMF surfaces for similar monomeric globins. Theglobins are: (a) sperm whale Mb (blue) vs. horse heart Mb (red), and (b) lupin (blue) and soy Lb(red).

lupin Lbs must transport O2 to symbiotic Rhizobium bacteroids inside root nodules in the plant,

while simultaneously ensuring that as little as possible of this O2 reaches the Rhizobia’s nitrogenase

enzymes, which are essential to the plant host, and which are intolerant to O2. By having a unique

exit, the possibility is raised that this exit could be blocked during transport and/or that Lb could

deliver O2 to a precise target while ensuring minimal O2 leakage. Our results do not prove such a

conclusion, nevertheless, they do show that Lbs have a peculiar O2 pathway arrangement which is

compatible with the possibility of their role in sequestering O2 in a way that is not realized by any

other globin examined in this study.

4.3.2 Specific residues promoting O2 pathways

From the comparison of the various globin O2 pathway maps performed in this study, it is evident

that O2 pathways are not determined by a protein’s tertiary structure; instead, O2 pathways are

found to correlate with a protein’s residue composition. While there is no guarantee that specific

residue types have any individual effect on the location of O2 pathways in the protein, we find

that certain residues, on average, are more likely than others to be found near O2 pathways. By

collecting statistical information regarding the predisposition of residue types to form or not form

O2 pathways, one can guide future efforts to manipulate O2 pathways inside proteins [25, 63, 139].

65

Page 77: c 2007 by Jordi Cohen. All rights reserved.

Fig. 4.4 shows the proportion of residues that lie next to an O2 pathway, sorted by residue

type. The analysis was done for two different sets of globular proteins: a set of nine monomeric

globins, and a set of three different copper-containing amine oxidases from Hansenula polymor-

pha [83], Pichia pastoris (PDB: 1N9E) [43] and Arthrobacter globiformis (PDB: 1W6G). “Core”

residues (Fig. 4.4a–c) are distinguished from “surface” residues (Fig. 4.4d–f), based on whether the

residues’ side chains (or entire residue for the cases of Gly and Ala) are in contact with the external

solution. The proportion of total protein residues which also happen to line the O2 pathways was

computed, for each type of residue, by counting the number of residues for which any atom of

their side chains (including hydrogen atoms) in the crystal structure was located within 2.5 A of a

gridpoint of the O2 PMF map for which the PMF is lower than a given threshold (taken as -2, 0

and 2 kcal/mol in this study). As can be seen, the propensity of given residue types to be near an

O2 pocket is loosely correlated with its hydrophobicity. Large flexible hydrophobic residues and

those possessing aromatic rings are the most often seen near O2 pathways, indicating that the large

size and mobility of these residues most likely promote the formation of cavities, rather than fill

them up.

The copper-containing amine oxidases studied here exhibit less O2 cavities and pathways, on

average, than the more loosely packed and smaller monomeric globins. However, there is never-

theless a very high level of correlation between the relative propensity for different residue type to

be near an O2 pathways (see Fig. 4.4) for the two protein families. For example, very favorable

holding regions (PMF < -2 kcal/mol) for O2 inside the protein are predominantly (and in similar

proportions) lined with Trp, Ile, Leu, Phe and Met residues in both protein families (Fig. 4.4c).

Especially surprising is the correlation between O2 favorable areas outside the protein and

surface residue types. Fig. 4.4e shows that surface residues have the near identical effects on O2

binding sites in the exterior of the protein for both globins and copper-containing amine oxidases,

suggesting that the affinity to O2 for both a protein’s interior and surface can be tuned by means

of point mutations. Fig. 4.4d shows the surface residues which are near O2 regions having an

affinity of better than 2 kcal/mol (close to the solvation energy of O2 in water). If any residue

type is not at 100% in this graph, this means that it is statistically repelling O2. For both globins

and copper-containing amine oxidases, Asp is an outlier in Fig. 4.4d, meaning that it repels O2 in

66

Page 78: c 2007 by Jordi Cohen. All rights reserved.

Figure 4.4: Percentage of residues of a given type whose side chains (or entire residue for the caseof Gly and Ala) are located within 2.5 A of a region of the implicit ligand O2 PMF map where thePMF is less than (a,d) 2 kcal/mol, (b,e) 0 kcal/mol, or (c,f ) -2 kcal/mol. The data was collectedseparately over a set of nine monomeric globins (excluding whale deoxy-Mb, YQR-Mb, and soy Lb,which are redundant), shown in black, and a set of three copper amine oxidases, shown in gray.Residues are treated separately based on whether they (a–c) comprise the hydrophobic core of theprotein or (e–f ) are in contact with the external solvent. The area of each data point is proportionalto the number of total residues of a given type and location (core/surface) found across the set ofproteins used for that analysis.

67

Page 79: c 2007 by Jordi Cohen. All rights reserved.

solution. Interestingly, it was also found in studies of gas permeation in aquaporin [76, 167], that

there exists a distinct and unexplained barrier to O2 permeation at a specific location inside an

otherwise uniform water channel. At that location, this water channel is surrounded by four Asp

residues. The natural tendency, observed here, of Asp to repel O2 on the surface of proteins would

explain this result.

4.3.3 Significance of the O2 pathway networks

There has been, over the course of many decades, a large body of work dedicated to understanding

and characterizing the O2 migration kinetics inside a small number of representative globins. Our

results clearly show that the shapes of the networks of O2 pathways inside globins vary greatly from

one globin to the next, and that whatever conclusion can be experimentally drawn for one specific

globin is likely to be only applicable to that specific globin. One might even wonder if the actual

location of pathways and cavities inside a globin bear much relevance to its function. As long as

the global properties of the O2 pathways match the desired function of the protein, there may well

be no incentive for the protein to conserve or even tune O2 pathway locations, especially given that

most globins appear to possess a large number of these pathways. To the present authors, it is not

clear that O2 pathways are critically important. A discussion of the role of O2 pathways should

therefore start with what they are not.

O2 pathway networks do not affect O2 binding affinities. A differentiating property of individual

globins is their affinity to gas ligands such as O2. Maintaining fine-tuned affinities to O2 is crucial

to organism function. For example, most Hbs undergo conformational changes to vary their O2

affinity through cooperative binding: they require high affinities near an O2 sources such as the

lungs along with a decreased affinity in low O2 environments. Similarly, vertebrate Mb, as well as

secondary Hbs in invertebrates, require a relatively high O2 affinity in order to uptake O2 from

the primary Hb carrier. Globin O2 affinity, however, cannot be a property of the O2 pathway

network. Instead, it can only depend on the free energy of O2 binding at the heme, which is almost

exclusively influenced by interaction with the residues located in the DP, as evidenced by the high

sensitivity of globin-gas affinities to mutations of DP residues [97, 118, 149]. The pathway taken

by O2 to reach the DP are themselves not expected to bear an influence on globin affinity to O2.

68

Page 80: c 2007 by Jordi Cohen. All rights reserved.

The properties of pathways in the protein matrix could, in theory, affect the O2 on/off rates

between the heme and the exterior. Given a fixed O2 affinity (which is related to the ratio of

the on and off rates), altering the energy barriers along, and shapes of, the O2 pathways could

slow or hasten O2’s migration speeds. While most globins bind to O2 for short times (with a 1–

100 ms half-life), the Hbs of the Paramphistomum trematode and Ascaris roundworm both display

exceptionally long O2 binding half-lives (21 s and 175 s, respectively) [168]. Despite the fact that

both the roundworm and trematode Hbs exhibit the two highest free energy barriers for O2 exit of all

the globins studied here, the barriers that we measure are too low to explain by themselves the 4–5

orders of magnitude difference in O2 binding times for these protein compared to that of the average

globin. Assuming an Arrhenius process, such a difference in binding times would require an energy

barrier of at least 9 kBT , which would have been readily measured by the implicit ligand sampling

calculation. Both roundworm and trematode rely mainly on exceptionally strong O2-binding at

the heme to accomplish their long binding times, and not on the O2 pathway network. In practice,

the effect of the O2 pathway shapes and locations thus appears to be of minor importance relative

to the effect of the bound ligand’s environment at the heme, even for extreme cases.

The different pathways in globins may however have important roles not yet appreciated or

understood. For one, these pathways may have possible enzymatic functions such as kinetic proof-

reading as a means to increase the selectivity of Mb binding to various gases as suggested by

Radding and Phillips [130], or as a means to promote the NO to NO−3 reaction catalyzed by some

oxyHbs and oxyMbs [23]. Most obviously, such pathways would provide many ways for O2 to enter

and escape globins, and many entrances in the globin surface would also increase the capture (and

release) rates of gas molecules by globins. Ever since the hypothesis by Perutz and Matthews [123]

that O2 enters Mb and some Hbs through a conserved swinging histidine gate, this pathways has

often been considered as the dominant pathway for O2 to enter these globins. This hypothesis has

remained popular because the His gate is and has been the only visible O2 pathway in static crystal

structures of Mbs/Hbs. However, when thermal fluctuations and protein dynamics are accounted

for, numerous other possible O2 pathways are revealed, and it is likely that the His gate is just one

pathway amongst many [31, 47, 75]. In all likelihood, the swinging of the His gate (as opposed to

the direct interaction of the gating His with the bound ligand in the DP) is not a critical factor in

69

Page 81: c 2007 by Jordi Cohen. All rights reserved.

the regulation of O2 entry or exit from the protein. We furthermore postulate that the conserved

His gate appears to gate a water channel, which could allow, e.g., for the escape of the NO−3 prod-

uct from Mb. At this point, the roles of such pathways in Mb and other globins, including the

histidine gate, are still partially speculative, but in light of new developments in the localization of

the O2 pathways in globins, it is now appropriate to carefully reconsider the assumptions made in

the past.

4.4 Conclusion

While there has been an accelerating rate of progress in recent years in first identifying major gas-

holding packing defects [23, 156], and later complete maps of O2 pathways inside many proteins [6,

17, 31, 32, 77, 109], a more fundamental understanding of how these pathways occur in general has

been lacking. In the present study, we have characterized the network of O2 pathways inside a large

range of proteins from the globin superfamily. Despite the fact that our results are reproducible

and similar for very closely related proteins, we find a complete lack of conservancy of the location,

topology, or sizes of the O2 pathway networks from one individual monomeric globin to the next,

despite the similar folds of the proteins. On one hand, this suggests that the specific details

and locations of O2 pathways in proteins do not matter much for the protein’s function, as long

as the pathways are present and provide adequate transport to gas molecules across the protein

matrix. On the other hand, this implies that while these pathways are rather independent of the

protein’s tertiary structural features, they may actually be dependent on the specific composition

of residues inside the protein. This hypothesis was tested and it was found that the propensity

of certain residues to be adjacent or not to O2 pathways is well-reproduced across two protein

families: monomeric globins and copper-containing amine oxidases. Such results can be used

to plan gas migration pathway-altering mutations inside proteins, by substituting residues which

have a predisposition to create O2 favorable regions with those that do not, and vice versa. The

correlation between residue types and O2 access is clear. Whether these correlations can be used

directly to plan or predict the effect of point mutations on O2 accessibility inside proteins, and the

effect of blocking O2 pathways on gas migration rates, remains to be tested.

70

Page 82: c 2007 by Jordi Cohen. All rights reserved.

Chapter 5

Conclusion and outlook

The development of methods for describing and understanding gas migration pathways has many

immediate applications beyond hydrogenase. Many important families of proteins such as oxy-

genases, oxidases, and globins, for example, must interact with O2 and/or other gas ligands to

perform their function. The ability to map O2 pathways in these proteins is of great interest in

understanding how they function.

For example, implicit gas ligand sampling has been applied to the problem of gas conduction

across the tetrameric AQP1 aquaporin water channel. A number of experimental studies [37, 112,

128] have reported that the expression or addition of AQP1 in reconstituted lipid membranes and

in the CO2-impermeable Xenopus oocytes (a huge unicellar egg cell) resulted in a measurable

increase in the membranes’ permeabilities to CO2. Figure 5.1a displays the match between the

implicit ligand sampling map and the positions of O2 collected from a 26 ns simulation of explicit

O2 diffusing across aquaporin, performed by Wang et al. [167]. Given that the explicit O2 simulation

started with initial conditions in which 100 O2 molecules were initially placed in the solution, regions

inside the protein which are favorable to O2 but which are protected from the external solution

by large energy barriers are not sampled sufficiently by the explicit O2 simulation, but are easily

discovered using implicit ligand sampling. Both Figs. 5.1a and 5.1b show the location of the three

most favorable gas pathways across the aquaporin, and Fig. 5.1c shows the computed effective free

energy barriers for O2 passage along each of these three pathways, as compared to a pure POPE

lipid bilayer. It should be noted, however, that POPE lipid bilayers are relatively permeable to

O2/CO2, whereas the experimental studies cited above used less-permeable membranes. Despite

being longer, the explicit O2 simulations did not collect enough statistics to provide the potential

of mean force along all three of these pathways, unlike the implicit ligand sampling analysis. And

even though the results from the implicit ligand sampling analysis generally mirrored the results

71

Page 83: c 2007 by Jordi Cohen. All rights reserved.

found from the explicit O2 simulations [76, 167], the very favorable “side” pathways were not even

detected using any of the explicit O2 simulations because they are open only very ephemerally,

even though they have the overall lowest energy barriers to O2 permeation.

The two O2 studies have both pointed to the aquaporin “central pore” as a very likely channel

for the passage of CO2 and O2. Where the implicit ligand method proved to be the most useful,

however, was in identifying the exact nature of this barrier. A previous computational study by

Hub and de Groot [76] had found this same barrier and had mistakenly attributed it to a narrow

hydrophobic constriction at the entrance of the central pore. Implicit ligand sampling revealed

that this was not the case: the barrier was located in the middle of a wide water channel above

this constriction, in a region where four aspartic acid residues (Asp 50) meet (see Fig. 5.2a). The

specific reason as for why the wide water channel blocks O2 passage at the precise location of Asp 50

is not well understood, since the water molecules are just as mobile there as anywhere else. It was

found, however, that the averaged density of water at that specific point was slightly higher than

that of the bulk water, as shown in Fig. 5.2b. It is probably not a coincidence that aspartic acid,

responsible for the O2 barrier here, was also identified as the residue type most likely to repel O2

in water, according to Fig. 4.4d (Chapter 4). In fact various mutations of this aspartic acid residue

across the four monomers resulted in a dramatic decrease in the central pore barrier, as measured

using implicit ligand sampling (Yi Wang, private communications).

Further work has also been performed and published using implicit ligand sampling on other

systems as well. One study aimed to find how O2 makes its way to the catalytic site of a copper

amine oxidase from Hansenula polymorpha, using an approach combining implicit ligand sampling

with the x-ray determination of the crystal structure in the presence of xenon [83]. Another set of

studies, both experimental and theoretical, are investigating the effect of hydrogenase mutations

on its O2 accessibility [63, 64, 87]. And more studies are currently in the works.

As with all discoveries, it is exciting to see how new methodologies and new ways of looking

at a problem lead to yet new discoveries. The study of gas migration inside proteins has certainly

always been relevant to the function of many classes of proteins, but until now, has been very

limited in its scope due to a lack of theoretical and experimental tools and of basic knowledge to

address the problem effectively. The present thesis provides many of these tools and much of this

72

Page 84: c 2007 by Jordi Cohen. All rights reserved.

Figure 5.1: (a) Top view of the AQP1 tetramer, displaying the locations of the three potentialO2/CO2 channels, which are (A) the central pore, (B) the water pores, and (C) the side pores.The 0 kcal/mol O2 PMF surface is displayed in yellow for all three cases. (b) The implicit ligand0.6 kcal/mol O2 PMF isosurfaces (mesh), superimposed on top of the explicit O2 from a 26 nssimulation, for comparison. Both datasets were symmetrized over the four identical AQP1 subunits.(c) The O2 PMFs for O2 migration along all three potential channels, and through a POPE bilayer,projected along the z-axis. All PMFs assume that AQP1 is maximally packed, with 1 AQP1 per50 nm2 of bilayer. The upper bound errors on the PMF are +0.25 kcal/mol, and the lower bounderrors are -0.25, -0.6, and -1.7 kcal/mol respectively for PMF values below 4, 6, and 8 kcal/mol.This figure was generated from material contained in [167], and (c) was created by Yi Wang.

73

Page 85: c 2007 by Jordi Cohen. All rights reserved.

Figure 5.2: (a) The 0 kcal/mol (yellow) and 3 kcal/mol (mesh) isosurfaces are display atop thecumulative positions of explicit O2 molecules (red lines) from a 26 ns simulation. An arrow points tothe location of a large energy barrier surrounded by four Asp 50 residues. This barrier correspondsto a region of denser water than average. (b) The regions of space which are occupied by watermore than 75% of the time are highlighted in blue. Such a region is found precisely at the locationof the barrier. The profile of the protein around the central pore is sketched using a simplifiedoutline. This figure was taken from [167], and (b) was created by Yi Wang.

knowledge by outlining a likely mechanism for gas permeation inside proteins along with a means to

detect and alter the gas migration pathways. Long-standing questions regarding gas migration in

proteins, especially those concerning how O2 and CO enter proteins, can now finally be addressed

with confidence. But because the field is so wide and so new, relatively little is still known about

the general properties of gas migration pathways in proteins, and a lot is yet to be learned. It is

hoped that the work presented in this thesis will be of invaluable help to all those who are currently

pursuing related research.

74

Page 86: c 2007 by Jordi Cohen. All rights reserved.

Appendix A

Mechanism of anionic conductionacross ClC chloride channels

Up until now, we have investigated methods for measuring free energy profiles for teh special case

of weakly-interacting ligands migrating inside proteins. In more typical scenarios, the ligands of

interest are either bulky or charged, and interact strongly with the protein. When this is the case,

the ligands cannot be treated implicitly as has been done in earlier chapters, and other free-energy

sampling techniques must be used. The ClC chloride transporter is a good example of such a

system.

ClC chloride transporters are voltage-gated transmembrane proteins which have been associ-

ated with a wide range of regulatory roles in vertebrates. To accomplish their function, they allow

small inorganic anions to efficiently pass through, while excluding passage to all other particles.

Understanding the conduction mechanism of ClC has been the subject of many experimental inves-

tigations, but until now, the detailed dynamic mechanism was not known despite the availability

of crystallographic structures. We investigate Cl− conduction by means of an all-atom molecular

dynamics simulation of the ClC transporters in a membrane environment. Based on our simulation

results, we propose a “king of the hill” mechanism for permeation, in which a lone ion bound to

the center of the ClC pore is pushed out by a second ion which enters the pore and takes its place.

While the energy required to extract the single central ion from the pore is enormous, by resorting

to this two-ion process, the largest free energy barrier for conduction is reduced to 4 kcal/mol.

At the narrowest part of the pore, residues Tyr 445 and Ser 107 stabilize the central ion. There,

the bound ion blocks the pore, disrupting the formation of a continuous water file that could leak

protons and possibly preventing the passage of uncharged solutes. (This chapter is based on work

published in Cohen and Schulten [34], Tajkhorshid et al. [153].)

75

Page 87: c 2007 by Jordi Cohen. All rights reserved.

A.1 Introduction

ClC chloride transporters were discovered by C. Miller in 1982 while investigating the Torpedo ray

electroplax membrane [105]. Since then, various members of the ClC family have been isolated in a

wide variety of organisms, ranging from animals and plants to yeast and almost all bacteria except

for a few species with small genomes. All ClCs have in common a selectivity for small inorganic

anions (e.g., Cl−, NO−3 , Br−, I−, SCN−, and some larger hydrophobic anions), though they tend to

discriminate rather poorly between these different anions. Despite their poor inter-anion selectivity,

ClCs are called “chloride-channels” and “chloride-transporters” because Cl− is the only inorganic

anion with a significant presence at physiological conditions.

Many roles for ClCs have been identified in higher organisms: they play vital cellular functions,

such as the regulation of blood pressure, of cell volume, of organelle pH, and of membrane excitabil-

ity [48, 80, 101, 161]. In prokaryotes, however, the ClCs’ various roles are only now emerging from

obscurity. First, Iyer et al. found that ClC is essential for E. coli to survive extreme acid shock [79].

Then, in an unexpected discovery, Accardi et al. concluded that the E. coli ClC was not in fact

a passive channel, as had been assumed for all ClCs, but in fact behaved like an active Cl−–H+

antiporter [1, 2]. This finding is particularly interesting because most other eukaryotic ClC homo-

logues are known to be passive channels, and the E. coli ClC has conduction rates (<0.2 pS) that

are intermediate between typical rates observed in channels and those observed in transporters,

prompting the suggestion that the evolutionary distance between channel and transporter proteins

is less than previously thought.

While most electrophysiological measurements have been performed on eukaryotic ClC homo-

logues [49, 50], such as the Torpedo ray ClC-0 and the human homologues ClC-1 and ClC-2, there

are believed to be many similarities between the core structures of prokaryotic and eukaryotic

ClCs [44, 106]. All ClCs are believed to share a “double-barrelled” architecture, in which each

protein consists of two identical monomers, each monomer consisting of two heterogeneous but

structurally-similar segments arranged in an anti-parallel fashion, and each monomer containing

its own independent water-filled pore [44, 105, 108] (Fig. A.1a). ClCs share a very high sequence

similarity for the selectivity filter: the central region of the pore that has been found to coordinate

the permeating Cl− ions according to x-ray crystallographic structures [44, 50]. On the other hand,

76

Page 88: c 2007 by Jordi Cohen. All rights reserved.

bacterial and eukaryotic ClCs differ considerably in size. Bacterial ClCs are much smaller (typi-

cally, 395-492 residues), whereas eukaryotic homologues are longer (687-988 residues), with most of

the extra residues lying in cytoplasmic dangling ends or in the periplasmic regions responsible for

regulation and gating functions. And while most, if not all, eukaryotic ClCs are voltage-gated in

some way or another [49, 129], it is still unclear as to whether the recently characterized bacterial

E. coli ClC possesses a voltage-gating mechanism at all.

Figure A.1: (a) View of the ClC dimer showing the broken helix architecture and the position ofthe Cl− ions in the crystal structure. Each monomer and pair of ions is displayed in a differentcolor. (b) Vertical cross-section of the solvent-accessible surface of the ClC protein embedded in alipid bilayer. The simulated model comprises 97,000 atoms. In the narrowest part of the protein,where the Cl− ions permeate, the residues that define the selectivity filter are shown.

Although individual macroscopic properties of ion channels have been studied extensively [74],

until recently, very little was known with certainty about their inner workings. The discovery of the

77

Page 89: c 2007 by Jordi Cohen. All rights reserved.

KcsA potassium channel structure [42] provided the first high-resolution structure of a channel that

was specifically selective for small ions. This discovery sparked a round of fruitful computational

experiments [9, 12, 13, 110, 147] that revisited long-standing assumptions about ion permeation

through ion channels and transporters.

The advent of a second generation of ion channel and transporters structures later emerged,

detailing the atomic structures of a bacterial ClC chloride transporters (at the time believed to be

channel) in a closed [44] and in a constitutively open form [45], and of the calcium-gated MthK [81]

and voltage-gated KvAP [82] potassium channels. These new structures came as much needed

data points in the ion channel structure landscape, and provided the opportunities to build a new

frameworks of simulations and studies on which to base revised microscopic theories of ion chan-

nels [86, 116, 136]. The discovery of the ClC structures made possible the computational study of

the ClC conduction mechanism [19, 38, 39, 107]. The present chapter details a computational study

of a bacterial ClC transporter that reveals the main energy landscape regulating the conduction of

Cl−.

A.2 Methods

A.2.1 Simulation system

Our simulations were based on a published x-ray structure of ClC from Salmonella serovar ty-

phimurium (stClC) at 3.0 A resolution (PDB accession code: 1KPL) [44]. The protein was then

placed in a POPE membrane and solvated with water. 24 Cl− and 2 Na+ ions were placed at ran-

dom positions in the water in order to neutralize the protein charge with a total ion concentration

of 100 mM. The resulting 97,000 atom system was then equilibrated for 5 ns in the NPT ensemble,

with a 2 fs integration time step, periodic boundary conditions and PME electrostatics, using the

NAMD2 molecular dynamics software [84] and the CHARMM22 force-fields [99, 141] for the en-

ergy parameters for lipids, protein and ions, and the TIP3P model for water molecules. During the

equilibration, the membrane relaxed to a state of hydrophobic match in which it became thinner

near the protein than away from it. The Cl− ions redistributed themselves near the very charged

cytoplasmic side of the transporter. Top and side views of the final system after equilibration are

78

Page 90: c 2007 by Jordi Cohen. All rights reserved.

shown in Fig. A.2.

Figure A.2: Top and side view of the ClC simulation system showing the POPE membrane andions. The reconstructed N-terminus is highlighted in black and lipid tails have been simplified.

A.2.2 Opening the pore

Each pore of the wild type ClC crystal structures is obstructed by a highly conserved glutamic

acid residue (Glu 148), which is bent so that its negatively charged carboxyl head is bound to a

region of the pore next to the periplasmic exit. In order to investigate the effect of this residue,

we ran interactive molecular dynamics (IMD) simulations [70, 151] in which Cl− ions were pulled

past Glu 148 to ascertain that the pore was indeed blocked and that the passage of Cl− across the

channel was not possible without displacing the glutamate side chain. While we can be certain that

the opening of the fast gate in ClC involves a change to the conformation of Glu 148, we cannot

79

Page 91: c 2007 by Jordi Cohen. All rights reserved.

exclude the possibility of other conformational changes. Recent indirect evidence suggests that the

open ClC-0 channel exhibits additional changes at its cytoplasmic mouth, unaccounted for in the

present study [3, 158]. Nevertheless, we set out to modify the conformation of Glu 148 on the basis

that this alteration was necessary and sufficient for unblocking ClC. Accordingly, we created an

open conformation of the ClC pore by using IMD to pull Glu 148’s hydroxyl group out of the pore

and into the channel’s periplasmic vestibule.

Recent electro-physiological measurement performed on the Torpedo ray ClC-0 show that both

the substitution of the pore-blocking glutamate (Glu 166 in ClC-0) with small non-charged residues

(E166G, E166A, E166V or E166Q) and the protonation of this glutamate, strongly reduced the

voltage dependence of the fast gate, allowing the pore to remain open for a wider range of condi-

tions [45, 58]. Similar mutation studies reached identical conclusions for ClC-4 (E224A) and ClC-5

(E211A) [58]. In addition, crystal structures of E. coli ClC (electro-physiological measurements

are not currently possible on the native protein) mutants (E148A, E148Q) showed them to be

virtually identical to the wild type structures except that the mutant pores were unobstructed by

the Glu 148 side chain. Further supporting the validity of our assumptions about the open pore

conformation and of the “molecular surgery” we performed, is the finding that in the E148Q mu-

tant, the neutrally charged glutamine residue, which has the same atomic geometry as glutamate,

has its side chain sticking out of the pore [45], suggesting that Glu 148 might do the same under

favorable conditions.

A superposition of the structures of the closed ClC transporter (PDB: 1KPL), of our manu-

ally opened transporter after equilibration, and of the E. coli constitutively open E148Q mutant

(PDB: 1OTU) [45] are shown in Fig. A.3. Since the beginning of our study predates the publication

of the constitutively open pore crystal structures, these later structures were not used as a starting

point; however, our modified “open” structure matches these new crystal structure closely.

It must be said that there is currently no certainty about the exact nature of the ClC fast gate.

The mechanism suggested by Dutzler et al. [45] is not without problems: for ClC-0, we note the

conflicting studies mentioned above [3, 158]; we also note that this mechanism does not account

for the measured fast gate charge of 0.92–2.2 e for ClC [45, 53, 98]; finally we note that it fails

to explain the reported effect of external Cl− ions on the fast gate [30, 129]. Nevertheless, the

80

Page 92: c 2007 by Jordi Cohen. All rights reserved.

Figure A.3: Superposition of the structures for the 1KPL closed transporter (dark gray), of the1OTU constitutively open channel E148Q mutant (orange) and of our manually opened transporterafter equilibration (atom-based coloring).

mentioned studies, complemented by Lin and Chen [95], are all either indicative of, or compatible

with, putative conformational changes at or outside the cytoplasmic mouth of the transporter. In

the present study, we seek to shed some light of our own on these matters. We demonstrate that

anionic conduction is possible and probable across the pore of bacterial ClC as observed in current

crystal structures, once the pore-blocking glutamate has been displaced. Furthermore, we find that

there exists ample space for the permeating Cl− ions to move in the pore. In fact, it is probable

that an open conformation of ClC that would exhibit a notably wider internal pore could lead to a

dramatic loss in selectivity and is as such unlikely. It is a credible possibility, and compatible with

the experimental data so far amassed, that the internal pore structure of ClC is not significantly

affected by the opening and closing of the fast gate.

A.2.3 Molecular alterations

The crystal structure (PDB:1KPL) is missing an entire N-terminal segment for one of the two

monomers (chain A). We have partially modeled this segment (residues 12 to 32) by duplicating

it from the other monomer (chain B) and splicing it into chain A. This reconstituted segment,

81

Page 93: c 2007 by Jordi Cohen. All rights reserved.

highlighted in black in Fig. A.2, dangles underneath the monomer opposite to the one to which it

belongs and binds with that monomer’s C-terminus, possibly contributing to the dimer’s stability.

At the end of the 5 ns equilibration, the two ClC pores, which are devoid of crystallized water

molecules in the published crystal structure, did not acquire any water molecules from the bulk

solution. Neither did any of the crystal structure Cl− ions budge from their binding site at the

center of each ClC pore. To remedy this situation, we placed water molecules in single file across

each pore. We also placed an additional Cl− at the Glu 148’s binding site in each pore. This new

configuration was then equilibrated for 0.5 ns and used as a starting point for all further simulations.

A.2.4 Mapping the potential of mean force

In order to reconstruct the potential of mean force (PMF) of permeation through ClC, we have

employed umbrella sampling [71, 135]. In our study, the PMF describes the overall free energy

profile experienced by two simultaneous Cl− ions traversing either of the ClC’s two pores in the

absence of an external electric field, as a function of their respective positions along the pore (z-

axis), averaged over all other degrees of freedom. In each of the sampling simulations, two Cl−

ions per ClC pore were separately tethered by means of virtual 1-D springs acting along the z-axis

with spring constants of 15 kcal/mol/A2. This was done for both pores simultaneously, which are

located so far apart (40 A) that the correlation between the configurations of the ions in each pore

is negligible. The tethering points for the umbrella potentials were never distributed more than 1 A

apart for any ion, requiring 92 simulations of 370 ps each to sample the range of motion of the ions

during their conduction through the pore. Starting configurations for the different simulations were

created by translating the permeating ions to the z-coordinate minima of the tethering (umbrella)

potentials. The energy of the ions and water molecules in the pore was then minimized (keeping

all the other atoms fixed) so that the ions could reposition themselves laterally in the pore. The

ions were then repositioned to the correct initial z-coordinate and the pore water molecules were

equilibrated for 10 ps (keeping everything else fixed), after which the system was equilibrated for

an extra 70 ps. This procedure ensured that the sudden artificial displacement of the ions in the

pore, at the beginning of each simulation, did not unnecessarily perturb the protein. The spatial

distributions of the ions along the z-axis for each simulation were then combined using the weighted

82

Page 94: c 2007 by Jordi Cohen. All rights reserved.

histogram analysis method [91] applied in 2-D in order to obtain the full two-ion distribution of

Cl−, which was then inverted to obtain the PMF.

As is often the case with atomistic MD simulations, the studied events occur at natural time

scales which cannot be directly probed using available computer power. Taking the example of

ClC-0, a typical pore current of 0.5 pA [30] corresponds to one elementary charge exiting the pore

every 200–300 ps. This includes the time for an ion to diffuse to the pore’s mouth as well as the

time it takes for an ion (not necessarily the same one) to subsequently exit on the other side. Since

we are only interested in the “permeation” phase, the conservative 300 ps time scale is a high upper

bound estimate of the time needed to observe one conduction event. Nevertheless, even assuming

the shortest event times, an equilibrium simulation would only provide a tiny number of complete

conduction events, if any at all. Umbrella sampling allows us to sample high energy states with a

much higher probability than is possible with equilibrium trajectories. With sufficient sampling,

this methodology permits us to accurately determine the energy barriers as well as the physical

pathway taken by the ions as they permeate through the pore. The accelerated sampling does

come with restrictions: non-equilibrium dynamic trajectories cannot be directly observed. And

while we provide a detailed statistical description of permeation across the ClC pore, the limitation

of our method is that we cannot directly resolve the concerted motions of individual residues, water

molecules and permeant ions as they occur during a conduction event.

A.2.5 Computing a slow process in little time

Our goal is to characterize the way in which Cl− passes through the ClC transporter under ex-

tremely favorable conditions: open gates, and no proton-coupling to slow the dynamics. For this,

the energy profile of Cl− ions going through the selectivity filter will be measured. When one

wishes to understand a long timescale process using computer simulation, MD trajectories of the

atomic motions, by themselves, are of limited interest. In the present case, we know that a Cl− ion

will conduct across E. coli ClC, in a favorable electrochemical gradient, within roughly one mil-

lisecond [1]. This timescale is out of the reach of contemporary atomic-level simulations running

on the best currently available hardware. But even if we could simulate the complete translocation

of a Cl− ion across one of ClC’s pores, the computed trajectory would represent only a single event

83

Page 95: c 2007 by Jordi Cohen. All rights reserved.

(or a handful at most). As we know, nature allows many paths between two end states, all with

different probabilities. Computing just one of these paths leaves us with no information about its

statistical relevance and prevents us from reaching any meaningful conclusion about the studied

mechanism.

Fortunately, one does not require a full description of the atomic trajectories in order to under-

stand the translocation process. What is really needed is the energy profile – the energy mountains

and valleys – experienced by anions as they conduct through ClC. A complete description of the

free energy profile would provide us with all the statistical information about the ion translocation

process, since the free energy for a given ion state (i.e. the set of positions along the transporter

occupied by ions at a given instant) can be interpreted as a measure of the probability of occurrence

of that ion state. Now, in order to compute the energy profile for a conduction event, we will in

fact need to carry out calculations of simulation trajectories. There are, however, two superfluous

aspects of real-time simulation that can be taken advantage of in order to allow the calculation

of the relevant energy profile using minimal computational requirements: (1) not all degrees of

freedom are important to the problem at hand and (2) not all parts of a trajectory are sampled

equally by equilibrium simulations. In fact, most slow processes spend the overwhelming majority

of the time in a small favorable region of phase space. If the system’s time evolution can be biased

such that states with high energy are sampled as often as the more favorable low energy states, we

could calculate the energy profile of the system projected along a chosen reaction coordinate much

faster than it would take to perform a full equilibrium simulation.

The potential of mean force (PMF) is the desired quantity that describes the overall free energy

profile experienced by the system as it evolves along one or more reaction coordinates, averaged

over all other degrees of freedom. In our case, this means that the PMF will describe an energy as

a function of the position of ions in the transporter, averaged over all possible conformations that

the ClC transporter itself can take in order to accommodate the permeant ions.

84

Page 96: c 2007 by Jordi Cohen. All rights reserved.

A.3 Results

A.3.1 Energetics of Selectivity and Conduction

In this section, we describe the energetics involved in ion conduction. First, we introduce a detailed

calculation of the free energy profile that governs the coordinated conduction of two simultaneous

Cl− ions in the pore in the absence of an external field. This reveals the respective motions of the

two ions as they permeate through the pore, and explains the anionic binding sites in ClC. We

then analyze the interaction energies between the individual ions and the major constituents of the

ClC transporter, providing evidence that confirms and dispels common assumptions.

Potential of mean force

Our map of the potential is shown for each pore (chains A and B) in Fig. A.4. The maps describe

the energetics involved in the transition between an initial state I and a final state II. In state I, a

first Cl− is bound to the transporter’s central binding site (determined from the crystal structure)

and a second Cl− is positioned in the transporter’s cytoplasmic entrance; in state II, the first

Cl− is positioned in the periplasmic exit and the second Cl− is bound in the central binding site.

This process effectively describes a conduction event since states I and II share identical pore

configurations except that in state I, one Cl− is at the top of the pore, and in state II, it is at

the bottom. In order for a new translocation to occur, one simply has to wait for a third ion to

diffuse from the bulk solution into either of the pore’s entrances, after which the transition I→II or

II→I can repeat. In the figure, distances along the z-axis (which approximately follows the pore)

are measured with respect to the center of the lipid membrane, with positive values toward the

periplasm. For the naming of the ClC binding sites, we follow the nomenclature of Dutzler et al.

[45]: Scen is the central binding site coordinated primarily by Ser 107 and Tyr 445, Sint is at the

cytoplasmic end of the pore and Sext is the outer binding site on the other side, for which Cl−

competes with Glu 148. The location of the Cl− binding sites as well as the z-axis coordinates are

indicated for reference in Fig. A.5.

In order to test the consistency and validity of our results, we have measured the PMF for

conduction in both monomers of the ClC dimer. Apart from a reconstructed N-terminus (for

85

Page 97: c 2007 by Jordi Cohen. All rights reserved.

Figure A.4: Potential of mean force for both pores (chains A and B) of ClC as a function of thepositions along the z-axis of the top and bottom Cl− ions. Contours represent slices of 1 kcal/molat a spatial resolution of 0.15 A. Red indicates a low energy while blue denotes high energy. Thegray background denotes areas not sampled and the black contour represents energies above athreshold. The position of the three Cl− binding sites from the x-ray structure of Dutzler et al.[45] are identified by straight lines. The minimum energy path, corresponding to the most likelyconduction pathway, is shown for each pore as a thick black line.

86

Page 98: c 2007 by Jordi Cohen. All rights reserved.

Figure A.5: The ClC pore, showing the positions of the permeating Cl− ions (small balls), the back-bone amide hydrogens involved in permeation (large spheres) and the related non-helical backbone(tube), the two pore-lining polar residues Tyr 445 and Ser 107 and the two charged residues Glu 148and Arg 147 possibly involved in the fast gate. The location of the three crystal binding sites aswell as the z axis coordinates are displayed for reference (with z = 0 indicating the middle of thelipid bilayer).

87

Page 99: c 2007 by Jordi Cohen. All rights reserved.

chain A of the 1KPL PDB structure), the two monomers have the same spatial structure. We

therefore expect that the variations between the PMFs calculated for the two different pores are

caused, for the most part, by the different distribution of microstates being sampled, rather than

by macroscopic conformational differences between the two pores. There was one exception, in

which a different behavior was observed between the two monomers: in chain A, when the two ions

were kept close together (≈ 4 A apart) at one specific location, the top ion was temporarily pushed

sideways in the pore to maintain its distances from the bottom ion, disrupting the PMF for chain

A at that location (located at ztop ≈ −1.5 and zbottom ≈ −6 A); this was not observed anywhere

else. In describing the PMF of ClC, we will only concern ourselves with features common to both

pores.

Looking at the two PMFs, we notice similar characteristics. First of all, the PMFs share a

similar two-ion pathway (shown as as a black line) for ion translocation across the pore. From the

PMF maps, we can hypothesize a probable sequence of events describing ion permeation across

the transporter that proceeds in a semi-stepwise manner, as shown in Fig. A.6. Initially, a Cl−

(Cl1) is bound to Scen and a second Cl− (Cl2) enters the pore from the cytoplasm until it reaches

the general area of Sint. At this point, Cl2 stays put and repels Cl1 while the latter attempts to

overcome a barrier located between Scen and Sext. Once this barrier has been crossed, Cl2 can inch

closer to the central binding site to a location about -2.5 A below it (which we call S−). Cl1 and

Cl2 then move simultaneously and gradually toward their intermediate destinations of 1.5 A above

Sext (which we call S+) and Scen, respectively. With Cl2 now tightly bound to the crystal binding

site Scen, Cl1 is free to exit the pore into the periplasm.

For both pores, the second Cl− enters or exits while the first Cl− is tightly bound to Scen.

We refer to this as a “king of the hill” mechanism, in which an ion is always ultimately always

left to dominate the central region of the pore. Also, in both cases, the PMF exhibits minima in

regions that correspond to two of the crystal structure binding sites being simultaneously occupied

(located at the intersections of the straight lines in Fig. A.4): Sint and Scen can be occupied

simultaneously and so can Sext and Sint. On the other hand, the simultaneous occupation of

Sint and Scen is energetically unfavorable compared to the other possibilities (the appearance of a

minimum of the potential of mean force at that location for chain A corresponds to the top ion

88

Page 100: c 2007 by Jordi Cohen. All rights reserved.

Figure A.6: Sequence of steps that occur during a conduction event in which an ion is moved fromthe cytoplasmic side to the periplasmic side of the transporter. The schematic locations of thecrystal structure binding sites (defined in the text) are shown as lines.

being pushed to the side, as described earlier). Simultaneous occupation of these two sites had been

speculated by Dutzler et al. [45], despite the fact that the sites are only 4 A apart, on the basis of

the observation that in some crystal structures, the authors observe simultaneous occupation of a

Cl− at Scen and of a charged oxygen from Glu 148’s carboxylic head group at Sext. Instead, we

observe two nearby stable intermediate states which involve alternate binding locations for either

ion: one where Sext and S− are occupied, and one where Sint and S+ are occupied. It comes as no

surprise that the location of S+ coincides perfectly with the location of Glu 148’s other carboxylic

oxygen, according to the crystal structures in which this residue is present. Occupancy of S+ and

S− have not been observed to date by x-ray diffraction.

Following the permeating Cl− ions along the most probable path, i.e., the path of minimum free

energy, we measure similar energy profiles for both pores. Fig. A.7 shows the PMF profile along

this path. Between the two pore entrances, the PMF profile is relatively flat (with 1–2 kcal/mol

fluctuations), permitting fast permeation. The main barrier in the profile occurs when a Cl−

moves between Sint and Scen, with a height of 3–4 kcal/mol (or 4–4.5± 2 kcal/mol between lowest

and highest energy in the pore). This barrier seems to be caused mainly by a lack of exposed

backbone amides in that specific area of the pore, for Cl− to interact with. The overall energy

barriers are consistent with those determined by a simulation of the Kcsa potassium channel [13],

in which a maximum energy barrier of 2–3 kcal/mol was measured for K+ permeation (with three

simultaneous ions in the pore).

89

Page 101: c 2007 by Jordi Cohen. All rights reserved.

Figure A.7: PMF profiles along the minimum energy pathway for both pores (chains A and B)with the locations of the two permeating Cl− ions indicated for each local minimum. The curvefollows the PMF along the black lines of figure A.4 and reports the minimum value of the PMFmeasured between this path and the two parallel paths representing the cases where the two ionsare displaced toward and away from each other by 0.15 A each. This protocol generates a fairlyaccurate description of the minimum energy pathway except for a small region (between 10 and12 A along the path) of chain A where an optimal solution could not be reliably found. The freeenergy is measured with respect to a base configuration in which one Cl− is bound to Scen and theother Cl− has been exchanged with a water molecule in bulk solution.

90

Page 102: c 2007 by Jordi Cohen. All rights reserved.

Permeation pathway

By stitching together all of our local sampling simulations along the most probable path, we obtain

a picture of the physical pathway taken by Cl− as it crosses the transporter. The trajectory of the

top-most ion as it was moved up and down the transporter joins with that of the bottom-most ion,

resulting in the continuous trajectory shown in Fig. A.5. While only a full trajectory capturing

entire Cl− permeation events can provide the true sequence of events during conduction, an analysis

of the interaction energies between the transporter and the ions, using the stitched trajectories,

can reveal information about the role of the pore residues during permeation.

Fig. A.8 shows the electrostatic and van der Waals interaction energies between the Cl− ions

and their environment as a function of their position along the pore. The energies were averaged

over all four permeating ions and no appreciable variation was observed between the curves for the

different ions and pores. The weakening of electrostatic interactions by the polarization of water

molecules in the pore was not taken into account. The interaction energy of the ions with their

surroundings can be decomposed into separate contributions from the various components of the

protein, giving insight into the roles of these components in tuning channel energetics.

In Fig. A.8a, the energy contribution of the pore-lining residues is compared with that from

the rest of the protein excluding the pore (referred to as the bulk protein). While the pore residues

dominate the interaction with Cl−, the bulk protein still contributes a significant fraction of the

attractive interaction with Cl−. The bulk protein’s purpose appears to be to provide a barrier-less

and energetically favorable background for negatively-charged particles present in the transporter’s

pore.

Fig. A.8b shows the interaction energies between Cl− and the pore’s backbone and non-polar

residues (including glycine) as well as with the pore’s four polar and charged residues, as a function

of position along the pore. We note that the backbone and non-polar residues account for virtually

the entirety of the total integrated interaction energy with the permeating Cl−. The backbone and

non-polar residues of the pore provide a flat basin of attraction for anions, while the polar and

charged residues modulate the Cl− energy’s position dependence. Indeed, aside from the central

binding pocket in which Cl− is coordinated by polar residues and the periplasmic exit in which

charged residues form a putative gate, the transporter pore is lined in its entirety with non-polar,

91

Page 103: c 2007 by Jordi Cohen. All rights reserved.

Figure A.8: Interaction energy of a permeating Cl− with the various constituents of (a) theClC transporter and (b) the pore region, calculated using a cutoff distance of 16 A for non-bonded interactions. The standard deviation for the energy is ± 5 kcal/mol for pore-lining residues,± 15 kcal/mol for water and ± 2 kcal/mol for bulk protein.

non-charged residues.

It is intriguing to note that the pore’s two conserved polar residues are present at the same

location and define the pore’s strongest Cl− binding site (Scen). The role of these two residues,

Ser 107 and Tyr 445, is open to speculation, but it is clear that they are not by themselves respon-

sible for the ClC transporter’s anion over cation selectivity since their interaction energy with Cl−

is not significant compared to the energy due to the strong electrical polarization of the protein.

We believe that the most compelling reason for the existence of these residues is to keep an anion

permanently in the pore in order to prevent such events as the formation of a proton-carrying

continuous water file stretching across the transporter or the passage of hydrophobic anions [138].

Indeed, Ser 107 and Tyr 445 provide an abrupt and significant narrowing of the pore simultaneously

with a very strong binding site for anions.

92

Page 104: c 2007 by Jordi Cohen. All rights reserved.

A.3.2 ClC Architecture

Non-Helical backbone and protein polarization

It has been echoed throughout the literature on ClC that the broken helix architecture stabilizes

Cl− through α helix electrostatic dipole interactions. While the α helix dipoles certainly contribute

to a favorable environment for Cl−, it is not certain that their role is fundamental in determining

the preferred Cl− binding sites [8]. The interaction energy between Cl− and helices F and N

(which both have their positive ends pointing toward Sext and have been credited for creating a

favorable binding location), excluding the interaction with the pore-lining helix-capping residues, is

shown in Fig. A.8a. This energy does not constitute a particularly prominent feature of the energy

profile controlling Cl− conduction and does not explain the transporter’s intricate broken-helix

architecture.

We believe that the the broken-helix architecture stems from nature’s desire to expose its

backbone’s amide groups to the permeant ions. While the conventional picture of a membrane

protein is that of a bundle of parallel α helices, other structurally known transporter proteins also

exhibit a “broken helix” conformation. Notable examples are the potassium channels, the only

other ion channels of known structure, and the aquaporin family of water channels [59, 154]. In

these other channels, it is the protein’s backbone carbonyl groups which are exposed to the pore,

as opposed to backbone amide groups for the case of ClC. It must be noted that the amide - Cl−

interaction is by itself electrostatically unfavorable, but that the interaction between Cl− and the

total dipole moment of an amino acid’s backbone, when its amide group points toward Cl−, is

favorable over a region of a few A. In all cases, the proteins seem to favor the presence of a non-

helical secondary structure in the pore region, making the backbone available for interaction with

solutes. The lack of a stable secondary structure would presumably require that the non-helical

segments be held in place at their ends by α helices solidly anchored inside the protein. A cursory

look at the location of the non-helical segments in ClC reveals that, with few exceptions, these

are all either at the surface of the protein and act as connecting loops, or are concentrated near

the pore. Further examination of the conserved genetic sequence shows that all three pore-lining

5-peptide segments end with a helix-breaking proline and are rich in small flexible hydrophobic

residues [44].

93

Page 105: c 2007 by Jordi Cohen. All rights reserved.

Given the small size of the pore region compared to the great size and complexity of the scaffold

that supports its non-helical structure (if one can consider the ClC protein as a scaffold), there

must be a strong incentive for channel proteins to expose their naked backbone. We believe that

there is. The discoveries of the first ion channel structures elicited surprise because of the absence

of significant charge in their pore regions, leading to the suggestion that strong charges would

be problematic because, although the channels would be very selective, the strong electrostatic

interactions could prevent the solute from unbinding from the pore. With this in mind, if we

consider that a channel’s idealized function in the absence of any bias is to provide a flat free energy

potential for solutes, mimicking the bulk solution, then the ideal channel (from a permeation point

of view) would have a continuous line of charge along which ions could glide. The important idea

here is that of a flat potential energy surface: the ion should not get particularly attached to any

location in the pore. In this respect, the backbone dipole moments provide a ladder of closely-

spaced, but weak, identical interactions. Such a configuration confers a much higher mobility to

Cl− than would a few isolated strong charges.

The role of the protein’s electrical polarization is not to be neglected, however. As we have

seen, most of the pore-ion attraction is caused by either backbone atoms or by non-polar residues.

The same is true of the interactions between the permeating Cl− ions and the channel as a whole

(except at the cytoplasmic end of the pore, where the interactions between Cl− and the positively

charged (+28 e) cytoplasmic side is important). At first, the role of the backbone atoms and

non-polar residues might seem strange. However, if the protein relied on polar residues to create

favorable basins of attraction for anions, there would always be the danger that if the side chains

of these residues were mobile enough, they would reorient themselves to be attractive to cations.

Furthermore, polar residues interact strongly with external electrical fields, and their collective

polarization might disrupt the field experienced by the permeating ions. Conversely, an external

field would disrupt their interactions with the permeating ions, affecting the selective behavior of

the channel. On the other hand, backbone atoms and non-polar residues do not react strongly

to external electrical fields or the presence of ions. In that sense, they provide a reliable “frozen”

interaction with the permeating ions, since their polarization is dependent more on the channel

architecture and on local interactions with neighboring residues, and less on external electrical

94

Page 106: c 2007 by Jordi Cohen. All rights reserved.

fields. These “weak” dipolar interactions become quite substantial when one accounts for the sheer

number of non-polar residues involved.

Interrupted water file geometry

The water geometry that we observed in the ClC pore did not conform to the single water file picture

expected of narrow channels. After equilibration, we found that the ClC pore instead encouraged

an interrupted double-file geometry of water molecules around the Cl− ions. In Fig. A.9, we have

superimposed the locations of the water oxygens in and around the ClC pore for all local sampling

simulations along the permeation pathway. One sees clear evidence that, on average, the water

molecules are localized into two distinct files. One of these files is always present and also carries

the permeating Cl−. The second file, only observed in the immediate proximity of Cl−, reflects

the observed fact that in most of the pore, permeating Cl− ions remained partially hydrated (one

water molecule above and below and up to three on the outer side).

Figure A.9: Overlay of the positions of the water molecules (blue) and Cl− (orange) for all thelocal sampling simulations along the permeation pathway showing the water double-file.

95

Page 107: c 2007 by Jordi Cohen. All rights reserved.

This partial hydration shell did not follow the permeating anions all the way, however. A

continuous water file would create many problems such as loss of selectivity against larger anions

(if the pore were wide throughout) and the conduction of protons if a bridge of connected water

molecules were to span the pore. Instead, the double-file is broken at the constriction at Scen, to

which a Cl− is quasi-permanently bound. An obvious advantage of the broken double-file geometry

is that the pore can offer an environment for permeating anions that is similar to that of the bulk

solution outside the pore, thereby reducing the free energy cost of removing the anion’s water shell

as it enters the pore from either end. The only exception is at the central binding site Scen where

the ion-pore attraction is very strong but there the ion can more easily part with its solvation shell.

Since the double water file is so advantageous, one may wonder why it is not observed in the

potassium channel. Theoretical studies on generalized channels suggest that while electrostatic

effects contribute to selectivity between ions of different charge, the discrimination between ions of

same charge and valence can be controlled by the pore’s radius and size fluctuations [92], energetic

contributions that arise in addition to the dehydration energies at the pore entrance. These con-

siderations are important for cation channels where strong selectivity is crucial, but in ClC there

is little need to discriminate between different inorganic anions. Therefore one may imagine that,

unlike for the potassium channel, the ClC pore geometry attempts to optimize permeation rather

than inter-anion selectivity.

Multi-ion conduction

The pore’s central binding site binds to Cl− very tightly through the side chain hydroxyl groups of

Tyr 445 and Ser 107 as well as through the backbone dipole moments of Phe 357 and Ile 356, such

that a pore configuration devoid of the presence of anions is not possible in ClC. To dislodge the

central Cl−, its binding energy needs to be considerably reduced. The presence of additional anions

in the pore is thus required so that the transporter–Cl− attraction and the Cl−-Cl− repulsion can

balance each other. Our results confirm what had been previously suggested for ClC [129] and has

already been established for the K+–channel [13]: conduction across the transporter requires more

than one ion in the pore. In addition, our PMF describes a permeation process for two simultaneous

Cl− ions in the ClC pore, establishing that this number may be sufficient for the Cl− conduction

96

Page 108: c 2007 by Jordi Cohen. All rights reserved.

in ClC.

With both ion channels/transporters of known structure (ClC and Kcsa) exhibiting multi-ion

permeation, we can ask whether multi-ion permeation is an ion channel necessity or merely a

coincidence. On one hand, if the aim is to ensure that permeation can occur along a path with

a flat energy profile, then increasing the dimensionality of the PMF (i.e., adding essential degrees

of freedom) makes it easier to get around barriers. On the other hand, it is conceivable that one

could engineer an ion channel with single ion occupancy by relegating the degrees of freedom of the

extra ions to internal components of the channel. Multiple occupancy would in that case not be

a necessity. There is, however, a problem with single occupancy pores: if the single ion is allowed

to exit the channel, then the possibility arises that a continuous file of water across the channel

connects both sides of the cell membrane, possibly allowing for proton conduction to the detriment

of the cell.

A.4 Conclusion

We have mapped the energetics involved in the conduction of a pair of Cl− ions across the ClC

transporter. The result suggests that ion dynamics in the pore follows a “king of the hill” mecha-

nism, in which two Cl− ions compete over an energetically favorable central location in the pore.

This strategy would ensure that an ion is always left inside the pore to block it. During a conduc-

tion event, the ion configurations in the pore appear to evolve through a succession of four stable

states. The positions of the ions in these states coincide with the locations of three binding sites

observed by x-ray crystallography. In addition, we observe stable intermediate states in which the

ions are located at two novel locations, S− and S+.

Inspection of the interaction energies between the ClC transporter and the permeating Cl− ions

reveals the importance of the protein’s overall polarization in making the pore attractive to anions.

Indeed, backbone and non-polar residues account for a large majority of the attraction between

Cl− and the transporter. Our calculations do not a priori support the common assumption that

polar residues and specific α helix dipole interactions play a pivotal role in assuring the pore’s

anion over cation selectivity. Instead, we suggest that the main role for the ClC’s broken helix

structure is to provide the pore region with a non-helical backbone structure, allowing anions to

97

Page 109: c 2007 by Jordi Cohen. All rights reserved.

interact favorably with the exposed backbone’s electric dipole moment. This type of interaction

has the advantage that it is rather evenly distributed along the pore and prevents Cl− from getting

stuck in deep energy wells. We also suggest a novel role for the polar residues Ser 107 and Tyr 445

that is consistent with the nature of their measured interaction with Cl−: they ensure that the

pore remains blocked by an anion at all times, preventing the formation of a continuous water file.

These residues likely also contribute to the size-selectivity of the pore and prevent the passage of

hydrophobic particles, and this will likely be resolved by further experimental and theoretical work

investigating selectivity [13, 50, 148] in ClCs.

Compared to the other ion channel family of known structure, the K+ channels, the ClC narrow

pore region is longer and wider. Whereas Kcsa has a very symmetrical pore and tight interactions

with the permeating K+ ions, the ClC pore is irregular and accommodates partially solvated anions

through most of its length. The consequence of these two diverging architectures is that ClC is

a lot less selective between ion species of same charge than is Kcsa, in harmony with a lack of

evolutionary pressure in that direction. On the other hand, both channels share a similar peculiar

feature: they both exhibit a broken helix architecture resulting in their pores being lined almost

exclusively with a non-helical backbone structure. This pore architecture has been previously

shown to be a crucial ingredient for the efficient conduction and selectivity of ions in K+ channels

and here we observe the same for ClC. Based on these discoveries, one should expect to observe

the exposed backbone architecture in the pores of the many ion channel and transporter structures

yet to come.

98

Page 110: c 2007 by Jordi Cohen. All rights reserved.

References

[1] Accardi, A., L. Kolmakova-Partensky, C. Williams, and C. Miller. 2004. Ionic currents mediatedby a prokaryotic homologue of CLC Cl- channels. Journal of General Physiology 123:109–119.

[2] Accardi, A. and C. Miller. 2004. Secondary active transport mediated by a prokaryotic homo-logue of ClC Cl− channels. Nature 427:803–807.

[3] Accardi, A. and M. Pusch. 2003. Conformal changes in the pore of CLC-0. Journal of GeneralPhysiology 122:277–293.

[4] Adams, M. W. M. 1990. The structure and function of iron-hydrogenase. Biochimica et Bio-physica Acta 1020:115–145.

[5] Allocatelli, C. T., F. Cutruzzola, A. Brancaccio, B. Vallone, and M. Brunori. 1994. EngineeringAscaris hemoglobin oxygen affinity in sperm whale myoglobin: role of tyrosine B10. FEBSLetters 352:63–66.

[6] Amara, P., P. Andreoletti, H. M. Jouve, and M. J. Field. 2001. Ligand diffusion in the catalasefrom Proteus mirabilis: A molecular dynamics study. Protein Science 10:1927–1935.

[7] Appleby, C. A. 1984. Leghemoglobin and rhizobium respiration. Ann. Rev. Plant Physiol.35:443–478.

[8] Aqvist, J., H. Luecke, F. A. Quiocho, and A. Warshel. 1991. Dipoles localized at helix termini ofprotein stabilize charges. Proceedings of the National Academy of Sciences, USA 88:2026–2030.

[9] Aqvist, J. and V. Luzhkov. 2000. Ion permeation mechanism of the potassium channel. Nature404:881–884.

[10] Austin, R. H., K. W. Beeson, L. Eisenstein, H. Frauenfelder, and I. C. Gunsalus. 1975. Dy-namics of ligand binding to myoglobin. Biochemistry 14:5355–5373.

[11] Banushkina, P. and M. Meuwly. 2005. Free-energy barriers in MbCO rebinding. Journal ofPhysical Chemistry B 109:16911–16917.

[12] Berneche, S. and B. Roux. 2000. Molecular dynamics of the KcsA K+ channel in a bilayermembrane. Biophysical Journal 78:2900–2917.

[13] Berneche, S. and B. Roux. 2001. Energetics of ion conduction through the K+ channel. Nature414:73–77.

[14] Beveridge, D. L. and F. M. DiCapua. 1989. Free energy via molecular simulation: Applicationsto chemical and biological systems. Annual Review of Biophysics and Biophysical Chemistry18:431–492.

99

Page 111: c 2007 by Jordi Cohen. All rights reserved.

[15] Boichenko, V. A., E. Greenbaum, and M. Seibert. 2004. Hydrogen production by photosyn-thetic microorganisms. In Photoconversion of Solar Energy: Molecular to Global Photosynthesis.M. D. Archer and J. Barber, editors. Imperial College Press, London. 397–452.

[16] Bolognesi, M., S. Onesti, G. Gatti, A. Coda, P. Ascenzi, and M. Brunori. 1989. Aplysialimacina myoglobin. crystallographic analysis at 1.6 A resolution. Journal of Molecular Biology205:529–544.

[17] Bossa, C., A. Amadei, I. Daidone, M. Anselmi, B. Vallone, M. Brunori, and A. D. Nola. 2005.Molecular dynamics simulation of sperm whale myoglobin: Effects of mutations and trapped COon the structure and dynamics of cavities. Biophysical Journal 89:465–474.

[18] Bossa, C., M. Anselmi, D. Roccatano, A. Amadei, B. Vallone, M. Brunori, and A. D. Nola.2004. Extended molecular dynamics simulation of the carbon monoxide migration in spermwhale myoglobin. Biophysical Journal 86:3855–3862.

[19] Bostick, D. L. and M. L. Berkowitz. 2004. Exterior site occupancy infers chloride-inducedproton gating in a prokaryotic homolog of the ClC chloride channel. Biophysical Journal 87:1686–1696.

[20] Bourgeois, D., B. Vallone, F. Schotte, A. Arcovito, A. E. Miele, G. Sciara, M. Wulff, P. An-finrud, and M. Brunori. 2003. Complex landscape of protein structural dynamics unveiledby nanosecond Laue crystallography. Proceedings of the National Academy of Sciences, USA100:8704–8709.

[21] Brunori, M. 2001. Nitric oxide moves myoglobin centre stage. Trends in Biochemical Sciences26:209–210.

[22] Brunori, M., D. Bourgeois, and B. Vallone. 2004. The structural dynamics of myoglobin.Journal of Structural Biology 147:223–234.

[23] Brunori, M. and Q. H. Gibson. 2001. Cavities and packing defects in the structural dynamicsof myoglobin. EMBO Reports 2:676–679.

[24] Brunori, M., B. Vallone, F. Cutruzzola, C. Travaglini-Allocatelli, J. Berendzen, K. Chu, R. M.Sweeti, and I. Schlichting. 2000. The role of cavities in protein dynamics: Crystal structureof a photolytic intermediate of a mutant myoglobin. Proceedings of the National Academy ofSciences, USA 97:2058–2063.

[25] Buhrke, T., O. Lenz, N. Krauss, and B. Friedrich. 2005. Oxygen tolerance of the H2-sensing[NiFe] hydrogenase from Ralstonia eutropha H16 is based on limited access of oxygen to theactive site. Journal of Biological Chemistry 280:23791–23796.

[26] Calhoun, D. B., J. M. Vanderkooi, G. V. Woodrow 3rd, and S. W. Englander. 1983. Penetrationof dioxygen into proteins studied by quenching of phosphorescence and fluorescence. Biochemistry22:1526–1532.

[27] Carlson, M. L., R. M. Regan, and Q. H. Gibson. 1996. Distal cavity fluctuations in myoglobin:Protein motion and ligand diffusion. Biochemistry 35:1125–1136.

[28] Case, D. A. and M. Karplus. 1979. Ligands binding to heme proteins. Journal of MolecularBiology 132:353–368.

100

Page 112: c 2007 by Jordi Cohen. All rights reserved.

[29] Chatfield, M. D., K. N. Walda, and D. Magde. 1990. Activation parameters for ligand es-cape from myoglobin proteins at room temperature. Journal of the American Chemical Society112:4680–4687.

[30] Chen, T.-Y. and C. Miller. 1996. Nonequilibrium gating and voltage-dependence of the ClC-0Cl− channel. Journal of General Physiology 108:237–250.

[31] Cohen, J., A. Arkhipov, R. Braun, and K. Schulten. 2006. Imaging the migration pathwaysfor O2, CO, NO, and Xe inside myoglobin. Biophysical Journal 91:1844–1857.

[32] Cohen, J., K. Kim, P. King, M. Seibert, and K. Schulten. 2005a. Finding gas diffusion pathwaysin proteins: Application to O2 and H2 transport in CpI [FeFe]-hydrogenase and the role of packingdefects. Structure 13:1321–1329.

[33] Cohen, J., K. Kim, M. Posewitz, M. L. Ghirardi, K. Schulten, M. Seibert, and P. King. 2005b.Molecular dynamics and experimental investigation of H2 and O2 diffusion in [Fe]-hydrogenase.Biochemical Society Transactions 33:80–82.

[34] Cohen, J. and K. Schulten. 2004. Mechanism of anionic conduction across ClC. BiophysicalJournal 86:836–845.

[35] Cohen, J. and K. Schulten. 2007. O2 migration pathways in monomeric globins are determinedby residue composition, not tertiary structure Submitted.

[36] Connolly, M. L. 1983. Solvent-accessible surfaces of proteins and nucleic acids. Science221:709–713.

[37] Cooper, G. and W. Boron. 1998. Effect of PCMBS on CO2 permeability of Xenopus Oocytesexpressing aquaporin 1 or its C189S mutant 275:C1481–C1486.

[38] Corry, B. and S. Chung. 2005. Influence of protein flexibility on the electrostatic energylandscape in gramicidin A. European Biophysics Journal 34:208–216.

[39] Corry, B., M. O’Mara, and S.-H. Chung. 2004. Conduction mechanisms of chloride ions inClC-type channels. Biophysical Journal 86:846–860.

[40] Czerminski, R. and R. Elber. 1991. Computational studies of ligand diffusion in globins: I.leghemoglobin. PROTEINS: Structure, Function, and Genetics 10:70–80.

[41] Dantsker, D., C. Roche, U. Samuni, G. Blouin, J. S. Olson, and J. M. Friedman. 2005. Theposition 68(E11) side chain in myoglobin regulates ligand capture, bond formation with hemeiron, and internal movement into the xenon cavities. Journal of Biological Chemistry 280:38740–38755.

[42] Doyle, D. A., J. M. Cabral, R. A. Pfuetzer, A. Kuo, J. M. Gulbis, S. L. Cohen, B. T. Chait, andR. MacKinnon. 1998. The structure of the potassium channel: molecular basis of K+ conductionand selectivity. Science 280:69–77.

[43] Duff, A., A. E. Cohen, P. J. Ellis, J. A. Kuchar, D. B. Langley, E. M. Shepard, D. M. Dooley,H. C. Freeman, and J. M. Guss. 2003. The crystal structure of Pichia pastoris lysyl oxidase.Biochemistry 42:15148–14157.

101

Page 113: c 2007 by Jordi Cohen. All rights reserved.

[44] Dutzler, R., E. B. Campbell, M. Cadene, B. T. Chait, and R. MacKinnon. 2002. X-raystructure of a ClC chloride channel at 3.0 A reveals the molecular basis of anion selectivity.Nature 415:287–294.

[45] Dutzler, R., E. B. Campbell, and R. MacKinnon. 2003. Gating the selectivity filter in ClCchloride channels. Science 300:108–112.

[46] Eargle, J. and Z. Luthey-Schulten. 2006. Visualizing the dual space of biological molecules30:219–226.

[47] Elber, R. and M. Karplus. 1990. Enhanced sampling in molecular dynamics: Use of thetime-dependent Hartree approximation for a simulation of carbon monoxide diffusion throughmyoglobin. Journal of the American Chemical Society 112:9161–9175.

[48] Estevez, R. and T. J. Jentsch. 2002. ClC chloride channels: correlating structure with function.Current Opinion in Structural Biology 12:531–539.

[49] Fahlke, C. 2001. Ion permeation and selectivity in ClC-type chloride channels. AmericanJournal of Physiology – Renal Physiology 280:F748–F757.

[50] Fahlke, C., H. Yu, C. L. Beck, T. R. Rhodes, and A. L. George, Jr. 1997. Pore-formingsegments in voltage-gated chloride channels. Nature 390:529–532.

[51] Fan, H.-J. and M. B. Hall. 2001. A capable bridging ligand for Fe-only hydrogenase: Densityfunctional calculations of a low-energy route for heterolytic cleavage and formation of dihydrogen.Journal of the American Chemical Society 123:3828–3829.

[52] Feher, V. A., E. P. Baldwin, and F. W. Dahlquist. 1996. Access of ligand to cavities withinthe core of a protein is rapid. Nature Structural Biology 3:516–521.

[53] Ferroni, S., C. Marchini, M. Nobile, and C. Rapisarda. 1997. Characterization of an inwardlyrectifying chloride conductance expressed by cultured rat cortical astrocytes. Glia 21:217–227.

[54] Flogel, U., M. W. Merx, A. Godecke, U. K. M. Decking, and J. Schrader. 2001. Myoglobin: Ascavenger of bioactive NO. Proceedings of the National Academy of Sciences, USA 98:735–740.

[55] Flynn, T., M. L. Ghirardi, and M. Seibert. 2002. Accumulation of O2-tolerant phenotypesin H2-producing strains of Chlamydomonas reinhartdtii by sequential applications of chemicalmutagenesis and selection. International Journal of Hydrogen Energy 27:1421–1430.

[56] Frauenfelder, H., B. H. McMahon, R. H. Austin, K. Chu, and J. T. Groves. 2001. The roleof structure, energy landscape, dynamics, and allostery in the enzymatic function of myoglobin.Proceedings of the National Academy of Sciences, USA 98:2370–2374.

[57] Frauenfelder, H., B. H. McMahon, and P. W. Fenimore. 2003. Myoglobin: The hydrogen atomof biology and a paradigm of complexity. Proceedings of the National Academy of Sciences, USA100:8615–8617.

[58] Friedrich, T., T. Breiderhoff, and T. J. Jentsch. 1999. Mutational analysis demonstrates thatClC-4 and ClC-5 directly mediate plasma membrane currents. Journal of Biological Chemistry274:896–902.

102

Page 114: c 2007 by Jordi Cohen. All rights reserved.

[59] Fu, D., A. Libson, L. J. W. Miercke, C. Weitzman, P. Nollert, J. Krucinski, and R. M. Stroud.2000. Structure of a glycerol conducting channel and the basis for its selectivity. Science 290:481–486.

[60] Garry, D. J., S. B. Kanatous, and P. P. A. Mammen. 2003. Emerging roles for myoglobin inthe heart. Trends in Cardiovascular Medecine 13:111–116.

[61] Garry, D. J., A. Meeson, Z. Yan, and R. S. Williams. 2000. Life without myoglobin. Cellularand Molecular Life Sciences 57:896–898.

[62] Gerber, R., V. Buch, and M. Ratner. 1982. Time-dependent self-consistent field approximationfor intramolecular energy transfer. I. formulation and application to dissociation of van der Waalsmolecules. Journal of Chemical Physics 94:3022–3030.

[63] Ghirardi, M. L., J. Cohen, P. King, K. Schulten, K. Kim, and M. Seibert. 2006. [FeFe]-hydrogenases and photobiological hydrogen production. In Solar hydrogen and Nanotechnology.L. Vayssieres, editor, volume 6340 of Proceedings of the Society of Photo-Optical InstrumentationEngineers, 253–258.

[64] Ghirardi, M. L., P. W. King, M. C. Posewitz, P. C. Maness, A. Fedorov, K. Kim, J. Cohen,K. Schulten, and M. Seibert. 2005. Approaches to developing biological H2-photoproducingorganisms and processes. Biochemical Society Transactions 33:70–72.

[65] Ghirardi, M. L., L. Zhang, J. W. Lee, T. Flynn, M. Seibert, E. Greenbaum, and A. Melis.2000. Microalgae: A green source of renewable H2. Trends in Biotechnology 18:506–511.

[66] Gibson, Q. H. and S. Ainsworth. 1957. Photosensitivity of haem compounds. Nature 180:1416–1417.

[67] Gibson, Q. H., R. Regan, R. Elber, J. S. Olson, and T. E. Carver. 1992. Distal pocketresidues affect picosecond ligand recombination in myoglobin. Journal of Biological Chemistry267:22022–22034.

[68] Giuffre, A., E. Forte, M. Brunori, and P. Sarti. 2005. Nitric oxide, cytochrome c oxidase andmyoglobin: Competition and reaction pathways. FEBS Letters 579:2528–2532.

[69] Gower, M., J. Cohen, J. Phillips, R. Kufrin, and K. Schulten. 2006. Managing biomolec-ular simulations in a grid environment with NAMD-G. In Proceedings of the 2006 TeraGridConference. In press.

[70] Grayson, P., E. Tajkhorshid, and K. Schulten. 2003. Mechanisms of selectivity in channelsand enzymes studied with interactive molecular dynamics. Biophysical Journal 85:36–48.

[71] Gullingsrud, J., R. Braun, and K. Schulten. 1999. Reconstructing potentials of mean forcethrough time series analysis of steered molecular dynamics simulations. Journal of ComputationalPhysics 151:190–211.

[72] Hargrove, M., J. Barry, E. Brucker, M. Berry, G. Phillips, Jr., J. Olson, R. Arredondo-Peter,J. Dean, R. Klucas, and G. Sarath. 1997. Characterization of recombinant soybean leghemoglobina and apolar distal histidine mutants. Journal of Molecular Biology 266:1032–1042.

103

Page 115: c 2007 by Jordi Cohen. All rights reserved.

[73] Harutyunyan, H. E., T. N. Safonova, I. P. Kuranova, A. N. Popov, A. V. Teplyakov, G. V.Obmolova, A. A. Rusakov, B. K. Vainshtein, G. G. Dodson, J. C. Wilson, and M. F. Perutz.1995. The structure of deoxy- and oxy-leghaemoglobin from lupin. Journal of Molecular Biology251:104–115.

[74] Hille, B. 1992. Ionic channels of excitable membranes. Sinauer Associates, Sunderland, MA,second edition.

[75] Huang, X. and S. G. Boxer. 1994. Discovery of new ligand binding pathways in myoglobin byrandom mutagenesis. Nature Structural Biology 1:226–229.

[76] Hub, J. S. and B. L. de Groot. 2006. Does CO2 permeate through Aquaporin-1? BiophysicalJournal 91:842–848.

[77] Hummer, G., F. Schotte, and P. A. Anfinrud. 2004. Unveiling functional protein motionswith picosecond x-ray crystallography and molecular dynamics simulations. Proceedings of theNational Academy of Sciences, USA 101:15330–15334.

[78] Humphrey, W., A. Dalke, and K. Schulten. 1996. VMD – Visual Molecular Dynamics. Journalof Molecular Graphics 14:33–38.

[79] Iyer, R., T. M. Iverson, A. Accardi, and C. Miller. 2002. A biological role for prokaryotic ClCchloride channels. Nature 419:715–718.

[80] Jentsch, T. J., T. Friedrich, A. Schriever, and H. Yamada. 1999. The ClC chloride channelfamily. Pflugers Archiv – European Journal of Physiology 437:783–795.

[81] Jiang, Y., A. Lee, J. Chen, M. Cadene, B. T. Chait, and R. MacKinnon. 2002. Crystalstructure and mechanism of a calcium-gated potassium channel. Nature 417:515–522.

[82] Jiang, Y., A. Lee, J. Chen, M. Cadene, B. T. Chait, and R. MacKinnon. 2003. X-ray structureof a voltage-dependent K+ channel. Nature 423:33–41.

[83] Johnson, B. J., J. Cohen, R. W. Welford, A. R. Pearson, K. Schulten, J. P. Klinman, andC. M. Wilmot. 2007. Lessons on substrate specificity: The crystal structure of Hansenula poly-morpha copper-containing amine oxidase in complex with xenon. Nature Chemical Biology Inpreparation.

[84] Kale, L., R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shi-nozaki, K. Varadarajan, and K. Schulten. 1999. NAMD2: Greater scalability for parallel molec-ular dynamics. Journal of Computational Physics 151:283–312.

[85] Kendrew, J. C., R. E. Dickerson, B. E. Strandberg, R. G. Hart, D. R. Davies, D. C. Phillips,and V. C. Shore. 1960. Structure of myoglobin: A three-dimensional Fourier synthesis at2 Angstrom resolution. Nature 185:422–427.

[86] Khalili-Araghi, F., E. Tajkhorshid, and K. Schulten. 2006. Dynamics of K+ ion conductionthrough Kv1.2. Biophysical Journal 91:L72–L74.

[87] King, P. W., D. Svedruzic, J. Cohen, K. Schulten, M. Seibert, and M. L. Ghirardi. 2006.Structural and functional investigations of biological catalysts for optimization of solar-driven,H2 production systems. In Solar Hydrogen and Nanotechnology. L. Vayssieres, editor, volume6340 of Proceedings of the Society of Photo-Optical Instrumentation Engineers, 259–267.

104

Page 116: c 2007 by Jordi Cohen. All rights reserved.

[88] Kocher, J.-P., M. Prevost, S. J. Wodak, and B. Lee. 1996. Properties of the protein matrixrevealed by the free energy of cavity formation. Structure 4:1517–1529.

[89] Kollman, P. 1993. Free energy calculations: Applications to chemical and biochemical phe-nomena. Chemical Reviews 93:2395–2417.

[90] Kottalam, J. and D. A. Case. 1988. Dynamics of ligand escape from the heme pocket ofmyoglobin. Journal of the American Chemical Society 110:7690–7697.

[91] Kumar, S., D. Bouzida, R. H. Swendsen, P. A. Kollman, and J. M. Rosenberg. 1992. Theweighted histogram analysis method for free-energy calculations on biomolecules. I. The method.Journal of Computational Chemistry 13:1011–1021.

[92] Laio, A. and V. Torre. 1999. Physical Origin of Selectivity in Ionic Channels of BiologicalMembranes. Biophysical Journal 76:129–148.

[93] Lakowicz, J. and G. Weber. 1973. Quenching of fluorescence by oxygen. probe for structuralfluctuations in macromolecules. Biological Cybernetics 12:4161–4170.

[94] Lemon, B. J. and J. W. Peters. 1999. Binding of exogenously added carbon monoxide atthe active site of the iron-only hydrogenase (CpI) from Clostridium pasteurianum. Biochemistry38:12969–12973.

[95] Lin, C.-W. and T.-Y. Chen. 2003. Probing the pore of ClC-0 by substituted cysteine accessi-bility method using methane thiosulfonate reagents. Journal of General Physiology 122:147–159.

[96] Liong, E. C. 1999. Structural and functional analysis of proximal pocket mutants of spermwhale myoglobin. Ph.D. thesis, Rice University, Houston, TX.

[97] Liong, E. C., Y. Dou, E. E. Scott, J. S. Olson, and G. N. Phillips. 2001. Waterproofing theheme pocket. Journal of Biological Chemistry 276:9093–9100.

[98] Ludewig, U., T. J. Jentsch, and M. Pusch. 1997. Inward rectification in ClC-0 chloride channelscaused by mutations in several protein regions. Journal of General Physiology 110:165–171.

[99] MacKerell, Jr., A., D. Bashford, M. Bellott, R. L. Dunbrack, Jr., J. Evanseck, M. J. Field,S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos,S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, I. W. E. Reiher, B. Roux, M. Schlenkrich,J. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus.1998. All-atom empirical potential for molecular modeling and dynamics studies of proteins.Journal of Physical Chemistry B 102:3586–3616.

[100] MacKerell, Jr., A. D., D. Bashford, M. Bellott, J. R. L. Dunbrack, J. Evanseck, M. J.Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, L. Kuchnir, K. Kuczera, F. T. K. Lau,C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, B. Roux, M. Schlenkrich, J. Smith,R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus. 1992.Self-consistent parameterization of biomolecules for molecular modeling and condensed phasesimulations. FASEB Journal 6:A143–A143.

[101] Maduke, M., C. Miller, and J. A. Mindell. 2000. A decade of ClC chloride channels: structure,mechanism, and many unsettled questions. Annual Review of Biophysics and BiomolecularStructure 29:411–438.

105

Page 117: c 2007 by Jordi Cohen. All rights reserved.

[102] Maurus, R., C. Overall, R. Bogumil, Y. Luo, A. Mauk, M. Smith, and G. Brayer. 1997.A myoglobin variant with a polar substitution in a conserved hydrophobic cluster in the hemebinding pocket. Biochimica et Biophysica Acta 1341:1–13.

[103] Mertens, R. and A. Liese. 2004. Biotechnological applications of hydrogenases. CurrentOpinion in Biotechnology 15:343–348.

[104] Merx, M. W., A. Godecke, U. Flogel, and J. Schrader. 2005. Oxygen supply and nitric oxidescavenging by myoglobin contribute to exercise endurance and cardiac function. FASEB Journal19:1015–1017.

[105] Miller, C. 1982. Open-state substructure of single chloride channels from Torpedo electroplax.Philosophical Transactions of the Royal Society of London B. (Biological Sciences) 299:401–411.

[106] Miller, C. 2003. Reading eukaryotic function through prokaryotic spectacles. Journal ofGeneral Physiology 122:129–131.

[107] Miloshevsky, G. V. and P. C. Jordan. 2004. Anion pathway and potential energy profilesalong curvilinear bacterial ClC Cl− pores: Electrostatic effects of charged residues. BiophysicalJournal 86:825–835.

[108] Mindell, J. A., M. Maduke, C. Miller, and N. Grigorieff. 2001. Projection structure of aClC-type chloride channel at 6.5 a resolution. Nature 409:219–223.

[109] Montet, Y., P. Amara, A. Volbeda, X. Vernede, E. C. Hatchikian, M. J. Field, M. Frey, andJ. C. Fontecilla-Camps. 1997. Gas access to the active site of Ni-Fe hydrogenase probed by x-raycrystallography and molecular dynamics. Nature Structural Biology 4:523–526.

[110] Morais-Cabral, J. H., Y. Zhou, and R. MacKinnon. 2001. Energetic optimization of ionconduction rate by the K+ selectivity filter. Nature 414:37–41.

[111] Nadler, W. and D. L. Stein. 1996. Reaction-diffusion description of biological transportprocesses in general dimension. Journal of Chemical Physics 104:1918–1936.

[112] Nakhoul, N., B. Davis, M. Romero, and W. Boron. 1998. Effect of expressing the waterchannel aquaporin-1 on the CO2 permeability of Xenopus oocytes 274:C543–548.

[113] Nicolet, Y., C. Cavazza, and J. C. Fontecilla-Camps. 2002. Fe-only hydrogenases: structure,function and evolution. Journal of Inorganic Biochemistry 91:1–8.

[114] Nicolet, Y., C. Piras, P. Legrand, C. E. Hatchikian, and J. C. Fontecilla-Camps. 1999. Desul-fovibrio desulfuricans iron hydrogenase: the structure shows unusual coordination to an activesite Fe binuclear center. Structure 7:13–23.

[115] Nienhaus, K., P. Deng, J. M. Kriegl, and G. U. Nienhaus. 2003. Structural dynamics ofmyoglobin: Effect of internal cavities on ligand migration and binding. Biochemistry 42:9647–9658.

[116] Noskov, S., S. Berneche, and B. Roux. 2004. Control of ion selectivity in potassium channelsby electrostatic and dynamic properties of carbonyl ligands. Nature 431:830–834.

106

Page 118: c 2007 by Jordi Cohen. All rights reserved.

[117] Nutt, D. R. and M. Meuwly. 2004. CO migration in native and mutant myoglobin: Atomisticsimulations for the understanding of protein function. Proceedings of the National Academy ofSciences, USA 101:5998–6002.

[118] Olson, J. S. and G. N. Phillips, Jr. 1997. Myoglobin discriminates between O2, NO, and COby electrostatic interactions with the bound ligand. Journal of Biological Inorganic Chemistry2:544–552.

[119] Olson, W. K. 1996. Simulating DNA at low resolution. Current Opinion in Structural Biology6:242–256.

[120] Ostermann, A., R. Waschipky, F. G. Parak, and G. U. Nienhaus. 2000. Ligand binding andconformational motions in myoglobin. Nature 404:205–208.

[121] Park, H. J., C. Yang, N. Treff, J. D. Satterlee, and C. Kang. 2002. Crystal structures ofunligated and CN-ligated Glycera dibranchiata monomer ferric hemoglobin components III andIV. PROTEINS: Structure, Function, and Genetics 49:49–60.

[122] Perutz, M. F. 1979. Regulation of oxygen affinity of hemoglobin: Influence of structure ofthe globin on the heme iron. Annual Review of Biochemistry 48:327–386.

[123] Perutz, M. F. and F. S. Mathews. 1966. An X-ray study of azide methaemoglobin. Journalof Molecular Biology 21:199–202.

[124] Pesce, A., S. Dewilde, L. Kiger, M. Milani, P. Ascenzi, M. C. Marden, M. L. V. Hauwaert,J. Vanfleteren, L. Moens, and M. Bolognesi. 2001. Very high resolution structure of a trematodehemoglobin displaying a TyrB10-TyrE7 heme distal residue pair and high oxygen affinity. Journalof Molecular Biology 309:1153–1164.

[125] Peters, J. W. 1999. Structure and mechanism of iron-only hydrogenases. Current Opinion inStructural Biology 9:670–676.

[126] Peters, J. W., W. N. Lanzilotta, B. J. Lemon, and L. C. Seefeldt. 1998. X-ray crystalstructure of the Fe-only hydrogenase (CpI) from Clostridium pasteurianum to 1.8 angstromresolution. Science 282:1853–1858.

[127] Phillips, J. C., R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D.Skeel, L. Kale, and K. Schulten. 2005. Scalable molecular dynamics with NAMD. Journal ofComputational Chemistry 26:1781–1802.

[128] Prasad, G. V. T., L. A. Coury, F. Finn, and M. L. Zeidel. 1998. Reconstituted aquaporin 1water channels transport CO2 across membranes. Journal of Biological Chemistry 273:33123–33126.

[129] Pusch, M., U. Ludewig, A. Rehfeldt, and T. J. Jentsch. 1995. Gating of the voltage-dependentchloride channel ClC-0 by the permeant anion. Nature 373:527–531.

[130] Radding, W. and G. N. Phillips, Jr. 2004. Kinetic proofreading by the cavity system ofmyoglobin: protection from poisoning. BioEssays 26:422–433.

[131] Richards, F. M. 1977. Areas, volumes, packing, and protein structure. Annual Review ofBiophysics and Bioengineering 6:151–176.

107

Page 119: c 2007 by Jordi Cohen. All rights reserved.

[132] Rizzi, M., J. B. Wittenberg, A. Coda, M. Fasano, P. Ascenzi, and M. Bolognesi. 1994.Structure of the sulfide-reactive hemoglobin from the clam Lucina pectinata. crystallographicanalysis at 1.5 A resolution. Journal of Molecular Biology 244:86–99.

[133] Rohlfs, R. J., J. S. Olson, and Q. H. Gibson. 1988. A comparison of the geminate recombina-tion kinetics of several monomeric heme proteins. Journal of Biological Chemistry 263:1803–1813.

[134] Roitberg, A. and R. Elber. 1991. Modeling side chains in peptides and proteins: Application ofthe locally enhanced sampling technique and the simulated annealing methods to find minimumenergy conformations. Journal of Chemical Physics 95:9277–9287.

[135] Roux, B. 1995. The calculation of the potential of mean force using computer simulations.Computer Physics Communications 91:275–282.

[136] Roux, B. 1999. Statistical Mechanical Equilibrium Theory of Selective Ion Channels. Biophys.J. 77:139–153.

[137] Royer, Jr., W. E., H. Zhu, T. A. Gorr, J. F. Flores, and J. E. Knapp. 2005. Allosterichemoglobin assembly: diversity and similarity. Journal of Biological Chemistry 39:27477–27480.

[138] Rychkov, G. Y., M. Pusch, M. L. Roberts, T. J. Jentsch, and A. H. Bretag. 1998. Permeationand block of the skeletal muscle chloride channel, ClC-1, by foreign anions. Journal of GeneralPhysiology 111:653–665.

[139] Salomonsson, L., A. Lee, R. B. Gennis, and P. Brzezinski. 2004. A single-amino-acid lidrenders a gas-tight compartment within a membrane-bound transporter. Proceedings of theNational Academy of Sciences, USA 101:11617–11621.

[140] Scharlin, P., R. Battino, E. Silla, I. Tunon, and J. L. Pascual-Ahuir. 1998. Solubility of gasesin water: Correlation between solubility and the number of water molecules in the first solvationshell. Pure and Applied Chemistry 70:1895–1904.

[141] Schlenkrich, M., J. Brickmann, A. D. MacKerell Jr., and M. Karplus. 1996. Empirical po-tential energy function for phospholipids: Criteria for parameter optimization and applications.In Biological Membranes: A Molecular Perspective from Computation and Experiment. K. M.Merz and B. Roux, editors. Birkhauser, Boston. 31–81.

[142] Schlichting, I. and K. Chu. 2000. Trapping intermediates in the crystal: ligand binding tomyoglobin. Current Opinion in Structural Biology 10:744–752.

[143] Schmidt, M., K. Nienhaus, R. Pahl, A. Krasselt, S. Anderson, F. Parak, G. U. Nienhaus, andV. Srajer. 2005. Ligand migration pathway and protein dynamics in myoglobin: A time-resolvedcrystallographic study on L29W MbCO. Proceedings of the National Academy of Sciences, USA102:11704–11709.

[144] Schotte, F., M. Lim, T. A. Jackson, A. V. Smirnov, J. Soman, J. S. Olson, G. N. Phillips, Jr.,M. Wulff, and P. A. Anfinrud. 2003. Watching a protein as it functions with 150-ps time-resolvedX-ray crystallography. Science 300:1944–1947.

[145] Scott, E. E. and Q. H. Gibson. 1997. Ligand migration in sperm whale myoglobin. Biochem-istry 36:11909–11917.

108

Page 120: c 2007 by Jordi Cohen. All rights reserved.

[146] Scott, E. E., Q. H. Gibson, and J. S. Olson. 2001. Mapping the pathways for O2 entry intoand exit from myoglobin. Journal of Biological Chemistry 276:5177–5188.

[147] Shrivastava, I. H. and M. S. P. Sansom. 2000. Simulations of ion permeation through apotassium channel: Molecular dynamics of KcsA in a phospholipid bilayer. Biophysical Journal78:557–570.

[148] Shrivastava, I. H., D. P. Tieleman, P. C. Biggin, and M. S. P. Sansom. 2002. K+ versus Na+

in a K channel selectivity filter: a simulation study. Biophysical Journal 83:633–645.

[149] Springer, B. A., S. G. Sligar, J. S. Olson, and G. N. Phillips, Jr. 1994. Mechanisms of ligandrecognition in myoglobin. Chemical Reviews 94:699–714.

[150] Steigemann, W. and E. Weber. 1979. Structure of erythrocruorin in different ligand statesrefined at 1.4 A resolution. Journal of Molecular Biology 127:309–338.

[151] Stone, J., J. Gullingsrud, P. Grayson, and K. Schulten. 2001. A system for interactivemolecular dynamics simulation. In 2001 ACM Symposium on Interactive 3D Graphics. J. F.Hughes and C. H. Sequin, editors, 191–194, New York. ACM SIGGRAPH.

[152] Straub, J. E. and M. Karplus. 1991. Energy equipartitioning in the classical time-dependentHartree approximation. Journal of Chemical Physics 94:6737–6739.

[153] Tajkhorshid, E., J. Cohen, A. Aksimentiev, M. Sotomayor, and K. Schulten. 2005. Towardsunderstanding membrane channels. In Bacterial ion channels and their eukaryotic homologues.B. Martinac and A. Kubalski, editors. ASM Press, Washington, DC. 153–190.

[154] Tajkhorshid, E., P. Nollert, M. Ø. Jensen, L. J. W. Miercke, J. O’Connell, R. M. Stroud, andK. Schulten. 2002. Control of the selectivity of the aquaporin water channel family by globalorientational tuning. Science 296:525–530.

[155] Teixeira, V. H., A. M. Baptista, and C. M. Soares. 2006. Pathways of H2 toward the activesite of [NiFe]-hydrogenase. Biophysical Journal 91:2035–2045.

[156] Tilton, R. F., I. D. Kuntz, and G. A. Petsko. 1984. Cavities in proteins: Structure of ametmyoglobin-xenon complex solved to 1.9 A. Biochemistry 23:2849–2857.

[157] Torres, R. A., T. Lovell, L. Noodleman, and D. A. Case. 2003. Density functional andreduction potential calculations of Fe4S4 clusters. Journal of the American Chemical Society125:1923–1936.

[158] Traverso, S., L. Elia, and M. Pusch. 2003. Gating competence of constitutively open ClC-0mutants revealed by the interaction with a small organic inhibitor. Journal of General Physiology122:295–306.

[159] Ulitsky, A. and R. Elber. 1993. The thermal equilibrium aspects of the time dependentHartree and the locally enhanced sampling approximations: Formal properties, a correction, andcomputational examples for rare gas clusters. Journal of Physical Chemistry 98:3380–3388.

[160] Ulitsky, A. and R. Elber. 1994. Application of the locally enhanced sampling (LES) and amean field with a binary collision correction (cLES) to the simulation of Ar diffusion and NOrecombination in myoglobin. Journal of Physical Chemistry 98:1034–1043.

109

Page 121: c 2007 by Jordi Cohen. All rights reserved.

[161] Valverde, M. A. 1999. ClC channels: leaving the dark ages on the verge of a new millenium.Current Opinion in Cell Biology 11:509–516.

[162] Vignais, P. M., B. Billoud, and J. Meyer. 2001. Classification and phylogeny of hydrogenases.FEMS Microbiol. Rev. 25:455–501.

[163] Vojtechovsky, J., K. Chu, J. Berendzen, R. Sweet, and I. Schlichting. 1999. Crystal structuresof myoglobin-ligand complexes at near-atomic resolution. Biophysical Journal 77:2153–2164.

[164] Srajer, V., Z. Ren, T. Y. Teng, M. Schmidt, T. Ursby, D. Bourgeois, C. Pradervand,W. Schildkamp, M. Wulff, and K. Moffat. 2001. Protein conformational relaxation and lig-and migration in myoglobin: a nanosecond to millisecond molecular movie from time-resolvedLaue X-ray diffraction. Biochemistry 40:13802–13815.

[165] Srajer, V., T. Y. Teng, T. Ursby, C. Pradervand, Z. Ren, S. Adachi, W. Schildkamp, D. Bour-geois, M. Wulff, and K. Moffat. 1996. Photolysis of the carbon monoxide complex of myoglobin:nanosecond time-resolved crystallography. Science 274:1726–1729.

[166] Wan, L., M. B. Twitchett, L. D. Eltis, A. G. Mauk, and M. Smith. 1998. In vitro evolutionof horse heart myoglobin to increase peroxidase activity. Proceedings of the National Academyof Sciences, USA 95:12825–12831.

[167] Wang, Y., J. Cohen, W. Boron, K. Schulten, and E. Tajkhorshid. 2006. Exploring gaspermeability of cellular membranes and membrane channels with molecular dynamics. Journalof Structural Biology In press.

[168] Weber, R. E. and S. N. Vinogradov. 2001. Non-vertebrate hemoglobins: functions and molec-ular adaptations. Physiological Reviews 81:569–627.

[169] Wittenberg, J. B. and B. A. Wittenberg. 2003. Myoglobin function reassessed. Journal ofExperimental Biology 206:2011–2020.

[170] Yang, J., A. P. Kloek, D. E. Goldberg, and F. S. Mathews. 1995. The structure of Ascarishemoglobin domain I at 2.2 A resolution: molecular features of oxygen avidity. Proceedings ofthe National Academy of Sciences, USA 92:4224–4228.

[171] Zhang, L. and J. Hermans. 1996. Hydrophilicity of cavities in proteins. PROTEINS: Struc-ture, Function, and Genetics 24:433–438.

110

Page 122: c 2007 by Jordi Cohen. All rights reserved.

Author’s Biography

Jordi Cohen was born in Montreal, Canada, on August 11, 1977. He completed a B.Sc. in Physics

from McGill University, and a M.Sc. in Physics from Simon Fraser University. As a graduate

student in the Physics Department at the University of Illinois at Urbana-Champaign, he studied

theoretical biophysics under the direction of Klaus Schulten.

Publications

1. Johnson, B. J., J. Cohen, R. W. Welford, A. R. Pearson, K. Schulten, J. P. Klinman, and

C. M. Wilmot. 2007. Lessons on substrate specificity: The crystal structure of Hansenula

polymorpha copper-containing amine oxidase in complex with xenon. In preparation.

2. Cohen, J. and K. J. Schulten. 2007. O2 migration pathways in monomeric globins are

determined by residue composition, not tertiary structure. Submitted.

3. Gower, M., J. Cohen, J. Phillips, R. Kufrin, and K. Schulten. 2006. Managing biomolecular

simulations in a grid environment with NAMD-G. In Proceedings of the 2006 TeraGrid

Conference. In press.

4. Wang, Y., J. Cohen, W. Boron, K. Schulten, and E. Tajkhorshid. 2006. Exploring gas per-

meability of cellular membranes and membrane channels with molecular dynamics. Journal

of Structural Biology In press.

5. Cohen, J., A. Arkhipov, R. Braun, and K. Schulten. 2006. Imaging the migration pathways

for O2, CO, NO, and Xe inside myoglobin. Biophysical Journal 91:1844–1857.

6. Ghirardi, M. L., J. Cohen, P. King, K. Schulten, K. Kim, and M. Seibert. 2006. [FeFe]-

hydrogenases and photobiological hydrogen production. In Solar hydrogen and Nanotech-

111

Page 123: c 2007 by Jordi Cohen. All rights reserved.

nology. L. Vayssieres, editor, volume 6340 of Proceedings of the Society of Photo-Optical

Instrumentation Engineers, 253–258.

7. King, P. W., D. Svedruzic, J. Cohen, K. Schulten, M. Seibert, and M. L. Ghirardi. 2006.

Structural and functional investigations of biological catalysts for optimization of solar-driven,

H2 production systems. In Solar Hydrogen and Nanotechnology. L. Vayssieres, editor, volume

6340 of Proceedings of the Society of Photo-Optical Instrumentation Engineers, 259–267.

8. Cohen, J., K. Kim, P. King, M. Seibert, and K. Schulten. 2005a. Finding gas diffusion

pathways in proteins: Application to O2 and H2 transport in CpI [FeFe]-hydrogenase and the

role of packing defects. Structure 13:1321–1329.

9. Cohen, J., K. Kim, M. Posewitz, M. L. Ghirardi, K. Schulten, M. Seibert, and P. King.

2005b. Molecular dynamics and experimental investigation of H2 and O2 diffusion in [Fe]-

hydrogenase. Biochemical Society Transactions 33:80–82.

10. Ghirardi, M. L., P. W. King, M. C. Posewitz, P. C. Maness, A. Fedorov, K. Kim, J. Cohen,

K. Schulten, and M. Seibert. 2005. Approaches to developing biological H2-photoproducing

organisms and processes. Biochemical Society Transactions 33:70–72.

11. Tajkhorshid, E., J. Cohen, A. Aksimentiev, M. Sotomayor, and K. Schulten. 2005. To-

wards understanding membrane channels. In Bacterial ion channels and their eukaryotic

homologues. B. Martinac and A. Kubalski, editors. ASM Press, Washington, DC. 153–190.

12. Cohen, J. and K. Schulten. 2004. Mechanism of anionic conduction across ClC. Biophysical

Journal 86:836–845.

112