Download - Generating Triangulated Macromolecular Surfaces by ...zhanglab.ccmb.med.umich.edu/EDTSurf/EDTSurf.pdfGenerating Triangulated Macromolecular Surfaces by Euclidean Distance Transform

Generating Triangulated Macromolecular Surfaces byEuclidean Distance TransformDong Xu1,2, Yang Zhang1,2*

1 Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America, 2 Center for Bioinformatics and

Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas, United States of America

Abstract

Macromolecular surfaces are fundamental representations of their three-dimensional geometric shape. Accurate calculationof protein surfaces is of critical importance in the protein structural and functional studies including ligand-protein dockingand virtual screening. In contrast to analytical or parametric representation of macromolecular surfaces, triangulated meshsurfaces have been proved to be easy to describe, visualize and manipulate by computer programs. Here, we develop a newalgorithm of EDTSurf for generating three major macromolecular surfaces of van der Waals surface, solvent-accessiblesurface and molecular surface, using the technique of fast Euclidean Distance Transform (EDT). The triangulated surfaces areconstructed directly from volumetric solids by a Vertex-Connected Marching Cube algorithm that forms triangles from gridpoints. Compared to the analytical result, the relative error of the surface calculations by EDTSurf is ,2–4% depending onthe grid resolution, which is 1.5–4 times lower than the methods in the literature; and yet, the algorithm is faster and costsless computer memory than the comparative methods. The improvements in both accuracy and speed of themacromolecular surface determination should make EDTSurf a useful tool for the detailed study of protein docking andstructure predictions. Both source code and the executable program of EDTSurf are freely available at http://zhang.bioinformatics.ku.edu/EDTSurf.

Citation: Xu D, Zhang Y (2009) Generating Triangulated Macromolecular Surfaces by Euclidean Distance Transform. PLoS ONE 4(12): e8140. doi:10.1371/journal.pone.0008140

Editor: Markus J. Buehler, Massachusetts Institute of Technology, United States of America

Received August 19, 2009; Accepted November 9, 2009; Published December 2, 2009

Copyright: � 2009 Xu, Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The project is supported by the Alfred P. Sloan Foundation, NSF Career Award 0746198 and the National Institute of General Medical Sciences GrantGM083107 and GM084222. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

There are mainly three types of macromolecular surfaces—van

der Waals surface (VWS), solvent-accessible surface (SAS) and molecular

surface (MS)—in molecular biology studies [1]. Because the shape

and surface decide how the macromolecules interact with others,

accurate determination of the macromolecular surfaces is essential

for elucidating their biological roles in physiological processes.

Consequently, calculations of the macromolecular surfaces from

given 3D structures have found extensive uses in modern

molecular biology studies, including protein folding and structure

prediction [2], protein-ligand docking [3,4], DNA-protein inter-

actions [5], and new drug screening [6].

A variety of methods have been proposed to compute the three

macromolecular surfaces. These methods can be generally

categorized into two classes: analytical computation and explicit

representation. For analytical computing, Connolly first presented

an algorithm for calculating the smooth solvent-excluded surface

of a molecule [7] (Which he called ‘‘alternative solvent-accessible

surface’’), where the spheres, tori and arcs were defined using

analytical expressions according to the atomic coordinates, van der

Waals radii and the probe radius [8]. The author also developed

the Connolly’s Molecular Surface Package (MSP) which was a

suite of programs for computing and manipulating molecular

surfaces and volumes [9]. MSMS (Michel Sanner’s MolecularSurface) was later developed to compute both solvent accessibleand molecular surface relying on the reduced surface [10]. There

are also a number of other methods which were developed to

analytically calculate the value of the exact surface area and

volume [11–17]. Among them, Liang et al. presented a method for

computing molecular area and volume based on the alpha shape

theory [14] which was earlier proposed by Edelsbrunner and

Muche [11]. An alpha-shape of a set of weighted points is a subset

of the regular Delaunay triangulation of these weighted points.

The reduced surface [10] is equivalent to an alpha-shape with an

alpha value equal to zero when the radii of atoms are further

inflated by the solvent radius.

Although analytical methods have the advantage of getting

accurate values of surface area and volume, they are not

convenient to be employed in other applications when explicit

surfaces of local atoms are required for further processing. For

example, local surfaces of proteins and ligands are often used for

shape comparison in the docking problem. The explicit surface

generation method is a grid-based approximation which uses

space-filling model where each atom is modeled as a volumetric

item [18,19]. Molecules are placed onto the grids, whose width

could be altered to achieve different resolution. LSMS (Level Setmethod for Molecular Surface generation) used a level-set methodand achieved a very fast speed [20]. Zhang et al. constructed a

smooth volumetric electron density map from atomic data by

using weighted Gaussian isotropic kernel function and a two-level

clustering technique [21]. The authors selected a smooth implicit

solvation surface approximation to the Lee-Richards molecular

surface.

PLoS ONE | www.plosone.org 1 December 2009 | Volume 4 | Issue 12 | e8140

After the space-filling procedure, an important step is surface

representation and construction. In general, macromolecular

surface could be represented by parametric equations or triangular

patches. Parametric representations of protein molecular surfaces

are a compact way to describe a surface, and are useful for the

evaluation of surface properties such as the normal vector,

principal curvatures, and principal curvature directions [22].

Simplified triangular representations of molecular surfaces are

useful for easy manipulation, efficient rendering and for the display

of large-scale surface features. It is composed of a set of vertices

and a group of triangular patches connecting these vertices.

Connolly created the triangles by subdividing the curved faces of

an analytical molecular surface [23]. Molecular areas and volumes

may be calculated from it and packing defects in proteins may be

identified. MSMS computed the triangulated molecular surfaces

by sewing pre-triangulated template spheres and concave faces

together.

A commonly used method to construct triangulated isosurface

from 3D grid is the Marching Cube algorithm [24], which was also

used in LSMS. Marching Cubes (MC) creates triangle models of

constant density surfaces from 3D image data. The LSMS

algorithm only considers the inside/outside attributes of each

vertex and uses Marching Cubes to connect the middle point of

each edge. Xiang et al. proposed an improved version of the

Marching Cube method for molecular surface triangulation [25].

This new algorithm involves fewer and simpler basic building

blocks and avoids the artificial gaps of the original one. Obviously,

quantities like surface area and volume by grid-based algorithms

may not be as accurate as that calculated by the analytical

methods. However, these algorithms can generate triangular

surfaces efficiently without singularities.

In this paper, we develop a new method of EDTSurf for the

calculation of the three major macromolecular surfaces. We

demonstrate that all the macromolecular surfaces can be universally

connected with the theory of Euclidean Distance Transform (EDT).

Triangulated surfaces are then constructed by a variation of the

Marching Cube algorithm, which forms triangles efficiently by

connecting grid points directly rather than intersections of edges.

Materials and Methods

Macromolecular SurfacesThe definitions of the three surfaces are illustrated in Figure 1 in

a 2D plane. A molecule is represented as a set of overlapping

spheres, each having a van der Waals radius. The van der Waals

surface (VWS) is the topological boundary of these spheres (see

Figure 1A). The outer surface of a macromolecule binds to ligands

and other macromolecules. The van der Waals surface for small

molecules may describe the overall shape very well. However,

since most of the van der Waals surface is buried in the interior for

large molecules, it is necessary to define the other two kinds of

outer surfaces as follows.

The solvent-accessible surface (SAS) (see the red part of Figure 1B) is

defined as the area traced out by the center of a probe sphere as it

is rolled over the van der Waals surface. The probe sphere is a

solvent water molecule which is represented by the black circle in

Figure 1B.

The molecular surface (MS) is a continuous sheet consisting of two

parts: the contact surface and the reentrant surface [26]. The contact

surface (see the green part of Figure 1C) is part of the van der

Waals surface that is accessible to a probe sphere. The reentrant

surface (see the pink part of Figure 1C) is the inward-facing surface

of the probe when it touches two or more atoms. The molecular

surface is also called the solvent-excluded surface (SES), which is the

boundary of the union of all possible probes which do not overlap

with the molecule [10]. Molecular surface is also called the

Connolly surface. It was revealed that the solvent-accessible

surface was displaced outward from the molecular surface by a

distance equal to the probe radius [8].

Euclidean Distance TransformDistance Transform (DT) is the transformation that converts a

digital binary image to another gray scale image in which the value

of each pixel in the object is the minimum distance from the

background to that pixel by a predefined distance function. Three

distance functions between two points x1,y1,z1ð Þ and x2,y2,z2ð Þare often used in practice, which are City-block distance,

Chessboard distance and Euclidean distance, i.e.

dcity{block ~jx1{x2jzjy1{y2jzjz1{z2jdchessboard~max jx1{x2j,jy1{y2j,jz1{z2jð Þ

dEuclidean ~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix1{x2ð Þ2z y1{y2ð Þ2z z1{z2ð Þ2

q8>><>>:

ð1Þ

However, in our study, we found only the Euclidean distance

has a direct relation to the three macromolecular surfaces (see Eqs.

7–9 below). Therefore, our discussions will be focused on this

distance.

The signed Euclidean Distance Transform (sEDT), which represents

the displacement of a pixel from the nearest background point, is

defined in Ref. [27]. The gray image after EDT, which is called

Euclidean Distance Map (EDM), is useful in skeleton extraction,

shortest path planning and shape description. EDT can be

computed efficiently by methods such as mathematical morphol-

ogy [28], chain propagation [29] and boundary propagation [30].

Here, we extend the definition of Euclidean distance to the outside

of an object.

Suppose the set of boundary points (or surface) of an object O isCO. D x,yð Þ is the Euclidean distance between point x and y.N x,Oð Þ is the nearest boundary point on the surface to point x.To each point x, the signed Euclidean distance of x is defined asfollows:

Figure 1. Illustration of three macromolecular surfaces in a 2D plane. (A) van der Waals surface (blue); (B) solvent-accessible surface (red); (C)molecular surface which includes contact surface (green) and reentrant surface (pink).doi:10.1371/journal.pone.0008140.g001

Macromolecular Surfaces by EDT


Ds xð Þ~min D x,yð Þ, Vy [ COf g if x [ O{min D x,yð Þ, Vy [ COf g if x =[ O

�ð2Þ

Isosurface can be extracted conveniently after the EDT. The

isosurface with isovalue a is defined as:

I O,að Þ~Ds xð Þ~a,x [ Of g if a§0Ds xð Þ~a,x =[ Of g if av0

�ð3Þ

Obviously, if x belongs to the surface, the nearest boundarypoint to x is itself and the signed Euclidean distance of x is zero.Then, we have CO~I O,0ð Þ.

Macromolecular SolidsMacromolecular solids are solid bodies which are enveloped by

the macromolecular surfaces. The van der Waals solid, solvent-

accessible solid and solvent-excluded solid covered by the van der Waals

surface, the solvent-accessible surface and the molecular surface

are represented by OVW , OSA and OMS respectively. Suppose rp isthe probe radius which is often set to 1.4 Å and there are N atomsexcept hydrogen atoms in a molecular structure. Coordinate of the

ith atom is pi~ xi,yi,zið Þ and its van der Waals radius is ri. the vander Waals solid is the union of overlapping spheres and can be

written as the following formula.

OVW ~[Ni~1

sphere pi,rið Þ ð4Þ

where sphere pi,rið Þ means the solid sphere with radius ri andcenter pi. The solvent-accessible surface can also be perceived asthe topological boundary of a set of spheres by increasing the van

der Waals radius of each atom with the probe radius. Hence, the

solvent-accessible solid can be expressed in a similar formula to

that of the van der Waals solid:

OSA~[Ni~1

sphere pi,rizrp� �

ð5Þ

For points on the solvent-accessible surface, we define a subset

ISA to be the intersection set in which each point can be reachedby more than one solid sphere when constructing the solvent-

accessible solid. That is to say, two conditions should be satisfied

when defining ISA: first, ISA should belong to the intersection part

of two or more solid spheres; second, none of the points in ISA areburied inside the solvent-accessible surface.

Suppose the minimum van der Waals radius of all the N atoms

is rmin. Now, we define another solid called the minimalmacromolecular solid Omin. The boundary surface of the minimalmacromolecular solid is called the minimal macromolecular surface

COmin.

Omin~[Ni~1

sphere pi,ri{rminð Þ ð6Þ

If the van der Waals radius of an atom i equals rmin, the solidsphere degenerates to a point located at pi. The boundary of thepoint pi is set to be itself. If all the van der Waals radii are thesame, the minimal macromolecular solid becomes a point set and

the minimal macromolecular surface is the same as the minimal

macromolecular solid. Otherwise, if the van der Waals radius of an

atom i is larger than rmin, its boundary is the surface of the spherewith radius ri{rmin.

The above three equations stand for a kind of space-filling

methods which are the preliminary steps for grid-based macro-

molecular surface generation.

Macromolecular Surfaces from EDTAfter applying EDT to macromolecular solids as described

above, the macromolecular surfaces can be treated as isosurface

extracted from EDMs.

COVW ~I Omin,{rminð Þ ð7Þ

COSA~I OVW ,{rp� �

ð8Þ

COMS~I OSA,rp� �

ð9Þ

Equation (7) is elucidated in Figure 2A in the 2D plane. The

gray-scale value of each pixel is the minimum distance from that

pixel to the nearest boundary surface. The minimum macromo-

lecular surface is colored in yellow and the minimum macromo-

lecular solid is the area covered by the minimal macromolecular

surface. We then apply EDT to the minimal macromolecular solid

and extract the isosurface whose isovalue is {rmin. The equationmeans the extracted isosurface is the van der Waals surface, as

shown by the blue part of Figure 2A.

Figure 2. Illustration of three macromolecular surfaces from EDT in a 2D plane. (A) EDT with the minimal macromolecular surface (yellow)as the boundary. The isosurface with isovalue equaling the negative of the minimal van der Waals radius is the van der Waals surface (blue). (B) EDTwith van der Waals surface (blue) as the boundary. The isosurface with isovalue equaling the negative of the probe radius is the solvent-accessiblesurface (red). (C) EDT with solvent-accessible surface (red) as the boundary. The isosurface with isovalue equaling the probe radius is the molecularsurface which contains the surface (green) and reentrant surface (pink).doi:10.1371/journal.pone.0008140.g002



Suppose the van der Waals surface (the blue part of Figure 2B) is

given. We apply EDT to the van der Waals solid which is wrapped

by the van der Waals surface. The isosurface with isovalue equal to

{rp is the solvent-accessible surface, which is the red part inFigure 2B. This is the meaning of Equation (8).

Similarly, in Equation (9), we apply EDT to the solvent-

accessible solid which is enveloped by the solvent-accessible

surface (the red part of Figure 2C). The isosurface with isovalue

equaling rp is the molecular surface, which is divided into thegreen and pink parts in Figure 2C. The pink part is the reentrant

surface while the green part is the contact surface which overlaps

with the van der Waals surface in Figure 2A. This is the first

method to separate contact surface and reentrant surface.

There is another way to distinguish the contact surface from the

reentrant surface without the pre-calculation of van der Waals

surface, i.e.

x [contact surface if x [ MS and N x,OSAð Þ =[ ISAreentrant surface if x [ MS and N x,OSAð Þ [ ISA

�ð10Þ

We can record the nearest boundary point N x,OSAð Þ for eachpoint x on MS after the EDT. If N x,OSAð Þ belongs to theintersection set ISA, x belongs to the reentrant surface; otherwise xbelongs to the contact surface. In Figure 2C, the four intersection

points between arcs constitute the ISA, which are marked withsmall white blocks. The pink part belongs to the reentrant surface

since the nearest boundary points are these four points.

Algorithm FlowIn Figure 3, we present the flowchart of the algorithm for

computing the three macromolecular surfaces from 3D volumetric

solids, which mainly contains five steps. The atoms of a molecular

structure need to be scaled first and accommodated in a bounding

box whose length is assumed to be L. Only the grid points withinteger coordinates, which are called voxels, are processed within

the bounding box. The resolution is therefore L3. The scaledvolumetric representations of OVW , OSA and OMS are calledVVW , VSAand VMS separately.

Step I. Translate and scale the coordinates of all the atoms in

the molecular structure in order to fit them in the bounding box.

After scaling, the van der Waals radius ri and the probe radius rpbecome sri and srp.

Step II. To construct the van der Waals surface, treat each

atom of type i as a solid volumetric sphere whose radius is sri. Usethe space-filling method to get the scaled volumetric solid VVW .To get the solvent-accessible surface and the molecular surface, set

the radius to be the summation of sri and srp. Use the space-fillingmethod to get the scaled volumetric solid VSA.

Step III. To get the van der Waals surface and the solvent-

accessible surface, go to step IV directly. To get the molecular

surface, do EDT to the volumetric model by using Equation (9).

Get rid of the voxels whose Euclidean distances are less than srp.The remaining solid is VMS .

Step IV. Use Vertex-Connected Marching Cube method to

construct the triangulated surfaces from the volumetric models.Step V. Scale and translate the generated surface back to the

original size and position.

Since VVW and VSA are simply unions of solid spheres, we directlyuse the space-filling method rather than Equations (7) and (8) to get

the van der Waals surface and the solvent-accessible surface. In step

II, we can speed up the space-filling process. The volumetric sphere

of each type of atom can be pre-computed only once. The center of

this sphere is then translated to the transformed coordinate of this

atom. The voxels in the sphere are then filled. We can also record the

atomic information in conjunction with the voxels.

In step III, the propagation stops when the Euclidean distance is

larger than srp. That is to say, we don’t need to do EDT to the wholesolid VSA. This will also accelerate the computation of molecularsurface. After step III, the remaining solid is the scaled solvent-excluded

solid VMS , whose boundary CVMS is called isobounday since it is thediscretized representation of the isosurface after the EDT to VSA.

Triangulated Surface ConstructionAfter we get the three kinds of macromolecular solids,

triangulation is needed to construct the ultimate macromolecular

surfaces. We developed the dual of the traditional Marching Cube

algorithm here, which is called Vertex-Connected Marching Cubes

(VCMC). The difference between them is that the vertices of the

triangles in the traditional Marching Cubes are surface-edge

intersections while the vertices in the VCMC are the existing grid

points. When the resolution of grid is very high, there is no additional

cost for real-time construction and rendering of the triangular

surface by VCMC. Furthermore, the triangulation result generated

by VCMC contains fewer vertices and faces than that by MC.

For a unit cube which has eight vertices, there are totally

28~256 cases because each vertex may be inside or outside of thesolid. We group all the cases into 23 patterns according to the

symmetry of the cube in the three-dimensional space (see Figure 4).

Figure 3. Algorithm flow chart for EDTSurf macromolecularsurface construction. (A) van der Waals surface; (B) molecular surface;(C) solvent-accessible surface.doi:10.1371/journal.pone.0008140.g003



The vertex belonging to the solid is represented by a black sphere

while the vertex outside the solid is represented by a white sphere.

The normal vectors of the triangles are also given which point to

the outside of the solid.

The number of cases for each pattern is also marked in Figure 4.

Patterns (a) to (e) contain less than three vertices and cannot form

any triangle. In patterns (g), (h), (n) and (o), the black spheres are

separated by white spheres, so they also can’t form any triangles.

Pattern (w) means that all the eight vertices are inside the object.

Patterns (l), (f) and (p), (j) have the similar results. All the triangles

are pointing to the outside, which are shown in patterns (i), (k), (m),

(q) and (u). Some black spheres in patterns (r), (s), (t) and (v) which

are not involved will be considered in the neighboring cube.

All the triangles formed in Figure 4 can be further grouped into

three classes based on their shapes. The edge lengths of the three

triangles (two right-angled triangles and one equilateral triangle)

are 1,1,ffiffiffi2p� �

, 1,ffiffiffi2p

,ffiffiffi3p� �

andffiffiffi2p

,ffiffiffi2p

,ffiffiffi2p� �

separately. Hence,

the ultimate mesh surface after VCMC doesn’t contain any

narrow triangles, obtuse triangles and slivers. This provides a

satisfactory property in numerical calculations of physical forces,

such as electrostatic interactions [31].

Results and Discussion

Triangulated Surfaces and Computation of Area andVolume

Molecular structures in the RCSB Protein Data Bank (PDB) are

mainly obtained by the techniques of X-ray crystallography and

nuclear magnetic resonance spectroscopy. Figure 5 shows an

example of the Erythrocruorin protein (PDB ID: 1eca) with

Figure 4. All patterns of triangulation for Vertex-Connected Marching Cubes.doi:10.1371/journal.pone.0008140.g004



surfaces generated by EDTSurf using a length of the bounding box

of 256. Different colors of the surface patches represent different

type of atoms. The reentrant surface is colored in teal blue in

Figure 5C. The contact surface is the same as that in van der

Waals surface in Figure 5A. In our algorithm, the computation of

surface area and volume is straightforward, i.e. the surface area is

the summation of all the triangular patches while volume is the

product of the number of grid points in the macromolecular solid

and the unit volume for each point.

Since the area of surface can be analytically calculated by

MSMS [10], we try to evaluate the accuracy of EDTSurf based on

the result from MSMS. For the numerical volume calculation, we

set the vertex density for MSMS up to 100.0 vertex/Å2. The

purpose of such a high vertex density is to make the numerical

volume calculation of MSMS as close as possible to the exact

value. When the MSMS program fails to generate output results

with such a high vertex density, however, we set the vertex density

to a lower value. As a control, we also run LSMS [20], another

grid-based program, on the same set of protein molecules. For the

convenience of comparison, we set the radii of the atoms, the

probe radius (1.4 Å) and the size of bounding boxes (1283 and2563) for volumetric manipulation at the same values in EDTSurfand LSMS. We also run MSMS with its default vertex density

1.0 vertex/Å2. In Figures 6A and 6B, we present the area and

volume calculation results by EDTSurf, LSMS and MSMS for 31

test proteins that have been used in Ref. [14]. In Figure 6A, we

also indicate the analytical surface area from MSMS. Compared

to LSMS, the area and volume calculated by EDTSurf are better

in 23 and 26 out of 31 cases. A similar tendency is seen at

resolution 2563. The average relative errors of these algorithms are

listed in Table 1.

If the vertex density is very high, the difference between

numerical and analytical surface calculations by MSMS is small,

i.e. 0.45%. If we take the values of the analytical area and the

high-accurate numerical volume by MSMS as the golden-

standard, the relative errors for surface and volume are 3.96%

and 1.18% for EDTSurf at the resolution 1283, both of which arelower than that of LSMS (i.e. 6.10% and 3.57%) at the same grid

resolution. The errors will become smaller when the grid

resolution increases, which of course takes a longer CPU time.

In Table 1, we also calculate the surface and volume at 2563,where the errors are reduced to 1.99% and 0.48%, respectively,

which are still much lower than that by LSMS at the same grid

resolution (7.87% and 0.84%, respectively). These data demon-

strate that the representation by Euclidean Distance Transform

can be more accurate than the level-set-based approach [20] at the

same resolution.

CPU Time and Memory UseExcept for the accuracy of surface, an important requirement of

the surface calculation programs is the increase of speed and

decrease of memory cost. The time spent on computing the

molecular surface in EDTSurf is composed of three parts:

generation of scaled solvent-accessible solid VSA, EDT to the

Figure 5. Three macromolecular surfaces of protein 1eca. (A) van der Waals surface (B) solvent-accessible surface (C) molecular surface.doi:10.1371/journal.pone.0008140.g005

Figure 6. Comparison of accuracies of molecular surfaces at two different resolutions. Left panel is the numerical surface areas andanalytical surface area of 31 proteins; right panel is the corresponding numerical volumes enveloped by the molecular surfaces.doi:10.1371/journal.pone.0008140.g006



isoboundary of VSA, and triangulated surface generation byVertex-Connected Marching Cube algorithm. Atom type of each

voxel is not recorded in this experiment.

For testing the computer cost, we apply our algorithm to 15

large protein molecules taken from the PDB, which have 27,375 to

97,872 atoms. This set of proteins has also been used by Can et al.

to compare their algorithm LSMS with three other programs,

including the MSMS, which is integrated in UCSF Chimera [32].

Their result shows that LSMS algorithm is the fastest one among

them. Here, we compare our algorithm with LSMS where the

same parameters are exploited, i.e. the probe radius (1.4 Å), the

van der Waals radii, size of bounding box (2563). We also reportthe results by running MSMS whose vertex density is relatively low

(1.0 vertex/Å2 here). Both the three algorithms run on a Microsoft

Windows XP machine with Intel Pentium 4 Processor at 1.9GHZ

and 768 MB of RAM.

As shown in Table 2, EDTSurf only costs on average about 12

seconds for calculating the surfaces which is about 1.6 times faster

than LSMS. Also, the memory of EDTSurf is low. In this

experiment, the average RAM request for these molecules is 152

MB, which is about 2.1 times less than that by LSMS. It is also

comparable to the computational geometry method MSMS.

The speed of EDTSurf and LSMS are both dependent on the

size of bounding box while that of MSMS relies on the number of

atoms and vertex density. If the triangulation result contains

singularities in each round, MSMS will change the radii of some

atoms and perform several rounds of computations. This is partly

the reason for the expensive time cost of MSMS for most of

proteins in the Table 2. Moreover, if the vertex density is higher,

the time cost for surface triangulation will be higher in MSMS.

Since the computational complexity of MSMS is O(Nlog(N)) (N isthe total number of atoms in the molecule), MSMS is not efficient

for large supramolecular complexes. Both EDTSurf and LSMS

have the computational complexity O(L3) (L is the length of thebounding box). They will be slower than MSMS when handling

molecules with a small number of atoms.

Cavity DetectionProtein cavities can be empty or water-containing. They can be

within domains, between domains, or between subunits. The

buried water molecules in the internal cavities contribute to

protein stability. This is because the water-filled cavities are

important for modulating residues surrounding the cavities.

Cavities can help us to locate the proton transport pathway in

the membrane protein [33].

After the triangulated surface generation, one part of the

molecular surface is in contact with outside space while the other

part is buried inside the molecular solid. Cavities are those formed

by the inner molecular surface. Since molecular surface is

propagated from solvent-accessible surface by our method, it can

be seen that the number of cavities in the molecular surface

obtained is equal to that in the solvent-accessible surface.

In Table 3, we compare our algorithm on 6 protein structures

with LSMS and MSMS for the cavity detection. The bounding

boxes for EDTSurf and LSMS are set to be the same (2563). Theprobe radius is 1.2 Å. The vertex density for MSMS is

100.0 vertex/Å2. It is shown in Table 3 that EDTSurf detects

fewer cavities than LSMS and MSMS. This is because some small

cavities enveloped by the molecular surface may be filled when

constructing the solvent-accessible solid. Using the space-filling

method, the solvent-accessible solid occupies more spatial space

than the van der Waals solid because each van der Waals radius is

enlarged by the probe radius. These small cavities can’t be formed

after Euclidean Distance Transform and molecular surface

Table 1. Average relative errors of area and volume ofmolecular surface at two different resolutions calculated byEDTSurf, LSMS [20] and MSMS [10].

Method ResolutionAverage relativeerror of area

Average relativeerror of volume

EDTSurf 1283 3.96% 1.18%

2563 1.99% 0.48%

LSMS 1283 6.10% 3.57%

2563 7.87% 0.84%

MSMS 1.0 4.56% 0.72%

100.0 0.45% -------

doi:10.1371/journal.pone.0008140.t001

Table 2. CPU time and memory use for molecular surfacegeneration by EDTSurf, LSMS [20] and MSMS [10].

Protein #AtomsSurface generation time (s)/maximummemory use (MB)

EDTSurf LSMS MSMS

1a8r 27375 4.25/71.33 16.28/288.36 4.10/31.22

1h2i 32802 6.60/78.94 17.20/299.91 12.83/94.99

1fka 34977 13.85/208.72 19.21/328.77 11.94/116.42

1gtp 35060 5.88/65.10 17.17/298.07 30.80/110.26

1gav 43335 13.21/244.46 18.07/309.66 18.21/132.68

1g3i 45528 11.31/121.66 19.10/319.21 46.95/145.66

1pma 45892 17.85/159.73 20.68/333.26 19.80/146.19

1gt7 46180 7.00/103.88 17.10/296.31 14.53/106.38

1fjg 51995 12.86/192.19 19.34/321.88 44.91/183.89

1aon 58884 14.36/140.13 20.71/335.77 63.59/191.70

1j0b 60948 11.84/196.99 17.96/308.77 72.83/167.54

1ffk 64281 16.62/200.09 21.00/356.07 70.01/270.90

1otz 68620 17.63/218.82 21.40/331.28 52.21/165.49

1ir2 87087 10.12/105.05 18.41/309.58 53.93/159.28

1hto 97872 15.32/172.59 20.95/333.08 35.15/250.49

avg. 53389 11.91/151.98 18.97/318.00 36.79/151.54


Table 3. Number of cavities and the cavity volume of themolecular surface by EDTSurf, LSMS [20] and MSMS [10].

Protein #RES No. of cavities/cavity volume (in Å3)

EDTSurf LSMS MSMS

2act 218 14/533.00 16/514.66 18/573.858

2cha 248 7/347.44 19/529.91 20/587.81

2lyz 129 5/220.76 6/190.47 11/274.44

2ptn 230 7/411.29 14/608.94 20/680.45

5mbn 154 4/168.41 8/298.52 13/293.94

8tln 318 14/441.75 29/642.06 42/942.91




generation. Clearly, cavities which can’t be detected by our

method are too small to accommodate any solvent water

molecules. They are filtered out naturally by our method and

not necessary to be considered further. The operations which first

increase the radius then do EDT are equivalent to the dilation and

erosion in mathematical morphology. A dilation followed by an

erosion is called close operation which helps to close up breaks

between van der Waals surface here.

In Table 4, we calculate the volumes of seven cavities in the

disordered domain of trypsinogen (PDB ID: 2ptn). The residues

around each cavity are also tabulated, which are represented by

the abbreviation of the residue names and residue sequence

numbers. Cavity with small volume definitely has fewer residues

than that with large volume.

In the left panel of Figure 7, the outer molecular surface is set to

be transparent so that we can see the inner cavities clearly. We get

the atoms which contribute to the outer surface and the inner

surface in the right panel of Figure 7. Each atom is represented by a

van der Waals sphere colored in terms of its atom type. This helps us

to find the atoms which are related to the stability of cavities clearly.

Isosurface ExtractionQuantitative measures such as the area and the volume of

molecular surface will be more precise if the grid resolution is

higher. In Figures 8B, 8C and 8D, we compute the molecular

surface of a large complex at three different resolutions (323, 643,

and 1283) to see the visual effects. We also compare our generated

surfaces with the molecular surface (see Figure 8A) by the MSMS

method using its default options. At the three resolutions, the

shapes are well conserved. From the figure, we can see that the

surfaces are very similar to that in Ref. [10] even in a low

resolution (compare Figures 8A and 8B). The two domains of the

complex form the bound docking. The two complementary parts

of the two domains which contact each other have very similar

surface shape.

The molecular surface obtained with our approximation

method approaches to the accurate analytical surface when the

resolution is increased. From Figure 8, we can see that calculations

with a resolution equal to 1283 are accurate enough for most of theapplications. It takes very little CPU time and memory space for

computing the molecular surface at this resolution.

Because EDTSurf and LSMS are based on the volumetric

manipulation and the surface is only an approximation to the

actual analytical surface, it is interesting to examine whether and

how the calculations of the gird-based methods approach to the

real value of the surface and volume. Here, we use the three atoms

in Figure 1 as an example to check the result of EDTSurf, LSMS,

and MSMS. Numerical values of area and volume calculated by

the three algorithms at five different resolutions are presented in

Figures 9A and 9B. The analytical surface area and volume are

89.093 and 57.505 separately. The length of bounding box in

EDTSurf and LSMS varies from 16 to 256 while the vertex

density of MSMS changes from 0.25 to 64. For EDTSurf, we also

compare the surfaces generated by MC and VCMC. At the lowest

resolution, overall shapes of the molecular surfaces by all the four

algorithms (MSMS, LSMS, EDTSurf-MC and EDTSurf-VCMC)

are not kept, so they all have great difference to the analytical

value. Surface by MSMS converges to the real surface more

quickly than the other three methods. This is because MSMS gets

Table 4. Residues around the seven cavities of protein 2ptncalculated by EDTSurf.

Cavity Volume (in Å3) Contributing residues

1 186.00 G23, N25, T26, V27, P28, Y29, Q30, V31, L46, L67, G69,E70, D71, R117, V118, W141, L155

2 50.25 Q30, H40, G43, S139, G140,W141, G193, D194, G197

3 24.07 Y29, L137, S139, P198

4 13.19 A160, C136, I138, A183, V199

5 39.43 S45, V53, G196, G197, P198, L209, I212

6 85.17 L99, N100, N101, D102, N179, M180, S214, W215, V227,Y228, T229

7 13.19 I47, W51, V52, L105


Figure 7. Cavity detection of protein 2ptn. Left panel is the outer molecular surface and cavities of the protein; right panel shows the atomsaround the cavities.doi:10.1371/journal.pone.0008140.g007



the sampling vertices directly from spherical surface. Surface area

and volume by MC are always larger than that by VCMC because

MC connects the middle point of each edge while VCMC

connects grid points which are only inside the object. However,

their difference will be smaller when the grid resolution increases.

Both the surface area and volume by EDTSurf and LSMS will

converge to some values which are larger than the analytical value,

although EDTSurf is closer to the analytical results. This is

partially because the surface-area-to-volume ratio for an object

with triangulated surface is larger than a smooth object.

There is another type of macromolecular structures which are

reconstructed from electron microscopy (EM) images. On the left

panel of Figure 10, we use a density map of a complex of the double-

ring chaperonin GroEL and its lid-like cochaperonin GroES (EMDB

ID: 1180). Its isosurface is constructed by the VCMC method after

setting a threshold to the density map. The corresponding PDB file is

2c7c, which has 21 chains. The molecular surface is constructed as

shown on the right panel of Figure 10, which is then segmented

according to the chain information. We can see that the complex is

distributed in three layers and has 7-fold symmetry. The overall

shapes of the two structures are very coincidental.

MC vs. VCMCIn Figure 11, we show the molecular surfaces of the three atoms

generated by MC and VCMC at the resolution 1283. In general,their overall shapes are quite similar. However, the numbers of

vertices and faces in Figure 11B are 5958 and 11912, which are

only half of that (11130 and 22256) in Figure 11A. Hence, VCMC

has the advantage of saving storage space when describing mesh

surfaces with similar shapes.

We also compare the efficiency of MC and VCMC algorithms

on the isosurface extraction for 18 EM density maps. The average

CPU time by VCMC (0.54s) is about 1.4 times faster than the MC

algorithm (0.75s).

As discussed in [31], a correctly triangulated mesh should be in

the 2D manifold and satisfy the Euler Characteristics. Each edge is

shared by exactly two triangles. The number of faces should have a

general relationship with the numbers of vertices, components,

genuses and voids. We checked the mesh generated by VCMC

and found that only when the scaling factor is very small (surface is

very rough), there would be some singularities, such as one vertex

or one edge shared by two components, duplicated faces with

opposite normals. MC also has such problems in many cases.

Hence, we also support an additional component to check the

Euler Characteristics and correct the irregular part. For example,

mesh surfaces in Figures 8B, 8C and 8D are all obey the Euler

Characteristics.

When we add one run of Laplacian smoothing to the generated

surface, each mesh vertex is moved to the centroid of the

surrounding mesh vertices which are topologically connected. This

post-processing step will make the mesh surface closer to the

smooth continuous surface in some degree.

ConclusionsWe have developed a new method, EDTSurf, for calculating

three major macromolecular surfaces based on the method of

Euclidean Distance Transform. Triangulated surfaces are then

constructed by using Vertex-Connected Marching Cube method.

The two parts of the molecular surface which are the contact

surface and the reentrant surface can be efficiently distinguished.

Figure 8. Molecular surface of a complex with the PDB ID 1brs. Chain A is in blue and chain D is in red. (A) MSMS [10] triangulation result,9910 vertices and 19816 faces, vertex densities 1.0 vertex/Å2; (B) 2874 vertices and 5740 faces, resolution 323; (C) 12880 vertices and 25752 faces,resolution 643; (D) 55873 vertices and 111738 faces, resolution 1283.doi:10.1371/journal.pone.0008140.g008



The resolution of the grid system can be controlled flexibly. The

area and the volume of molecular surface are calculated

accurately. Surfaces of the interior cavities and their surrounding

atoms could be detected. Moreover, compared with the methods

Figure 9. Mesurements of molecular surfaces at different resolutions. (A) area; (B) volume.doi:10.1371/journal.pone.0008140.g009

Figure 10. Molecular surface of a chaperonin. Left panel is theisosurface of electron microscopy volume data (EMDB ID: 1180); rightpanel is the molecular surface of PDB data (PDB ID: 2c7c).doi:10.1371/journal.pone.0008140.g010

Figure 11. Molecular surfaces of three atoms at the resolution1283. (A) generated by EDTSurf-MC; (B) generated by EDTSurf-VCMC.doi:10.1371/journal.pone.0008140.g011



in literature, the EDTSurf algorithm is faster in speed and

consumes less memory, especially when the number of atoms in

the molecule is large.

As an application in protein structure prediction, we have

applied EDTSurf to generate the solvent-accessible surface area of

each residue for all proteins in the PDB library. This provides an

essential frame for matching the predicted solvent accessibility

with that of template structures in our fold-recognition algorithm

[34]. As alternative extensions of the proposed Euclidean Distance

Transform technique, distance functions other than the Euclidean

distance can also be considered for the surface generation. This

will result in new surface applications. For example, a solvation

surface using Gaussian isotropic kernel function can approximate

molecular surface [21]. It is hoped that the generated surfaces

have use on different aspects of molecular biology studies.

Although the illustrations have been given for proteins

molecules throughout the paper, the surface of any other

macromolecules such as RNA or DNA can also be calculated

using EDTSurf. The source code and executable package of

EDTSurf are freely available at http://zhang.bioinformatics.ku.

edu/EDTSurf. All the images have been generated by the MVP

(Macromolecular Visualization and Processing) software which isalso freely downloadable at http://zhang.bioinformatics.ku.edu/

MVP.

Author Contributions

Conceived and designed the experiments: DX YZ. Performed the

experiments: DX. Analyzed the data: DX. Wrote the paper: DX YZ.

References

1. Lee B, Richards FM (1971) The interpretation of protein structures: estimationof static accessibility. J Mol Biol 55: 379–400.

2. Zhang Y (2008) Progress and challenges in protein structure prediction. Curr

Opin Struct Biol 18: 342–348.3. Schneidman-Duhovny D, Inbar Y, Polak V, Shatsky M, Halperin I, et al. (2003)

Taking geometry to its edge: fast unbound rigid (and hinge-bent) docking.Proteins 52: 107–112.

4. Kanamori E, Murakami Y, Tsuchiya Y, Standley DM, Nakamura H, et al.(2007) Docking of protein molecular surfaces with evolutionary trace analysis.

Proteins 69: 832–838.

5. Locasale JW, Napoli AA, Chen S, Berman HM, Lawson CL (2009) Signatures ofprotein-DNA recognition in free DNA binding sites. J Mol Biol 386: 1054–1065.

6. Zavodszky MI, Sanschagrin PC, Korde RS, Kuhn LA (2002) Distilling theessential features of a protein surface for improving protein-ligand docking,

scoring, and virtual screening. J Comput Aided Mol Des 16: 883–902.

7. Connolly ML (1983) Solvent-accessible surfaces of proteins and nucleic acids.Science 221: 709–713.

8. Connolly ML (1983) Analytical molecular surface calculation. J Appl Crystallogr16: 548–558.

9. Connolly ML (1993) The molecular surface package. J Mol Graph 11: 139–141.

10. Sanner MF, Olson AJ, Spehner JC (1996) Reduced surface: an efficient way tocompute molecular surfaces. Biopolymers 38: 305–320.

11. Edelsbrunner H, Muche EP (1994) Three-dimensional alpha shapes. ACMTrans Graph 13: 43–72.

12. Fraczkiewicz R, Braun W (1998) Exact and efficient analytical calculation of theaccessible surface area and their gradient for macromolecules. J Comput Chem

19: 319–333.

13. Hayryan S, Hu CK, Skrivanek J, Hayryane E, Pokorny I (2005) A new analyticalmethod for computing solvent-accessible surface area of macromolecules and its

gradients. J Comput Chem 26: 334–343.14. Liang J, Edelsbrunner H, Fu P, Sudhakar PV, Subramaniam S (1998) Analytical

shape computation of macromolecules: I. Molecular area and volume through

alpha shape. Proteins 33: 1–17.15. Perrot G, Cheng B, Gibson KD, Vila J, Palmer KA, et al. (1992) MSEED: a

program for the rapid analytical determination of accessible surface areas andtheir derivatives. J Comput Chem 13: 1–11.

16. Richmond TJ (1984) Solvent accessible surface area and excluded volume inproteins. Analytical equations for overlapping spheres and implications for the

hydrophobic effect. J Mol Biol 178: 63–89.

17. Rychkov G, Petukhov M (2007) Joint neighbors approximation of macromo-lecular solvent accessible surface area. J Comput Chem 28: 1974–1989.

18. Greer J, Bush BL (1978) Macromolecular shape and surface maps by solvent

exclusion. Proc Natl Acad Sci U S A 75: 303–307.

19. Juffer AH, Vogel HJ (1998) A flexible triangulation method to describe the

solvent-accessible surface of biopolymers. J Comput Aided Mol Des 12:

289–299.

20. Can T, Chen CI, Wang YF (2006) Efficient molecular surface generation using

level-set methods. J Mol Graph Model 25: 442–454.

21. Zhang Y, Xu G, Bajaj C (2006) Quality meshing of implicit solvation models of

biomolecular structures. Comput Aided Geom Des 23: 510–530.

22. Duncan BS, Olson AJ (1993) Approximation and characterization of molecular

surfaces. Biopolymers 33: 219–229.

23. Connolly ML (1985) Molecular surface triangulation. J Appl Crystallogr 18:

499–505.

24. Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3d surface

construction algorithm. Comput Graph 21: 163–169.

25. Xiang Z, Shi Y, Xu YJ (1995) Calculating the electric potential of

macromolecules: A simple method for molecular surface triangulation.

J Comput Chem 16: 512–516.

26. Richards FM (1977) Areas, volumes, packing and protein structure. Annu Rev

Biophys Bioeng 6: 151–176.

27. Ye QZ. The signed Euclidean distance transform and its applications; 1988.

495–499.

28. Huang CT, Mitchell QR (1994) A Euclidean distance transform using grayscale

morphology decomposition. IEEE Trans PAMI 16: 443–448.

29. Vincent L. Exact Euclidean distance function by chain propagations; 1991.

520–525.

30. Xu D, Li H (2006) Euclidean distance transform of digital images in arbitrary

dimensions. LNCS 4261: 72–79.

31. Liang J, Subramaniam S (1997) Computation of molecular electrostatics with

boundary element methods. Biophys J 73: 1830–1841.

32. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al.

(2004) UCSF Chimera–a visualization system for exploratory research and

analysis. J Comput Chem 25: 1605–1612.

33. Liang J, Edelsbrunner H, Fu P, Sudhakar PV, Subramaniam S (1998) Analytical

shape computation of macromolecules: II. Inaccessible cavities in proteins.

Proteins 33: 18–29.

34. Wu S, Zhang Y (2008) MUSTER: Improving protein sequence profile-profile

alignments by using multiple sources of structure information. Proteins 72:

547–556.