Structural database resources for biological macromolecules · Structural database resources for...

11
Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata, Laboratory for Biomolecular Modeling, School of Life Sciences, Ecole Polytechnique Fe ´de ´rale de Lausanne, and Swiss Institute of Bioinformatics, Switzerland AAB014 Station 19 - 1015 Lausanne, Switzerland. E-mail: luciano.abriata@epfl.ch Abstract This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank (PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis tools tailored to specific kinds of molecules and interactions, often including also complex metrics precomputed by experts or ex- ternal programs, and connections to sequence and functional annotation databases. Importantly, updates of most of these databases involves steps of curation and error checks based on specific expertise about the subject molecules or inter- actions, and removal of sequence redundancy, both leading to better data sets for mining studies compared with the full list of raw PDB entries. The article presents the databases in groups such as those aimed to facilitate browsing through PDB entries, their molecules and their general information, those built to link protein structure with sequence and dynamics, those specific for transmembrane proteins, nucleic acids, interactions of biomacromolecules with each other and with small molecules or metal ions, and those concerning specific structural features or specific protein families. A few webservers dir- ectly connected to active databases, and a few databases that have been discontinued but would be important to have back, are also briefly commented on. Along the Briefing, sample cases where these databases have been used to aid structural studies or advance our knowledge about biological macromolecules are referenced. A few specific examples are also given where using these databases is easier and more informative than using raw PDB data. Key words: protein; nucleic acids; ligand binding; interactions; dynamics; PDB mining Introduction The worldwide Protein Data Bank [1] (referred here simply as ‘PDB’) is a partnership of servers for the collation, maintenance and distribution of macromolecular structure data (Figure 1A), which stand as the primary data resource in structural biology, containing all structures of biological macromolecules deter- mined by NMR, X-ray or neutron diffraction and cryo-electron microscopy. PDB entries include structures of isolated proteins, nucleic acids, their complexes with each other as well as with lipids, cofactors, substrate mimics, regulators, inhibitors, etc., adding up to >117 000 entries by April 2016 (for a recent discus- sion of extensive statistics, see the review by Berman et al. [2]). Naturally, each PDB entry brings important insights into the structural and functional biochemistry related to the original subject that motivated the study. But on top of that, the data- bank as a whole is a reservoir of broad rich information about biomolecular structure, dynamics and conformational variabil- ity, interactions, hydration, etc., and somehow also reflects the state of the art of structure determination methods and programs. The worldwide PDB is a large and complex database, and each of its entries contains large amounts of data besides the al- ready rich information of its atomic coordinates. Thus, despite the PDB servers having powerful querying interfaces, it turns out that for several goals it is often easier, faster and even more informative to resort to the specialized databases derived from the PDB. These PDB-derived databases (Figure 1B) are made of PDB entries prefiltered by the types of molecule(s) they contain, Luciano Abriata is a postdoctoral researcher at the Laboratory for Biomolecular Modeling at Ecole Polytechnique Fe ´de ´ rale de Lausanne and the Swiss Institute of Bioinformatics. His research focuses on the use of experiments and computation to achieve integrative models of biomolecular structure and function. Submitted: 11 March 2016; Received (in revised form): 16 April 2016 V C The Author 2016. Published by Oxford University Press. For Permissions, please email: [email protected] 1 Briefings in Bioinformatics, 2016, 1–11 doi: 10.1093/bib/bbw049 Paper Briefings in Bioinformatics Advance Access published June 5, 2016 at University of Nebraska-Lincoln Libraries on June 11, 2016 http://bib.oxfordjournals.org/ Downloaded from

Transcript of Structural database resources for biological macromolecules · Structural database resources for...

Page 1: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

Structural database resources for biological

macromoleculesLuciano A. AbriataCorresponding author: Luciano A. Abriata, Laboratory for Biomolecular Modeling, School of Life Sciences, �Ecole Polytechnique Federale de Lausanne, andSwiss Institute of Bioinformatics, Switzerland AAB014 Station 19 - 1015 Lausanne, Switzerland. E-mail: [email protected]

Abstract

This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank(PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis toolstailored to specific kinds of molecules and interactions, often including also complex metrics precomputed by experts or ex-ternal programs, and connections to sequence and functional annotation databases. Importantly, updates of most of thesedatabases involves steps of curation and error checks based on specific expertise about the subject molecules or inter-actions, and removal of sequence redundancy, both leading to better data sets for mining studies compared with the fulllist of raw PDB entries. The article presents the databases in groups such as those aimed to facilitate browsing through PDBentries, their molecules and their general information, those built to link protein structure with sequence and dynamics,those specific for transmembrane proteins, nucleic acids, interactions of biomacromolecules with each other and with smallmolecules or metal ions, and those concerning specific structural features or specific protein families. A few webservers dir-ectly connected to active databases, and a few databases that have been discontinued but would be important to have back,are also briefly commented on. Along the Briefing, sample cases where these databases have been used to aid structuralstudies or advance our knowledge about biological macromolecules are referenced. A few specific examples are also givenwhere using these databases is easier and more informative than using raw PDB data.

Key words: protein; nucleic acids; ligand binding; interactions; dynamics; PDB mining

Introduction

The worldwide Protein Data Bank [1] (referred here simply as‘PDB’) is a partnership of servers for the collation, maintenanceand distribution of macromolecular structure data (Figure 1A),which stand as the primary data resource in structural biology,containing all structures of biological macromolecules deter-mined by NMR, X-ray or neutron diffraction and cryo-electronmicroscopy. PDB entries include structures of isolated proteins,nucleic acids, their complexes with each other as well as withlipids, cofactors, substrate mimics, regulators, inhibitors, etc.,adding up to >117 000 entries by April 2016 (for a recent discus-sion of extensive statistics, see the review by Berman et al. [2]).Naturally, each PDB entry brings important insights into thestructural and functional biochemistry related to the original

subject that motivated the study. But on top of that, the data-bank as a whole is a reservoir of broad rich information aboutbiomolecular structure, dynamics and conformational variabil-ity, interactions, hydration, etc., and somehow also reflects thestate of the art of structure determination methods andprograms.

The worldwide PDB is a large and complex database, andeach of its entries contains large amounts of data besides the al-ready rich information of its atomic coordinates. Thus, despitethe PDB servers having powerful querying interfaces, it turnsout that for several goals it is often easier, faster and even moreinformative to resort to the specialized databases derived fromthe PDB. These PDB-derived databases (Figure 1B) are made ofPDB entries prefiltered by the types of molecule(s) they contain,

Luciano Abriata is a postdoctoral researcher at the Laboratory for Biomolecular Modeling at �Ecole Polytechnique Federale de Lausanne and the SwissInstitute of Bioinformatics. His research focuses on the use of experiments and computation to achieve integrative models of biomolecular structure andfunction.Submitted: 11 March 2016; Received (in revised form): 16 April 2016

VC The Author 2016. Published by Oxford University Press. For Permissions, please email: [email protected]

1

Briefings in Bioinformatics, 2016, 1–11

doi: 10.1093/bib/bbw049Paper

Briefings in Bioinformatics Advance Access published June 5, 2016 at U

niversity of Nebraska-L

incoln Libraries on June 11, 2016

http://bib.oxfordjournals.org/D

ownloaded from

Page 2: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

the level of sequence redundancy and parameters about thequality of the structures; many are cross-annotated with othertypes of data and classifications, and most output graphical in-formation and even interactive 3D visualizations of the relevantmolecules besides the alphanumerical results. Importantly,most PDB-derived databases have the added value that they arebuilt, maintained and updated by experts in a specific field ofstructural biology, therefore they perform analyses and calcula-tions on the coordinates that would be cumbersome for anonexpert user to carry out.

This Briefing describes the most used databases derivedfrom the PDB, which are currently active, curated and updated,giving database-specific remarks and general comments.These databases (Figure 1) simplify the process of finding andanalyzing specific sequences, molecules or their interactions,and facilitate browsing and mining PDB entries related by aproperty, interaction or molecule of interest. Specific examplesabout the utility of these databases are referenced to the litera-ture, and some specific test cases are presented (Figures 2–5).A few important Web servers related to databases and a fewdiscontinued databases of special importance are alsomentioned.

PDB data centers and PDB-derived databasesthat facilitate browsing through PDBentries at a global level

PDB entries are available chiefly from three data centers: theResearch Collaboratory for Structural Bioinformatics PDB [3],PDB Europe [4] and PDB Japan [5], whose contents are central-ized by the worldwide PDB but each having its own series of ori-ginal ways to search, browse and display data, as well asdifferent associated analysis tools and connections to externaldatabases. Both RCSB PDB and PDB Europe have extensivesearch and download facilities, simplified programmatic access,

Figure 1. The PDB data centers and the PDB-derived databases currently active

as of April 2016, focus of this Briefing. A clickable version of this figure is avail-

able at http://lucianoabriata.altervista.org/papersdata/bib2016.html

Figure 2. An example on how to use PDBsum as a hub to structural informa-

tion of the protein aquaporin-0 in this database and in related PDB-derived

databases (A), with the goal of learning about the protein before setting up a

molecular dynamics simulation of the protein embedded in a membrane

based on the orientation stored in the OPM database (B). Panel A starts by

searching for PDB ID 1SOR in PDBsum. Among many other informations, the

‘Top page’ tab for this entry has a precomputed Ramachandran plot (globe

labeled 1), references listed in the PDB file (2, clicking shows relevant text and

figures from the publication), species the protein sequence belongs to (3, in

this case Ovis aries, sheep), a direct link to its UniProt entry (4), gene ontologies

(5, indicating this is a membrane protein with transport activity), several ex-

ternal links (6,7, a few extended in the bottom part of the panel), two precom-

puted views (8,9) and a link to online 3D visualization (10). The top of this

panel shows selected pictures from the ‘Protein’ and ‘Pores’ tabs, showing a

planar view of the protein topology and a partially open pore identified in the

asymmetric unit. Of the many links (6,7), shown are those to PDB Europe and

RCSB PDB, which provide a picture of the asymmetric unit like PDBsum but

also precomputed biological assemblies, in this case an homooctamer that

consists of two layers of tetramers. PDB Europe and RCSB also quickly display

information about the experimental conditions, structure quality (with direct

reports from PROCHECK and WHATIF coming from PDBsum) and refinement

statistics (which in this case can be slightly improved according to

PDB_REDO). On the other hand, panel B starts by looking at the OPM entry for

1SOR, which shows how an homotetrameric unit could fit in a membrane.

Using this reoriented structure one can easily set up an MD simulation of the

system using Lipidbuilder or CHARMM-GUI. Finally, from annotation and

wiki-based databases like Proteopedia one can quickly learn that aquaporin-0

mediates cell-cell contacts by establishing membrane junctions permeable to

water. A coarse 3D model of such junctions can be made by putting together

the biological assembly and membrane-embedding informations (C). For a

color version of this figure please visit the online article.

2 | Abriata

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 3: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

online 3D visualization capabilities and internal tools for per-forming sequence, motif and structure analysis, structuralalignments, etc. Both have also pre-built pictures of each PDBentry, those of PDB Europe being more varied in their colors,orientations and rendering of different units and ligands, and ofbetter quality. Importantly, both RCSB and PDBe display themost likely biological assembly as determined by the PISA ser-ver, which is helpful to better validate and even find overlookedinteractions [6, 7]. On its side, the Japanese PDB site [5] containstwo unique resources: eF-site [8], a database of precomputedmolecular electrostatic potential surfaces for PDB entries usefulto quickly visualize electrostatic surfaces without having to runlengthy Poisson-Boltzmann solvers; and the associated eF-seekserver [9], which attempts to predict functional sites by search-ing for molecular surfaces of similar shapes and electrostaticsignatures. Last, each of the sites mentioned in this paragraphcontains its own educational resources about protein structuresfor experts and nonexperts.

Aiming to facilitate browsing and structure visualization on-line through a picture-rich interface, PDBsum [10] lists brief de-scriptions of all PDB entries including precomputed views ofmost molecular components, easy-to-browse displays of struc-ture resolution and R-factors, protein sequences with annotatedPFAM and CATH domains, wire diagrams and Ramachandranplots for protein components, varied information about ligandsand metal sites, and summaries of interactions between mo-lecular components. PDBsum further offers precomputed pre-dictions about potential pores and tunnels, quick links to searchfor PDB entries with similar sequences, and graphics-aided dis-play of sequence variants annotated with predicted changes ininteractions and solvent accessibility. PDBsum contains also

direct links to the main PDB entries at RCSB PDB and PDBEurope, to the literature where the entries have been cited, toother databases that summarize information about PDB entries,to databases and servers of quality check reports, to databasesof annotations about secondary and quaternary structures,motifs, domains, functions, sequence alignments, ontologyterms, possible orientations in membranes and to community-annotated resources, among others. Part of the richconnectivity among the databases covered up to this point isschematized through an example in Figure 2A.

With a graphical way to browse the PDB, PDB-Explorer pro-vides an online interactive map built from a high-dimensionalfingerprint of atom pairs that reflects protein shapes, mappedto two dimensions through principal components analysis. ThePDB-Explorer interface loads in less than a minute and thenallows searching for structures similar to those of a given queryprovided in atomic coordinates with smooth and essentially in-stantaneous feedback [11].

Last in this section, PDB_SELECT [12] compiles minimal listsof representative X-ray structures at a sequence identity cutoffof 30%, of the highest available quality (measured as a combin-ation of resolution and R-factor). This database is particularlyuseful for mining studies as it reduces the number of PDBentries to analyze, minimizing redundancy as well as noise anderrors on the mined values. Similarly, PDB-REPRDB is a databaseof representative protein chains [13].

Detecting errors and rebuilding PDB structures

PDBREPORT [14] is a database that describes structural problemsin PDB entries. It summarizes anomalies and errors in

Figure 3. Example on using PDBFINDER II to easily retrieve the most likely secondary structures adopted by a dipeptide. This task could be achieved through a number

of alternatives, but the fastest is possibly by just scanning PDBFINDER II using this set of Linux scripts and small Python program. Notice that this protocol does not re-

quire downloading any files other than the single ASCII PDBFINDER II file (ftp://ftp.cmbi.ru.nl/pub/molbio/data/pdbfinder2/PDBFIND2.TXT.gz, under 450 MB by April

2016); and that it does not require any kind of secondary structure calculations to be performed because they are already included from DSSP analysis on PDBFINDER II

update. Therefore the answer is obtained in seconds. This specific example retrieves secondary structures for all Cys-Pro dipeptides in structures of the PDB, from a

real-world question received from experimental collaborators. The scripts can be obtained at http://lucianoabriata.altervista.org/papersdata/bib2016.html.

Structural database resources for biological macromolecules | 3

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 4: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

structures of the PDB computed by WHAT_CHECK, in the formof text and graphics reporting on differences between positionsor angles of multiple copies of a molecule, presence of ligandsof unknown topologies, outliers in Ramachandran plots, unex-pected and missing atoms, chain breaks, suspicious B-factorsand occupancies, unusual geometries (bond lengths, angles, tor-sions, planarity of aromatic molecules and puckering of prolineresidues and carbohydrates, unusual backbone conformations),unusual packing including unsatisfied hydrogen bonds, poten-tial problems with solvent molecules and ions, possible histi-dine/asparagine/glutamine flips and more.

PDB_REDO [15] is a database of automatically re-refined PDBentries. This is important because many PDB entries are old andsuffer from problems that modern software, methods andknowledge about biomolecular structure can fix. This server isintegrated directly into the Coot and Yasara programs for pro-tein crystallography, facilitating the comparison of original andoptimized structures and electron density maps. As an exampleof its importance beyond the curation of specific errors in PDBstructures, high-throughput analyses based on PDB_REDO led toa large compilation of peptide planes predicted to be flippedand peptide bonds predicted to be swapped between trans andcis conformations in the PDB [16].

All these databases are advised for preliminary checks ofPDB structures before launching calculations that rely heavilyon accurate coordinates and structure completion, for example,when setting up molecular dynamics simulations. In case theuser cannot find an entry of interest, the WHYNOT database(http://www.cmbi.ru.nl/WHY_NOT2) attempts to explain why agiven PDB entry is not available in the PDB-derived databasesfrom the Vriend group, PDBREPORT, B-factor Data Bank (BDB),PDB_REDO, PDBFINDER and PDB_SELECT [17].

PDB-derived databases that connect proteinsequence, structure and dynamics

PDBFINDER [18] is a particularly useful database that providesprecomputed secondary structures directly in text formataligned to protein sequences, for each entry of the PDB, alto-gether in a single text file. A new version, PDBFINDER II, in-cludes also information about chain breaks, quality indicators,B-factors and much more, also in a single compact text file.These databases allow fast searches through sequences andresidue-specific information, largely simplifying searches andcomparisons of such kind of data through simple Linux scriptsas exemplified in Figure 3. Related to the problem of mappingsequences, secondary structures and PDB structures, ChSeq is adatabase focused on chameleon sequences, i.e. sequences thatadopt radically different conformations across PDB entries [19].

There are also PDB-derived data sets focused on protein con-formational variability, among others PCDB [20], CoDNaS [21]and PDBFlex [22]. These data sets take advantage of the fact thatseveral proteins have been crystallized in different conform-ations arising from varying point group crystals, pH, precipi-tants, bound ligands, mutations, etc. The idea behind thesedatabases is that the conformational variability observed in dif-ferent structures of a given protein somehow reflects the under-lying conformational states that the protein can adopt [23], withthe caveat that some might not truly exist in the intrinsic dy-namic landscape of the free protein in standard conditions butrather be forced by the special conditions in which the structurewas solved (although this is somewhat considered in CoDNaS[21]). In practice, it turns out that conformational variability as

observed across PDB structures has multiple applications, forexample, to explore the role of conformational flexibility in pro-tein function, regulation and even evolution [24–26], to projectmolecular dynamics trajectories [27] and to better discriminatethe effects of mutations on protein structure and even stability[28].

One of the latest servers dedicated to protein variability inthe context of flexibility, PDBFlex [22], provides several uniquevisualization capabilities that clearly highlight dynamics fromstructures. Especially enticing is the display of an animationthat summarizes the collective protein dynamics, which areoften the most functionally relevant ones and which are hard toobserve for example in molecular dynamics simulations be-cause they occur in slow timescales. Moreover, PDBFlex disen-tangles local from global variability, both of which can beinspected in plots interactively connected to the PDB structuresof the corresponding structures. Another unique feature is thatthe user can download the sets of aligned protein structures ineach cluster.

B-factors are additional atom-specific outputs from the pro-cess of structure refinement from X-ray diffraction data, ofteninterpreted in terms of internal atomic motions to extrapolateinformation about protein dynamics. But B-factors are affectedby variables other than true dynamics, hence caution must betaken for their interpretation. The B-factor data bank (BDB)gathers all PDB entries with consistent B-factors that reflecttrue dynamics [29].

Last in this section, the now discontinued database ProDDO[30] was particularly interesting because it was built by textmining the PDB files for keywords related to protein dynamics,such as ‘disorder’, ‘gap’ (referring to unresolved residues), ‘un-folded’, etc. A rebuild of this lexicon-based database could be-come important in the context of automated annotations ofprotein properties.

Protein flexibility and disorder fromX-ray and NMR structures

Some small protein segments and peptides of high flexibilityare resolved in NMR and X-ray structures when bound to pro-teins, if this binding restricts motions. In this regard, an inter-esting database, ComSin [31], compares structures of proteincomplexes with structures of the isolated proteins focusing onorder–disorder transitions, with the outcome that the unboundproteins tend to, but not always, have more disordered residues.A related important database, of less structural content butwith more annotations, is MobiDB [32].

Many proteins and peptides that do not crystallize and/orcharacterized by extensive flexibility are still represented in thePDB, mostly solved by solution-state NMR. The BiologicalMagnetic Resonance Data Bank [33] (BMRB, or biomagresbank)is the hub that collects raw NMR observables for biomolecules,not limited to restraints for protein structure calculation.Entries with NMR structural restraints are interconnected to thecorresponding PDB entries; indeed submission of new NMRstructures to the PDB is entangled with submission of NMR datato the BMRB [34]. Also, the PACSY programs provide a relationaldatabase system to browse information from the PDB, BMRBand SCOP databases through SQL queries [35].

In the highly dynamic extreme of the flexibility spectrum,intrinsically disordered proteins and peptides simply lack struc-tures that can be captured neither by crystallography nor byNMR, and are therefore poorly represented in the PDB. Still,

4 | Abriata

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 5: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

their structural heterogeneity can be treated through combin-ations of computational methods coupled to experimentalobservations from solution-state techniques like NMR, fluores-cence, Electron Paramagnetic Resonance and Small Angle X-rayScattering. Although not derived from the PDB, the ProteinEnsemble Database [36] gathers these structural ensembles inthe same format as normal PDB files, together with the experi-mental restraints and algorithms used to generate the ensem-bles. As such, this server is a valuable resource for researchinvolving disordered peptides and proteins too flexible to befound in the PDB.

Databases of membrane proteins

Membrane proteins represent around 20–30% of the protein-coding genes in an organism [37, 38], and many are vital be-cause they are at the heart of molecular sensing mechanisms,cell-to-cell communication processes and even the essentialprocess of respiration itself. Membrane proteins are harder towork with in the laboratory, so they are much less representedthan soluble proteins in the PDB. But on the other hand, theyare the main human protein targets of current drugs [39].Because of this, making the most out of existent structures bymining the PDB, and using this information for simulations,modeling and driving experiments, is of utmost importance.There are indeed several PDB-derived databases that focus onmembrane proteins.

PDBTM [40] was the first comprehensive database of trans-membrane proteins from the PDB, with currently >2600 entries.It consists basically of a list of PDB files that can be browsed oneby one, downloaded entirely or downloaded by groups (alpha orbeta proteins, redundant or nonredundant data sets). From thesame lab, the TOPDB database cross-references sequences andPDB structures of membrane proteins with information aboutmembrane protein topologies obtained through experimentsand bioinformatic predictions [41].

Similar to PDBTM, mpstruc (http://blanco.biomol.uci.edu/mpstruc/) also maintains an up-to-date list of membrane pro-teins of known 3D structure, but provides finer classificationsinto monotopic proteins, transmembrane b-barrels and trans-membrane alpha helical proteins, each further classified ac-cording to their structures, functions and families.

One of the most popular such databases is OPM, theOrientation of Proteins in Membranes database [42]. OPM pro-vides PDB coordinates of integral membrane proteins, someperipheral proteins and membrane-active peptides, pre-oriented relative to the membrane normal for membranes ofvariable thickness. Orientations are optimized by minimizingprotein transfer energies from water to membrane as computedwith an implicit solvent model [43]; and usefully, a related Webserver allows orientation of user-uploaded structures. OPM pro-vides reasonable orientations in most cases, but if necessarythey may be refined manually and/or through coarse-grainedMD simulations. It is directly connected to the CHARMM-GUIserver [44], together simplifying largely the setup of atomisticand coarse-grained MD simulations for membrane proteinsfrom the PDB (Figure 2B).

The new database MemProtMD [45] contains proteins pre-equilibrated into explicit membranes using coarse-grained self-assembly MD simulations. This approach is in principleexpected to be more accurate than the implicit solvent model ofthe OPM server and should therefore deal better with complexcases like that of peripheral (i.e. nonintegral) membrane pro-teins. From these coarse-grained models it is straightforward to

inspect the lipid environment of a membrane protein and tosetup more complicated simulations, even of atomistic level.The process of building the MemProtMD database illuminatednew structural aspects about how proteins stay embedded inmembranes, how membranes respond with deformations,amino acid–lipid interactions and lipid-binding protein sites,different distributions of amino acids along the membrane nor-mal, and more; even a protocol for the identification of novelmembrane proteins from new structures emerged from thatwork [45].

Connecting structure and sequence spaces for transmem-brane proteins, the TMalphaDB and TMbetaDB Web servers[46] allow searches of amino acid sequences (including wild-cards) in PDB files of alpha and beta membrane proteins. Aftersearching, the servers allow interactive display of the matchedsequences and their structures in the found PDB entries, aswell as extraction of unique sequences and calculation ofbackbone and sidechain torsional angles for the matched pep-tide. Interesting effects of some residues on transmembranesecondary structures were unveiled by analyses with theseservers [46].

Databases specialized on nucleic acids

Nucleic acids, especially RNAs, have also been the focus of spe-cialized databases. The first database of nucleic acid structures,NDB [47], has been around since the early 1990s and is today themain reference in the field. It currently contains >8000 experi-mental structures of DNA and RNA molecules (including DNA–RNA pairs) extracted and curated from the PDB. The structuresare annotated with information specific to nucleic acids, hardlyaccessible from the original PDB entry. On incorporation of PDBentries to NDB, the structures are analyzed through computa-tion of structural geometries (hydrogen bonding and base-pairing patterns, extraction of regular motifs, etc.), and classi-fied and made searchable at the sequence, secondary structureand structure levels. Users can carry out further specific ana-lyses and online visualizations in 2D and 3D using several toolsavailable at the NDB server, with links to other resources (two

examples in Figure 4). Moreover, new tools were introduced in2014 to analyze RNA sequences, align RNA and DNA moleculesand calculate and visualize RNA structural geometries andbase-pairing patterns, which are more complex and varied thanthose for DNA. The NDB server also contains statistics aboutideal geometries for bases and sugars as well as for differenthydrogen bonding pairing patterns, including a new catalog ofbase pairing in RNA, an educational section, and software forfurther offline analyses.

Other relevant databases specific for RNA combine datafrom several sources, including the NDB and occasionally dir-ectly the PDB sites, to provide more comprehensive annota-tions, which are in many cases combined with sequence-based predictions [48–52]. Most of these servers allow searchesfrom RNA sequences and secondary structures; some offer thepossibility of searching and analyzing RNA coordinates froman uploaded PDB file or filter results based on geometries,neither of which is supported by the NDB at the moment. Aparticularly informative database about RNA structure is theRNA 3D hub [51], which is in fact the source of informationfor part of the precomputed structural analyses of RNA mol-ecules at NDB.

Structural database resources for biological macromolecules | 5

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 6: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

Structural databases of intermolecularinteractions

Interactions between biomolecules are the cornerstone of cellu-lar structure, signal transduction and biochemistry. Moreover,small molecules can strongly regulate protein function, inter-action and localization through binding, as exploited in drugdesign.

When interactions are strong enough, there is a fair chanceof solving structures of the bound biomolecules by Cryo-EM,X-ray diffraction or NMR spectroscopy. When interactions arenot too strong, NMR can still provide ensembles of possiblestructures with variable confidence. For weak binding, or forcases of strong binding where Cryo-EM, X-ray or NMR failed,only computational modeling techniques based on sparse datacan be used to achieve a structural model of the complex [53](pure molecular dynamics simulations with no experimental in-put are promising, but still not reliable enough [54]). It is there-fore important to get most out of the biomolecular interactionsobserved in the PDB.

Continuing with nucleic acids from the previous section,NPIDB [55] focuses on interactions between nucleic acids andproteins. It can be downloaded entirely, browsed or searched byPDB entry, PFAM or SCOP classification of the protein partners,or by GO terms, among others. The work leading to the develop-ment of NPIDB yielded unprecedented classification of protein-nucleic acid interactions defined by the contacts established be-tween different structural elements of the intervening proteinsand nucleic acids, a classification that the server performs.

Protein–protein interactions were the scope of the originalversion of the DOMMINO database [56], whose latest version in-tegrates structures of all complexes involving protein, RNA andDNA molecules. Searches can be done by PDB ID, SCOP family ofthe intervening proteins or interaction type, among others, andthe output can be filtered by number of intermolecular contacts.Related, although not a structural database, the iPFAM database[57] of protein–protein and protein–ligand interactions was builtthrough high-throughput analysis of interactions in the PDB.The most interesting aspect of iPFAM is that a user can query a

PFAM identifier for all the interactions established by proteinsof that family with proteins of all PFAM families as seen at thestructural level.

Besides interactions between macromolecules, PDB entriesalso often contain small molecules bound to proteins, such asadditives for crystallization, substrates, substrate analogs, regu-lators, lipids for proteins purified from membranes, and drugs ordrug candidates added to proteins before crystallization orsoaked into crystals. Given that the analysis of intermolecularinteractions is most useful in the context of functional informa-tion about how the interaction modulates activity, databasesabout protein-bound ligands typically cross data from multiplesources to that of the PDB. The sc-PDB [58] is a database of‘druggable’ binding sites from the PDB. Its construction and up-dates filter out solvent molecules, detergents, ions and othercommon additives used for protein crystallization, thus enrich-ing relevant binding sites. It is annotated by crossing informa-tion from Uniprot and GO, includes several classification criteriaand chemoinformatic descriptors of the ligands and bindingsites. The output is rich in geometric and chemical data aboutthe interactions between ligands and proteins, and includingenticing yet simple 2D diagrams of the binding site besides theusual 3D visualization (example in Figure 5, top part).

Still related to targeting protein function with small mol-ecules, the Pocketome [59] is an automatically updated databaseof small-molecule binding sites that includes in most casesmore than one PDB entry per site, allowing the user to inspectthe structural variability of each binding site (example followingfrom scPDB in Figure 5, bottom part). From the same authors,PeptiSite [60] is a database of peptide-binding sites that alsogroups entries into ensembles allowing inspection of structuralvariability in the docked peptides and binding sites. Last, PDTD[61] is a database of protein targets with known structures, builtfrom the PDB based on actual functional data extracted fromthe literature and from several databases of therapeutic com-pounds. It is dominated by enzymes as targets, but includesalso receptors, transport proteins and many others; and ithomogeneously covers targets related to varied diseases. PDTD

Figure 4. How PDB-derived databases for nucleic acids help highlight the structural complexity of RNA over that of DNA. For an example of protein-bound DNA (A)

100% of the base pairs are in standard Watson-Crick bonding with anti-parallel strand orientation (‘cWW’) according to NDB. This simplicity stems from the canonical

B conformation adopted by this piece of DNA. But the exemplified tRNA molecule (B) has around 25% of its nucleotides in one of -in this case- 8 noncanonical pairing

motifs (from NDB). This is owing to the complex 3D structure as observed in the online 3D visualization or in the simple circular representation at the RNA 3D hub. For

a color version of this figure please visit the online article.

6 | Abriata

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 7: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

is integrated to the TarFisDock [62] Web server, with which auser can easily dock a small molecule to all PDTD entries.

There are also databases specialized on interactions involv-ing specific kinds of proteins. For example, KLIFS [63] is special-ized on kinases, currently containing around 250 differentproteins and almost 2000 unique ligands taken from over 3000PDB entries, including also sequence and structure alignmentsand annotations of the pockets targeted by the ligands, the con-formations of key structural elements of kinases like the DFGloop and more. Another example is the Antigen–AntibodyInteraction Database [64], which collects molecular interactionsbetween antigens and antibodies at atomic/residue levels clas-sified by interaction type, and whose output includes informa-tion about the antibody regions involved in binding and anonline visualization tool.

Metal sites in proteins and nucleic acids

Around 30–40% of the proteomes consists of metal-binding pro-teins [65]. This includes enzymes whose activities require oneor more metal ions, proteins that require metals to achieve theirnative structures and as more recently identified, proteins ofthe metal homeostasis systems, which control intracellularmetal levels and that deliver metal ions to functional and struc-tural sites. Now discontinued, Scripp’s Metalloprotein Databaseand Browser was the first PDB-derived database of proteinmetal sites [66]. But other resourceful databases are available,widely used for point applications and also for mining and thusbetter understanding protein metal sites [67–70], a knowledgethat in turn helps to better refine the metal sites of newly solvedmetalloprotein structures [68, 71, 72].

Figure 5. An example based on part of a real-world investigation of the molecular features of b-lactam-binding sites. One can retrieve all known b-lactam-binding sites

by searching for ‘lactamase’ at scPDB. For each retrieved PDB, this webservice shows structural and chemical properties of both the binding pocket and bound ligand,

plus a detailed list of protein-ligand interactions and 2D representation of it. One can then search for each of the PDB entries retrieved by scPDB at Pocketome, which

will return structural and chemical information about the conserved and variable features of similar sites in other PDB entries. For a color version of this figure please

visit the online article.

Structural database resources for biological macromolecules | 7

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 8: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

Metal Interactions in Protein Structures (MIPS [73]) allows toeasily find, download and visualize all PDB entries that containa given metal ion (also monoatomic anions), filtering them bythe types of molecules interacting with the ion and by structurequality and degree of redundancy.

MetalPDB [74] is a database specialized on metal sites, de-veloped by one of the world leading groups in structural metal-lobiology. Its search is limited to PDB IDs only, but has morecomplete visualization capabilities and provides some informa-tion unavailable from MIPS, including automatic calculation ofcoordination numbers, coordination geometries and of proteinand nonprotein metal ligands, plus CATH, SCOP and Pfam an-notations. A related server from the same group, MetalS(3) [75],allows to search metal sites structurally similar to the metalsite of a given structure (from the PDB or user-uploaded)throughout the whole PDB.

Metal Ions in Nucleic AcidS [76] is designed to search formetal ions bound to nucleic acids. The search interface is com-plex, but this allows detailed queries specifying which bases toallow in the inner and outer coordination spheres, restraints ontheir distances to the metal ion, restraints on the relative pos-itions of different bases bound to the metal ion and more. Theoutput includes a list of atoms in the inner and outer coordin-ation spheres, their distances to the metal ion, the bases theybelong to and online visualizations.

Last, CheckMyMetal [77] is not strictly a database but rathera Web server of fast response such that it works as a database,powerful to validate and detect inconsistencies in metal sites ofthe PDB or in user-uploaded structures.

Databases of specific structural features ofproteins

Posttranslational modifications are key players in the regulationof protein function, intracellular localization and turnover.Despite being minor compared with protein sizes, they provokeradical alterations in binding specificities and often entire looprefolding. While most databases and Web servers aboutposttranslational modifications are dedicated to their compil-ation and prediction, PTM-SD is probably the only one thatfocuses globally on the structures of protein amino acid modifi-cations as retrieved from the PDB [78]. It crosses informationfrom the PDB, UniProt, PTMCuration [79] (which curates modifi-cations on-the-fly on Swiss-Prot updates) and dbPTM [80] (acomplete repository of posttranslational modifications but lack-ing structural information), facilitating sequence-structure-function analyses.

Knots in the backbone traces of proteins are relatively poorlyunderstood, but important especially regarding the field of pro-tein folding [81]. The pKnot database/server allows browsingthrough PDB entries that contain knots in their backbone traces,searching sequences in the database of PDB entries with knots(including homology models when no perfect sequence matchis found), and also searching for knots in user-uploaded struc-tures. The main output includes online visualization of the knotin the context of the full protein, and classification of its knot(s)into one of so-far four types identified by the developers.

Tandem repeats in proteins are also relatively poorly under-stood; in particular, the degeneracy of the basic repeat unitsmakes tandem repeats especially hard to find, define with preci-sion and mine. RepeatsDB is a structural database of tandem re-peats in proteins, built through automatic detection followed bymanual curation by a group of experts in repeat proteins [82]. Bybuilding this database the authors could provide a draft

structure-based classification of repeats including coiled coils, b-solenoids and b-propellers as some of the most represented ex-amples, and proposed a large number of potential new repeats.

Protein loops are another important element under intensestudy, as they are often dynamic and are usually directly relatedto function. Moreover, they differ from secondary structures inthat they do not adopt regular conformations, and therefore,they are not easy to model even though they are accurately pre-dicted from sequences. The ArchDB [83] database provides asimple interface to perform complex PDB searches of loops thatconnect specific secondary structure elements (i.e. two helices,a helix and a b-strand, etc.) with specific geometries and num-bers of spacing residues. The detailed output includes second-ary structures, surface accessibility, geometric parametersdescribing the loop and online 3D visualization, and is helpfulfor loop engineering. The work that led to ArchDB furtherderived a novel structural classification of loops based onRamachandran and sequence patterns. A more dynamical clas-sification of loops based on the predicted timescales of theirfluctuations can be obtained through a Web server developedfrom extensive MD simulations [84].

Other relevant databases of biologicalmacromolecules

The Glycan Fragment Database (GFDB [85]) focuses on protein-bound oligosaccharides, a special kind of posttranslationalmodification especially ubiquitous in the extracellular domainsof membrane proteins where it is often involved in molecularrecognition. GFDB allows users to search for specific glycan se-quences in a set precompiled from the PDB, to retrieve struc-tural information. This database was the starting point forrecent parameterization of carbohydrates in the CHARMM forcefield [86], allowing the treatment of glycoproteins in moleculardynamics simulations.

In recent years, the large number of PDB entries for someproteins has allowed protein-specific databases to appear, likethe ones for kinases or antibody–antigen pairs reviewed above.Another example is UbSRD [87], which compiles structures ofubiquitin, ubiquitin-like folds and their interaction partners,browsable in different ways. Similar in spirit but broader interms of molecules, SATPdb [88] focuses on peptides of thera-peutic interest (antimicrobial, hemolytic, anticancer, etc.) withstructures available in the PDB, extended with high-level mo-lecular modeling when structures are unavailable but can be re-liably estimated by prediction methods. SATPdb can be browsedby different criteria such as therapeutic function, secondarystructures and other properties. Interesting statistics emergedfrom building and updates of SATPdb; for example, showingthat most peptides in the database have more than one thera-peutic activity.

Remarks on other relevant Web services

Finally, a few comments on Web services that do not containstructures of biological macromolecules but are extremely rele-vant to computational and experimental research in biochemis-try and structural biology. One is EMDataBank [89], the mainrepository for primary electron microscopy data, important ascryo-EM structures rapidly populate the PDB providing unprece-dented structural data for large macromolecular assemblies. TheEMDB and PDB sites are indeed highly interconnected, just likethe BMRB [34] for NMR structures and the Electron DensityDatabase [90] for X-ray structures. Another novel server that is

8 | Abriata

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 9: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

not a database but is related to database building and couldrevolutionize data mining studies is PatternQuery, a webscripting-based application to extract structure patterns fromPDB files [91].

Last, whereas this Briefing has covered structural databaseresources for biological macromolecules, it is important to high-light the also extremely useful databases containing chemicaland functional information, ontologies and structures of mil-lions of small molecules. Some of these databases focus on mol-

ecules with biological activities, others focus on metabolites,and others simply attempt to cover the full chemical space orsubsets of drug-like compounds. Some of the best known ser-vers for small molecules are PubChem, ChEMBL, ChEBI, ZINC,the Human Metabolome Database and The Chemical SpaceProject among the most comprehensive academic options[92–97].

Conclusion

It is hoped that this Briefing would recapitulate most of the cur-rently up-to-date (as of April 2016) databases derived from thePDB, which researchers should have in mind and profit from tobetter and more easily analyze their structures and mine PDBdata. An important remark is that most of these databases werebuilt by specialists in the relevant fields and molecular types.Also important, these databases provide immediate responseand online visualization capabilities (although some depend on

external plug-ins so their developers should consider replacingthem by options like JSmol [98]) and are highly interconnectedto each other, both aspects making them accessible and inter-active.

Key Points

• The worldwide PDB is the main repository for struc-tural data of biomolecules, but its complexity often ob-scures browsing, finding and mining its entries effi-ciently, accurately and without bias from for exampleredundancy or structure quality.

• Specialized databases built down from the PDB throughmining methodologies, curation (sometimes even man-ual) and connections to other data repositories, facili-tate browsing and finding specific kinds of moleculesand molecular features as well as connecting structuresto sequence, dynamics, interactions and function.

• The specialized databases described here are built byexperts in the relevant techniques, molecular typesand interactions. Thus, they often contain precom-puted descriptors, for example, about molecular geo-metries, that would be cumbersome to calculate fornonexperts.

• Most described databases feature online displays ofrelevant data and online structure visualization facili-ties optimized to show the relevant molecules andinteractions.

Acknowledgement

I acknowledge EMBO for a Long-Term PostdoctoralFellowship.

References1. Berman H, Henrick K, Nakamura H. Announcing the world-

wide protein data bank. Nat Struct Biol 2003;10:980.2. Berman HM, Coimbatore Narayanan B, Di Costanzo L, et al.

Trendspotting in the Protein Data Bank. FEBS Lett

2013;587:1036–45.3. Rose PW, Prli�c A, Bi C, et al. The RCSB Protein Data Bank: views

of structural biology for basic and applied research and edu-cation. Nucleic Acids Res 2015;43:D345–56.

4. Velankar S, van Ginkel G, Alhroub Y, et al. PDBe: improved ac-cessibility of macromolecular structure data from PDB andEMDB. Nucleic Acids Res 2016;44:D385–95.

5. Kinjo AR, Suzuki H, Yamashita R, et al. Protein Data BankJapan (PDBj): maintaining a structural data archive and re-source description framework format. Nucleic Acids Res

2012;40:D453–60.6. Krissinel E, Henrick K. Inference of macromolecular assem-

blies from crystalline state. J. Mol. Biol 2007;372:774–97.7. Abriata LA, Pontel LB, Vila AJ, et al. A dimerization interface

mediated by functionally critical residues creates interfacialdisulfide bonds and copper sites in CueP. J Inorg Biochem

2014;140:199–201.8. Kinoshita K, Nakamura H. eF-site and PDBjViewer: database

and viewer for protein functional sites. Bioinformatics2004;20:1329–30.

9. Kinoshita K, Murakami Y, Nakamura H. eF-seek: prediction ofthe functional sites of proteins by searching for similar elec-trostatic potential and molecular surface shape. Nucleic Acids

Res 2007;35:W398–40210.de Beer TAP, Berka K, Thornton JM, et al. PDBsum additions.

Nucleic Acids Res 2014;42:D292–6.11. Jin X, Awale M, Zasso M, et al. PDB-Explorer: a web-based

interactive map of the protein data bank in shape space. BMC

Bioinformatics 2015;16:339.12.Hobohm U, Sander C. Enlarged representative set of protein

structures. Protein Sci 1994;3:522–4.13.Noguchi T, Akiyama Y. PDB-REPRDB: a database of represen-

tative protein chains from the Protein Data Bank (PDB) in2003. Nucleic Acids Res 2003;31:492–3.

14.Hooft RW, Vriend G, Sander C, et al. Errors in protein struc-tures. Nature 1996;381:272.

15. Joosten RP, Long F, Murshudov GN, et al. The PDB_REDO ser-ver for macromolecular structure model optimization. IUCrJ

2014;1:213–20.16.Touw WG, Joosten RP, Vriend G. Detection of trans-cis flips

and peptide-plane flips in protein structures. Acta CrystallogrD Biol Crystallogr 2015;71:1604–14.

17.Touw WG, Baakman C, Black J, et al. A series of PDB-relateddatabanks for everyday needs. Nucleic Acids Res

2015;43:D364–8.18.Hooft RW, Sander C, Scharf M, et al. The PDBFINDER database:

a summary of PDB, DSSP and HSSP information with addedvalue. Comput Appl Biosci 1996;12:525–9.

19.Li W, Kinch LN, Karplus PA, et al. ChSeq: A database of chame-leon sequences. Protein Sci 2015;24:1075–86.

20. Juritz EI, Alberti SF, Parisi GD. PCDB: a database of proteinconformational diversity. Nucleic Acids Res 2011;39:D475–9.

21.Monzon AM, Juritz E, Fornasari MS, et al. CoDNaS: a databaseof conformational diversity in the native state of proteins.Bioinformatics 2013;29:2512–4.

22.Hrabe T, Li Z, Sedova M, et al. PDBFlex: exploring flexibility inprotein structures. Nucleic Acids Res 2016;44:D423–8.

Structural database resources for biological macromolecules | 9

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 10: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

23.Katebi AR, Sankar K, Jia K, et al. The use of experimental struc-tures to model protein dynamics. Methods Mol Biol2015;1215:213–36.

24. Juritz E, Palopoli N, Fornasari MS, et al. Protein conformationaldiversity modulates sequence divergence. Mol Biol Evol2013;30:79–87.

25.Swapna LS, Mahajan S, de Brevern AG, et al. Comparison oftertiary structures of proteins in protein-protein complexeswith unbound forms suggests prevalence of allostery in sig-nalling proteins. BMC Struct Biol 2012;12:6.

26.Seckler JM, Leioatts N, Miao H, et al. The interplay of structureand dynamics: insights from a survey of HIV-1 reverse tran-scriptase crystal structures. Proteins 2013;81:1792–801.

27.Spiga E, Abriata LA, Piazza F, et al. Dissecting the effects ofconcentrated carbohydrate solutions on protein diffusion,hydration, and internal dynamics. J Phys Chem B2014;118:5310–21.

28. Juritz E, Fornasari MS, Martelli PL, et al. On the effect of pro-tein conformation diversity in discriminating among neutraland disease related single amino acid substitutions. BMCGenomics 2012;13 (Suppl 4):S5.

29.Touw WG, Vriend G. BDB: databank of PDB files with consist-ent B-factors. Protein Eng Des Sel 2014;27:457–62.

30.Sim KL, Uchida T, Miyano S. ProDDO: a database of disorderedproteins from the Protein Data Bank (PDB). Bioinformatics2001;17:379–80.

31.Lobanov MY, Shoemaker BA, Garbuzynskiy SO, et al. ComSin:database of protein structures in bound (complex) and un-bound (single) states in relation to their intrinsic disorder.Nucleic Acids Res 2010;38:D283–7.

32.Di Domenico T, Walsh I, Martin AJM, et al. MobiDB: a compre-hensive database of intrinsic protein disorder annotations.Bioinformatics 2012;28:2080–1.

33.Ulrich EL, Akutsu H, Doreleijers JF, et al. BioMagResBank.Nucleic Acids Res 2008;36:D402–8.

34.Markley JL, Ulrich EL, Berman HM, et al. BioMagResBank(BMRB) as a partner in the Worldwide Protein Data Bank(wwPDB): new policies affecting biomolecular NMR depos-itions. J Biomol NMR 2008;40:153–5.

35.Lee W, Yu W, Kim S, et al. PACSY, a relational database man-agement system for protein structure and chemical shift ana-lysis. J Biomol NMR 2012;54:169–79.

36.Varadi M, Kosol S, Lebrun P, et al. pE-DB: a database of struc-tural ensembles of intrinsically disordered and of unfoldedproteins. Nucleic Acids Res 2014;42:D326–35.

37.Fagerberg L, Jonasson K, von Heijne G, et al. Prediction of thehuman membrane proteome. Proteomics 2010;10:1141–9.

38.Dobson L, Remenyi I, Tusn�ady GE. The human transmem-brane proteome. Biol Direct 2015;10:31.

39.Bakheet TM, Doig AJ. Properties and identification of humanprotein drug targets. Bioinformatics 2009;25:451–7.

40.Kozma D, Simon I, Tusn�ady GE. PDBTM: Protein Data Bank oftransmembrane proteins after 8 years. Nucleic Acids Res2013;41:D524–9.

41.Dobson L, Lang�o T, Remenyi I, et al. Expediting topology datagathering for the TOPDB database. Nucleic Acids Res2015;43:D283–9.

42.Lomize MA, Pogozheva ID, Joo H, et al. OPM database and PPMweb server: resources for positioning of proteins in mem-branes. Nucleic Acids Res 2012;40:D370–6.

43.Lomize AL, Pogozheva ID, Mosberg HI. Anisotropic solventmodel of the lipid bilayer. 2. Energetics of insertion of smallmolecules, peptides, and proteins in membranes. J Chem InfModel 2011;51:930–46.

44. Jo S, Lim JB, Klauda JB, et al. CHARMM-GUI Membrane Builderfor mixed bilayers and its application to yeast membranes.Biophys J 2009;97:50–8.

45.Stansfeld PJ, Goose JE, Caffrey M, et al. MemProtMD: auto-mated insertion of membrane protein structures into explicitlipid membranes. Structure 2015;23:1350–61.

46.Perea M, Lugtenburg I, Mayol E, et al. TMalphaDB andTMbetaDB: web servers to study the structural role of se-quence motifs in a-helix and b-barrel domains of membraneproteins. BMC Bioinformatics 2015;16:266.

47.Coimbatore Narayanan B, Westbrook J, Ghosh S, et al. The nu-cleic acid database: new features and capabilities. NucleicAcids Res 2014;42:D114–22.

48.Popenda M, Szachniuk M, Blazewicz M, et al. RNA FRABASE2.0: an advanced web-accessible database with the capacityto search the three-dimensional fragments within RNA struc-tures. BMC Bioinformatics 2010;11:231.

49.Appasamy SD, Hamdani HY, Ramlan EI, et al. InterRNA: adatabase of base interactions in RNA structures. Nucleic AcidsRes 2016;44:D266–71.

50.Firdaus-Raih M, Hamdani HY, Nadzirin N, et al. COGNAC: aweb server for searching and annotating hydrogen-bondedbase interactions in RNA three-dimensional structures.Nucleic Acids Res 2014;42:W382–8.

51.Zirbel CL, Roll J, Sweeney BA, et al. Identifying novel se-quence variants of RNA 3D motifs. Nucleic Acids Res2015;43:7504–20.

52.Chojnowski G, Walen T, Bujnicki JM. RNA Bricks–a databaseof RNA 3D motifs and their interactions. Nucleic Acids Res2014;42:D123–31.

53.Tam�o GE, Abriata LA, Dal Peraro M. The importance of dy-namics in integrative modeling of supramolecular assem-blies. Curr Opin Struct Biol 2015;31:28–34.

54.Abriata LA, Dal Peraro M. Assessing the potential of atomisticmolecular dynamics simulations to probe reversible protein-protein recognition and binding. Sci Rep 2015;5:10549.

55.Zanegina O, Kirsanov D, Baulin E, et al. An updated versionof NPIDB includes new classifications of DNA-proteincomplexes and their families. Nucleic Acids Res 2016;44:D144–53.

56.Kuang X, Dhroso A, Han JG, et al. DOMMINO 2.0: integratingstructurally resolved protein-, RNA-, and DNA-mediatedmacromolecular interactions. Database Oxf 2016, 1–12 doi:10.1093/database/bav114.

57.Finn RD, Miller BL, Clements J, et al. iPfam: a database of pro-tein family and domain interactions found in the ProteinData Bank. Nucleic Acids Res 2014;42:D364–73.

58.Desaphy J, Bret G, Rognan D, et al. sc-PDB: a 3D-database ofligandable binding sites–10 years on. Nucleic Acids Res2015;43:D399–404.

59.Kufareva I, Ilatovskiy AV, Abagyan R. Pocketome: an encyclo-pedia of small-molecule binding sites in 4D. Nucleic Acids Res2012;40:D535–40.

60.Acharya C, Kufareva I, Ilatovskiy AV, et al. PeptiSite: a struc-tural database of peptide binding sites in 4D. Biochem BiophysRes Commun 2014;445:717–23.

61.Gao Z, Li H, Zhang H, et al. PDTD: a web-accessible proteindatabase for drug target identification. BMC Bioinformatics2008;9:104.

62.Li H, Gao Z, Kang L, et al. TarFisDock: a web server for identify-ing drug targets with docking approach. Nucleic Acids Res2006;34:W219–24.

63.Kooistra AJ, Kanev GK, van Linden OPJ, et al. KLIFS: a struc-tural kinase-ligand interaction database. Nucleic Acids Res2016;44:D365–71.

10 | Abriata

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

Page 11: Structural database resources for biological macromolecules · Structural database resources for biological macromolecules Luciano A. Abriata Corresponding author: Luciano A. Abriata,

64.Kulkarni-Kale U, Raskar-Renuse S, Natekar-Kalantre G, et al.Antigen-Antibody Interaction Database (AgAbDb): a compen-dium of antigen-antibody interactions. Methods Mol Biol2014;1184:149–64.

65.Andreini C, Bertini I, Rosato A. Metalloproteomes: a bioinfor-matic approach. Acc Chem Res 2009;42:1471–9.

66.Castagnetto JM, Hennessy SW, Roberts VA, et al. MDB: themetalloprotein database and browser at the Scripps ResearchInstitute. Nucleic Acids Res 2002;30:379–82.

67.Djinovic-Carugo K, Carugo O. Structural biology of thelanthanides-mining rare earths in the Protein Data Bank. JInorg Biochem 2015;143:69–76.

68.Abriata LA. Analysis of copper-ligand bond lengths in X-raystructures of different types of copper sites in proteins. ActaCrystallogr D Biol Crystallogr 2012;68:1223–31.

69.Yao S, Flight RM, Rouchka EC, et al. A less-biased analysis ofmetalloproteins reveals novel zinc coordination geometries.Proteins 2015;83:1470–87.

70.Abriata LA. Investigation of non-corrin cobalt(II)-containingsites in protein structures of the Protein Data Bank. ActaCrystallogr B Struct Sci Cryst Eng Mater 2013;69:176–83.

71.Harding MM. Small revisions to predicted distances aroundmetal sites in proteins. Acta Crystallogr Biol Crystallogr2006;62:678–82.

72.Echols N, Morshed N, Afonine PV, et al. Automated identifica-tion of elemental ions in macromolecular crystal structures.Acta Crystallogr D Biol Crystallogr 2014;70:1104–14.

73.Hemavathi K, Kalaivani M, Udayakumar A, et al. MIPS: metalinteractions in protein structures. J Appl Crystallogr 2010;43:196–9.

74.Andreini C, Cavallaro G, Lorenzini S, et al. MetalPDB: a data-base of metal sites in biological macromolecular structures.Nucleic Acids Res 2013;41:D312–9.

75.Valasatava Y, Rosato A, Cavallaro G, et al. MetalS(3), adatabase-mining tool for the identification of structurallysimilar metal sites. J Biol Inorg Chem 2014;19:937–45.

76.Schnabl J, Suter P, Sigel RKO. MINAS–a database of metal ionsin nucleic acids. Nucleic Acids Res 2012;40:D434–8.

77.Zheng H, Chordia MD, Cooper DR, et al. Validation of metal-binding sites in macromolecular structures with theCheckMyMetal web server. Nat Protoc 2014;9:156–70.

78.Craveur P, Rebehmed J, de Brevern AG. PTM-SD: a database ofstructurally resolved and annotated posttranslational modi-fications in proteins. Database Oxf 2014, 1–9 doi: 10.1093/data-base/bau041.

79.Khoury GA, Baliban RC, Floudas CA. Proteome-wide post-translational modification statistics: frequency analysis andcuration of the swiss-prot database. Sci Rep 2011;1:90.

80.Huang K-Y, Su M-G, Kao H-J, et al. dbPTM 2016: 10-year anni-versary of a resource for post-translational modification ofproteins. Nucleic Acids Res 2016;44:D435–46.

81.Lai Y-L, Chen C-C, Hwang J-K. pKNOT v.2: the protein KNOTweb server. Nucleic Acids Res 2012;40:W228–31.

82.Di Domenico T, Potenza E, Walsh I, et al. RepeatsDB: a data-base of tandem repeat protein structures. Nucleic Acids Res2014;42:D352–7.

83.Bonet J, Planas-Iglesias J, Garcia-Garcia J, et al. ArchDB 2014:structural classification of loops in proteins. Nucleic Acids Res2014;42:D315–9.

84.Gu Y, Li D-W, Bruschweiler R. Decoding the mobility andtime scales of protein loops. J Chem Theory Comput 2015;11:1308–14.

85. Jo S, Im W. Glycan fragment database: a database of PDB-based glycan 3D structures. Nucleic Acids Res 2013;41:D470–4.

86.Mallajosyula SS, Jo S, Im W, et al. Molecular dynamics simula-tions of glycoproteins using CHARMM. Methods Mol Biol2015;1273:407–29.

87.Harrison JS, Jacobs TM, Houlihan K, et al. UbSRD: the ubiquitinstructural relational database. J Mol Biol 2016;428:679–87.

88.Singh S, Chaudhary K, Dhanda SK, et al. SATPdb: a databaseof structurally annotated therapeutic peptides. Nucleic AcidsRes 2016;44:D1119–26.

89.Lawson CL, Patwardhan A, Baker ML, et al. EMDataBankunified data resource for 3DEM. Nucleic Acids Res2016;44:D396–403.

90.Kleywegt GJ, Harris MR, Zou JY, et al. The Uppsala Electron-Density Server. Acta Crystallogr D Biol Crystallogr2004;60:2240–9.

91.Sehnal D, Pravda L, Svobodov�a Va�rekov�a R, et al.PatternQuery: web application for fast detection of biomacro-molecular structural patterns in the entire Protein Data Bank.Nucleic Acids Res 2015;43:W383–8.

92.Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: Improvedservices and an expanding collection of metabolites. NucleicAcids Res 2016;44:D1214–9.

93.Davies M, Nowotka M, Papadatos G, et al. ChEMBL web ser-vices: streamlining access to drug discovery data and util-ities. Nucleic Acids Res 2015;43:W612–20.

94.Reymond J-L. The chemical space project. Acc Chem Res2015;48:722–30.

95.Kim S, Thiessen PA, Bolton EE, et al. PubChem substance andcompound databases. Nucleic Acids Res 2016;44:D1202–13.

96.Wishart DS, Jewison T, Guo AC, et al. HMDB 3.0–the humanmetabolome database in 2013. Nucleic Acids Res2013;41:D801–7.

97. Irwin JJ, Shoichet BK. ZINC–a free database of commerciallyavailable compounds for virtual screening. J Chem Inf Model2005;45:177–82.

98.Hanson RM, Prilusky J, Renjian Z, et al. JSmol and the next-generation web-based representation of 3D molecular struc-ture as applied to proteopedia. Isr J Chem 2013;53:207–16.

Structural database resources for biological macromolecules | 11

at University of N

ebraska-Lincoln L

ibraries on June 11, 2016http://bib.oxfordjournals.org/

Dow

nloaded from