Post on 29-Jan-2016
description
Management and Distribution of Chemical Data in the
Protein Data Bank
John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne
and Helen Berman
RCSB Protein Data Bank
U.S. Government Chemical Databases and Open Chemistry August 26, 2011
What is the Protein Data Bank?
Single international archive for information about the structure of large biological molecules
PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules
Outcome of a Workshop on Archiving Structural Models of Biological Macromolecules (2006) Structure 14: 1211-1217
What is the content of the PDB? Public archive (August 2011)
More than 75,000 entries More than 550,000 files Requires over 115 GB of
storage Data dictionaries Derived data files
For each entry Atomic coordinates Sequence information Description of structure Experimental data Release status information
Internal archive Depositor correspondence Depositor contact information Paper records Documentation Historical records from Day
One
Who manages the PDB?
NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK NLM
EMBL-EBI, Wellcome Trust, BBSRC, NIGMS, EU NBDC-JST
Who uses the PDB?
Depositors
Users
Nu
mb
er o
f re
leas
ed e
ntr
ies
Year:
Chemical data in PDB
Understanding the interactions between proteins and small molecules is key to understanding biological function Providing accurate chemical descriptions is a major focus of PDB annotation All polymer and small molecule chemical components are described in the PDB Chemical Component Dictionary Significant software and data infrastructure has been created to maintain this dictionary and to provide a consistent chemical representation across the PDB archiveChemical representation in the PDB is under constant scrutiny and is continuously improved
Depositedcoordinates
Chemicalcomponents
Perceivedcovalentstructure
New?
Chemical ComponentDictionary
Standardizeresidue/atom nomenclature
Yes
No
Comparewith
dictionary
Processdeposited
entry
Annotatechemical definition
How does new chemistry enter the PDB?
PDB entry 3dnb; 1.3 Å resolution PDB entry 6bna; 2.21 Å resolution
Chemical data in PDB are experimentally derived subject to modeling restraints
Assessing data quality
How are data checked now?
Chemistry Polymer (match to sequence DB and internal
consistency) Ligands, ions, inhibitors (match to dictionary)
Geometry Close contacts Valence geometry Torsion angles
Experimental data Model vs. structure factors
Method-specific Validation Task Forces have been convened to collect recommendations and develop consensus on method-specific issues, including validation checks that should be performed and identification of validation software applications.
On-going focus on data quality
X-ray Validation 2008 Workshop on Next
Generation Validation Tools for the wwPDB
White paper accepted by Structure Chair: Randy J. Read (University of
Cambridge)
3DEM Validation Meeting September 2010 Chairs: Richard Henderson (Maps,
Cambridge University), Andrej Sali (Models, UCSF)
White paper in progress
NMR Validation Meetings held September 2009,
January 2011 Report in progress Chairs: Gaetano Montelione
(Rutgers), Michael Nilges (Institut Pasteur)
Small-Angle Scattering Members: Jill Trewhella (University
of Sydney), Dmitri Svergun (EMBL Hamburg), Andrej Sali (UCSF), Mamoru Sato (Yokohama City University), John Tainer (Scripps)
Documenting PDB chemistryin the Chemical Component Dictionary
Library of all polymer and non-polymer chemical components in PDB ~13,000 chemical component definitions 400 additional definitions of amino acid
protonation variants ~700 new components released this year ~1700 component definitions updated this year Maintained by members of the wwPDB
wwPDB resourceswwpdb.org
Chemical Component Dictionary and data download options
Chemical definitions in mmCIF, PDBML/XML and SDF/MOL formats
Tabulations of SMILES, InChI and InChI key descriptors for each chemical definition
Bundles of coordinates extracted from PDB entries for each ligand in the archive, stored in mmCIF, PDBML and SDF/MOL formats
Chemical Component Dictionary content
Molecular names and synonyms Chemical formula, formula weight, and formal charge Atom and residue nomenclature Polymer linking type Model coordinates (an example from a PDB entry) Computed coordinates (Corina or OpenEye) Connectivity and bond types Stereochemistry and aromaticity Systematic names (ACDLabs & OpenEye) SMILES, InChi, and InChiKey descriptors Release status and revision history
Chemical Component Dictionary Interpretation
Definitions includeCommon or representative forms of the moleculeGenerally neutral and complete moleculesOff-the-shelf reagents used to prepare an experimental sampleModel coordinates from a single experimental observationComputed coordinates from programs: Corina or OpenEye/Omega
Searching the Chemical Component Dictionary
ligand-expo.rcsb.org
Search optionsMolecular NameFormulaSMILES InChI/InChIKeyPDB component identifierChemical substructure
Browsing optionsStandard and modified amino acidsStandard and modified nucleotidesSelected top-selling pharmaceuticalsCommon aromatic ring systems
Ligand Expo: Browse dictionary content
Ligand Expo: View chemical details
Ligand Expo: View chemical details
Ligand Expo: Find data in related resources
Find small molecules at the RCSB PDBhttp://www.pdb.org
Simple search for all entries containing a particular ligand
RCSB PDB Small molecule Advanced Search
Interactive chemical structure search with graphics Exact, substructure, superstructure, MW searches Restricted formula searches
RCSB PDB report and display of molecular interactions
Access
RCSB Protein Data
Bank www.pdb.org
Ligand Expo ligand-expo.rcsb.org
wwPDB www.wwpdb.org
Dictionary
Resources mmcif.pdb.org
pdbml.pdb.org
Acknowledgements
Operated by two members of the RCSB:
Supported by:
NIGMS
The RCSB PDB is a member of the