Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for...
-
Upload
abigayle-bishop -
Category
Documents
-
view
218 -
download
0
Transcript of Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for...
www.ccdc.cam.ac.uk
CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign
John Liebeschuetz, Peter Carlqvist, Simon BowdenCambridge Crystallographic Data Centre
12 Union Rd., Cambridge, UK
www.ccdc.cam.ac.uk
Assessment and Comparison of Ligand – Protein Structural Models
• For the Crystallographer
– What is wrong with my model?
– What interesting features or differences with related structures can I highlight in my publication?
• For the Molecular Modeller
– What is wrong with the Crystallographer’s model?
– What interesting features or differences with related structures can I use to inform my structure-based drug design campaign ?
– Are there non-homologous structures with similar features that I need to watch out for?
www.ccdc.cam.ac.uk
Why can’t I take a structure from the PDB and just use it ?
• Validation of ligand structures bound to proteins
15% of 100 recent PDB entries have ligand geometry that are almost certainly in significant error (in house analysis using Relibase+/Mogul)
evaluation of pdb ligand dataset from 1990's with Mogul and Relibase
correct34%
wrong26%
not unusual40% correct
wrong
not unusual
evaluation of most recent pdb ligand dataset with Mogul and Relibase
correct29%
wrong16%
not unusual55%
correct
wrong
notunusual
Pre 2000 2006
www.ccdc.cam.ac.uk
How much ligand strain is accomodated by the protein?
• Accepted View –Many ligands adopt strained conformation when bound to proteins, some (60%) do not bind even in a local minimum conformation. (Perola & Charifson, J. Med. Chem. 2004, 47, 2499-2510)
• Alternative view – Ligands usually (but not always) bind in a local minimum. Many ‘strained’ structures found in the PDB are imperfectly refined. (Open-Eye, B. Kelley and G. Warren, EuroCYP)
www.ccdc.cam.ac.uk
CCDC Tools that can help you
• Relibase/Relibase+ - Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB)
– Relibase is freely available for academics
– Relibase+ has extra features (some of these will be used in this workshop)
• The Cambridge Structural Database System - Database of > 400,000 small molecule crystallographic structures, and associated query software
– Mogul and IsoStar knowledge-bases of molecular geometry and inter-molecular interactions
– Directly linked access from Relibase+
www.ccdc.cam.ac.uk
The Workshop
Part 1: Validation of models and structural analysis
• Analysing a protein structure for errors and interesting features
• Comparing a structure with structures related by homology or by functionality
Part 2: Probing the Protein-Ligand Interface
• Substructure searching in Relibase/Relibase+
• Comparing the interactions of different ligands with the same target
• Validating an unusual interaction using substructure searching in Relibase+
www.ccdc.cam.ac.uk
Relibase+
• Relibase+
– Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB)
– Successor to ReLiBase (developed by Manfred Hendlich et al. (Merck, Marburg U.) M. Hendlich, Acta Cryst. D54,1178-1182, 1998
• Relibase: free on WWW for academics
– http://relibase.ccdc.cam.ac.uk/
– http://relibase.rutgers.edu/
www.ccdc.cam.ac.uk
Relibase+
• Keyword searching
• FASTA protein sequence searching
• 2D substructure searching
• 3D protein-ligand interaction searching
• Protein-protein interaction searching
• Similarity searching for ligands
• SMILES substructure matching
• Automatic superposition of related binding sites to compare ligand binding modes, water positions, etc.
• 3D visualisation with AstexViewer and ReliView(Hermes)
Basic Functionality
www.ccdc.cam.ac.uk
Relibase+
• Functionality for generation and search of proprietary databases of protein-ligand complexes alongside the PDB
• Links to the Mogul and IsoStar modules of the CSDS for geometry validation
• Additional modules: Crystal packing, WaterBase, CavBase
• Detailed analysis of superimposed binding sites
• Enhanced treatment of hitlists
• Reliscript: Command-line access via a Python-based toolkit
• Coming Soon: SecBase including Turn Classification
Advanced Functionality
www.ccdc.cam.ac.uk
CavBase
• Detect unexpected similarities amongst protein cavities (e.g. active sites) that share little or no sequence homology.
• Similarity judged by matching 3D property descriptors (pseudocentres) that encode the shape and chemical characteristics of each cavity
• No sequence information used, can detect similar cavities even if they have no obvious secondary-structure relationship
• Developed by S.Schmitt et al., J.Mol.Biol. (2002)
CavBase
www.ccdc.cam.ac.uk
Cambridge Structural Database
• Repository for the world’s small organic and metal-organic crystal structures (up to 500 non-H atoms)
• Experimentally determined 3D structures via X-ray, and neutron diffraction methods
• 2007 release contains 423,798 entries
– approximately 32,000 entries added per year
• Derived from around 1200 published sources
– official depository for >80 major journals
– majority of data directly deposited electronically (CIF)
• Increasing number of Private Communications
www.ccdc.cam.ac.uk
How much Data is Available?
CSD Growth 1970-2006
419,768 entries June 2007
0
100000
200000
300000
400000
500000
600000
2001 2003 2005 2007 2009
Growth of the CSD
Predicted Growthto 2010
>500,000 entries during 2009
www.ccdc.cam.ac.uk
CSD Information content
Atomic coordinates, unit-cell, space-group symmetry (fully validated)
Crystal structure data
www.ccdc.cam.ac.uk
Bibliographic and Chemical Information
• Bibliographic and chemical text and properties (all searchable)
4-Oxonicotinamide-1-
(1’-beta-D-2’,3’,5’-tri-O-acetyl-ribofuranoside)
Source: Rothmannia longiflora
Colour: pale yellow
Habit: acicular
Polymorph: Form IV
C17 H20 N2 O9
G. Bringmann, M. Ochse, K. Wolf,
J. Kraus, K. Peters, E-M. Peters,
M. Herderich, L. Ake, F. Tayman
Phytochemistry 51 (1999), p271
R-factor: .0506
• Chemical diagram and chemical connectivity to enable 2D and 3D searching for substructures, pharmacophores and intermolecular interactions
• Cross-referencing between entries
CSD Information content
www.ccdc.cam.ac.uk
Cambridge Structural Database System
CambridgeCambridge StructuralStructural DatabaseDatabase
PreQuestDatabase Production
VISTAStatisticalanalysis
MercuryGraphical display,packing analysis
ConQuestDatabase
Search
MogulLibrary of
Molecular Geometry
IsoStarLibrary of
Intermolecular Interactions
Knowledge Bases
www.ccdc.cam.ac.uk
MogulA Knowledge Base of Molecular Geometries
Bruno et al., J. Chem. Inf. Comput. Sci., 44, 2133-2144, 2004
www.ccdc.cam.ac.uk
Incorporates pre-computed libraries of bond lengths, valence angles and torsion angles, derived entirely from the CSD
Sketch or import molecule, then click on feature of interest to view distribution, mean values and statistics
Very fast search speeds, with hyperlinks to the CSD to view specific structures
Complete geometry: retrieve distributions for all bonds, angles and torsions in the molecule
MogulRapid access to CSD information
www.ccdc.cam.ac.uk
A Knowledge Base of Intermolecular Interactions
• Experimental data from:
– Cambridge Structural Database
– Protein Data Bank (protein-ligand complexes only)
– Theoretical potential energy minima (DMA, IMPT)
• Interaction distributions displayed immediately as scatterplots or contour surfaces
• >20,000 CSD scatterplots, >5,500 PDB, 1,500 Eminima
IsoStar
www.ccdc.cam.ac.uk
central group: -CONH2
contact group: NH
IsoStar Methodology
Search CSD or PDB for structures containing desired contact
Superimpose hits and display as scatterplots
www.ccdc.cam.ac.uk
Density Maps
Can also represent distribution as density maps
www.ccdc.cam.ac.uk
The Workshop
Part 1: Validation of models and structural analysis
• Analysing a protein structure for errors and interesting features
• Comparing a structure with structures related by homology or by functionality
Part 2: Probing the Protein-Ligand Interface
• Substructure searching in Relibase/Relibase+
• Comparing the interactions of different ligands with the same target
• Validating an unusual interaction using substructure searching in Relibase+
www.ccdc.cam.ac.uk
How to access the workshop
http://relibase.ccdc.cam.ac.uk/
s1mple
Webpage
Email address
Password
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
Cavity Detection
PROTEIN
N
O
OO
N
ON
N
O
N
OO
N
N
O
O
N
O
N
N
N
O
Based on the LIGSITE ProgramM.Hendlich et al., J.Mol.Graph. (1997).
www.ccdc.cam.ac.uk
The pseudo-centre concept
donor
acceptor
aliphatic
pi/aromatic
NH
O
O
O
N
O
O
N
HN
HH
Coding Molecular Recognition into Simple Descriptors
www.ccdc.cam.ac.uk
O
NH
Cavity
Protein
3D Property Description
www.ccdc.cam.ac.uk
Similarity Search
www.ccdc.cam.ac.uk
Similarity Search
Clique detectionBron-Kerbosch
www.ccdc.cam.ac.uk
Similarity Search
Clique detectionBron-Kerbosch
www.ccdc.cam.ac.uk
Similarity Analysis
Scoring based on matching pseudo-centres, and the associated surface patches
www.ccdc.cam.ac.uk
An Example
1OXO/1F2D
• Overlay of PLP ligands
• Matching pseudo-centres and surface patches shown
www.ccdc.cam.ac.uk
Crystal PackingImportant e.g. when docking ligands
Concanavalin A (1cjp) Binding site in Relibase+
www.ccdc.cam.ac.uk
1mtw
reference ligand, no packing
reference in green, first-rank solution atom-coloured
www.ccdc.cam.ac.uk
1mtw, Packing Included
reference ligand, no packing
including neighbouring chains
GOLD’s first-rank solution