Pdbbind 2007 Intro

3
1 A Brief Introduction to the PDB A Brief Introduction to the PDBbind bind Database v.2007 Database v.2007 PDB 40,876 entries A Protein-ligand Complexes 11,822 entries General Set 3,124 entries Refined Set 1,300 entries Core Set 210 entries B C D E The PDBbind database, as prompted by its name, is a collection of the experimentally measured binding a f f ini ti es e xclusively for the protein-l i ga nd c omp lexes a vail able in the Protei n Da ta Ba nk (PDB). It thus provides a link between energetic and structural information of those complexes and may be of great value to various molecular recognition studies. Since its first public release in May 2004, several hundred users around the world have already registered to use the PDBbind databa se . The current re l ea se i s version 2007 . In order to utilize the PDBbind database smartly, it is important to get familiar with its basic construction: (A ) The P DBbi nd v.2007 i s ba se d on the c onte nts of P DB, which contains a total of 40,876 experimentally de termi ne d structures of f icially rele a s e d be fore J a n 1s t, 2007. Theoretical models are not considered by PDBbind. (B) A s op histica te d a l go ri thm is de si gn e d to i de nti f y t he complexes formed between proteins and small organic molecules. Please refer to the references cited in the e nd o f thi s broch ure f or de tail s . A tota l of 11,822 entries are identified as valid protein-ligand complexes in this release. (C) The pri ma ry re f erence of e a ch co mp le x i s revi e wed manually to collect the experimentally determined binding affinity (IC 50 , K  i  , or K  d  ) of the complex. Binding a f f i nit y da ta o f a total of 3,124 complexes are compiled in this way from nearly 7,000 references. They form the ge ne ral s e t” of PDBbind. (D) A re f i ne d s e t” is further selected out of the general set with a number of criteria. These criteria address the quality of binding data as well as complex structures. Each complex in the refined set has been double- checked to ensure that binding affinity matches the structure from PDB. The refined set is designed to serve as a high-quality standard data set for theoretical studies on protein-ligand binding. The refined set in this release consists of a total of 1,300 entries. http://www.pdbbind.org/ http://www.pdbbind.org/ (To be continued on next page)

Transcript of Pdbbind 2007 Intro

Page 1: Pdbbind 2007 Intro

8/3/2019 Pdbbind 2007 Intro

http://slidepdf.com/reader/full/pdbbind-2007-intro 1/2

1

A Brief Introduction to the PDBA Brief Introduction to the PDBbindbind Database v.2007Database v.2007

PDB40,876 entriesA

Protein-ligandComplexes

11,822 entries

General Set3,124 entries

Refined Set1,300 entries

Core Set210 entries

B

C

D

E

The PDBbind database, as prompted by its name, is a collection of the experimentally measured

binding affinities exclusively for the protein-l igand complexes available in the Protein Data Bank(PDB). It thus provides a link between energetic and structural information of those complexesand may be of great value to various molecular recognition studies. Since its first public release inMay 2004, several hundred users around the world have already registered to use the PDBbinddatabase. The current release isversion 2007.

In order to utilize the PDBbind database smartly, itis important to get familiar with its basic construction:

(A) The PDBbind v.2007 is based on the contents of PDB,which contains a total of 40,876 experimentallydetermined structures officially released before Jan 1st,2007. Theoretical models are not considered byPDBbind.

(B) A sophisticated algorithm is designed to identify thecomplexes formed between proteins and small organic

molecules. Please refer to the references cited in theend of this brochure for details. A total of 11,822

entries are identified as valid protein-ligand complexesin this release.

(C) The primary reference of each complex is reviewedmanually to collect the experimentally determinedbinding affinity (IC 50 , K  i  , or K  d  ) of the complex.Binding affinity data of a total of 3,124 complexes arecompiled in this way from nearly 7,000 references.They form the “general set” of PDBbind.

(D) A “refined set” is further selected out of the general setwith a number of criteria. These criteria address thequality of binding data as well as complex structures.Each complex in the refined set has been double-checked to ensure that binding affinity matches thestructure from PDB. The refined set is designed toserve as a high-quality standard data set for theoretical

studies on protein-ligand binding. The refined set inthis release consists of a total of 1,300 entries.

http://www.pdbbind.org/http://www.pdbbind.org/

(To be continued on next page)

Page 2: Pdbbind 2007 Intro

8/3/2019 Pdbbind 2007 Intro

http://slidepdf.com/reader/full/pdbbind-2007-intro 2/2

2

1. It is a crystal structure with an overall

resolution ≤ 2.5 angstrom.2. It is a “clean”, binary complex formed between

one protein and one ligand through non-covalent binding.

3. It has an experimentally determined K  d  or K  i  value.

4. The protein and the ligand used in bindingassay exactly match the ones observed in thecomplex structure.

5. The ligand does not consist of any elementother than C, N, O, S, P, H, and halogens, andits molecular weight is lower than 1,000.

6. There are no unnatural amino acid residues inthe binding site on the protein.

(E) The refined set is further grouped into clusters by protein sequence similarity. BLAST is used to

compute the similarity between any two sequences, and a cutoff of 90% is used in clustering.Each cluster normally consists of complexes formed by a particular type of protein. A total of70 clusters from the refined set are found to contain at least four members. The one with thehighest binding affinity, the one with the lowest binding affinity, and the one with a mediumbinding affinity in each cluster are selected as the representatives of this cluster. All of theserepresentatives,210 complexes in total, form the “core set” of PDBbind. Note that the core setis strictly a subset of the refined set. The core set is designed to provide a non-redundantsampling of the refined set, which may be more appropriate and more convenient to use forcertain studies.

The PDBbind database is developed by a collaboration between Prof. Shaomeng Wang’s group atthe University of Michigan and Prof. Renxiao Wang’s group at the Shanghai Institute of OrganicChemistry. To cite the PDBbind database, please refer to:1. Wang, R.; Fang, X.; Lu, Y.; Yang, C.-Y.; Wang, S. J. Med. Chem . 2005, 48 , 4111-4119.2. Wang, R.; Fang, X.; Lu, Y.; Wang, S. J. Med. Chem. 2004, 47 , 2977-2980.

A given protein-ligand complex is accepted into the refined set if it meets all of the followingcriteria:

For users’ convenience, PDBbind providesprocessed structural files for each complex in therefined set, which can be readily utilized by

molecular modeling software. Thebiological unitof each complex is split into a protein saved inthe PDB format, and a ligand saved in the Mol2format and the SD format. Atom types and bondtypes on the ligand are assigned carefully asappropriate.

http://www.pdbbind.org/http://www.pdbbind.org/

Version

Protein-ligand

Complexes

Entriesin the

GeneralSet

Entriesin the

RefinedSet

Entriesin theCoreSet

2002 5671 1446 800 ---

2003 5897 1763 900 ---

2004 6847 2276 1091 231

2005 9775 2756 1296 288

2006* 9775 2632 1122 234

2007 11822 3124 1300 210

*: V.2006 is in fact a correction on v.2005.

History of the PDBbind Database

0 2 4 6 8 10 12 14

0

50

100

150

200

250

300

 

   P

  o  p  u   l  a   t   i  o  n

Binding Data

Refined Set

Distribution of the Binding Datain the Refined Set