What the Protein Data Bank teaches us about structural biology

43
Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008

description

What the Protein Data Bank teaches us about structural biology. Helen M. Berman NCMI Workshop December 13, 2008. 1960’s Protein crystallography begins to take off Emerging interest in protein folding Use of computer graphics to represent structure - PowerPoint PPT Presentation

Transcript of What the Protein Data Bank teaches us about structural biology

Page 1: What the Protein Data Bank teaches us about structural biology

Worldwide Protein Data Bank

www.wwpdb.org

What the Protein Data Bank teaches us about structural biology

Helen M. Berman

NCMI Workshop

December 13, 2008

Page 2: What the Protein Data Bank teaches us about structural biology
Page 3: What the Protein Data Bank teaches us about structural biology

1960’s Protein

crystallography begins to take off

Emerging interest in protein folding

Use of computer graphics to represent structure

Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin

Lysozyme

Hemoglobin

Ribonuclease

Myoglobin

Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature 181 662-666; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, 161-187; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757.

Page 4: What the Protein Data Bank teaches us about structural biology

1970’s Grass roots

community efforts to archive data

Protein crystallographers discuss how to archive data

June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972)

October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223)

1975 PDB receives first funding from NSF (~32 structures)

Page 5: What the Protein Data Bank teaches us about structural biology

Hemoglobin

M.F. Perutz (1962) Proc. R. Soc. A265:161-187

Carboxypeptidase AF.A. Quiocho, W.N. Lipscomb (1971) Adv Protein Chem 25:1-78

MyoglobinJ.C. Kendrew, G. Bodo, H.M. Dintzis, R.G. Parrish, H. Wyckoff, D.C. Phillips (1958) Nature 181:662-666

SubtilisinR.A. Alden, J.J. Birktoft, J. Kraut, J.D. Robertus, C.S. Wright (1971) Biochem Biophys Res Commun 45: 337-344

Alpha-chymotrypsinJ.J. Birktoft, D.M. Blow (1972) J Mol Biol 68: 187-240

Pancreatic trypsin inhibitorR. Huber, D. Kukla, A. Ruhlmann, O. Epp, H. Formanek(1970) Nature 57: 389-392

Rubredoxin K.D. Watenpaugh, L.C. Sieker, J.R. Herriott, L.H. Jensen (1973) Acta Crystallogr B29: 943-956

Lactate dehydrogenaseJ.L. White, M.L. Hackert, M. Buehner, M.J. Adams, G.C. Ford, P.J. Lentz Jr., I.E. Smilely, S.J. Steindel, M.G. Rossmann (1976) J Mol Biol 102: 759-779

Cytochrome b5 F.S. Mathews, P. Argos, M. Levine (1972) Cold Spring Harb Symp Quant Biol 36: 387-395

PapainJ. Drenth, J.N. Jansonius, R. Koekoek, H.M. Swen, B.G. Wolthers (1968) Nature 218: 929-932

Page 6: What the Protein Data Bank teaches us about structural biology

Ligases

Isomerases

Lyases

HydrolasesTransferases

Oxidoreductases

Proportion of enzyme classes relative to total enzyme structures

Enzyme Class 1972-79 1980-89 1990-99 2000-08 Total

Oxidoreductases 5 25 918 2977 3925

Transferases 3 29 1423 5246 6701

Hydrolases 29 123 2797 6846 9795

Lyases 2 3 451 1337 1793

Isomerases 1 2 280 716 999

Ligases 0 4 123 652 779

Total 40 186 5992 17774 23992

Enzymes

In the beginning

LysozymeBlake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757

Ribonuclease Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757.

Decade:

Per

cen

t

Page 7: What the Protein Data Bank teaches us about structural biology

In the beginning

RNA-containing structures (1317)

Protein/RNA complexes

RNA only

1972-1979 1980-1989 1990-1999 2000-2008Decade:

Nu

mb

er o

f S

tru

ctu

res

0

200

400

600

800

1000

1200

DNA/RNA hybrid

Protein/DNA/RNA complexes

J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: 546-551.

tRNA

Page 8: What the Protein Data Bank teaches us about structural biology

1980’s Technology takes

off

Structural biology is able to focus on medical problems

Community efforts to promote data sharing

IUCr guidelines requiring data deposition in the PDB are published

Page 9: What the Protein Data Bank teaches us about structural biology

In the beginning

DNA-containing structures (2474)

Protein/DNA complexes

DNA only

DNA/RNA hybrid

Protein/DNA/RNA complexes

Z-DNAB-DNA

1bna Dickerson & Drew (1981) J. Mol. Biol. 149: 761-786 2dcg Wang, Quigley, Kolpak, Crawford, van Boom, van der Marel, Rich (1979) Nature 282: 680-686

Decade

Page 10: What the Protein Data Bank teaches us about structural biology

In the beginning

Phage 434 repressor-operator

Protein-nucleic acid complexes (1920)

Protein/DNA complexes

Protein/RNA complexes

Protein/DNA/RNA complexes

Nu

mb

er o

f S

tru

ctu

res

2or1 Aggarwal, Rodgers, Drottar, Ptashne, & Harrison (1988) Science 242: 899-907

Decade:

Page 11: What the Protein Data Bank teaches us about structural biology

Helical (25)

Icosahedral(255)

Viruses (280 total)

In the beginning

Hopper, Harrison, Sauer (1984) Structure of tomato bushy stunt virus. V.

Coat protein sequence determination and its structural implications J.Mol.Biol.

177: 701-713

Silva, Rossmann (1985) The refinement of southern bean mosaic virus in

reciprocal space Acta Crystallogr. B41: 147-157

20

121

139

0

20

40

60

80

100

120

140

160

1980-1989 1990-1999 >=2000

Nu

mb

er o

f S

tru

ctu

res

Decade

Page 12: What the Protein Data Bank teaches us about structural biology

Cooperative community action

Individual letters to editors of journals

Committees – IUCr commission on Biological

Macromolecules– ACA/USNCCr– Richards committee

Funding agencies Articles in journals

Marvin Cassman Fred Richards Richard Dickerson

Page 13: What the Protein Data Bank teaches us about structural biology

1990’s

Number of structures increases exponentially

Complexity of structures increases

mmCIF dictionary created

New databases begin to emerge

User base expands dramatically

PDB archive moves

mmCIF Working Group Members

Page 14: What the Protein Data Bank teaches us about structural biology

In the beginning

Electron Microscopy structures

Bacteriorhodopsin

Henderson, Baldwin, Ceska, Zemlin, Beckmann, Downing (1990) J.Mol.Biol.

213: 899-929.

Page 15: What the Protein Data Bank teaches us about structural biology

Ribosome structures (214)

Prokaryotic Eukaryotic

In the beginning

Ban, Nissen, Hansen, Moore, & Steitz (2000) Science 289: 905-920; Clemons Jr., May, Wimberly, McCutcheon, Capel, & Ramakrishnan (1999) Nature 400: 833-840; Schluenzen, Tocilj, Zarivach, Harms, Gluehmann, Janell, Bashan, Bartels, Agmon, Franceschi, Yonath (2000) Cell 102: 615-623; Yusupova, Yusupov, Cate,& Noller (2001) Cell 106: 233-241.

Ribosome

1%1% 2%

55%

41%

30S 50S

Page 16: What the Protein Data Bank teaches us about structural biology

2000’s wwPDB is formed Continued growth in structures Structural genomics takes off

Page 17: What the Protein Data Bank teaches us about structural biology

www.wwpdb.org

Page 18: What the Protein Data Bank teaches us about structural biology

Nu

mb

er o

f re

leas

ed e

ntr

ies

Year:

Depositions to the PDB by decade

Page 19: What the Protein Data Bank teaches us about structural biology

July 2008

Page 20: What the Protein Data Bank teaches us about structural biology

What can we learn from the PDB?

Page 21: What the Protein Data Bank teaches us about structural biology

Structure distribution

46157

1301

1093

755

39

582655

Other

Protein only

Protein-DNA

complexes

DNA only

Protein-RNA complexes RNA only

RNA-DNA hybrid

Response to stimuli

Biological regulation &

signal transduction

Cellular processes

Immune system process

Other

RibosomeVirus

Enzyme17988

23466

819

4445t500

2911

218

280

*

*

*

*

* GO process

Page 22: What the Protein Data Bank teaches us about structural biology

number_prot_rna_nmr.listnumber_prot_rna_xray.listnumber_total_em.listnumber_total_nmr.listnumber_total_xray.list

000

0.3

86 0 0341 2 0

8837

1790

2

33797

5492

154

0

5000

10000

15000

20000

25000

30000

35000

1972-1979 1980-1989 1990-1999 2000-2008

X-Ray

NMR

EM

22 22 21

18

15

11

3

0

5

10

15

20

25

FIBER DIFFRACTIONNEUTRON DIFFRACTIONSOLUTION SCATTERINGPOWDER DIFFRACTIONELECTRON DIFFRACTIONELECTRON TOMOGRAPHY INFRARED SPECTROSCOPY

86 341 2

8837

1790

33797

5492

0

5000

10000

15000

20000

25000

30000

35000

1972-1979 1980-1989 1990-1999 2000-2008

X-Ray NMR EM

Num

ber o

f str

uctu

res

Structure determination methods

April 30, 2008Decade

6 176

Page 23: What the Protein Data Bank teaches us about structural biology

Resolution distribution of protein structures

Resolution distribution of other structures

Year

Re

solu

tion

Resolution distribution of all structures

Page 24: What the Protein Data Bank teaches us about structural biology

Structures containing distinct protein sequences (<98%)

Structures containing novel protein sequences (<30%)

Distinct and novel protein sequences

Decade

Per

cen

t o

f d

isti

nct

/no

vel

stru

ctu

res

Subset of PSI structures

Subset of other SG structures

1972-1979 1980-1989 1990-1999 2000-2008

0

10

20

30

40

50

60

70 63%

37%

51%

27%32%

14%

39%

16%

7%

7%

25%

4%2%

10%

Page 25: What the Protein Data Bank teaches us about structural biology

Redundancy: protein clustersCluster #

Total distinct chains in cluster

Protein cluster First structure Deposition Date

1 459 Bacteriophage T4 lysozyme 2LZM 1977-03-28

2 297 Hen white lysozyme 2LYZ 1975-02-01

3 196 Human lysozyme 1GFE 1984-10-12

4 445Mouse immunoglobulin Fc&Fab fragments 1GIG 1993-01-20

5 218Human immunoglobulin Fc&Fab fragments 1FC1 1981-05-21

6 330 HIV-1 protease 2HVP 1989-04-10

7 302 Trypsin (serine protease) 5PTP 1977-12-19

8 254 Thrombin 2HGT 1991-06-03

9 229 Human carbonic anhydrase II 1CA2 1976-05-22

10 185 Whale myoglobin 1MBN 1973-04-05

11 182 Human leukocyte antigen 1HLA 1987-10-15

12 178 Human hemoglobin -subunit 3HHB 1975-04-01

13 176 Human hemoglobin -subunit 3HHB 1975-04-01

14 160 Ribonuclease A 2RNS 1973-04-01

15 153Human cyclin-dependant kinase 2 (CDK2) 1HCK 1996-06-03

Page 26: What the Protein Data Bank teaches us about structural biology

Lysozyme: Lessons learned

Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206: 757.

T4 bacteriophage (459 structures) Amino acid replacement studies suggest

that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% B.W. Matthews (1996) FASEB J.10: 35-41.

Insight into folding and catalysis

Hen egg white (297 structures) Low sequence identity Structural similarity of active site to T4

B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981) J.Mol.Biol. 147: 545-58.

Insight into evolution and catalysis

Page 27: What the Protein Data Bank teaches us about structural biology

Myoglobin and hemoglobin: Lessons learned

Lodish et al.6

1Kuriyan, Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini, Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: 8704-8709; 4Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10:1739-1749, Harrington, Adachi, Royer Jr.

(1998) J. Biol. Chem. 273: 32690 - 32696; 6Lodish, Berk, Zipursky, Matsudaira, Balitmore, Darnell (2000) Molecular Cell Biology WH Freeman & Co.

Whale myoglobin (185 structures) Different ligands: oxygen, carbon dioxide1

Amino acid substitution studies2

Laue studies3

Insight into function and dynamics

Other species myoglobin Low sequence identity, same structure4

Insight into evolution

Human hemoglobin (178 structures)

Insight into function and disease (sickle cell anemia, thalassemia)5

Other species hemoglobin Low sequence identity, same structure4

Profound insight into evolution

Page 28: What the Protein Data Bank teaches us about structural biology

TIM barrel proteins: Lessons learned

TIM barrel structures (1727)

http://www.cathdb.info Share the same fold but represent

significant sequence and functional diversity

Are enzymes or enzyme-related proteins involved in molecular or energy metabolism

Comparative structure analysis indicates evolutionary relatedness of TIM barrel proteins

Banner, Bloomer, Petsko, Phillips, Wilson, (1976) Biochem.Biophys.Res.

Commun. 72: 146-155

Nagano, Orengo, Thornton (2002) J.Mol.

Biol. 321: 741-65.

Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65.

Page 29: What the Protein Data Bank teaches us about structural biology

HIV-related structures (609)

311

110

39

27

122

Nu

mb

er o

f S

tru

ctu

res

DecadeProteaseReverse TranscriptaseGag proteinIntegraseOther

Page 30: What the Protein Data Bank teaches us about structural biology

Amprenavir (GSK) Fosamprenavir (GSK)

Lopinavir (Abbott) Atazanavir (BMS)

Nelfinavir (Agouron) Darunavir (Tibotec)

Tipranavir (BI) Indinavir (Merck)

Ritonavir (Abbott) Saquinavir (Roche)

HIV-1 protease (311)

Navia, Fitzgerald, McKeever, Leu, Heimbach, Herber, Sigal, Darke, Springer (1989) Nature 337: 615-620; Wlodawer, Miller, Jaskolski, Sathyanarayana, Baldwin, Weber, Selk, Clawson, Schneider, Kent (1989) Science 245: 616-621

226 structures with ligands

2R5P, 2B7Z, 2AVV, 2AVO, 2AVS, 1SGU, 1SDT, 1SDV, 1SDU, 1K6C, 1C6Y, 2BPX, 1HSG, 1HSH

1T7J, 1HPV

2B60, 1RL8, 1SH9, 1N49, 1HXW

2QAK, 2PYM, 2Q63, 2PYN, 2Q64, 2R5Q, 1OHR

2O4N, 2O4L, 2O4P, 1D4Y, 1D4S

3D1X, 3D1Y, 3CYX, 2NMW, 2NMZ, 2NNP, 2NMY, 2NNK, 1C6Z, 1FB7

2FXE, 2FXD, 2O4K, 2AQU, 2FND 2RKG, 2RKF,

2QHC, 2Z54, 2Q5K, 2O4S, 1RV7, 1MUI

Page 31: What the Protein Data Bank teaches us about structural biology

Abacavir (GSK)

Nevirapine (BI) Stavudin (BMS)

Efavirenz (BMS) Lamivudine (GSK)

Zidovudine (GSK) Emtricitabine (Gilead)

Tenofovir (Gilead) Zalcitabine (Hoffmann- LaRoche)

Etravirine (Tibotec) Delavirdine (Pfizer)

HIV-1 reverse transcriptase (110)

Year

Nu

mb

er o

f S

tru

ctu

res

Wang, Smerdon, Jager, Kohlstaedt, Rice, Friedman, Steitz, (1994) Proc.Natl.Acad.Sci.USA 91: 7242-7246

76 structures with ligands

2HND, 2HNY, 1S1U, 1S1X, 1LW0, 1LWE, 1LWC, 1LWF, 1JLB, 1JLF, 1FKP, 1VRT, 3HVT

1JKH, 1IKW, 1IKV, 1FKO, 1FK9

1T05

1S6P

Page 32: What the Protein Data Bank teaches us about structural biology

KEGG PathwayNumber of Structures

Complement and coagulation cascades 506

Small cell lung cancer 506

Regulation of actin cytoskeleton 449

Non-small cell lung cancer 407

Pyrimidine metabolism 402

Nitrogen metabolism 399

Two-component system - General 360

Ribosome 333

Base excision repair 328

Purine metabolism 310

Antigen processing and presentation 281

Nicotinate and nicotinamide metabolism 252

Insulin signaling pathway 248

Porphyrin and chlorophyll metabolism 248

ABC transporters - General 246

Prostate cancer 244

Structural coverage of KEGG pathways50136 structures

16526 structures associated with KEGG pathway (33%)

Page 33: What the Protein Data Bank teaches us about structural biology

Human biological pathways

Genes that contain a PDB structure are in red

Complement and coagulation cascades pathway

Small cell lung cancer Non small cell lung cancer

Regulation of actin cytoskeleton

KEGG (http://www.genome.jp/kegg/)

Page 34: What the Protein Data Bank teaches us about structural biology

EM maps and Models in the PDB

Page 35: What the Protein Data Bank teaches us about structural biology

How EM experimentsare archived

Page 36: What the Protein Data Bank teaches us about structural biology
Page 37: What the Protein Data Bank teaches us about structural biology

Nuclear porecomplex, 85 ÅEMD-1097

Rotavirus V6protein, 3.8 ÅEMD-1461

EMDataBank

Created by EBI in 2002 for archiving EM maps US deposition/annotation site added this year Maps stored in CCP4/MRC format Associated metadata stored in xml format

580 entries total

Page 38: What the Protein Data Bank teaches us about structural biology

EM entries in the PDB

Atomic coordinate models fitted to EM maps Storage format for models and metadata is CIF Matrix representations possible Some large entries “break” PDB format

PBCV-1(1m4x, 1680 matrices)

80S ribosome(1s1h + 1s1i)230 entries total

Page 39: What the Protein Data Bank teaches us about structural biology

PDBj

Page 40: What the Protein Data Bank teaches us about structural biology

Goals

Common data model Data harvesting tools “One-stop shop” for deposition and retrieval Tools for visualization, segmentation, and

assessment

Page 41: What the Protein Data Bank teaches us about structural biology

Acknowledgements

Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL NLMBIRD-JST, MEXT

NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK

Page 42: What the Protein Data Bank teaches us about structural biology
Page 43: What the Protein Data Bank teaches us about structural biology

Acknowledgements

NIH GM079429 (Baylor, Rutgers, EBI) 2007- 2012EU Network of Excellence LSHG-CT-2004-50282 (EBI) 2004-2009