Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

37
Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Transcript of Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Page 1: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Quality of Protein Crystal Structures in the PDB

Eric. N Brown, Lokesh Gakhar

and

S. Ramaswamy.

Page 2: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Between objectivity and subjectivityCarl-Ivar Bränd´en & T. Alwyn Jones

Department of Molecular Biology, Uppsala Biomedical Center, PO Box 590, S-751 24 Uppsala, Sweden.

Protein crystallography is an exacting trade, and the results may contain errors that are difficult to identify. It is the crystallographer's responsibility to make sure that incorrect protein structures do not reach the literature.

Nature 343, 687 - 689 (22 February 1990)

Page 3: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Amplitudes and Phases - Bias.

Animal stories - by Kevin Cowtan

Page 4: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Amplitudes and Phases - Bias.

More animal stories.

Page 5: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Stolen from Bernhard Rupp website without permission

Page 6: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

How much of what we think?

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Stolen from --- James Holton, Berkeley, without permission.

Page 7: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

VALIDATION Based on GeometryWHATIFPROCHECKMOLPROBITYRAMACHANDRAN PLOT.

STRUCTURE VALIDATIONValidation based on fit to DATA R-factor/R-freeReal space fit, Etc.Problem: Data to parameter ratio.

ADD Geometric Restraints - or Chemical Knowledge

COMPOSITE VALIDATION:ASTRAL - SPACIhttp://astral.Berkeley.edu/spaci.html

Page 8: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

WHY MORE?

DON’T WE HAVE ENOUGH VALIDATION TOOLS?

WHAT IS COMMON BETWEEN ALL EXISTING VALIDATION TECHNIQUES?THERE IS AN ABSOLUTE CORRECT ANSWER

WE KNOW THERE IS NO CORRECT ANSWER

Page 9: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

THINK DIFFERENTLY

• All crystallographers want to deposit the correct structure.

• There is subjectivity and bias - all of which are random

AVERAGE IS BEST !!

Page 10: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

QUALITY & AVERAGE

• How different are you from the average is a measure of quality

HOW DO YOU DESCRIBE THE AVERAGE?

Page 11: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Quality of Model

Independent Variables

Date submitted to PDB

Maximum resolution

X-Ray Source

Number of atoms

Similarity Index

Cross Terms

Dependent Variables

R-factor

R-free

Real-space R-value

Real-space CC

Outliers

Ramachandran Violations

Page 12: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Predictive Models

Example: How To determine weight for 5’7” male . . .

. . . make up an equation . . .

. . . choose a group of males . . .

. . . fit the equation to their weight . . .

. . . evaluate equation.

Page 13: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Open problems

• What independent variables?Quality = f(resolution)Quality = f(resolution, date, x-ray

source)• What equation?

Quality = a x resolution + b x date + cQuality = a x res + log

b2(date) + c

• How to fit it to observations?- Least squares vs. Maximum likelihood- Outliers

Page 14: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Choose model based on LL

Start with Metric = a x resolution + C Add or remove terms iteratively to decrease LL Use BIC to decide if a new parameter contributes to significant

decrease in LL or not

RESULT: An equation that predicts a given metric…

Data is all structures in the PDB that have all independent and dependent variables (16,609)

PICK ALL AVAILABLE METRICS (R-factor/R-free etc.. )

and FOR EACH METRIC

Page 15: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.
Page 16: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

R factor =C + rhigh + S + N + I + rhigh × (S + N) + N × I

Rreal −space = C+ rhigh + S+D + N + rhigh ×(S+D + I + N )+D ×(I + S)+ I × N + rhigh ×(S×D + I × N )

Page 17: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

EQUATIONS FOR METRICS!

Page 18: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

INFORMATION INHERENT IN THE MODEL

Model can tell us immediately What independent variables affect what metrics (dependent variables) and by how much?

Example: R-factor Vs time R-factor Vs source & resolution

Page 19: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.
Page 20: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.
Page 21: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

UNEXPLORED QUESTIONSIN THE MODEL?

Unexplored Independent Variables :• R-sym and Redundancy• Space group and volume of unit cell?• Refinement protocol• Solvent modeling and B-factor modeling.• Temperature of data collection.• Complexity - as a function of number of

chains of macromolecules.

Page 22: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Nine - metrics to ONEPrincipal component

analysis

• We took the nine metrics and combined them to form one metric accounting for co-relations and redundancy. Now we have one metric which is what we can call Quality-values.

• CONSTRUCTION of the Q-value of the average is zero. Negative numbers mean better than average - positive numbers worse than the average. Standard deviation is one.

Page 23: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.
Page 24: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

USE OF THE MODEL

• COMPARE STRUCTURES WITH THE AVERAGE - INDIVIDUALLY AND AS A GROUP.

Q- value is now independent of all the independent variables used to make the model. (Resolution, number of atoms, date of data collection, novelty of structure etc..)

Better indicator of quality than any one of the dependent variables.

Page 25: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

STRUCTURAL GENOMICS (updated - Jan 2008)

Page 26: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

MCSG over Time!

Page 27: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

MORE-SG groups!

Page 28: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Quality Vs. Journals

Page 29: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

percentage better than global average

0 10 20 30 40 50 60 70 80 90 100

ImmunityNature immunology

Cell biochemistry and biophysicsMolecular and cellular biology

ScienceNucleic acids research

Journal of virologyBiochemical and biophysical research communications

The EMBO journal.Nature

Journal of immunology (Baltimore - Md)Nature structural biology

Journal of structural biologyDie Pharmazie

Chemistry (Weinheim an der Bergstrasse - Germany)EMBO reports

Plant & cell physiologyJournal of medicinal chemistry

Bioorganic & medicinal chemistry lettersBiochemistry. Biokhimiia

Bioorganic chemistryStructure (London - England)

The Journal of biological chemistryBiological chemistry

Journal of biological inorganic chemistry : JBICInorganic chemistryBiophysical journal

OTHERJournal of the American Chemical Society

Molecular microbiologyJournal of molecular biology

Acta crystallographica. Section D - BiologicalChembiochem : a European journal of chemical biology

Archives of biochemistry and biophysicsFEBS letters

Journal of bacteriologyProtein science : a publication of the Protein Society

ProteinsProtein engineering

Biochemical pharmacologyJournal of inorganic biochemistry

European journal of biochemistry / FEBS

Page 30: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

WHAT CAN WE DO?

• Beam lines.• Best practices.• Protocols and methodologies.• Countries.• Institutions.• Funding mechanisms.• Investigators.

Page 31: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Is this the best we can do?

Page 32: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

WE CAN DO BETTERWe improve quality of structures by better

design of experiments and refinement protocols if we know what independent variables affect what dependent variables and how?

BEFORE WE DO THIS - FIX PROBLEMS THAT WE FOUND.

•Too much dependence of external databases!

•Problems with unknown atoms.

•Develop methods for missing data correction.

Page 33: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

OTHER DATABASES - NMRSome thoughts on independent variables.• Spectrometers• Samples - size, tags, buffers etc..• Completeness of Assignments - percentage of

backbone assigned etc..• Actual Data Used in Structural Calculations -

NOE distance restraints, Hydrogen bond distance restraints (experimental vs. inferred), Torsion angle restraints, Dipolar coupling restraint, Paramagnetic restraint.

• Structural Statistics• Date of structure determination.• Relaxation measurements?

Page 34: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

OTHER DATABASES - NMR

DEPENDENT VARIABLES.• RMS deviation of Ensemble• Packing (Molprobity score?)• Ramachandran violations• Recall, Precision, F-measure (Huang, Powers and

Montelione).• Agreement with high resolution X-ray

structures• Other??

Page 35: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

AFTER Today's LECTURES

HOW ABOUT THE MODEL DATABASE?

I am sure out modeling experts can think of the dependent and independent variables….

Page 36: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

THANK YOU

ACKNOWLEDGEMENT

X-ray work - Eric N Brown and Lokesh Gakhar

The R-statistical package!

NMR work - Liping Yu and Andrew Fowler

Thanks to Brian Fox for inviting me - though I am not a member of any SG initiative.

Page 37: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Questions and Accusations.