Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf ·...

23
Protein Crystallography Part II Tim Grüne Dept. of Structural Chemistry Prof. G. Sheldrick University of Göttingen http://shelx.uni-ac.gwdg.de [email protected]

Transcript of Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf ·...

Page 1: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Protein CrystallographyPart II

Tim GrüneDept. of Structural Chemistry

Prof. G. SheldrickUniversity of Göttingen

http://shelx.uni-ac.gwdg.de

[email protected]

Page 2: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Overview

• The Reciprocal Lattice

• The Ewald Sphere

• Data Processing and Scaling

• The Phase Problem

• SAD, MAD, MIR, RIP, et al.

Molecular Biology 1 Protein Crystallography II

Page 3: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Amplitudes and Phases

The electron density can be calculated from the structure factors via the Fourier transformation.

ρ(x, y, z) =1

Vunitcell

∑h,k,l

F (h, k, l) · e−2πi(hx+ky+lz)

=1

Vunitcell

∑h,k,l

|F (h, k, l)| · eiφ · e−2πi(hx+ky+lz)

This is easily done by a computer. The equation, however, contains two unknown quantities, amplitude|F (h, k, l)| and phase φ of the reflections. They must be known before anything can be computed.

The first half of this talk deals with how to extract the first part, the amplitude, from diffraction experiments.

The second half is concerned with how to retrieve the phases.

Molecular Biology 2 Protein Crystallography II

Page 4: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

The Reciprocal Lattice

The reciprocal lattice is an important concept in crystallography.It is created by three reciprocal lattice vectors, ~a∗, ~b∗, and ~c∗,derived from the real space vectors ~a, ~b, and ~c. ~a

~bA

c∗ = A/Vunitcell

~c∗

For an orthorhombic space group (all angles 90◦), the reciprocal vectors are parallel to the real spacevectors, with different lengths. For general space groups the vectors are not parallel to those of the realspace unit cell. But in any case, the volume of the reciprocal cell is the inverse of the real space cell,V ∗ = 1/V

The point group (symmetry without translations) is the same for the real and the reciprocal lattice. There-fore many symmetry related questions for crystals also apply to their reciprocal lattice.

Molecular Biology 3 Protein Crystallography II

Page 5: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

The Ewald Sphere

It is important to collect as complete data as possible, i.e., to record nearly all reflections up to the resolu-tion limit (which is often due to the crystal quality). In order to understand which reflections are collected,one can look at the Ewald Sphere. It is constructed in reciprocal space.

r = 1/λ

incident beam

origin

When a lattice point crosses theEwald sphere, a reflection occurs inthe direction determined by the cen-tre of the sphere and the point of in-tersection. The angle 2θ is the sameas the direction recorded on the de-tector, even though the Ewald sphereis constructed in reciprocal space.

Molecular Biology 4 Protein Crystallography II

Page 6: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Limits of Data Collection

During data collection the crystal is rotated about an axis. The reciprocal lattice then rotates about thesame axis. All lattice points that pass through the Ewald sphere during rotation are collected. Apart fromthe resolution limit (radius of the sphere, but more likely the quality of the crystal), two parts of reciprocalspace cannot be collected:

rotation axis

camera limitr = 1/λ

The grey shaded zone canbe minimised by changingthe direction of the rota-tion axis with respect ofthe incident beam direc-tion.

Molecular Biology 5 Protein Crystallography II

Page 7: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Data Collection

X-ray beam

rotation

0− 1◦ 1− 2◦ 179− 180◦

Typical frame widths range from 0.2–1◦. For a 180◦ scan, this gives 180–720 images. This is typical forproteins that diffract to moderate resolution. A more thorough data collection rotates the crystal about twoaxes. One easily ends up with a few thousand image.

Molecular Biology 6 Protein Crystallography II

Page 8: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Data Processing/ Integration

Data collection results in a list of images, each representing a wedge of the rotation of the crystal in thebeam. The images are distorted sections of reciprocal space.

Data integration has to reconstitute the original, undistorted lattice in 3 dimensions. It provides a (long)list with one line per reflection:

det.–coord’sH K L Intensity error x y z[◦]-3 0 -3 4.162E+03 1.537E+02 1181.5 1235.6 107.4-3 -3 0 2.747E+03 -1.075E+02 1110.9 1205.1 76.0-3 0 3 3.946E+03 1.451E+02 1156.2 1233.4 18.31 1 -4 5.933E+03 -2.139E+02 1215.0 1226.7 165.04 1 -1 5.640E+03 -2.064E+02 1209.5 1074.0 57.3

Molecular Biology 7 Protein Crystallography II

Page 9: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Data Integration — Flow Chart

Molecular Biology 8 Protein Crystallography II

Page 10: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Scaling I

Calculation of the electron density is based on an ideal crystal: infinitely large, perfect unit cell, but alsoperfect data collection.

This is quite far from reality.

• Different regions of the detector have different sensibility

• Beam instability: one some frames the total intensity can be higher than on others — this refersespecially to synchrotrons

• The crystal is not perfectly centred in the beam

• Data may even be collected from several crystals

Molecular Biology 9 Protein Crystallography II

Page 11: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Scaling II

The (experimental) differences in intensities necessitates the scaling of the data: All reflections must beput on a common scale. To do so, one takes symmetry related reflections into account: Reflections thatare related by one of the symmetry operators of the crystal’s space group must have equal intensities.

Even in the simplest space group (P1) with no symmetries, scaling can be carried out because of Friedel’slaw: Reflections with negated indices, i.e., (h, k, l) and (−h,−k,−l) have the same intensity. That isbecause they are reflected from the same set of planes, but on opposite sides.

Molecular Biology 10 Protein Crystallography II

Page 12: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

The Phase Problem

ρ(x, y, z) =1

Vunitcell

h,k,l=∞∑h,k,l=−∞

|F (h, k, l)| eiφ(h,k,l)e−2πi(hx+ky+ly)

to calculate electron density

gives |F (h, k, l)|, but not φ(h, k, l)

The structure factor, from which we could calculate the electron density distribution of the crystal, isa complex quantity. It has an amplitude and a phase. Only the amplitude, but not the phase can bedetermined directly from a diffraction experiment.

This loss can be compared with a projection on a plane wall:The eye may see a three dimensional object — but which facepoints forward?

This problem is known as the phase problem of crystallography.

Molecular Biology 11 Protein Crystallography II

Page 13: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

The Importance of the Phase

Unfortunately, the phase of the structure factor contains the main information about the shape of themolecule.

inverse FT

|F (h, k, l)|, φ(h, k, l)

|F (h, k, l)|, φ(h, k, l)

inverse FT

FT

|F (h, k, l)|

φ(h, k, l)

The phase φ of the duckdetermines the picture

pictures from http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html

Molecular Biology 12 Protein Crystallography II

Page 14: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Techniques for retrieving the Phases — Overview

One of the major efforts of macromolecular crystallography lies in determining good phases. The followingare the most frequently used techniques:

1. direct methods (small molecules and high resolution only)

2. molecular replacement

3. isomorphous replacement

4. anomalous dispersion

5. exploitation of radiation damage

Molecular Biology 13 Protein Crystallography II

Page 15: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Direct methods

With small molecules ( <1000 unique atoms) and high resolution ( > 1.2Å), one can manage to find thestructure from random starting phases. The starting phases are optimised using the assumption thatthe structure consists of resolved atoms. This assumption imposes statistical restraints on the phaseprobability distribution.

Very small structures can also be solved by interpreting the Patterson function. This is a Fourier transformbased on intensities rather than structure factors, i.e., it can be calculated from experimental data. ThePatterson function has the property that a vector to a peak is also a vector connecting two atoms in thestructure.

For too many atoms, the peaks of the Patterson function come to close to be interpreted.

Molecular Biology 14 Protein Crystallography II

Page 16: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Molecular Replacement

By November 2004, the PDB, the Protein Data Bank(http://www.rcsb.org/pdb), held more than 28,000structures, both from X-ray crystallography and NMR. Less and less of newly deposited structures reveala new fold. Sequence homology between two proteins normally also implies structural similarity, andtherefore chances are good that a new structure is similar to an already determined one.

One search the unit cell with a structure or a fragment of a known structure for the correct orientation andposition. These co-ordinates can then be used to calculate first phases for the experimental data. Thesearch is done in two steps:

Rotational search The Patterson function can be calculated both from the diffraction data and the searchmodel. It does not depend on the position within the unit cell, but only on the orientation. Hence, wecan calculate the Patterson for the model in different orientations, compare it with the Patterson of thedata, and pick the orientation with the best agreement.

Translational search The model is moved through the asymmetric unit keeping the orientation found atthe rotational search. At each point, the calculated structure factor amplitudes |Fc| are scored againstthe experimental data.

Problems: strong model bias (phases!), may sometimes not work even with 100% sequence homology(domain movements).

Molecular Biology 15 Protein Crystallography II

Page 17: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Isomorphous Replacement

Isomorphous replacement is based on the idea that introduction of a small molecule into a protein ornucleic acid crystal does not or hardly alter the structure of the macromolecule. On the other hand, a fewheavy metal atoms can contribute detectably to the structure factors and hence introduce changes in thereflection intensities.

Common heavy metals are Hg (80e−), Pb (82e−),Au (79e−), Pt (78e−), or U (92e−). They can be in-corporated by co-crystallisation or by soaking afterthe crystals have grown.The first protein structures like myoglobin orhemoglobin were solved by isomorphous replace-ment.

G. Sheldrick

Molecular Biology 16 Protein Crystallography II

Page 18: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Isomorphous Replacement

In order to use the extra information, one needs at least two data sets: a native one (no heavy metal) anda derivative (with heavy metal).

|FH | , φH

|FT | , φT

construction

Harker-co-ordinates

difference

derivative: |FT |

native: |FP |

The co-ordinates of the heavy metal(s) can be derived via either direct methods or Patterson methods.From the co-ordinates one can calculate structure factors (amplitude and phase!). The phases for thederivative follow from the Harker construction.

Molecular Biology 17 Protein Crystallography II

Page 19: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

The Harker Construction

With a single derivative, the Harker construction provides phases for the protein structure up to a twofoldambiguity:

1. Draw a circle with radius |FT |2. Draw the vector for the heavy atom,|FH |, φH

3. From its endpoint, draw a circle with radius |FP |The two circles have two points of intersection from which onereads the two possible phases φT for the derivative or ( draw-ing the vector from the endpoint of the heavy atom) the nativestructure φP .

|FT |, φT

|FT ||FP |

|FH |, φH

With only one derivative, one speaks of SIR, single isomorphous replacement, with more than one, onespeaks of MIR, multiple isomorphous replacement.

MIR removes the ambiguity of SIR. The more derivatives, the better the phases (and their errors) can bedetermined.

Molecular Biology 18 Protein Crystallography II

Page 20: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Anomalous Dispersion

For a normal diffraction experiment, Friedel’s law is valid, which states that the intensities of the reflection(h, k, l) and (−h,−k,−l) are equal and that the phases of the underlying structure factor have oppositesigns, φ(h, k, l) = −φ(−h,−k,−l). For heavy atoms, the wavelength of X-rays lies in a region wherethis is no longer true under all circumstances. This effect is due to absorption of these atoms at specificwavelengths. This wavelength is different for every type of atom and normally has to be determined beforedata collection by a fluorescence scan (scattering of X-rays at right angle to the incident beam).

The difference in intensities can be exploited by a Harker construction similar to isomorphous replacement,but with |FT | and |FP | replaced with |F (h, k, l)| and |F (−h,−k,−l)|. With this SAD (single-wavelengthanomalous dispersion) approach, the two-fold ambiguity for the phases remains.

Molecular Biology 19 Protein Crystallography II

Page 21: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

SIRAS and MAD phasing

To overcome the twofold phase ambiguity, two methods can be applied:

1. SIRAS Often a native crystal or dataset is available, when SAD data are collected. This leads to thecombination of SIR and SAD — SIRAS. SIR from the comparison of native to derivative, SAD fromthe derivative

2. MAD Instead of changing crystals, one can change the wavelength: the strength of anomalous signalvaries with the wavelength. This results in multi-wavelength anomalous dispersion or MAD

Molecular Biology 20 Protein Crystallography II

Page 22: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Some “exotic” experimental techniques

RIP Radiation Induced Phasing makes use of the fact that radiation forms radicals. They damage themolecule, and apart from random destruction, carboxyl-groups are removed and disulphides de-stroyed. For RIP, a normal data set is collected (“native”), then the crystal is exposed to a highdose of X-rays, then a second set (“derivative”) is collected.

Sulphur–SAD exploitation of the very weak signal of native S (or P for nucleic acid structures).

Halide soaking Iodide SIRAS or bromide MAD after a quick soak (10–30s) in ≈ 1M KI or NaBr.

Molecular Biology 21 Protein Crystallography II

Page 23: Protein Crystallography - SHELX Homeshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day2.pdf · tg@shelx.uni-ac.gwdg.de. Overview • The Reciprocal Lattice • The Ewald Sphere •

Example Phases

(Initial) centroid phases Resolved twofold ambiguity

Final (refined) phases

Molecular Biology 22 Protein Crystallography II