CCP4 School Chicago 2013 Experimental Phasing with shelx …...The construct |tee shelxc.log saves...

34
CCP4 @ APS 2013 CCP4 School Chicago 2013 Experimental Phasing with shelx c/d/e Tim Grüne Georg-August-Universität Institut für Strukturchemie [email protected] Tim Grüne shelx c/d/e 1/34

Transcript of CCP4 School Chicago 2013 Experimental Phasing with shelx …...The construct |tee shelxc.log saves...

CCP4 @ APS 2013

CCP4 School Chicago 2013Experimental Phasing with shelx c/d/e

Tim GrüneGeorg-August-UniversitätInstitut für Strukturchemie

http://shelx.uni-ac.gwdg.de

[email protected]

Tim Grüne shelx c/d/e 1/34

CCP4 @ APS 2013

Phasing in the Context of Structure Determination

Data Integration (Anomalous)Differences

SubstructureSolution

Phasing & DensityModification

Building &Refinement

MolecularReplacement

shelxc shelxd s h e l x e . . .

Tim Grüne shelx c/d/e 2/34

CCP4 @ APS 2013

shelx c/d/e

shelxc shelxd shelxe

• Extracts anomalous signal• Prepares native data• Analyses data sets

• Determines substructure • Density modification• Poly-ALA autobuilding

Tim Grüne shelx c/d/e 3/34

CCP4 @ APS 2013

Alternatives

Alternative programs for substructure solution and density modification:

SHELXD • Sharp www.globalphasing.com

• phenix http://www.phenix-online.org/

• bp3 http://www.bfsc.leidenuniv.nl/software/bp3

• SnB http://www.hwi.buffalo.edu/snb

SHELXE • parrot http://www.ysbl.york.ac.uk/˜cowtan/parrot/parrot.html

• buccanneer http://www.ysbl.york.ac.uk/˜cowtan/buccaneer/buccaneer.html

• phenix http://www.phenix-online.org/

Tim Grüne shelx c/d/e 4/34

CCP4 @ APS 2013

Documentation

Most shelx programs issue short usage instruction when called without an argument.tg@slartibartfast:~$ anode

ANODE - ANOmalous DEnsity analysis - version 2013/1===================================================

anode [-b8.0] [-d1.0] [-m20] [-h80] [-r5.0] [-s4.0] [-n99.0] name

Reads PDB format file name.ent (or if not found, name.pdb) and SHELXC outputfile name_fa.hkl, calculates heavy atom map by subtracting SAD, MAD, SIR, RIPor SIRAS phase shift alpha from calculated native phase and does FFT withcoefficients FA. In the case of SAD this is the anomalous map. This map issaved as name.pha and the strongest unique peaks output to name_fa.res, whichmay be used, together with the .hkl files from SHELXC, to test SHELXE.

-r multiplies the maximum h, k and l to set up a suitable FFT grid-d is the resolution at which to truncate the reflection data-b sets a B-value to damp the (noisy) FA data at high resolution-s is height in sigma for peak output (at least one is output)-n is height in sigma for negative peak output-h is maximum number of positive or negative peaks output-a ouputs the anomalous density at all atoms, otherwise it is averaged-m sets the maximum number of atom types for the averaged density table-i alternative indexing for .hkl file, if trigonal -i1, -i2 or -i3-t number of threads for mp version (if not set, one thread per CPU)

tg@slartibartfast:~$

More information including tutorials available at http://shelx.uni-ac.gwdg.de/SHELX/.

Tim Grüne shelx c/d/e 5/34

CCP4 @ APS 2013

“Installation”

1. Register online at http://shelx.uni-ac.gwdg.de/SHELX/

2. Download

3. Save files to directory with the PATH variable (Linux, Mac OS X: /usr/local/bin)

Tim Grüne shelx c/d/e 6/34

CCP4 @ APS 2013

Shelxc Data Preparation: Keywords

shelxc can be used for six different phasing scenarios:

SAD • SAD

• (NAT)

SIRAS • SIRA

• NAT

MAD• (NAT)• PEAK

• INFL

• LREM

• HREMSIR • SIR

• NAT

RIP • BEFORE

• AFTER

• (NAT)

Each keyword takes the filename of the corresponding integrated dataset. shelxc reads

• hkl - format (native shelx)

• scalepack - format

• XDS -format

Tim Grüne shelx c/d/e 7/34

CCP4 @ APS 2013

Shelxc Example Run

shelxc is run from the command line with the syntax ∗

shelxc mymad < my_shelxc.input | tee shelxc.log

mymad sets the project name and determines the filenames used by shelxd and shelxe. It must not containspaces or a period “.”.

The names my shelxc.input and shelxc.log are completely arbitrary.

Example my shelxc.input for MAD experiment:

NAT jia_nat.hkl

HREM jia_hrem.sca

PEAK jia_peak.sca

INFL jia_infl.sca

CELL 96.00 120.00 166.13 90 90 90

SPAG C2221

FIND 8

SFAC SE

∗The construct |tee shelxc.log saves the output from shelxc in the file shelxc.log for later reference.

Tim Grüne shelx c/d/e 8/34

CCP4 @ APS 2013

Shelxc Output Files

“shelxc mymad” creates three files:

mymad fa.ins Text file with instructions for shelxd

mymad fa.hkl Artificial substructure data set from which shelxd determines substructure coordinates. Eachline contains

h, k, l,∣∣∣|F+(hkl)| − |F−(hkl)|

∣∣∣ , αα is not used by shelxd, but by shelxe to calculate an initial phase estimate for the protein structure as

φT (hkl) = φA(hkl) + α(hkl)

α can be calculated for MAD or SIRAS and only roughly estimated for SIR or SAD. φA is the phase anglecalculated from the substructure coordinates determined by shelxd.

mymad.hkl native data used by shelxe for phasing and density modification

Tim Grüne shelx c/d/e 9/34

CCP4 @ APS 2013

Shelxc Statistics

Both shelxd and shelxe automatically create log-files (ending .lst).

shelxc only writes to the terminal it was started from. Therefore it is best redirected (’>’) to a file for laterreference.

The output contains some useful analyses of the input data.

The GUI hkl2map (T. Schneider, http://webapps.embl-hamburg.de/hkl2map) plots graphs of these statistics.

Tim Grüne shelx c/d/e 10/34

CCP4 @ APS 2013

Shelxc: Resolution Cut-off for Anomalous Signal23009 Reflections read from PEAK file gere_peak.sca12000 Unique reflections, highest resolution 2.750 Angstroms

Resl. Inf - 8.0 - 6.0 - 5.0 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0 - 2.75N(data) 499 673 874 1402 561 669 799 1030 1297 1648 2548<I/sig> 29.2 22.1 16.9 20.5 19.1 16.3 13.7 12.6 9.4 7.0 4.7%Complete 92.6 96.0 98.0 98.5 98.9 99.1 97.7 98.9 98.9 98.9 90.2<d"/sig> 3.18 2.77 2.24 1.74 1.61 1.41 1.18 1.26 1.11 0.97 0.98

• CCP4 standard MAD data set ’GERE’

• <d"/sig> Anomalous signal for peak wavelength

• <d"/sig> > 1.3 useful anomalous signal

Tim Grüne shelx c/d/e 11/34

CCP4 @ APS 2013

Shelxc: Data set fit by Correlation CoefficientCorrelation coefficients (%) between signed anomalous differencesResl. Inf - 8.0 - 6.0 - 5.0 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0 - 2.75LREM/HREM 89.7 81.4 67.2 59.4 43.3 49.7 36.7 28.0 26.2 19.8 16.4LREM/PEAK 96.3 91.8 86.6 79.7 73.2 72.8 58.7 56.0 46.1 40.8 35.4LREM/INFL 93.0 86.7 79.9 72.3 62.5 60.9 41.2 41.3 31.3 35.3 27.6HREM/PEAK 89.5 80.0 68.2 58.5 51.9 50.0 44.5 34.6 24.3 23.2 17.2HREM/INFL 87.3 75.8 64.2 54.5 40.7 40.4 34.9 21.8 16.1 17.7 19.2PEAK/INFL 95.4 91.1 84.1 77.8 67.0 68.4 58.0 52.8 38.1 26.3 32.0

0

20

40

60

80

100

2 3 4 5 6 7 8 9

CC

Resolution [Å]

peak/infllrem/peak

lrem/infllrem/hremhrem/peak

hrem/infl

CC > 30% usually more reliable than <d"/sig> > 1.3

Tim Grüne shelx c/d/e 12/34

CCP4 @ APS 2013

Shelxd — Finding the Substructure

shelxd mymad_fa

reads the “substructure data” mymad fa.hkl and its instructions from mymad fa.ins. The most important entriesin mymad fa.ins:

SFAC SE atom type to look forFIND 12 expected number of substructure atoms, should be within 20 % of the actual

number (try several for e.g. a soak where the number is not known)SHEL 999 3.3 resolution limits of the anomalous signal, not the original data. High re-

solution limit can be critical, but the default of dmax + 0.5 Å works well in normalcases.

NTRY 50000 number of trials. Since shelxd starts from random atom positions, a lowquality data set may require a large number of trials before a solution is found. shelxdscales about linearly with the number of CPUs, so NTRY 50000 may finish within a fewhours on a modern PC.

ESEL 1.5 Minimum |E|-value used in peak picking during dual-space recycling. At lowresolution, varying this number (between 1.0 and 2.0, say) is worth trying.

Tim Grüne shelx c/d/e 13/34

CCP4 @ APS 2013

Shelxd Output

shelxd automatically writes a logfile mymad fa.lst.

While shelxd runs, the best solution is written to mymad fa.res which contains the substructure coordinates infractional coordinates and which is later read by shelxe.

REM Best SHELXD solution: CC 60.74 CC(weak) 49.22 CFOM 109.96TITL mymad_fa.ins MAD in C2CELL 0.98000 109.02 61.75 71.74 90.00 97.08 90.00LATT -7SYMM -X, Y, -ZSFAC SEUNIT 192SE01 1 0.758774 0.508636 0.246391 1.0000 0.2SE02 1 0.792908 0.398262 0.138903 0.8845 0.2

[...]SE10 1 0.925819 0.231575 0.191291 0.5569 0.2SE11 1 0.495239 0.183609 0.416278 0.5352 0.2SE12 1 0.643097 0.029221 0.210653 0.4897 0.2 <---SE13 1 0.811539 0.048553 0.227752 0.1453 0.2 <---SE14 1 0.600281 0.156860 0.149628 0.0764 0.2HKLF 3END

The sixth column contains the occupancy of thecorresponding atom. A sharp drop (here betweenSE12 and SE13) is a promising sign of a correctsolution.The correlation coefficient (CC and CCweak) inthe first line measures the reliability of the solution

For SAD, a CC of more than 30 % is a safe sign of a correct solution, for MAD the limit is about 40 %.

Tim Grüne shelx c/d/e 14/34

CCP4 @ APS 2013

Shelxe: Phasing, Density Modification, Model Building

• shelxe w/o command line arguments: one page usage instruction for various scenarios including MR-SADand complete list of option.

• no .ins-tructions file. All parameters provided as command line options after data file names.

Tim Grüne shelx c/d/e 15/34

CCP4 @ APS 2013

Shelxe

A typical and one of the most simple command line could be

shelxe mymad mymad_fa -s0.65 -h -a

mymad read native data mymad.hkl

mymad fa read angle estimate for α from mymad fa.hkl, substructure coordinates from mymad fa.res (theshelxd output)

-s0.65 Assume a solvent content of 65%. The solvent content is one of the most critical parameters for shelxeand it is worth testing various settings

-h substructure atoms present in native data mymad.hkl

-a run 5 (default) cycles of poly-ALA autotracing.

Tim Grüne shelx c/d/e 16/34

CCP4 @ APS 2013

Shelxe -i: Inverted Substructure

It is impossible to distinguish the substructure from its enantiomer with the anomalous data and there is a 50 %chance that the coordinates in mymad fa.res are inverted w.r.t. the correct substructure.

Therefore shelxe must always be run twice

• with the direct hand• with the inverted hand, i.e. with the same options as the direct hand plus the switch-i. This inverts the hand and takes care of everything necessary (e.g. inversion ofscrew axes, P41 to P43). The output files are amended by i to distinguish the tworuns.

N.B. if the inverted hand turns out to be the correct hand, your space group may change - e.g. in the presenceof screw axes. Keep this in mind when you convert your native data to e.g. mtz-format!

Tim Grüne shelx c/d/e 17/34

CCP4 @ APS 2013

Shelxe: The Correct Hand

Criteria to distinguish correct from wrong hand:

1. Correct hand shows better Contrast, especially at early cycles of density modification.

2. Correct hand has higher map correlation coefficient throughout resolution range:

d inf - 4.66 - 3.70 - 3.23 - 2.93 - 2.72 - 2.56 - 2.43 - 2.33 - 2.24 - 2.15<mapCC> 0.626 0.795 0.775 0.754 0.819 0.804 0.756 0.694 0.620 0.582 direct<mapCC> 0.810 0.877 0.845 0.844 0.874 0.856 0.840 0.830 0.839 0.809 inverse

3. A reasonable poly-ALA trace with at least 10 residues per chain and a CC > 25%

When using the auto-tracing option (-a) in shelxe, the first two figures (contrast/ mapCC) become rather mea-ningless, but in this case the poly-ALA trace tells unambiguously the correct hand.

Tim Grüne shelx c/d/e 18/34

CCP4 @ APS 2013

Solvent Content

• With autobuilding (-a): correct solvent content less importat

• Without autobuilding (e.g. RNA or DNA only): solvent content (-s) one of the most critical parameters

– Average volume per amino acid: 140 Å3

– Average volume per nucleic acid: 380 Å3

– or use matthews coeff in CCP4i

• Take disordered regions into account: increase solvent content

Tim Grüne shelx c/d/e 19/34

CCP4 @ APS 2013

Caveat: Substructure Resolution

• “Normal” macromolecular structure: Determine atom positions even at e.g. 5 Å resolution because of res-traints.

• Substructure unrestrained

⇒ coordinates only know within resolution of anomalous signal, often much worse than 3 Å

Way out:

1. e.g. Sharp improves substructure coordinates before density modification

2. “substructure recycling” with shelxe

Tim Grüne shelx c/d/e 20/34

CCP4 @ APS 2013

Shelxe: Substructure Recycling

mymad_fa.res

input

mymad.hat

renam

e

shelxd

shelxe

finds

impro

ves

• Better substructure = better φA = better φT = better map• Caveat: If the inverted structure turns out to be the correct

hand (i.e. mymad i.hat from the -i-run of shelxe), the secondrun of shelxe must be run without the -i switch:

Tim Grüne shelx c/d/e 21/34

CCP4 @ APS 2013

Shelxe: Structure Solved?

Coot reads mymad.pdb (poly-ALA trace) and mymad.phs (map).

Elastase SAD tutorial

Direct hand Inverse hand

Tim Grüne shelx c/d/e 22/34

CCP4 @ APS 2013

Shelxe: Structure Solved?

In more difficult cases, looking at the maps and model does not clearly distinguish between correct and falsesolution, e.g.

• Small molecule, large number of NCS copies

• Poor resolution, large complex

Indicators from shelxe:

TITLE elastase.pdb Cycle 1 CC = 4.28% 48 residues in 6 chains

TITLE elastase_i.pdb Cycle 3 CC = 41.28% 225 residues in 6 chains

1. CC>25%

2. average chain length > 10

3. jump in CC over many cycles (e.g. with -a50)

Tim Grüne shelx c/d/e 23/34

CCP4 @ APS 2013

Shelxe: MR-SAD [5]

MR-SAD: Combination of

• poor MR solution

• weak anomalous phase information

often neither of the two alone are sufficient for structure solution

Tim Grüne shelx c/d/e 24/34

CCP4 @ APS 2013

Shelxe: MR-SAD howto

• MR-Solution: phaser-test 1.1.pdb

• Peak Dataset from mosflm/scala: ctruncate 1234.mtz

#> cp phaser-test_1.1.pdb mrsad_test.pda

#> mtz2sca ctruncate_1234.mtz -o shelxc_input.sca

#> shelxc mrsad_test < myshelxc_SAD.inp > mrsad_test_shelxc.log

→ Creates mrsad test fa.hkl containing angle α

#> shelxe mrsad_test.pda mrsad_test_fa -s0.67 -a -h -o -z

• -o optimize CC of myproject.pda

• -z optimize (CC of) substructure

Tim Grüne shelx c/d/e 25/34

CCP4 @ APS 2013

Shelxe: MR-SAD

• Phases from model only used to find substructure in anomalous data

• Phases improved from substructure coordinates and data

• Building of poly-ALA trace with map from φT = φA(substructure coordinates) + α(anomalous signal).

poly-ALA trace free of model bias from MR-solution

Tim Grüne shelx c/d/e 26/34

CCP4 @ APS 2013

Shelxe: Fragment Extension

• shelxe designed to work with very poor starting phases

• Works to extend a small fragment even with native data only

• Source of phases not necessarily anomalous data

⇒ Conclusion: Combine Phaser and shelxe

Tim Grüne shelx c/d/e 27/34

CCP4 @ APS 2013

Shelxe: Fragment Extension

• Small search fragments used with phaser fora 1.7 Å data set.• Phaser TFZ-score and shelxe-CC do not cor-

relate.• Correct solutions marked by CC > 25 %

Graph courtesy A. Thorn.

Tim Grüne shelx c/d/e 28/34

CCP4 @ APS 2013

Arcimboldo — ab inito phasing at 2 Å

The Computer program Arcimboldo [4] pushes this to extremes:

Phaser:

store many putative

solutions

Check results

(keep CC>25%)

Shelxe:

extension for each

solution

Search motif:

fragment

short helical

• Arcimboldo requires only native data• Arcimboldo works to about 2 Å resolution.

Tim Grüne shelx c/d/e 29/34

CCP4 @ APS 2013

AnodeA. Thorn and G. M. Sheldrick, “ANODE: anomalous and heavy-atom density calculation”, J. Appl. Cryst. (2011). 44, 1285-1287

Anomalous Difference Map optimised for working with shelx c/d/e phasing.

• “post-mortem” analysis of anomalous scatterers: positions and signal strength

• cross-confirmation for MR:

1. anomalous data not sufficient for data solution

2. MR solution available, but e.g. weak or risk of wrong solution

• Distinction between unknown metal types (anomalous vs. non-anomalous)

Tim Grüne shelx c/d/e 30/34

CCP4 @ APS 2013

Using Anode

Input :

1. my-project.pdb (putative) MR-solution; source for φT

2. my-project fa.hkl anomalous data prepared by shelxc; source for α

#> anode my-project

Output :

1. my-project.pha Anomalous difference map calculated from |FA| and φA = φT − α

2. my-project fa.res Peaks from anomalous map

3. my-project.lsa Logfile listing atoms in model PDB closest to anomalous peaks.

Tim Grüne shelx c/d/e 31/34

CCP4 @ APS 2013

Anode Examples

Semi-refined MR solution with

R/Rfree = 37%/46%

locates Lanthanide Cluster from MAD experiment.• yellow cross: shelxd MAD• black cross: anode

Sequence confirmation: anomalous peak from S

near Cystein.

Tim Grüne shelx c/d/e 32/34

CCP4 @ APS 2013

Anode Applications

• Unbiased confirmation of substructure solution

• Unbiased confirmation of MR solution

• Sequence register errors

Tim Grüne shelx c/d/e 33/34

CCP4 @ APS 2013

References

1. G. M. Sheldrick, A short history of SHELX, Acta Crystallogr. (2008), A642. G. M. Sheldrick, Experimental phasing with SHELXC/D/E: combining chain tracing with density modi-

fication, Acta Crystallogr. (2010), D663. P. D. Adams et al., PHENIX: a comprehensive Python-based system for macromolecular structure

solution, Acta Crystallogr. (2010) D66, 213–2214. Rodríguez, D. D. et al., Crystallographic ab initio protein structure solution below atomic resolution,

Nature Methods (2009), volume 6(9); http://chango.ibmb.csic.es/ARCIMBOLDO5. S. Pnajikar et al., On the combination of molecular replacement and single-wavelength anomalous

diffraction phasing for automated structure determination,

Tim Grüne shelx c/d/e 34/34