CCP4 School Chicago 2013 Experimental Phasing with shelx …...The construct |tee shelxc.log saves...
Transcript of CCP4 School Chicago 2013 Experimental Phasing with shelx …...The construct |tee shelxc.log saves...
CCP4 @ APS 2013
CCP4 School Chicago 2013Experimental Phasing with shelx c/d/e
Tim GrüneGeorg-August-UniversitätInstitut für Strukturchemie
http://shelx.uni-ac.gwdg.de
Tim Grüne shelx c/d/e 1/34
CCP4 @ APS 2013
Phasing in the Context of Structure Determination
Data Integration (Anomalous)Differences
SubstructureSolution
Phasing & DensityModification
Building &Refinement
MolecularReplacement
shelxc shelxd s h e l x e . . .
Tim Grüne shelx c/d/e 2/34
CCP4 @ APS 2013
shelx c/d/e
shelxc shelxd shelxe
• Extracts anomalous signal• Prepares native data• Analyses data sets
• Determines substructure • Density modification• Poly-ALA autobuilding
Tim Grüne shelx c/d/e 3/34
CCP4 @ APS 2013
Alternatives
Alternative programs for substructure solution and density modification:
SHELXD • Sharp www.globalphasing.com
• phenix http://www.phenix-online.org/
• bp3 http://www.bfsc.leidenuniv.nl/software/bp3
• SnB http://www.hwi.buffalo.edu/snb
SHELXE • parrot http://www.ysbl.york.ac.uk/˜cowtan/parrot/parrot.html
• buccanneer http://www.ysbl.york.ac.uk/˜cowtan/buccaneer/buccaneer.html
• phenix http://www.phenix-online.org/
Tim Grüne shelx c/d/e 4/34
CCP4 @ APS 2013
Documentation
Most shelx programs issue short usage instruction when called without an argument.tg@slartibartfast:~$ anode
ANODE - ANOmalous DEnsity analysis - version 2013/1===================================================
anode [-b8.0] [-d1.0] [-m20] [-h80] [-r5.0] [-s4.0] [-n99.0] name
Reads PDB format file name.ent (or if not found, name.pdb) and SHELXC outputfile name_fa.hkl, calculates heavy atom map by subtracting SAD, MAD, SIR, RIPor SIRAS phase shift alpha from calculated native phase and does FFT withcoefficients FA. In the case of SAD this is the anomalous map. This map issaved as name.pha and the strongest unique peaks output to name_fa.res, whichmay be used, together with the .hkl files from SHELXC, to test SHELXE.
-r multiplies the maximum h, k and l to set up a suitable FFT grid-d is the resolution at which to truncate the reflection data-b sets a B-value to damp the (noisy) FA data at high resolution-s is height in sigma for peak output (at least one is output)-n is height in sigma for negative peak output-h is maximum number of positive or negative peaks output-a ouputs the anomalous density at all atoms, otherwise it is averaged-m sets the maximum number of atom types for the averaged density table-i alternative indexing for .hkl file, if trigonal -i1, -i2 or -i3-t number of threads for mp version (if not set, one thread per CPU)
tg@slartibartfast:~$
More information including tutorials available at http://shelx.uni-ac.gwdg.de/SHELX/.
Tim Grüne shelx c/d/e 5/34
CCP4 @ APS 2013
“Installation”
1. Register online at http://shelx.uni-ac.gwdg.de/SHELX/
2. Download
3. Save files to directory with the PATH variable (Linux, Mac OS X: /usr/local/bin)
Tim Grüne shelx c/d/e 6/34
CCP4 @ APS 2013
Shelxc Data Preparation: Keywords
shelxc can be used for six different phasing scenarios:
SAD • SAD
• (NAT)
SIRAS • SIRA
• NAT
MAD• (NAT)• PEAK
• INFL
• LREM
• HREMSIR • SIR
• NAT
RIP • BEFORE
• AFTER
• (NAT)
Each keyword takes the filename of the corresponding integrated dataset. shelxc reads
• hkl - format (native shelx)
• scalepack - format
• XDS -format
Tim Grüne shelx c/d/e 7/34
CCP4 @ APS 2013
Shelxc Example Run
shelxc is run from the command line with the syntax ∗
shelxc mymad < my_shelxc.input | tee shelxc.log
mymad sets the project name and determines the filenames used by shelxd and shelxe. It must not containspaces or a period “.”.
The names my shelxc.input and shelxc.log are completely arbitrary.
Example my shelxc.input for MAD experiment:
NAT jia_nat.hkl
HREM jia_hrem.sca
PEAK jia_peak.sca
INFL jia_infl.sca
CELL 96.00 120.00 166.13 90 90 90
SPAG C2221
FIND 8
SFAC SE
∗The construct |tee shelxc.log saves the output from shelxc in the file shelxc.log for later reference.
Tim Grüne shelx c/d/e 8/34
CCP4 @ APS 2013
Shelxc Output Files
“shelxc mymad” creates three files:
mymad fa.ins Text file with instructions for shelxd
mymad fa.hkl Artificial substructure data set from which shelxd determines substructure coordinates. Eachline contains
h, k, l,∣∣∣|F+(hkl)| − |F−(hkl)|
∣∣∣ , αα is not used by shelxd, but by shelxe to calculate an initial phase estimate for the protein structure as
φT (hkl) = φA(hkl) + α(hkl)
α can be calculated for MAD or SIRAS and only roughly estimated for SIR or SAD. φA is the phase anglecalculated from the substructure coordinates determined by shelxd.
mymad.hkl native data used by shelxe for phasing and density modification
Tim Grüne shelx c/d/e 9/34
CCP4 @ APS 2013
Shelxc Statistics
Both shelxd and shelxe automatically create log-files (ending .lst).
shelxc only writes to the terminal it was started from. Therefore it is best redirected (’>’) to a file for laterreference.
The output contains some useful analyses of the input data.
The GUI hkl2map (T. Schneider, http://webapps.embl-hamburg.de/hkl2map) plots graphs of these statistics.
Tim Grüne shelx c/d/e 10/34
CCP4 @ APS 2013
Shelxc: Resolution Cut-off for Anomalous Signal23009 Reflections read from PEAK file gere_peak.sca12000 Unique reflections, highest resolution 2.750 Angstroms
Resl. Inf - 8.0 - 6.0 - 5.0 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0 - 2.75N(data) 499 673 874 1402 561 669 799 1030 1297 1648 2548<I/sig> 29.2 22.1 16.9 20.5 19.1 16.3 13.7 12.6 9.4 7.0 4.7%Complete 92.6 96.0 98.0 98.5 98.9 99.1 97.7 98.9 98.9 98.9 90.2<d"/sig> 3.18 2.77 2.24 1.74 1.61 1.41 1.18 1.26 1.11 0.97 0.98
• CCP4 standard MAD data set ’GERE’
• <d"/sig> Anomalous signal for peak wavelength
• <d"/sig> > 1.3 useful anomalous signal
Tim Grüne shelx c/d/e 11/34
CCP4 @ APS 2013
Shelxc: Data set fit by Correlation CoefficientCorrelation coefficients (%) between signed anomalous differencesResl. Inf - 8.0 - 6.0 - 5.0 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0 - 2.75LREM/HREM 89.7 81.4 67.2 59.4 43.3 49.7 36.7 28.0 26.2 19.8 16.4LREM/PEAK 96.3 91.8 86.6 79.7 73.2 72.8 58.7 56.0 46.1 40.8 35.4LREM/INFL 93.0 86.7 79.9 72.3 62.5 60.9 41.2 41.3 31.3 35.3 27.6HREM/PEAK 89.5 80.0 68.2 58.5 51.9 50.0 44.5 34.6 24.3 23.2 17.2HREM/INFL 87.3 75.8 64.2 54.5 40.7 40.4 34.9 21.8 16.1 17.7 19.2PEAK/INFL 95.4 91.1 84.1 77.8 67.0 68.4 58.0 52.8 38.1 26.3 32.0
0
20
40
60
80
100
2 3 4 5 6 7 8 9
CC
Resolution [Å]
peak/infllrem/peak
lrem/infllrem/hremhrem/peak
hrem/infl
CC > 30% usually more reliable than <d"/sig> > 1.3
Tim Grüne shelx c/d/e 12/34
CCP4 @ APS 2013
Shelxd — Finding the Substructure
shelxd mymad_fa
reads the “substructure data” mymad fa.hkl and its instructions from mymad fa.ins. The most important entriesin mymad fa.ins:
SFAC SE atom type to look forFIND 12 expected number of substructure atoms, should be within 20 % of the actual
number (try several for e.g. a soak where the number is not known)SHEL 999 3.3 resolution limits of the anomalous signal, not the original data. High re-
solution limit can be critical, but the default of dmax + 0.5 Å works well in normalcases.
NTRY 50000 number of trials. Since shelxd starts from random atom positions, a lowquality data set may require a large number of trials before a solution is found. shelxdscales about linearly with the number of CPUs, so NTRY 50000 may finish within a fewhours on a modern PC.
ESEL 1.5 Minimum |E|-value used in peak picking during dual-space recycling. At lowresolution, varying this number (between 1.0 and 2.0, say) is worth trying.
Tim Grüne shelx c/d/e 13/34
CCP4 @ APS 2013
Shelxd Output
shelxd automatically writes a logfile mymad fa.lst.
While shelxd runs, the best solution is written to mymad fa.res which contains the substructure coordinates infractional coordinates and which is later read by shelxe.
REM Best SHELXD solution: CC 60.74 CC(weak) 49.22 CFOM 109.96TITL mymad_fa.ins MAD in C2CELL 0.98000 109.02 61.75 71.74 90.00 97.08 90.00LATT -7SYMM -X, Y, -ZSFAC SEUNIT 192SE01 1 0.758774 0.508636 0.246391 1.0000 0.2SE02 1 0.792908 0.398262 0.138903 0.8845 0.2
[...]SE10 1 0.925819 0.231575 0.191291 0.5569 0.2SE11 1 0.495239 0.183609 0.416278 0.5352 0.2SE12 1 0.643097 0.029221 0.210653 0.4897 0.2 <---SE13 1 0.811539 0.048553 0.227752 0.1453 0.2 <---SE14 1 0.600281 0.156860 0.149628 0.0764 0.2HKLF 3END
The sixth column contains the occupancy of thecorresponding atom. A sharp drop (here betweenSE12 and SE13) is a promising sign of a correctsolution.The correlation coefficient (CC and CCweak) inthe first line measures the reliability of the solution
For SAD, a CC of more than 30 % is a safe sign of a correct solution, for MAD the limit is about 40 %.
Tim Grüne shelx c/d/e 14/34
CCP4 @ APS 2013
Shelxe: Phasing, Density Modification, Model Building
• shelxe w/o command line arguments: one page usage instruction for various scenarios including MR-SADand complete list of option.
• no .ins-tructions file. All parameters provided as command line options after data file names.
Tim Grüne shelx c/d/e 15/34
CCP4 @ APS 2013
Shelxe
A typical and one of the most simple command line could be
shelxe mymad mymad_fa -s0.65 -h -a
mymad read native data mymad.hkl
mymad fa read angle estimate for α from mymad fa.hkl, substructure coordinates from mymad fa.res (theshelxd output)
-s0.65 Assume a solvent content of 65%. The solvent content is one of the most critical parameters for shelxeand it is worth testing various settings
-h substructure atoms present in native data mymad.hkl
-a run 5 (default) cycles of poly-ALA autotracing.
Tim Grüne shelx c/d/e 16/34
CCP4 @ APS 2013
Shelxe -i: Inverted Substructure
It is impossible to distinguish the substructure from its enantiomer with the anomalous data and there is a 50 %chance that the coordinates in mymad fa.res are inverted w.r.t. the correct substructure.
Therefore shelxe must always be run twice
• with the direct hand• with the inverted hand, i.e. with the same options as the direct hand plus the switch-i. This inverts the hand and takes care of everything necessary (e.g. inversion ofscrew axes, P41 to P43). The output files are amended by i to distinguish the tworuns.
N.B. if the inverted hand turns out to be the correct hand, your space group may change - e.g. in the presenceof screw axes. Keep this in mind when you convert your native data to e.g. mtz-format!
Tim Grüne shelx c/d/e 17/34
CCP4 @ APS 2013
Shelxe: The Correct Hand
Criteria to distinguish correct from wrong hand:
1. Correct hand shows better Contrast, especially at early cycles of density modification.
2. Correct hand has higher map correlation coefficient throughout resolution range:
d inf - 4.66 - 3.70 - 3.23 - 2.93 - 2.72 - 2.56 - 2.43 - 2.33 - 2.24 - 2.15<mapCC> 0.626 0.795 0.775 0.754 0.819 0.804 0.756 0.694 0.620 0.582 direct<mapCC> 0.810 0.877 0.845 0.844 0.874 0.856 0.840 0.830 0.839 0.809 inverse
3. A reasonable poly-ALA trace with at least 10 residues per chain and a CC > 25%
When using the auto-tracing option (-a) in shelxe, the first two figures (contrast/ mapCC) become rather mea-ningless, but in this case the poly-ALA trace tells unambiguously the correct hand.
Tim Grüne shelx c/d/e 18/34
CCP4 @ APS 2013
Solvent Content
• With autobuilding (-a): correct solvent content less importat
• Without autobuilding (e.g. RNA or DNA only): solvent content (-s) one of the most critical parameters
– Average volume per amino acid: 140 Å3
– Average volume per nucleic acid: 380 Å3
– or use matthews coeff in CCP4i
• Take disordered regions into account: increase solvent content
Tim Grüne shelx c/d/e 19/34
CCP4 @ APS 2013
Caveat: Substructure Resolution
• “Normal” macromolecular structure: Determine atom positions even at e.g. 5 Å resolution because of res-traints.
• Substructure unrestrained
⇒ coordinates only know within resolution of anomalous signal, often much worse than 3 Å
Way out:
1. e.g. Sharp improves substructure coordinates before density modification
2. “substructure recycling” with shelxe
Tim Grüne shelx c/d/e 20/34
CCP4 @ APS 2013
Shelxe: Substructure Recycling
mymad_fa.res
input
mymad.hat
renam
e
shelxd
shelxe
finds
impro
ves
• Better substructure = better φA = better φT = better map• Caveat: If the inverted structure turns out to be the correct
hand (i.e. mymad i.hat from the -i-run of shelxe), the secondrun of shelxe must be run without the -i switch:
Tim Grüne shelx c/d/e 21/34
CCP4 @ APS 2013
Shelxe: Structure Solved?
Coot reads mymad.pdb (poly-ALA trace) and mymad.phs (map).
Elastase SAD tutorial
Direct hand Inverse hand
Tim Grüne shelx c/d/e 22/34
CCP4 @ APS 2013
Shelxe: Structure Solved?
In more difficult cases, looking at the maps and model does not clearly distinguish between correct and falsesolution, e.g.
• Small molecule, large number of NCS copies
• Poor resolution, large complex
Indicators from shelxe:
TITLE elastase.pdb Cycle 1 CC = 4.28% 48 residues in 6 chains
TITLE elastase_i.pdb Cycle 3 CC = 41.28% 225 residues in 6 chains
1. CC>25%
2. average chain length > 10
3. jump in CC over many cycles (e.g. with -a50)
Tim Grüne shelx c/d/e 23/34
CCP4 @ APS 2013
Shelxe: MR-SAD [5]
MR-SAD: Combination of
• poor MR solution
• weak anomalous phase information
often neither of the two alone are sufficient for structure solution
Tim Grüne shelx c/d/e 24/34
CCP4 @ APS 2013
Shelxe: MR-SAD howto
• MR-Solution: phaser-test 1.1.pdb
• Peak Dataset from mosflm/scala: ctruncate 1234.mtz
#> cp phaser-test_1.1.pdb mrsad_test.pda
#> mtz2sca ctruncate_1234.mtz -o shelxc_input.sca
#> shelxc mrsad_test < myshelxc_SAD.inp > mrsad_test_shelxc.log
→ Creates mrsad test fa.hkl containing angle α
#> shelxe mrsad_test.pda mrsad_test_fa -s0.67 -a -h -o -z
• -o optimize CC of myproject.pda
• -z optimize (CC of) substructure
Tim Grüne shelx c/d/e 25/34
CCP4 @ APS 2013
Shelxe: MR-SAD
• Phases from model only used to find substructure in anomalous data
• Phases improved from substructure coordinates and data
• Building of poly-ALA trace with map from φT = φA(substructure coordinates) + α(anomalous signal).
poly-ALA trace free of model bias from MR-solution
Tim Grüne shelx c/d/e 26/34
CCP4 @ APS 2013
Shelxe: Fragment Extension
• shelxe designed to work with very poor starting phases
• Works to extend a small fragment even with native data only
• Source of phases not necessarily anomalous data
⇒ Conclusion: Combine Phaser and shelxe
Tim Grüne shelx c/d/e 27/34
CCP4 @ APS 2013
Shelxe: Fragment Extension
• Small search fragments used with phaser fora 1.7 Å data set.• Phaser TFZ-score and shelxe-CC do not cor-
relate.• Correct solutions marked by CC > 25 %
Graph courtesy A. Thorn.
Tim Grüne shelx c/d/e 28/34
CCP4 @ APS 2013
Arcimboldo — ab inito phasing at 2 Å
The Computer program Arcimboldo [4] pushes this to extremes:
Phaser:
store many putative
solutions
Check results
(keep CC>25%)
Shelxe:
extension for each
solution
Search motif:
fragment
short helical
• Arcimboldo requires only native data• Arcimboldo works to about 2 Å resolution.
Tim Grüne shelx c/d/e 29/34
CCP4 @ APS 2013
AnodeA. Thorn and G. M. Sheldrick, “ANODE: anomalous and heavy-atom density calculation”, J. Appl. Cryst. (2011). 44, 1285-1287
Anomalous Difference Map optimised for working with shelx c/d/e phasing.
• “post-mortem” analysis of anomalous scatterers: positions and signal strength
• cross-confirmation for MR:
1. anomalous data not sufficient for data solution
2. MR solution available, but e.g. weak or risk of wrong solution
• Distinction between unknown metal types (anomalous vs. non-anomalous)
Tim Grüne shelx c/d/e 30/34
CCP4 @ APS 2013
Using Anode
Input :
1. my-project.pdb (putative) MR-solution; source for φT
2. my-project fa.hkl anomalous data prepared by shelxc; source for α
#> anode my-project
Output :
1. my-project.pha Anomalous difference map calculated from |FA| and φA = φT − α
2. my-project fa.res Peaks from anomalous map
3. my-project.lsa Logfile listing atoms in model PDB closest to anomalous peaks.
Tim Grüne shelx c/d/e 31/34
CCP4 @ APS 2013
Anode Examples
Semi-refined MR solution with
R/Rfree = 37%/46%
locates Lanthanide Cluster from MAD experiment.• yellow cross: shelxd MAD• black cross: anode
Sequence confirmation: anomalous peak from S
near Cystein.
Tim Grüne shelx c/d/e 32/34
CCP4 @ APS 2013
Anode Applications
• Unbiased confirmation of substructure solution
• Unbiased confirmation of MR solution
• Sequence register errors
Tim Grüne shelx c/d/e 33/34
CCP4 @ APS 2013
References
1. G. M. Sheldrick, A short history of SHELX, Acta Crystallogr. (2008), A642. G. M. Sheldrick, Experimental phasing with SHELXC/D/E: combining chain tracing with density modi-
fication, Acta Crystallogr. (2010), D663. P. D. Adams et al., PHENIX: a comprehensive Python-based system for macromolecular structure
solution, Acta Crystallogr. (2010) D66, 213–2214. Rodríguez, D. D. et al., Crystallographic ab initio protein structure solution below atomic resolution,
Nature Methods (2009), volume 6(9); http://chango.ibmb.csic.es/ARCIMBOLDO5. S. Pnajikar et al., On the combination of molecular replacement and single-wavelength anomalous
diffraction phasing for automated structure determination,
Tim Grüne shelx c/d/e 34/34