GGNB Marcomolecular Crystallography I...

44
GGNB Marcomolecular Crystallography I Practical Monday Introduction PART 1: CRYSTALLIZATION AND MEASUREMENT Lysozyme crystallization Fishing and data collection Linux introduction Tuesday Explanation on synchrotrons PART 2: PREDICTION, INTEGRATION AND PHASING Integration with HKL2000, data conversion with CCP4i Secondary structure and its prediction Structure prediction by homology modelling How to use the PDB Wednesday Molecular Replacement: Model preparation and PHASER 3D viewing and COOT introduction Interpreting PHASER density in COOT PART 3: MODEL BUILDING AND REFINEMENT First steps in model building with COOT The Ramachandran plot as quality indicator Refinement with REFMAC Thursday Iterative model building, and how to include other molecules in the model The meaning of resolution B factors: Modelling the stomic displacement ARP/wARP: Usage and quality Friday PART 4: VALIDATION AND REPORTING THE STRUCTURE Using MOLPROBITY to evaluate your refined structure Using PYMOL to generate pictures and for 3D viewing Secondary structure assignment CIF file downloading and conversion Electron density server Evaluation The time table will be adjusted to your very needs and special interests. Each day contains a group revision in the end. There will be a coffee break on each day. Additional tasks for fast and more advanced students are available. (Just ask!) Free time should be spent to answer the questions for the cake.

Transcript of GGNB Marcomolecular Crystallography I...

Page 1: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

GGNB Marcomolecular Crystallography I Practical

Monday Introduction PART 1: CRYSTALLIZATION AND MEASUREMENT Lysozyme crystallization Fishing and data collection Linux introduction

Tuesday Explanation on synchrotrons PART 2: PREDICTION, INTEGRATION AND PHASING Integration with HKL2000, data conversion with CCP4i Secondary structure and its prediction Structure prediction by homology modelling How to use the PDB

Wednesday Molecular Replacement: Model preparation and PHASER 3D viewing and COOT introduction Interpreting PHASER density in COOT PART 3: MODEL BUILDING AND REFINEMENT First steps in model building with COOT The Ramachandran plot as quality indicator Refinement with REFMAC

Thursday Iterative model building, and how to include other molecules in the model The meaning of resolution B factors: Modelling the stomic displacement ARP/wARP: Usage and quality

Friday PART 4: VALIDATION AND REPORTING THE STRUCTURE Using MOLPROBITY to evaluate your refined structure Using PYMOL to generate pictures and for 3D viewing Secondary structure assignment CIF file downloading and conversion Electron density server Evaluation

The time table will be adjusted to your very needs and special interests. Each day contains a group revision in the end. There will be a coffee break on each day. Additional tasks for fast and more advanced students

are available. (Just ask!) Free time should be spent to answer the questions for the cake.

Page 2: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Parts of this script are based on: - Petsko & Ringe, Protein Structure and Function - PHASER documentation - COOT documentation Copyright (c) 2010 Andrea Thorn. The crystal images are from the Hampton Research home page. Amino acid pictures are taken with permission from Petsko & Ringe. For the integration section: (c) 2010 Tim Grüne

Page 3: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

100 Points for the cake

The students need at least 100 points to get a cake on Friday afternoon. If not mentioned differently, only one guess per question is allowed. Discuss your answers in the group before guessing! All possible

methods are allowed to get solutions, except for asking former participants.

Any time:

Grow a lysozyme crystal bigger than 0.5 mm across (2 points).

What is the typical order of magnitude of the distance between atoms (1 point)?

What does mean I/sigma mean (1 point)?

What is the advantage of powder diffraction (3 points)?

Put correctly into the table: Long-range electrostatic interaction, Van der Waals interaction, covalent bond, disulfide bond, salt bridge, hydrogen bond. (3 points)

Interaction Typical distance Free energy or dissociation enthalpy

1.5 Å 300-700 kJ/mol

2.2 Å 167 kJ/mol

2.8 Å 12.5-30 kJ/mol

3.0 Å 2-21 kJ/mol

Variable Depends on surroundings

3.5 Å 4-17 kJ/mol

Only Monday:

Put a lysozyme crystal on the diffractometer that diffracts better than 1.5 Å (only Monday, several times possible, 2 points).

Mount a crystal with a Hampton loop (2 points, several times possible).

Acta Cryst (2007). F63, 608-612

What are the rings from? What

causes the rings? (2 points, 2

guesses)

Which grid points are in reflection

condition (1 point)?

Where on the detector (blackline) do

they show (2 points)?

Explain which one you would choose and why (1 point). Is there

a single crystal (1 point)?

Page 4: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

No earlier than Tuesday:

Compare two secondary structure predictions from two services. Explain differences and which prediction is more accurate (2 points).

Print out examples from the PDB for: Double Rossman fold, beta barrel, TIM barrel, beta sandwich, Globin fold and Four helix bundle. (6 points)

Of how many amino acids does the protein 1ACC consist? (1 point)

What is the unit cell and space group of 2HCY? (1 point)

From what organism does the protein structure of 1ACC originate? (1 point)

Make a report of all crystallization conditions used for botulinum neurotoxin. (1 point)

What detector and source was used for the data measurement of 2HCY? What was the insertion device? (2 points)

How are Patterson maps used to do rotation and translation searches in MR (4 points)?

Show Theta, Kappa, Phi circles on the diffractometer (3 points).

How are these pictures related to the structure 1JC1 (1 point)?

Page 5: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

No earlier than Wednesday:

Try running PHASER with the bare structure of human lysozyme. Are the results better or worse than with the “chainsawed” model? (2 points)

Print the Ramachandran plot of a random protein structure and explain whether it contains more alpha helices or beta sheets (1 point)!

Refine lysozyme under an R(free) value of 27 %. (10 points)

Show in COOT a space group symmetry element of a random protein! (2 points).

Measure in the structure 1JC1 the distance between C220 Cα to C108 Cα in Ångström. What would that be in nm? (1 point)

Measure in 1 ACC the distance between A712 O and A712 OE2. How reliable would you believe is this distance? (Hint: Look at your B‐factors!) (1 point)

What are the red crosses in COOT? (1 point)

Where is residue C1 in 1JC1? What happened? (1 point)

Look at the packing of 1JC1 of your protein in COOT. The proteins are not packed tightly to each other. What is in between the protein molecules? (2 points)

Why is there three times the same molecule in 1JC1? (2 points)

Where did we flag reflections for the calculation of R(free)? (2 points, two guesses) What do you need to start refinement? What do you get out of refinement? Why is it called

“iterative”? (1 point) Look at the N-terminal end of the A-chain of 1MN8. Find what’s strange and how things should be

(1 point).

Look at 406D. Why are the water positions unreasonable (several answers possible, 2 points in total) ?

Look at 4-Piperidinopiperidine in 1K4Y. Name at least one fishy thing about the ligand. (1 point)

Look at 3-Phenyl-propylamine in 1TNK. What is wrong? (1 point)

Look at conenzyme A in 1Q2C. Find at leat three mistakes. (2 points)

In 1986, Chothia and Lesk postulated: σr = 0.4 · exp (1.87(1-s)) σr r.m.s. coordinate deviation s sequence identity Search models for MR should have no smaller r.m.s. coordinate deviation than 1.5 Å. What should then be their minimum sequence identity (2 points)? Draw this criterion in the graphic below (2 points)! Why should every atom not contributing to a low σr be omitted? What is the motivation for trimming, as applied by CHAINSAW (1 point)?

Page 6: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 7: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

PART 1: CRYSTALLIZATION AND MEASUREMENT The diffractometer Anode: Here the X-rays are generated. Collimator: Through this tube the X-rays are emitted. Beam stop: Protects the detector from the primary beam. Crystal: Sits on a pin, which is hold magnetically in the Goniometer Head. Goniometer: Moves the crystal and the detector to different angular positions relative to the collimator. Cryo: Keeps the crystal at 100K. Most often with nitrogen. Camera and Centering monitor: For centering the crystal in the X-ray beam. Detector: Measures the reflections.

Home source with Kappa geometry and CCD

Synchrotron beamline with MAR image plate. Where are the X-rays generated?

Page 8: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 9: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

l

l

400μ

l

400μ

l

so

luti

on

μl

μl

μl

μl

μl

μl

Eth

ylene G

lycol

15

%120

18

%144

20

%160

22

%176

25

%200

27

%216

Na A

ceta

te p

H 4

.60,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

0,0

6M

24

NaC

l 5M

2M

160

2M

160

2M

160

2M

160

2M

160

2M

160

ddH

2O

80

56

40

24

00

Eth

ylene G

lycol

18

%144

20

%160

22

%176

25

%200

27

%216

30

%240

Na A

ceta

te p

H 4

.60,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

NaC

l 5M

1,5

M120

1,5

M120

1,5

M120

1,5

M120

1,5

M120

1,5

M120

ddH

2O

96

80

64

40

24

0

Eth

ylene G

lycol

20

%160

22

%176

25

%200

27

%216

30

%240

32

%256

Na A

ceta

te p

H 4

.60,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

NaC

l 5M

1M

80

1M

80

1M

80

1M

80

1M

80

1M

80

ddH

2O

120

104

80

64

40

24

Eth

ylene G

lycol

22

%176

25

%200

27

%216

30

%240

32

%256

34

%272

Na A

ceta

te p

H 4

.60,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

0,1

M40

NaC

l 5M

0,5

M40

0,5

M40

0,5

M40

0,5

M40

0,5

M40

0,5

M40

ddH

2O

144

120

104

80

64

48

sto

cks

:50

%E

thyl

ene G

lycol

L-1

am

t of

solu

tion in w

ell=

pro

tein

concentr

ation=

75 m

g/m

l

am

t of

solu

tion m

ade=

Lysozym

eD

ate

:

am

t of

pro

tein

in d

rop=

am

t of

solu

tion in d

rop=

Page 10: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 11: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

LINUX For the method course, there are two different kinds of computers: External ones for internet usage and internal ones for everything else. We have different types of Linux installed on our machines and also slightly different versions of programs like coot. We will be working with LINUX. LINUX can be used like Windows in many respectives, but it has also some unique features. Two will be discussed here, namely the shell and the mouse copy/paste function. Please play around a little with the possibilities listed here to get in touch. Mouse copy/paste function Highlight a text fragment to copy. Move the mouse cursor to any input line. Click the mouse middle button.

The highlighted text is pasted into the input line. The input place can be anything, from a browser navigator bar to a text editor or a shell command line.

The shell The shell is a way to give commands to the computer trough with the keyboard only, and if mastered, it can be very fast and exact. To start a new shell window, you double click the shell icon on the shortcut bar at the lower left of your LINUX desktop. A window comes up, in which you can type commands and execute them by hitting “Enter”. Linux command list

ls What is in the folder I am right now? Give a list. Example: user@computer:~/praktikum> ls [Enter] my.pdb my.hkl another.pdb

ls -lisa What is in the folder I am right now? Give a list with details.

pwd Where am I? Please give me the path. Example: user@computer:~/praktikum> pwd [Enter] /net/home/user/praktikum

more [file] Show me this file in the shell.

kate [file] Open this file in the Editor. If no file name is given, a new file is generated.

cd [path] Changes your folder to the given path.

mv [file1] [file2] Move file1 to file2.

cp [file1] [file2] Copy file1 to file2.

Mkdir [path] Make a new folder with this name.

rm [file] Delete file.

rmdir [path] Delete empty folder.

firefox Opens a firefox browser window. Alternatively, you can use conqueror, which also serves as file browser.

Page 12: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Other important functionalities

[TAB] The TAB key lets you automatically complete your input. If there is several possibilities, the first TAB hit will yield no reaction, the second will show all available options. Example: user@computer:~/praktikum> ls [Enter] my.pdb my.hkl another.pdb user@computer:~/praktikum> more a [TAB] becomes automatically: user@computer:~/praktikum> more another.pdb but: user@computer:~/praktikum> more m [TAB] yields nothing. user@computer:~/praktikum> more m [TAB] [TAB] gives: my.pdb my.hkl user@computer:~/praktikum> more m

[Strg] + [C] Stops the command you execute. For example when viewing a file with “more”.

* Replaces in file names a random string. Example: user@computer:~/praktikum> ls *.pdb [Enter] my.pdb another.pdb

~ Means your home folder in a path. Example: user@computer:/media> ls ~/praktikum my.pdb my.hkl another.pdb

/media/ In this folder, you can find a USB stick you’ve attached to your computer. The name of your USB stick depends. Example: user@computer:~/praktikum> cd /media/ [TAB] USBstick/ user@computer:~/praktikum> cd /media/USBstick/ [Enter]

[file] If the file is in a certain folder you are not in, you can also type [path/file]. Example: user@computer:~/praktikum> cp meine.pdb /media/USBstick/deine.pdb

.. .. refers to the folder below the one you’re actually in and can be used in any [file] part of a command. Example: user@computer:~/praktikum/1mbn/> cp meine.pdb ../deine.pdb [Enter]

& If & is attached to a random command, that command is executed in the background, so the shell stays available for typing another command. This is especially useful in connection with kate. Example: user@computer:~/praktikum> kate meine.pdb & user@computer:~/praktikum> firefox &

Page 13: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

PART 2: PREDICTION, INTEGRATION AND PHASING

Four levels of protein structure: Most proteins are polymers of 20 different amino acids joined by peptide bonds. The sequence of the different amino acids in a protein, which is directly determined by the nucleotide sequence in the encoding gene, is its primary structure. This in turn determines how the protein folds into higher-level structures. The secondary structure of a polypeptide chain can take form of either alpha helices or of beta strands, formed through regular hydrogen-bonding interactions between N-H and C=O groups in the main chain. Secondary structure elements, often connected by non-regular loops, fold into a tertiary structure. Many proteins are folded chains of more than one polypeptide; this constitutes the quarternary structure of the protein. Amino acids A list is given as photocopy on next page. Amino acids have different side chains and therefore interact differently with each other and water. These differences contribute to protein fold and function:

- Hydrophobic amino acids tend to form weak Van-der-Waals interactiosn with each other, and to avoid contact with water or hydrophilic residues. Therefore, they pack against each other. Phenyalanine can also participate in polar interactions.

- Hydrophilic amino acids can form hydrogen bonds to each other, the backbone, polar solvent molecules, such as water. Depending on protonation states they are charged differently. They can also form electrostatic interactions and stronger salt bridges.

- Amphiphilic amino acids have polar and unpolar character. - A Cysteine residue can form a disulfide bridges with another Cysteine

residue of a nieghbouring chain.

Alpha helices Alpha helices are the most common secondary structure element. Their hydrogen bonding pattern causes a dipole over the whole helix. At the amino terminus, an alpha helix is charged positively, at the C-term negatively. The helix has 3.6 amino acid residues per turn, corresponding to a 100° rotation per residue. Sometimes, one face of the helix is charged, while the other face’s residues are unpolar. Such „amphiphatic helices“ further helix packing onto each other. About 20 residues constitute a helix 30 Å long, which is the typical width of bilipid membrane. Beta sheets Other than alpha helices, beta sheets involve hydrogen bonds between residues distant from each other in the linear sequence. The strands can run in the same d irection (parallel) or in opposite direction (anti-parallel). The side chains stick on top and bottom of the sheet. The sheet may also curve around on itself, forming a beta barrel.

Page 14: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Secondary structure prediction Certain amino acid types occur more frequently in beta sheets, alpha helices or loop parts of polypeptide chains. Long sidechain-residues, such as leucine, methionine, glutamine and glutamic acid are often found in helices, because the sidechains can project out from the crowded cylinder region. In contrast, branched side chains (valine, isoleucine and phenylalanine) occur more often in beta sheets. There are various empirical rules for the prediction of secondary structure from the sequence. Secondary structure prediction So, let’s do one! First, we need a sequence. You will find the sequence as a file lyso.txt in your folder. But for most servers, we will need FASTA format. To convert, open the file with a text editor. Wikipedia states: „A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (both are optional). There should be no space between the ">" and the first letter of the identifier. It is recommended that all lines of text be shorter than 80 characters. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. A simple example of one sequence in FASTA format: >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV

EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG

LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL

GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX

IENY“

After you have given a name and description to your sequence, save this as lyso.fas. Also, for ARP/wARP, you are going to need a PIR format file. A sequence in PIR format consists of:

- A line starting with >; followed by a two-letter-code describing the sequence type (P1, F1, DL, DC, RL, RC, or XX) and a semicolon, followed by the sequence identification code.

- The next line describing the sequences - The third line contains the sequence itself. The end of the sequence is marked by a "*" character.

The PIR format is also often referred to as the NBRF format. It looks like this: >P1;CRAB_ANAPL

ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN).

MDITIHNPLI RRPLFSWLAP SRIFDQIFGE HLQESELLPA SPSLSPFLMR

SPIFRMPSWL ETGLSEMRLE KDKFSVNLDV KHFSPEELKV KVLGDMVEIH

GKHEERQDEH GFIAREFNRK YRIPADVDPL TITSSLSLDG VLTVSAPRKQ

SDVPERSIPI TREEKPAIAG AQRK*

Please also save your sequence as lyso.pir for later. Now, you can choose a web server to do the prediction:

www.biogem.org/tool/chou-fasman www.meilerlab.org/web/view.php npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html bioinf.cs.ucl.ac.uk/psipred

Page 15: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Homology modelling Every protein has its own tetiary structure, often combined from more common motifs. A popular way to predict the tertiary structure of a protein to find defining motifs for a certain known domain in the questionable sequence. Try (might take a while, and you need an e-mail adress to send the results to):

www.cbs.dtu.dk/services/CPHmodels/ www.fundp.ac.be/urbm/bioinfo/esypred/ geno3d-pbil.ibcp.fr/cgi-bin/geno3d_automat.pl?page=/GENO3D/geno3d_home.html

There is a big number of other things you can do with a sequence alone, most possibilities can be found at expasy.org/tools . The PDB All polypeptide and nucleic acid structures that have been solved by experimental methods, such as NMR, X-ray, electron diffraction or microscopy and have been published to a journal, can be found in the Protein Data Bank: pdb.org. Access is free and it makes a great tool for protein research. There is a lot of extra tasks related to the PDB. Try to make a report on crystallization conditions for a search result. Also, try to combine different searches in the „advanced search“ option. Download a PDB file and look at it in a text editor. What information do you see; what columns are contained in the ATOM lines? What important information is not contained in the PDB file?

Page 16: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 17: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

GGNB Course A57

Practical forMacromolecular X-ray Crystallography I

October 2010

Tim Grüne

Tutors:Tobias Beck, Navdeep Sidhu, Andrea Thorn

Page 18: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

CONTENTS 1 DATA INTEGRATION WITH HKL2000

Contents1 Data Integration with HKL2000 1

1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Logging in and setting up our files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Data Integration with HKL2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Box- and Spot-Size for Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1 Data Integration with HKL2000

1.1 Getting StartedThe major part of this practical is computer based. The computing environment is completely Linux-based.This section contains a few basics to get familiar with it.

1.2 Logging in and setting up our filesThere are 15 accounts set up for this practical with

usernames: ggnb01 . . . ggnb15passwords: ggnb01 . . . ggnb15

Per week each group must select one username and stick to it throughout the duration of the practical.The computers have there names printed on them. The ones suited for this course are

• euryale

• klio

• medusa

• stheno

• urania

All computers have the same setup and your files are accessible from all these computers. Therefore it is notimportant that you stick to the same computer all the time.The computer network in the Sheldrick group is separated from the internet. In order to access the internet,different computers must be used (the tutors will tell you which ones they are).The username is the same on those computers, but the password is

P43212

for all groups. Passwords are case sensitive, so make sure you type it with a capital ’P’.After logging in your desktop looks a little desolate. You are going to need a terminal from which you cantype commands. In order to get your first terminal, type

1

Page 19: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.2 Logging in and setting up our files 1 DATA INTEGRATION WITH HKL2000

Alt-F2

which opens a small windows that allows you to type in commands. Type

konsole

to get a terminal.

From now on, the terminal will appear in your control menu (the blue button at the bottom of the screen withthe “K”).

In order to keep track of the process you should keep the different parts of this course in separate directories.For the first part, create a directory integration using the linux command mkdir by typing in theterminal window

2

Page 20: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

#> mkdir integration#> cd integration

The first command creates the directory, the second one changes into that directory.

1.3 Data Integration with HKL2000Now you can start the integration program HKL2000 by typing this command in the same terminal. Thefirst window that appears asks for the detector type. Our data were collected on a Mar 345 image plate.

Selecting the correct detector.

Once you selected the Mar345 detector and clicked OK, the main window of HKL2000 appears:

3

Page 21: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

Make sure that the New Output Data Dir lists the directory

/net/home/ggnbXX/integration

where ’XX’ corresponds to the number in your username.

Correct output directory?

Next you have to tell HKL2000 were the data frames are (New Raw Data Dir), because they are not inthe directory you just created.In the subwindow Directory Tree you have to double-click on the net-folder, so that you can seeganymede and home. Double-click you way through to

net->ganymede->ggnb-I->frames

Thereafter click on the >> below New Raw Data Dir (not the one below New Output Data Dir!)When the New Raw Data Dir points to the correct directory /net/ganymede/ggnb-I/frames,you can click on the Load Data Sets-button and you can see the frames that HKL2000 found in thatdirectory.

Loading data sets.

After OK, the HKL2000 window looks like the next picture. Note that the bottom fields are not filled in asmuch as HKL2000 could extract from the file header.

4

Page 22: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

Data set information.

When you click on the Display button, HKL2000 shows the first frame (if you do not see anything,move the main window aside. Sometimes HKL2000 opens the new window behind the main window.)

Frame number 1.

5

Page 23: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

1.3.1 Indexing

We must next ask HKL2000 to index our data, i.e.

1. find an (approximate) unit cell consistent with the pattern in the frame

2. assign the Miller indices to the reflections

First select the Index-tab at the top of the main window. HKL2000 must find the reflections on the image.This is what the Peak Search button in the main window is for.

Click the Index. HKL2000 opens a new window which offers a choice of Bravais lattices. The green onesfit well with the diffraction pattern, the red one fit only poorly. Furthermore, the Bravais lattices are orderedaccording to the degree of symmetry. Pick the top green one, primitive tetragonal. The colour ofthe peaks in the frame window changes colour.

6

Page 24: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

Once an initial cell has been found, the unit cell parameters and the experimental settings (detector distance,. . . ) can be refined, i.e., their values improved based on the collected data. Hit the Refine button in the mainwindow 3-4 times and check whether the numbers in that window do not change much any more. Now clickFit All in order to include all parameters, also select the button Mosaicity and click on the Refine buttonagain several times until the numbers stabilise and do not change much anymore. The colour on the framedisplay has changed again. The colour indicates whether a reflection is a full reflection, i.e. whether thecomplete reflection has been recorded on that image, or whether part of it is on one of the adjacent frames.

Question: Why are the spots on the circles not enclosed? Is HKL2000 making a mistake?

1.3.2 Box- and Spot-Size for Integration

You can give HKL2000 an idea about how big the spots are and how much area the program should use inorder to determine the background around each spot. Both are important settings for a proper data integra-tion.Click the Zoom window-button in the Frame Display. This opens a third window. Middle-click in theFrame Display in an area with spots and adjust the brightness (in the Frame Display) so that you canclearly see the spots.

7

Page 25: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

The Int. box-button in the Zoom window shows the current settings for the background area (square) andthe spot size (circle). There are actually two circles, the area in between the two is the “transition” betweenspot and background. You can see that the boxes overlap with some of the boxes, and the circles seem tosmall to encompass the spots, at least the larger ones.Click Zoom in twice for a good view.First click on Box size in the main window. A setting 20-25 seems to be a reasonable setting for this dataset. You must click Refine in the main window to see the effect of the change.Similarly, increase the Spot size so that the spots fit into the circles. It does not matter, if the circles are toobig, but it does if they are too small. A Spot size of about 0.75 seems a good choice.

Now click the Refine-button in the main window two to three more times to adjust to the new settings.Before advancing to the integration step, you can tell the program where the shadow of the beam stop isso that it excludes this area during integration. With weak data this can be important to improve the dataquality, but it is also good practice for high resolution data.Please ask the tutors to show how to carry out this step.Then click the Integrate-button to start the integration, and lean back for a few minutes.

1.3.3 Integration

Integration starts automatically. The Integration-tab of the main window shows the progress of integra-tion. The program now further improves the experimental parameters. The bottom right window shows thevariation of some of the parameters. They usually fluctuate a little.

With most decent data sets, there is lit-tle to adjust during or after integrationwith HKL2000, and once the data inte-gration has completed, you can select theScale-tab in the main window in order toscale the data and determine the spacegroup.

8

Page 26: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

1.3.4 Scaling

HKL2000 offers to determine the space group as much as possible. Click on Space Group Diagram in themain window and wait until a new window opens.

HKL2000 suggests the space group P422, the top of the list. HKL2000 does not take screw axes intoaccount. Your tutors will tell you the correct space group you should select in the main window. A moredetailed discussion about space group determination is part of the advanced course.

After the sets are scaled (Scale Sets button) HKL2000 prints some statistics about the data set in the mainwindow, including some graphs.You can discuss with the tutors the meaning of:

• R-factor

• Completeness of the data set

9

Page 27: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000

• Plots of χ2 and R-factor vs. resolution and vs. frame number

• Plot of I/σI vs. resolution for determination of the data set resolution.

In order to get a feeling about the quality of the data set, it is necessary to gather experience with severaldata sets, rather than just providing you with numbers here. A better understanding of these numbers is alsopart of the advanced part of this course.

At the bottom left of the main windowyou can see the result of your integra-tion: The Output File output.sca.When you work on your own data set,you should change this name to some-thing more meaningful!

10

Page 28: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 29: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Molecular Replacement: Introduction Let’s suppose that we do not yet know the structure of hen egg-white lysozyme. We have collected and integrated a data set from our crystals. We also know the sequence. Now, we need to solve the phase problem for the data. Every measured reflection is corresponding to a wave, the so-called structure factor – but we measured only its amplitude and not the phase. Luckily, phases of reflections are dependent on each other. If we know some, we can determine all. An electron density map corresponds to a set of structure factors. They contain the same information and can be converted into one another by Fourier transformation/synthesis. So, if we would already know the structure, we would also know its electron density and could easily get structure factors (including phases). In MR, we take a structure as similar as possible to what we expect. This structure is positioned in the unit cell as similar as possible as the questionable structure. Then, we calculate phases and combine them with the measured intensities (amplitudes) of our reflections to yield structure factors. This is our first guess for the phases – the so-called “initial solution”. Molecular Replacement: Preparing the model What to take as a good model? Generally, a protein with high sequence identity and presumably similar fold makes a good model. For our data set, we could take the human lysozyme structure 1GF6. Please download both the PDB file as well as the FASTA format sequence from the PDB. Let’s do an alignment. You go to the online ClustalW service: ebi.ac.uk/Tools/clustalw2 Give first the hen egg white lysozyme sequence and as second the one of human lysozyme in the field labeled “Enter or paste a set of sequences in any supported format:“ and run. After waiting a short moment, you get an alignment of the sequences, which you can even color. Please hit „View alignment file“ and save this file for later use on your USB stick. Make sure its „last name“ is „.aln“. You have to note (at least roughly) the % sequence identity. Now you move to the computer where you did the data integration and transfer the 1GF6.pdb and ALN file to your directory. Start CCP4i, using your project. Left, under „Molecular Replacement“, you can find „Create search model“. Hit it and open chainsaw, a program for automatic deletion of atoms which differ between search model and your sequence:

Give the 1GF6.pdb, your alignment file and choose both „last common atom“ as well as „ALN/ClustalW“ format. Run this. All side chains are pruned after the last common atom.

Page 30: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Molecular Replacement: Using Phaser Now, we have everything we need: A MTZ file with our experimental data, space group and unit cell. A sequence of the target protein. A model, which was prepared to be as similar as possible to the target protein’s structure. Let’s start:

Before you hit „Run“, don’t forget to give the correct ensemble and sequence file!

Page 31: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

After about five minutes, PHASER should be finished. Did it solve the phase problem? Even before we look at the solution graphically, there are some indicators: Double click on the Job in the overview menu to open the LOG file:

Most MR programs do a rotation of the model in the unit cell first, and then score them. Then, a translation search is executed with the best rotation solutions and scored. Then clashes between moleculea and their symmetry equivalents are calculated and eventually omitted. Such a search is called 3+3 search, as first the 3 rotational degrees of freedom are evaluatedm and than the 3 translational degrees of freedom. If the asymmetric unit contains more than one molecule, the procedure is repeated with the next molecule. But we have only one, and because of that, the output is very short; at the bottom of the LOG file, we read: RFZ is rotation function Z score. A high score indicates a good positioning. TFZ is translation function Z score. A score less than 5 indicates no solution, at 8 or higher the phase problem is definitely solved. LLG is the log likelihood gain. If several moelcules are placed, it should increase with each one packed.

Page 32: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 33: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

COOT Coot is started by the command coot. It is a program to display electron density maps and molecules and to manipulate molecular structures, especially of proteins. By far not all tools and options coot offers are explained here. Please do not hesitate to try around as much as you like. This may even yield some faster ways to solve your tasks! Coot is mainly used with the mouse:

[Left-mouse Drag] Rotate view

[Shift] [Left-Mouse Click] Label Atom

[Right-Mouse Drag] Zoom in and out

[Middle-mouse Click] Center on atom

[Scroll-wheel] Map contour level

[Ctrl] [Left mouse Drag] Move center

THIS IS WHAT IT SHOULD LOOK LIKE: DISPLAY MANAGER

The Display manager can be opened by choosing “Draw” from the menu-bar and then selecting “Display manager”. Here you can select which maps and molecules you can see and how they are drawn. The “Display” toggle buttons control whether a molecule (or map) is drawn and the “Active” button controls if the molecule is clickable. The "Scroll" radio buttons sets which map is has its contour level changed by scrolling the mouse scroll wheel. Next to each molecule is a menu, where you can change between several representations.

Page 34: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

THE MOST IMPORTANT MENU POINTS Here you can make your own notes! (Red options are explained on the following pages.)

Page 35: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

COORDINATES Display Coordinates Select “File” from the menu-bar, and then the “Open Coordinates” item. Coot displays a Coordinates File Selection window. If you click the "Filter" button, only suitable files are shown. Choose your file and click "Ok". Centering Select “Draw” from the menu-bar, and “Go to atom”. The “Go To Atom” window opens. You expand the chain (“+”), choose a residue by clicking on it and click “Apply” to center on Cα of this amino acid. By hitting [Space] and [Shift] you can move forwards or backwards in the sequence. Note that information about the atom which is centered is given in the status bar, such as occupancy and B factor. Measuring Measurement options can be found under “Measures → Distances & Angles”. After the measurement is chosen (distance, angle, torsion), the atoms involved are selected by clicking on them. If selected, the atoms will be labeled. When a sufficient number of atoms are selected, the result will be shown. Display of close contacts Select “Measures” from the menu-bar, “Environment Distances. . . ”. The menu opens, from which you check the “Show Residue Environment?” option. If you click “Label Atom?” as well, Cα atoms will be additionally labeled. To apply, you click “Ok”. The default maximum distance value corresponds to the maximum hydrogen bond length. Display symmetry equivalents To be found on “Draw” → “Cell & Symmetry”. In the appearing window, you should select Show Symmetry Atoms? → Yes. You can change the radius from the center to which symmetry equivalent atoms should be shown. You can also choose to show the unit cell. After selecting all option, hit “Ok”. You can zoom out and have a look at how the molecules pack together. MAPS Display maps Select “File” from the menu-bar, and then the “Auto Open MTZ” item. Coot displays a Dataset File Selection window. Select your file. You can only open REFMAC output .mtz files in this way. For other formats, such as .phs, select “File” and then “Open MTZ, cif or phs...” and choose you dataset file. Possibly, further input is needed to convert the file to a displayed map. Select a Map for model building Before model building and validation tools are used, a map has to be selected. If no map has been selected, a window to choose a map will automatically pop up instead of the option chosen. To select a map for model building, select “Calculate” from the menu bar and “Model/Fit/Refine...”. The rainbow-colored Model/Fit/Refine window opens. Choose “Select Map. . . ”, select your map (usually the map with “FWT PHWT”) and click “Ok”. Find difference map peaks “Validate → Difference Map Peaks” helps to find interesting features in a newly created map and model. Find unmodelled blobs To find density peaks which have not yet been covered by the model, choose “Validate” from the menu-bar, and “Unmodelled Blobs”. Hit “Find Blobs” in the pop-up and wait a short while. You will get a new window with a list of blobs, on which can be centered by clicking them.

Page 36: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

ADD AND REMOVE MOLECULES AND RESIDUES Adding small molecules You can add small molecules at the center by choosing “Calculate → Model/Fit/Refine… → Place Atom At Pointer”. A new window appears, and the desired molecule or ion can be selected. Below you can choose the file to which the new molecule is written, which in most cases should be the main protein model. If the small molecule doesn’t lie perfectly in the density, often “Calculate → Model/Fit/Refine… → Rigid body fit” helps! Adding residues To add a residue, center on a already existing sequence neighbor of the amino acid that you want to add. Choose “Calculate → Model/Fit/Refine… → Add terminal residue…” and click on the second-to-last residue. A new Gly residue shows up along a confirmation window. Before accepting the new residue, you can click-draw it to a better position. The new residue can be mutated after accepting. Adding a whole loop Make sure that you can see the unmodelled density well. Choose “Calculate → Fit Loop”. The “Fit Loop” dialog appears. Choose the file you want to edit (which has to be an opened model), the chain and residues and give the sequence in the box. Click “Fit Loop”. Deleting To delete an entity, choose the option “Calculate → Model/Fit/Refine… → Delete…”. The ”Delete Item” window pops up and what should be deleted has to be chosen by checkbox. Then click on whatever you want to delete. If you want to delete more than one entity, check “Keep Delete Active” and close with “Cancel” after deleting. MODEL IMPROVEMENT Mutate a residue To change the type of a residue, select “Calculate → Model/Fit/Refine… → Mutate & Auto Fit…” or “Calculate → Model/Fit/Refine… → Simple Mutate…” A selection window opens, from which the new amino acid type can be selected and by clicking on the residue you want to mutate, mutation is executed. There is also an option to mutate a whole range of residues at once under “Calculate → Mutate residue range” Rotamers Certain side chain conformations are common in proteins and the option “Calculate → Model/Fit/Refine… → Rotamers…” lets you choose from those so-called “Rotamers”. To do so, just check a residue shown in the pop-up window and examine whether it fits the density nicely. To keep it, click “Ok”. Real Space Refinement Real Space Refine is a very powerful tool for model improvement. With this option, a residue or a chain of residues (called “zone”) is moved into the density of the map. Click “Real Space Refine Zone” in the “Model/Fit/Refine” window. Then choose the first and the last residue you want to refine – if they’re the same, double-click. Coot displays the refined coordinates in white in the graphics and a new “Accept Refinement” window. You can accept or drag your white new coordinates to another position, where they are again put into the density. To drag only single atoms, hold [Ctrl] while clicking and dragging them. Regularize Zone Regularize zone optimizes the geometry of the residue or zone chosen. In all other respective, it works like “Real space Refine”. Often, both are used alternating.

Page 37: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

FINDING STRUCTURAL PROBLEMS Density Fit Analysis Choose “Validate → Density Fit Analysis”. Coot displays a bar graph; the bigger and redder the bar is, the worse is the fitting of that residue in the density. If you click on a block, the related residue is centered automatically. Ramachandran plot Choose “Validate → Ramachandran plot” to display the Ramachandran plot window. It is “dynamic” - when you edit the molecule, the Ramachandran plot gets updated to reflect those changes. Also the underlying phi/psi probability density changes according to the selected residue type. There are 3 different residue types: GLY, PRO, and not-GLY-or-PRO. When you mouse over a representation of a residue (a little square or triangle) the residue label pops up. The residue can be clicked to center this residue. In the Ramachandran plot window, the current residue is highlighted by a green square. Finding waters To include water oxygen into your model, use the option “Find waters…” from “Calculate → Model/Fit/Refine…”. In the appearing window, you can select the map and a cutoff value. You can also select whether the waters should be generated as a new PDB file or be added to an already opened model. As a general rule of thumb: If you use a “2Fo-Fc”-style map (blue, use the default value. If you want to use a difference map, you must change the sigma level (typically, to 3 sigma). Otherwise you run the risk of fitting waters to difference map noise peaks. Finding bad rotamers Choose “Validate → Rotamer Analysis”. Coot displays a bar graph; the bigger and redder the bar is, the worse is the rotamer compared to statistics of which rotamers occur often in proteins. If you click on a block, the related residue is centered automatically. In this menu, you can also identify if sidechain atoms are missing!

Page 38: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Check/Delete waters To check waters, you can make a list of (or even delete) waters which do not match the given criteria. This is done in a window under “Validate → Check/Delete Waters”. You can give cutoff values for:

- B factor - map sigma level - closest contact maximum/minimum

Water Checking starts, when “Ok” is clicked, and eventually a list generated.

MISCELLANEOUS

Background color The background color can be changed at “Edit → Background color”. This is especially useful for rendering screenshots.

Rendering Screenshots

Several methods for rendering can be found under “Draw → Screenshot”.

Page 39: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Refinement with REFMAC Your starting situation should be: You have generated an .mtz file with your experimental data and R free flags. You have refined your model as far as possible with the initial map. The model should contain as long chains as possible. If you have reached this state, run a refinement with REFMAC. For the input, you need the .mtz file with your measurement data and the .pdb file with the initial solution you got from tracing. Typically, COOT is used first to improve the model in real space and manually. REFMAC then tries – broadly speaking – to minimize the difference between the reflection intensities measured and the ones to be expected from the model. (In fact, it works on the absolute root mean square of the intensity – the amplitude of the reflection.) It does so mainly by changing the model – slightly moving the atom positions in the model and adjusting the B factors. You will get as an output the slightly changed model and a newly calculated electron density map with different phases.

MINIMAL REFMAC INPUT

Input amplitudes Give next to “MTZ in” the path and filename. Alternatively, you can click “Browse” and choose the file by clicking. PLEASE REMEMBER: THIS SHOULD NEVER BE AN OUTPUT MTZ OF REFMAC’S BUT ONE WITH THE EXPERIMENTAL DATA ONLY!!!

Model input Give next to “PDB in” the path and filename. Alternatively, you can click “Browse” and choose the file by clicking.

Page 40: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

OTHER USEFUL REFMAC INPUT

There are plenty of interesting features for refinement in REFMAC, so over time, you should get used to most of them (not only to the ones listed here).

Output map This is where the map for usage in coot is saved. Give next to “MTZ out” the path and filename. Alternatively, you can click “Browse” and choose the file by clicking.

Output model This is where the output model is saved. Give next to “MTZ out” the path and filename. Alternatively, you can click “Browse” and choose the file by clicking.

LIB IN Here you can give an external restraint file for ligands or monomers which are not in REFMAC’s so-called monomer library.

Refinement cycle number

Click on “Refinement parameters” to open a sub-window. The first line is “Do [10] cycles of maximum likelihood refinement” click on [10] and change the number. 20 is a good value to reach a better convergence, while 10 gives you quite rough results.

Resolution cutoff Click on “Refinement parameters” to open a sub-window. In the third line there is a check box (“Resolution range from minimum [ ] to [ ]”). Check the box. The default values are read from your data, but possibly, you want to put a maximum resolution cutoff to make the R values better, so you can input it by typing it into the second window!

NCS Restraints Click “Setup Non-Crystallographic Symmetry (NCS) Restraints”. A sub-window opens with the text “No NCS restraints are currently defined”. Click “Add NCS restraint”. You can now give details for the restraints between similar protein molecules in the same asymmetric unit. These restraints will make the two protein molecules more similar to each other.

TLS Refinement In addition to B factors, you can also refine the displacement of a whole chain or parts of it anisotropically (meaning direction-dependent). Therefore, the translation, libration and screw movement of the molecule part given will be calculated. This is done as follows: The third of the REFMAC window is “Do [restrained refinement] using [no prior phase information] input.” You click on [restrained refinement] and chose [TLS & restrained refinement] from the dropdown menu. The sub-window “TLS Parameters shows” up, but you only click in the main part on “Create TLSIN”. A window opens, where you can define groups. Usually, a good start is defining just every protein chain in your structure a one rigid body. Close the window and run the TLS-REFMAC job!

Your output should be an .mtz map and a new – refined – model as .pdb. Load them into COOT and try to optimize your structure further. You can repeat this several times, until all side chains have been mutated to the correct residue and no new terminal residues can be found. Also check by density fit and other validation tools in COOT. Then you should include the waters and finally, you could eventually try TLS refinement! Please note your R/Rfree-values per refinement cycle (best in a graph). It should steadily decrease!

Page 41: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

The meaning of resolution Resolution is the most often used quality indicator for X-ray data. It determines how detailed your electron density will be. Try this: Load the map lyso_30.mtz in COOT and your model. What features are missing compared to the map against which you have refined so far? B factors: Modelling the atomic displacement We average our atom positions in time and space, as a crystal with many single protein molecules is measured at least for some minutes. Therefore, the atom position might be spread out and in consequence also its electron density. To include this displacement in our model, we use the so-called B factors, which model the displacement of a single atom as a Gaussian function. Displacement can be modeled isotropic, meaning equal in all directions – or anisotropic, with different displacement widths in three directions. Also, there is in REFMAC the possibility to model isotropic displacement in combination with anisotropic movement of the whole protein in the crystal lattice – this is called TLS refinement. (TLS = Tensors for translational, librational and screw motions are applied.)

Page 42: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least
Page 43: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

PART 4: VALIDATION AND REPORTING THE STRUCTURE Display the Ramachandran plot of your structure. How many outliers are there and why has the backbone that uncommon conformation in this position? Now, you should upload your REFMAC-Output .pdb file to the MolProbity server: http://molprobity.biochem.duke.edu

Add hydrogens:

What is flipping? Explain to the others!

Page 44: GGNB Marcomolecular Crystallography I Practicalshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2011/pdfs/MolBio2011_ver1.pdftotal) ? Look at 4-Piperidinopiperidine in 1K4Y. Name at least

Look at the file. Can you see some hydrogens? Can you see hydrogen bonds? Then do “Analyze all-atom contacts and geometry“ on the MolProbity server.

Here is some explanation what the result page means:

Clashscore, all atoms Clashscore is the number of steric overlaps bigger than 0.4 Å per 1000 atoms. The “percentile” refers to how many structures of comparable resolution in the PDB would be below that level.

Poor rotamers Percent of rotamers which are very rarely found in proteins.

Ramachandran outliers Percent of residues with improbable backbone geometry.

Ramachandran favored Percent of residues with probable backbone geometry.

Cβ deviations Percent of residues where Cβ is not in the expected position.

MolProbity score A mixed criterion consisting of Ramachandran plot, clashes and rotamers. The “percentile” refers to how many structures of comparable resolution in the PDB would be below that level.

Residues with bad bonds Percent residues with strange bonds.

Residues with bad angles Percent residues with strange angles.

Now change to “Multi criterion chart”. Here you can now see which residues are causing bad scores. You possibly want to revise them, refine and put that again on the molprobity server! Search the PDB for a structure which seems fishy to you and check it with the Molprobity server. Tell the other students about interesting observations. Also: Often, ligands or molecules from the crystallization experiments have a wrong structure after refinement. How does that happen and why? Can you find an example?