Appendix II: Methods of Structure Determination

17
TABLE OF CONTENTS A.1 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES 1 Crystallization of proteins 1 Data collection 2 Data reduction and processing 2 Phase determination 2 Model building and refinement 3 Future prospects 4 A.2 ELECTRON MICROSCOPY 5 Cryo-electron microscopy and single-particle analysis 5 Electron crystallography 6 STEM and AFM microscopy 6 Electron tomography 7 Hybrid approaches and computational refinement 7 A.3 NMR SPECTROSCOPY 8 Biomolecular solution NMR—structure 8 Biomolecular solution NMR—dynamics 9 Biomolecular solid-state NMR—structure 10 Biomolecular solid-state NMR—dynamics 10 A.4 COMPUTATIONAL APPROACHES 12 Protein fold classification 12 Comparative protein structure modeling 12 Protein function prediction 13 Evolution and assembly 14 Protein ligand docking 15 Molecular dynamics simulations 16 Methods of Structure Determination

Transcript of Appendix II: Methods of Structure Determination

Page 1: Appendix II: Methods of Structure Determination

TABLE OF CONTENTS

A.1 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES 1

Crystallization of proteins 1

Data collection 2

Data reduction and processing 2

Phase determination 2

Model building and refinement 3

Future prospects 4

A.2 ELECTRON MICROSCOPY 5

Cryo-electron microscopy and single-particle analysis 5

Electron crystallography 6

STEM and AFM microscopy 6

Electron tomography 7

Hybrid approaches and computational refinement 7

A.3 NMR SPECTROSCOPY 8

Biomolecular solution NMR—structure 8

Biomolecular solution NMR—dynamics 9

Biomolecular solid-state NMR—structure 10

Biomolecular solid-state NMR—dynamics 10

A.4 COMPUTATIONAL APPROACHES 12

Protein fold classification 12

Comparative protein structure modeling 12

Protein function prediction 13

Evolution and assembly 14

Protein ligand docking 15

Molecular dynamics simulations 16

Methods of Structure Determination

Page 2: Appendix II: Methods of Structure Determination

Methods of Structure Determination 1

A.1 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES

Contributed by Bauke Dijkstra, University of Groningen, the Netherlands

General Reading

Rupp B (2010) Biomolecular Crystallography. Principles, Practice, and Application to Structural Biology. Garland Science.

Drenth J (2007) Principles of Protein X-ray Crystallography, 3rd ed. Springer Verlag.

Blow D (2002) Outline of Crystallography for Biologists. Oxford University Press.

https://en.wikipedia.org/wiki/X-ray_crystallography

Since the beginning of the twentieth century X-ray crystallography has emerged as a tremendously power-ful method to determine three-dimensional structures of small and large molecules at atomic resolution. The method depends on the diffraction of X-rays by crystals. X-rays are scattered by the electrons in the crystal and the interfer-ence pattern of the scattered X-rays (or ‘diffraction pattern’) can be used to establish the three-dimensional structure of the molecules in the crystal. The method requires crystals, which unfortunately, are not always easy to obtain, and for which considerable amounts of protein may be required. Usually, proteins are produced as recombinant material

from bacterial, human, or mammalian cells. This can be rather labor-intensive, but as shown by recent successes it is an effective approach, even for difficult membrane pro-teins. Purity is important: it is generally understood that the purer the protein, the better the chance of getting good crystals. This requirement concerns not only the absence of other, contaminating proteins, but also conformational purity, that is, the protein should be homogeneous and monodisperse; mixtures of monomers and higher oligom-ers or nonspecific aggregates should be avoided. A dynamic light scattering analysis usually serves as an effective ana-lytical tool to evaluate the dispersity of the protein solution.

Crystallization of proteins

Obtaining diffraction-quality crystals of biological macromolecules has long been a major stumbling block, but thanks to the development of effective crystallization screens, the automation provided by crystallization robots, and recombinant DNA techniques that allow the produc-tion of proteins in amounts that could not be obtained from any natural source, structural biology has advanced dramatically. The goal of crystallization experiments is to obtain single crystals of sufficient size that diffract to high resolution. Since it is not possible to predict the condi-tions under which a protein will form suitable crystals, the approach is to start with a broad screen of conditions that have been successful for other proteins. The most impor-tant variable is the precipitant, which can be an organic sol-vent, a concentrated salt solution (for example (NH4)2SO4), or a polyethylene-glycol variety. Precipitant concentration, pH, and additives such as metal ions and detergents are

additional parameters to be varied. Many commercial crys-tallization kits are available, which—together with robotic dispensing and crystallization systems—have made screen-ing for suitable crystallization conditions much less cum-bersome than it used to be. After initial crystallization hits have been obtained, the conditions need to be optimized. Optimization involves probing small changes in precipitant concentration, pH, and type and concentration of additives. For crystallization of membrane proteins, the lipidic cubic phase and other related approaches have been developed, which have had dramatic successes for the crystallization of G-protein-coupled receptors. Some general reviews on protein crystallization and crystallographic structure determination recommended to the interested reader are listed below. The basic methodology and practice of X-ray crystallographic structure determination are covered in the textbooks mentioned at the beginning of this section.

Garman EF (2014) Developments in X-ray crystallographic structure determination of biological macromolecules. Science 343:102–108 (doi: 10.1126/science.1247829).

Giegé RA (2013) A historical perspective on protein crystallization from 1840 to the present day. FEBS J 280:6456–6497 (doi: 10.1111/febs.12580).

Loll PJ (2014) Membrane proteins, detergents and crystals: what is the state of the art? Acta Crystallogr F70:1576–1583 (doi: 10.1107/S2053230X14025035).

McPherson A & Gavira JA (2014) Introduction to protein crystallization. Acta Crystallogr F70:2–20 (doi: 10.1107/s2053230x14019670).

McPherson A & Cudney B (2014) Optimization of crystallization conditions for biological macromolecules. Acta Crystallogr F70:1445–1467 (doi: 10.1107/S2053230X13033141).

Russo Krauss I, Merlino A, Vergara A & Sica F (2013) An overview of biological macromolecule crystallization. Int J Mol Sci 14:11643–11691 (doi: 10.3390/ijms140611643).

Page 3: Appendix II: Methods of Structure Determination

Methods of Structure Determination 2

Data collection

After suitable crystals have been obtained, the X-ray diffraction pattern needs to be measured. Although labo-ratory X-ray sources can be used for routine projects with well-diffracting crystals, the more challenging projects are invariably brought to synchrotrons, which because of the much higher intensity of their X-ray beams can cope with much smaller and weakly diffracting crystals. Indeed, the construction of highly brilliant synchrotrons over the last two decades has enabled routine data collection on very small crystals of less than 0.1 mm in size. The development of 1 µm X-ray beam lines at synchrotrons has now even allowed crystals to be scanned for the best spot for data col-lection. Technical developments continue, and several syn-chrotrons now aim to boost their brilliance by reducing the horizontal emittance of the electron beam in the synchro-tron accelerator by a factor of 30–50. Reducing emittance is an effective way of concentrating many more photons into the emitted X-ray beams. With proper adaptation of the beam line optics, an estimated increase in brilliance of up to a factor of one million may be obtained.

Routine data collection at synchrotrons has been auto-mated to a high degree. The crystal is rotated about an axis in small angular increments while being exposed to the X-rays, and during each step a diffraction pattern is recorded. Immediately after the first pattern has been collected it is visually inspected for the presence of well-resolved, non-overlapping single spots extending to high resolution, to decide whether further data collection is worthwhile. Resolution is an important criterion, as is mosaicity. Mosaicity results from the fact that a crystal contains a large number of individual growth domains that are slightly misaligned against each other. Mosaicity causes reflections to spread out over multiple diffraction images, and may cause overlap of reflections, which is detrimen-tal to obtaining reliable reflection intensities. Thus, the best crystals are those that diffract to high resolution and have low mosaicity. Depending on the internal symmetry, crys-tals may need to be rotated over a total range to obtain a complete diffraction data set.

Eriksson M, van der Veen F & Quitmann C (2014) Diffraction-limited storage rings – a window to the science of tomorrow. J Synchro-tron Rad 21:837–842 (doi: 10.1107/S1600577514019286).

Data reduction and processing

Once the diffraction images have been collected, the raw pixel intensities in the individual images are integrated, and the intensities from successive images are combined to give a full set of reflection intensities. After applying vari-ous corrections, the reflection intensities are converted into structure factor amplitudes, which are proportional to the square root of the measured intensity data. The resulting list of indices with properly scaled structure factor amplitudes and their estimated standard errors are used during the

stages of structure determination and refinement. Several programs are available to accomplish the data reduction and scaling tasks, such as XDS, HKL-3000, and iMosflm.

It is good practice to keep the set of integrated but unmerged reflections for future reference in case of doubt about the crystal symmetry or space group. Redundant data can always be merged again, but missing data cannot be reconstituted. For further information see the textbook by Bernhard Rupp, cited at the beginning of this section.

Kabsch W (2010) XDS. Acta Crystallogr D66:125–132 (doi: 10.1107/s0907444909047337).

Minor W, Cymborowski M, Otwinowski Z & Chruszcz M (2006) HKL-3000: the integration of data reduction and structure solu-tion – from diffraction images to an initial model in minutes. Acta Crystallogr D62:859–866 (doi: 10.1107/S0907444906019949).

Powell HR, Johnson O & Leslie AG (2013) Autoindexing diffraction images with iMosflm. Acta Crystallogr D69:1195–1203 (doi: 10.1107/S0907444912048524).

Phase determination

In the next step of structure determination an electron density map is calculated. This is done by calculating an inverse Fourier transform from the set of structure factor amplitudes and the accompanying phases. However, the diffraction data give us only the amplitudes, and the phases need to be determined in other ways. The first method to obtain phases was the isomorphous replacement method. A compound with a heavy atom (for example Au, Hg, or Pt) is diffused into the protein crystals, and when it binds to the protein at a few well-defined positions, will modify the diffraction pattern. From the differences with the dif-fraction pattern of the native protein crystal, the positions of the heavy atoms can be derived (by so-called ‘Patterson’ methods). These heavy atom positions can, in turn, be used

to calculate two possibilities for the phase of a given reflec-tion. With two different heavy atom compounds binding at different positions a unique (but not error-free) solution for the phase can be obtained. The isomorphous replacement method was later extended with inclusion of the anoma-lous signal of the heavy atom scatterers. With the advent of recombinant expression of proteins in bacteria it became possible to incorporate heavy atoms directly into the pro-tein, with the incorporation of selenomethionine residues as prime example. Selenium provides a very useful anoma-lous signal that can be used for phasing. Progress in phas-ing software and improved data collection strategies at syn-chrotrons have made experimental phase determination successful with very weak anomalous signals. Even sulfur

Page 4: Appendix II: Methods of Structure Determination

Methods of Structure Determination 3

atoms can now provide a usable anomalous signal. These methods are complemented by density modification tech-niques (solvent flattening) to resolve phase ambiguities. The isomorphous replacement method has been implemented in software suites like Phenix and CCP4.

A completely different method to obtain phase informa-tion is via molecular replacement. In molecular replace-ment Patterson methods are used to find the orientation and

position in the crystal unit cell of a related protein whose structure is already known. The method has now been refined to the point that it can succeed even with very dis-tant models or even secondary structure elements. Phaser and Molrep are two widely used programs for molecular replacement, both of which are available in the CCP4 suite of programs for macromolecular structure determination by X-ray crystallography.

Adams PD, Afonine PV, Bunkoczi G et al (2011) The Phenix software for automated determination of macromolecular structures. Methods 55z:94–106 (doi: 10.1016/j.ymeth.2011.07.005).

McCoy AJ, Grosse-Kunstleve RW, Adams PD et al (2007) Phaser crystallographic software. J Appl Crystallogr 40:658–674 (doi: 10.1107/s0021889807021206).

McCoy AJ, Nicholls RA & Schneider TR (2013) SCEDS: protein fragments for molecular replacement in Phaser. Acta Crystallogr D69:2216–2225 (doi: 10.1107/s0907444913021811).

Vagin A & Teplyakov A (2010) Molecular replacement with MOLREP. Acta Crystallogr D66:22–25 (doi: 10.1107/s0907444909042589).

Winn MD, Ballard CC, Cowtan KD et al (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D67:235–242 (doi: 10.1107/s0907444910045749).

Model building and refinement

Once an electron density map has been calculated, robust software is available for automatically building atomic models into the electron density. If this is not pos-sible, programs such as Coot allow one to do manual model building by fitting amino acid residues into the electron density. This is a rewarding part of protein structure deter-mination: the investigator who builds the structure is the first one to see a new structure emerging! However, model building is generally far from perfect, and the resulting model may contain errors in geometry (unrealistic back-bone torsion angles, bond distances, and bond angles; too close contacts), and may suffer from bad agreement with the observed structure factor amplitudes. Therefore, the model must be refined, that is, the geometry must be improved and the model must be better matched to the observed structure factor amplitudes. This will have the effect of improving the phases, which in turn results in clearer maps, allowing improvement of the model.

Refinement is done through adjustment of the atomic coordinates (and in later stages also the individual B-fac-tors) to fit the diffraction data better. To monitor the progress of refinement, the ‘R-factor’ is used, which is a measure of the discrepancy between the structure factor amplitudes calculated from the model and the observed ones. Thus, the lower the R-factor is the better. However, the R-factor can be made artificially low by including many

more variables in the model to be refined. To counter this, a small fraction of the structure factors (~5%), is set aside and never used in the refinement, but only to calculate an independent R-factor, the free R-factor, for cross-validation purposes. The free R-factor gives an independent measure of the refinement progress. The drop of the R-factors dur-ing the refinement is more important than the actual value, but generally final values lie between 15% and 25% with the free R-factor being slightly higher than the R-factor. In the final refinement stages, TLS refinement (for translation, libration, and screw-rotation of a group of atoms) can give a good approximation of anisotropy present in the protein molecule or a domain, adding only a few parameters to the model.

Refinement is alternated with visual inspection of the new model and the new electron density map, allowing the correction of errors and the inclusion of not yet modeled parts of the structure and solvent molecules, after which a new round of refinement is started. Refinement and model building is thus an iterative process until no more improve-ment can be obtained.

Popular refinement programs are Refmac5 and phenix.refine. They offer maximum-likelihood targets, and can also use experimental phase information and neutron dif-fraction data as additional observations. Restraints are used to prevent the model from attaining unrealistic geometries.

Adams PD, Afonine PV, Bunkoczi G et al (2011) The Phenix software for automated determination of macromolecular structures. Methods 55:94–106 (doi: 10.1016/j.ymeth.2011.07.005).

Afonine PV, Mustyakimov M, Grosse-Kunstleve RW et al (2010) Joint X-ray and neutron refinement with phenix.refine. Acta Crystal-logr D66:1153–1163 (doi: 10.1107/S0907444910026582).

Murshudov GN, Skubák P, Lebedev AA et al (2011) REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystal-logr D67:355–367 (doi: 10.1107/s0907444911001314).

Page 5: Appendix II: Methods of Structure Determination

Methods of Structure Determination 4

Future prospects

A promising new direction is the development of X-ray free electron lasers (XFELs), which are the brightest X-ray sources currently available, with peak brilliance a billion times higher than current synchrotron radiation. XFELs produce extremely short pulses of X-rays (of typically 40–100 fs) containing 1012 X-ray photons/pulse that can be focused to a submicrometer focal spot. They should allow acquisition of X-ray diffraction patterns from <10 µm-sized crystals, which are too small for data collection on present-day synchrotrons. However, since the intensity of the XFEL is so high, the sample is destroyed after one shot, requiring the use of large sample volumes and thousands of micro-crystals to obtain a full diffraction data set. The develop-ment of such ‘serial crystallography’ at XFELs has stimu-lated efforts to apply similar approaches at synchrotrons.

At synchrotrons, time-resolved studies are limited by the electron bunch duration of about 100 ps. Since XFEL pulses have a much shorter duration, this opens up pos-sibilities to study events that are in the picosecond realm. The report by Barends et al. cited below gives an insightful example of the power of XFELs to probe reactions using the dissociation of carbon monoxide (CO) from the heme Fe2+ ion in myoglobin as an example. The authors show that, within 500 fs after photolysis of the Fe-CO bond, collective motions occur that result from the coupling of vibrational modes of the heme group upon CO dissociation to global motions of the protein.

Nevertheless, many technical difficulties have yet to be overcome before XFEL data collection will become main-stream in the field of structural biology.

Barends TR, Foucar L, Ardevol A et al (2015) Direct observation of ultrafast collective motions in CO myoglobin upon ligand dis-sociation. Science 350:445–450(doi: 10.1126/science.aac5492).

Feld GK & Frank M (2014) Enabling membrane protein structure and dynamics with X-ray free electron lasers. Curr Opin Struct Biol 27:69–78 (doi: 10.1016/j.sbi.2014.05.002).

Schlichting I & Miao J (2012) Emerging opportunities in structural biology with X-ray free-electron lasers. Curr Opin Struct Biol 22:613–626 (doi: 10.1016/j.sbi.2012.07.015).

Page 6: Appendix II: Methods of Structure Determination

Methods of Structure Determination 5

A.2 ELECTRON MICROSCOPY

General Reading

Glaeser RM, Downing K, DeRosier D et al (2007) Electron Crystallography of Biological Macromolecules. Oxford University Press.

Spence JCH (2003) High Resolution Electron Microscopy, 3rd ed. Oxford Science Publications.

Cryo-electron microscopy and single-particle analysis

Thanks to recent experimental developments, cryo-electron microscopy (cryo-EM) is emerging as a power-ful and widely applicable approach—and in many cases, the method of choice—for determining the structures of macromolecular assemblies. The widely used term, single-particle analysis (SPA), is in fact a misnomer, as it refers to analysis of sets containing many images of a given kind of particle. The particle is single only in the sense that it is free-standing, that is, non-crystalline. In the 1980s, the key discovery was made that macromolecules can be pre-served in a near-native state by rapid freezing: this vitri-fies a thin film of buffer containing the particles, which remain hydrated. To image them with minimal radiation damage, a ‘low dose’ technique is employed. As transmis-sion electron micrographs are two-dimensional projections of three-dimensional objects, the viewing geometry of each

particle must be determined. Combining these data leads to a three-dimensional reconstruction of the object of inter-est. An alternative approach, usually limited to somewhat lower resolution, is to perform individual three-dimen-sional reconstructions by tomography (see below) and average these maps. The achievable resolution depends to a considerable extent on the particle’s properties. However, as the result of progressive improvements in the reconstruc-tion software, the availability of microscopes with field emission gun illumination, the development of automated procedures that allow the collection of very large data sets (up to 106 particle images or even more), and the recent introduction of efficient ‘direct detection’ digital cameras, the resolutions achieved are, in the best cases, comparable to what can be attained in X-ray crystallography.

Cong Y & Ludtke SJ (2010) Single particle analysis at high resolution. Methods Enzymol 482:211–235 (doi: 10.1016/S0076-6879(10)82009-9).

Many of the advances in SPA were first achieved in studies of icosahedrally symmetric particles, mainly virus capsids, in part because their symmetry allows a high degree of internal averaging. This review focuses on additional complications that arise in SPA of par-ticles with lower symmetries.

Egelman EH (2006) The iterative helical real space reconstruction method: surmounting the problems posed by real polymers. J Struct Biol 157:83–94.

Helical filaments constitute an important class of specimens studied by three-dimensional reconstruction of EM images since its ear-liest days. Originally, these analyses were performed by Fourier-Bessel techniques applied to transforms of whole filaments, whereby the structural information is concentrated on ‘layer-lines’ in the Fourier plane. More recently it has been found advantageous to dis-sect filament images into short segments and analyze them by single-particle methods, as described in this review.

Briggs JA (2013) Structural biology in situ – the potential of subtomogram averaging. Curr Opin Struct Biol 23:261–267 (doi: 10.1016 /j.sbi.2013.02.003).

Reviews principles and practice of subtomogram averaging as a form of SPA.

Liao HY & Frank J (2010) Definition and estimation of resolution in single-particle reconstructions. Structure 18:768–775 (doi: 10.1016/j.str.2010.05.008).

It has been difficult to achieve consensus on the most appropriate resolution criterion to use in assessing SPA structures. The issue is discussed in this paper.

Heymann JB, Cheng N, Newcomb WW et al (2003) Dynamics of herpes simplex virus capsid maturation visualized by time-lapse cryo-electron microscopy. Nat Struct Biol 10:334–341.

Classification techniques can be used to distinguish images in a mixed population in which the particles may vary according to con-formation or composition, and a three-dimensional density map calculated for each class. In this study, this approach was used to visualize maturation of the herpes simplex virus (HSV) capsid as a dynamic process involving major conformational changes.

Cheng Y, Grigorieff N, Penczek PA & Walz T (2015) A primer to single-particle cryo-electron microscopy. Cell 161:438–449 (doi: 10.1016/j.cell.2015.03.050).

Reviews single-particle analysis by cryo-EM in the post-Direct Detector era.

Page 7: Appendix II: Methods of Structure Determination

Methods of Structure Determination 6

Electron crystallography

Electron crystallographic specimens are two-dimen-sional arrays, typically only one molecule thick. Electron diffraction is used to collect Fourier amplitudes (structure factors) while phases are determined by calculating Fourier transforms of images. The data are built up in three dimen-sions by repeating these operations after tilting the speci-men incrementally through multiple angles. The resulting three-dimensional Fourier transform is then inverted to yield a three-dimensional density map in real space. The

resolution is necessarily anisotropic, being considerably lower in the dimension perpendicular to the plane of the array, because only a limited range of tilt angles can be cov-ered and data quality tends to deteriorate at higher tilts. This approach is particularly useful for membrane proteins that can be visualized in a quasi-native environment with a continuous lipid phase. In X-ray crystallographic studies, membrane proteins have to be solubilized by detergents, potentially perturbing their native structure.

Stahlberg H, Biyani N & Engel A (2015) 3D reconstruction of two-dimensional crystals. Arch Biochem Biophys 581:68–77 (doi: 10.1016/j.abb.2015.06.006).

Recent review surveying the state of the art in electron crystallography.

Jap BK, Zulauf M, Scheybani T et al (1992) 2D crystallization: from art to science. Ultramicroscopy 46:45–84.

Gives an overview of the principles of electron crystallography and goes on to discuss theoretical aspects of two-dimensional crystal-lization and methodologies used to obtain two-dimensional crystals.

Schmidt-Krey I (2007) Electron crystallography of membrane proteins: two-dimensional crystallization and screening by electron microscopy. Methods 41:417–426.

An updated account of two-dimensional crystallization procedures with an emphasis on membrane proteins.

Kühlbrandt W (2013) Introduction to electron crystallography. Methods Mol Biol 955:1–16 (doi: 10.1007/978-1-62703-176-9_1).

Historical account of the development of electron crystallography, describing key innovations that have advanced the technique to its current capability.

STEM and AFM microscopy

In scanning transmission electron microscopy (STEM), the electron beam is focused to a fine probe only a few Ång-strom units across, and this probe is raster-scanned across the specimen. At each point sampled, the detector(s) records the electrons scattered. If an annular detector is located downstream from the specimen so that non-scattered elec-trons are not counted, it gives a high contrast ‘dark-field’ image that is relatively straightforward to interpret. With thin specimens, the signal recorded is directly proportional

to the projected density at that point. As such, these images can be used to measure the masses of individual macromo-lecular particles, the mass-per-unit-length of filaments, and the mass per area of sheetlike specimens. The specimens should be well washed and not stained with heavy metals and care should be taken to minimize the electron dose so as to avoid radiation-induced mass loss. These data can be used to determine the stoichiometry of subunits in macro-molecular assemblies.

Thomas D, Schultz P, Steven AC & Wall JS (1994) Mass analysis of biological macromolecular complexes by STEM. Biol Cell 80:181–192.

Early account of the principles and practice of STEM-based mass measurements.

Goldsbury C, Baxa U, Simon MN et al (2011) Amyloid structure and assembly: insights from scanning transmission electron micros-copy. J Struct Biol 173:1–13 (doi: 10.1016/j.jsb.2010.09.018).

Reviews STEM imaging and mass-per-length measurements for various kinds of amyloid fibrils.

Müller SA, Müller DJ & Engel A (2011) Assessing the structure and function of single biomolecules with scanning transmission elec-tron and atomic force microscopes. Micron 42:186–195 (doi: 10.1016/j.micron.2010.10.002).

Compares STEM with another modality that also collects raster-scanned images but operates on a very different principle. The atomic force microscope (AFM) scans a very sharp tip over a sample, measuring at each point its height above a flat substrate. Thus, a topo-graphic image is recorded. High resolution (to within a few Ångstrom units) is achieved in the vertical dimension, but only lower resolution in-plane because of the finite size of the tip. The technique can be used for direct observation of function-related structural changes induced by changes in environmental parameters (temperature, pH, etc.) and repeated measurements may be made of the same specimen.

Page 8: Appendix II: Methods of Structure Determination

Methods of Structure Determination 7

Electron tomography

Electron tomography is the only imaging technique to give three-dimensional density maps of pleomorphic spec-imens—from macromolecular complexes to cells. Projec-tion images are recorded for a tilt series and used to cal-culate the tomogram. In cryo-electron tomography, the individual projections must be recorded at exceptionally low doses. Nevertheless computational procedures have been developed to align these very noisy images, allowing a three-dimensional map—the tomogram—to be calculated

without loss of information. As in electron crystallogra-phy, resolution is anisotropic on account of the incomplete angular range that can be covered (this is called the ‘missing wedge effect’ with reference to the Fourier transform. Reso-lution in a single cryo-tomogram is limited by noise 4–5 nm in-plane but can be extended by combining tomograms that image copies of the complex of interest in differing ori-entations, and by sub-tomogram averaging (see below).

Frank J (ed) (2006) Electron Tomography Methods for Three-dimensional Visualization of Structures in the Cell, 2nd ed. Springer Verlag.

Multi-author volume giving accounts of many aspects of electron tomography.

Lucic V, Förster F & Baumeister W (2005) Structural studies by electron tomography: from cells to molecules. Annu Rev Biochem 74:833–865.

Reviews principles and practice of cryo-electron tomography which combines the power of three-dimensional imaging with the best structural preservation physically possible.

Lucic V, Rigort A & Baumeister W (2013) Cryo-electron tomography: the challenge of doing structural biology in situ. J Cell Biol 202:407–419.

Describes and illustrates the workflow of cryo-electron tomography for structural studies in situ, that is, in unperturbed cellular environments.

Villa E, Schaffer M, Plitzko JM & Baumeister W (2013) Opening windows into the cell: focused-ion-beam milling for cryo-electron tomography. Curr Opin Struct Biol 23:771–777 (doi: 10.1016/j.sbi.2013.08.006).

In tomography, specimens are limited to a maximum thickness of ~0.5 μm. Accordingly, for most cellular applications, the specimens must be thinned. Two ways of doing this are (1) cryo-sectioning, technically a very demanding approach; and (2) using a focused ion beam to carve out a suitably thin slab of the frozen cellular specimen, as described in this article.

Hybrid approaches and computational refinement

Many situations arise in which a high-resolution crystal structure is available for a component or components of an assembly of interest, as is a lower resolution EM-derived structure of the complete assembly. In these circumstances, the crystal structure(s) may be fitted into the full assembly with much greater precision than the nominal resolution

of the latter map would suggest. Detailed density maps obtained in this way are called pseudo-atomic models. There are numerous variations on this theme. One, called ‘flexible fitting,’ involves adapting the high-resolution struc-tures used to describe the (altered) conformations assumed in the assembled state.

Trabuco LG, Schreiner E, Gumbart J et al (2011) Applications of the molecular dynamics flexible fitting method. J Struct Biol 173:420–427 (doi: 10.1016/j.jsb.2010.09.024).

This widely used approach uses molecular dynamics simulations to optimize the fit of high-resolution structures from X-ray crystal-lography with cryo-EM density maps, usually at significantly lower resolution. In this way, the information to be extracted from the EM data is leveraged.

Wriggers W (2010) Using Situs for the integration of multi-resolution structures. Biophys Rev 2:21–27.

This review describes a software package for the integration of biophysical data across the spatial resolution scales. Among other options, models can be produced with various flexible and rigid body docking strategies.

Yang Z, Lasker K, Schneidman-Duhovny D et al (2012) UCSF Chimera, MODELLER, and IMP: an integrated modeling system. J Struct Biol 179:269–278 (doi:10.1016/j.jsb.2011.09.006).

Describes a suite of programs for molecular modeling complemented with an interactive visualization system.

Page 9: Appendix II: Methods of Structure Determination

Methods of Structure Determination 8

A.3 NMR SPECTROSCOPY

Contributed by Angela M Gronenborn, University of Pittsburgh, USA and Tatyana Polenova, University of Delaware, USA

General Reading

Palmer AG III, Fairbrother WJ, Cavanagh J et al (2007) Protein NMR Spectroscopy: Principles and Practice, 2nd ed. Academic Press.

Rule GS & Hitchens TK (2006) Fundamentals of Protein NMR Spectroscopy (Focus on Structural Biology). Springer.

McDermott AE & Polenova T (eds) (2010) Solid-state NMR Studies of Biopolymers. Wiley.

Biomolecular solution NMR—structure

Over the last 40 years, NMR has matured into a robust approach that is routinely employed to study biological systems. Indeed, solution NMR can be employed to deter-mine structures, dynamics, thermodynamics, and kinetics of molecules at atomic resolution. NMR parameters, such as nuclear Overhauser effects, scalar couplings, residual dipolar couplings, T1 and T2 relaxation times, pseudocon-tact shifts, and paramagnetic relaxation enhancements all report on molecular features of the system under study. To measure these parameters, a plethora of experiments have been devised that manipulate nuclear spins in precise and predefined fashion. However, before translating spectro-scopic features into molecular characteristics, the nuclear magnetic resonances of a molecule have to be assigned to individual nuclei, an arduous task for large systems. This task is aided by several technological achievements: (i) higher and higher magnetic fields (1.2 GHz spectrometers will become available within the next few years); (ii) cryo-genic probes (cooling the transmitter/receiver coils reduces noise); (iii) long-term stability of spectrometer electronics;

and (iv) alternate data collection (non-uniform sampling) and processing methods (taking advantage of ever-increas-ing computer power). In addition to advances on the instru-mentation/spectroscopy side, hand-in-hand developments on the sample preparation side have occurred. Uniform and specific labeling schemes have been devised to overcome resonance overlap problems, limitations in magnetization transfer efficiency, and line-broadening, resulting from the slow tumbling of large molecules. These include (i) uniform and fractional labeling (with 13C, 15N, 2H), (ii) residue- and (iii) site-specific amino acid labeling (using stop codons and orthogonal tRNA/aminoacyl-tRNA synthetase pairs), (iv) group-specific labeling (methyl- or methene), (v) segmen-tal and subunit-specific labeling, and (vi) covalent attach-ment of spectroscopic probes (such as paramagnetic tags). All these tailored labeling approaches can be exploited via specific spectroscopic experiments, such as isotope edited/filtered NOESY, TROSY (deuteration), methyl TROSY, and 13C-direct detection experiments (specific 13C labeling schemes and deuteration).

Li F, Lee JH, Grishaev A et al (2015) High accuracy of Karplus equations for relating three-bond J couplings to protein backbone tor-sion angles. Chem Phys Chem 16:572–578.

Shen Y, Delaglio F, Cornilescu G & Bax A (2009) TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 44:213–223.

NMR experiments for measuring coupling constants and chemical shifts accurately and with high precision have been progressively improved over the past 30 years. These articles are recent examples of how these parameters can provide precise molecular details. Backbone chemical shifts are used by the program TALOS to empirically predict phi and psi torsion angles using a database approach.

Tugarinov V, Kanelis V & Kay LE (2006) Isotope labeling strategies for the study of high-molecular-weight proteins by solution NMR spectroscopy. Nat Protoc 1:749–754.

Much of the methodology that has been developed for the production of highly deuterated, uniformly 15N- and 13C-labeled pro-teins, in particular the incorporation of methyl groups into isoleucine, leucine, and valine residues. Various types of methyl-labeling schemes are described in this review, and applications to large protein systems are highlighted.

Gob C, Madl T, Simon B & Sattler M (2014) NMR approaches for structural analysis of multidomain proteins and complexes in solu-tion. Prog Nucl Magn Reson Spectrosc 80:26–63.

This review presents solution NMR approaches that are used in current protein structure determination and applications of these to challenging large protein complexes.

Compared to other structural methods, solution NMR is ideally suited to investigate weak and transient molecular complexes, per-mitting the elucidation of interactions with dissociation constants in the millimolar range.

Page 10: Appendix II: Methods of Structure Determination

Methods of Structure Determination 9

Clore GM & Iwahara J (2009) Theory, practice, and applications of paramagnetic relaxation enhancement for the characterization of transient low-population states of biological macromolecules and their complexes. Chem Rev 109:4108–4139.

This review describes how intermolecular paramagnetic relaxation enhancement (PRE) data provide valuable information on tran-sient intermediates for exchanging systems, even if the equilibrium population of the intermediate is very low.

Luna RE, Akabayov SR, Ziarek JJ & Wagner G (2014) Examining weak protein–protein interactions in start codon recognition via NMR spectroscopy. FEBS J 281:1965–1973.

The inherent sensitivity of NMR parameters to the chemical environment makes NMR an excellent tool to tackle weak protein–pro-tein interactions. This mini-review describes the use of chemical shift perturbations, PREs, and cross-saturation transfer experiments to elucidate protein–ligand interfaces.

Rama S, Lange OF, Rossi P et al (2010) NMR structure determination for larger proteins using backbone-only data. Science 327:1014–1018.

This paper describes how protein structures can be determined using backbone chemical shifts, amide residual dipolar couplings, and amide proton distances by the Rosetta program. This is part of continuing efforts to combine Rosetta-based modeling with sparse NMR-derived constraints.

Biomolecular solution NMR—dynamics

Biological function is not only intricately linked to structure but also to dynamics, since biological macromol-ecules are inherently flexible at physiological temperature. Thus, distinct conformations are sampled over wide spatial and temporal scales, sub-nanometers to micrometers and femtoseconds to hours, and this conformational flexibility generally impacts basic aspects of biologically important activities. NMR spectroscopy is uniquely suited to interro-gate protein dynamics, due to its atomic resolution, nonper-turbing nature, and solid theoretical foundations. Atomic fluctuations can be probed by NMR in many distinct time windows, using tailored spectroscopic approaches. On the slowest end of the timescale, in real-time NMR experi-ments, exchange processes can be directly studied by quan-tifying the time dependence of resonance intensities. Slow kinetic processes such as protein folding, peptide bond isomerization, domain movements or hydrogen-deuterium exchange fall into this 10–5000 ms time window. Tra-ditional lineshape analysis, based on elegant theoretical work by McConnell over half a century ago, is used in the 10–100 ms time window, and Carr-Purcell Meiboom-Gill

relaxation dispersion experiments are conducted in the 0.3–10 ms time window. Rotating frame relaxation disper-sion experiments probe chemical exchange in the interme-diate-fast regime where signals are only slightly broadened due to exchange, generally within the 20–100 μs time win-dow. On the fast end of the scale, spin relaxation param-eters, such as R1, R2 and heteronuclear Nuclear Overhauser effects (NOEs) provide information on dynamics at the pico- to nanosecond timescale. Unlike the other meth-ods listed above, relaxation data are typically described by S2, the square of the generalized order parameter, and are classified as motions within a single energy well. They can be interpreted phenomenologically, without specifying any detail regarding the underlying physical processes, by reconstructing the site-specific spectral density function (spectral density mapping). Alternatively, either a model-free analysis, which assumes independence between inter-nal motions and molecular tumbling can be performed, or interpretation based on a specific model of internal motions that includes details about the structure and interactions at the atomic level can be carried out.

Palmer AG (2004) NMR characterization of the dynamics of biomacromolecules. Chem Rev 104:3623–3640.

Palmer AG (2014) Chemical exchange in biomacromolecules: past, present, future. J Magn Reson 241:3–17.

Over the past two decades, numerous NMR approaches have been developed for characterizing protein dynamics. These reviews give a thorough account of techniques and applications to assess protein folding, molecular recognition, catalysis, and allostery.

Al-Hashimi HM (2013) NMR studies of nucleic acid dynamics. J Magn Reson 237:191–204.

Xue Y, Kellogg D, Kimsey IJ et al (2015) Characterizing RNA excited states using NMR relaxation dispersion. Methods Enzymol 558:39–73.

These perspective and review articles describe the development and application of NMR techniques for characterizing the dynamic properties of nucleic acids from the picosecond to second timescales at atomic resolution, including residual dipolar couplings, spin relaxation, and relaxation dispersion, combined with sample engineering and computational approaches. Applications to RNA-targeted drug discovery and RNA bioengineering are also covered.

Page 11: Appendix II: Methods of Structure Determination

Methods of Structure Determination 10

Gu Y, Li D & Brüschweiler R (2014) NMR order parameter determination from long MD trajectories for objective comparison with experiment. J Chem Theory Comput 10:2599–2607.

Efforts to merge the timescales of fast motions experimentally accessible by NMR spectroscopy with those obtained by long molecu-lar dynamics (MD) simulations are pursued by several groups. Quantitative comparisons between experiment and simulations per-mit rigorous validation of simulations. Model-free order parameters derived from NMR relaxation experiments are compared with those computed from MD trajectories by the isotropic reorientational eigenmode dynamics analysis method. Much stands to be learned about the general nature of motions from comparisons between simulations and NMR observables.

Biomolecular solid-state NMR—structure

Biomolecular solid-state NMR emerged as a struc-tural biology technique in the past decade. In the context of biological systems, the term ‘solid-state’ does not imply dry materials but refers to samples in which molecules are immobilized and overall isotropic molecular tumbling is absent. Such conditions are realized, for example, in hydrated crystalline or microcrystalline biomolecules, in hydrated macromolecular assemblies, in fibers formed by misfolded proteins, in membrane proteins embedded in a lipid bilayer, and in proteins attached to inorganic matrices. These kinds of samples are not amenable to solution NMR methods and, therefore, solid-state NMR is the method of choice. The experimental requirements (hardware and pulse sequences) in solid-state NMR are technically more demanding than in solution, since a combination of (i) magic angle spinning frequencies of 5–110 kHz or align-ment of all molecules with respect to the static magnetic field is necessary, and (ii) complex radiofrequency pulse

sequences, synchronized with the mechanical rotation, are required. Therefore, biological solid-state NMR is only now fully on par with solution NMR for analyzing biologi-cal systems. Conceptually, determination of biomolecular structures by solid-state NMR methods borrows directly from solution NMR, requiring resonance assignments, fol-lowed by the determination of numerous pairwise distance restraints. Chemical shift, dipolar, and quadrupolar tensors provide additional important structural parameters. Appli-cation of high and ultra-high magnetic fields (17.6–23.5 T) is critical when studying large biomolecules by solid-state NMR and this, in combination with fast and ultrafast magic angle spinning at frequencies up to 110 kHz, per-mits studies of biomolecules at the atomic level. In many instances, sample deuteration, extensive, sparse, and amino acid-specific labeling, following protocols similar to those used in solution NMR, are employed to alleviate spectral congestion.

Castellani F, van Rossum B, Diehl A et al (2002) Structure of a protein determined by solid-state magic-angle-spinning NMR spec-troscopy. Nature 420:98–102 (doi: 10.1038/nature01070). Epub 2002/11/08.

Ritter C, Maddelein ML, Siemer AB et al (2005) Correlation of structural elements and infectivity of the HET-s prion. Nature 435:844–848 (doi: 10.1038/nature03793). Epub 2005/06/10.

Yi M, Cross TA & Zhou HX (2009) Conformational heterogeneity of the M2 proton channel and a structural model for channel acti-vation. Proc Natl Acad Sci USA 106:13311–13316.

These three seminal papers report on de novo protein structure determination of, respectively, a microcrystalline protein, a filamen-tous prion protein assembly, and a membrane protein channel by solid-state NMR.

Demers JP, Habenstein B, Loquet A et al (2014) High-resolution structure of the Shigella type-III secretion needle by solid-state NMR and cryo-electron microscopy. Nat Commun 5:4976.

A recent application of the hybrid solid-state NMR/ROSETTA approach for determining the T3SS needle structure.

Franks WT, Wylie BJ, Schmidt HL et al (2008) Dipole tensor-based atomic-resolution structure determination of a nanocrystalline protein by solid-state NMR. Proc Natl Acad Sci USA 105:4621–4626.

This paper demonstrates the use of anisotropic tensorial (dipolar and chemical shift anisotropy—CSA) solid-state NMR restraints for protein structure determination.

Biomolecular solid-state NMR—dynamics

The determination of internal dynamics by solid-state NMR relies on the measurement of relaxation rates, ani-sotropic lineshapes (quadrupolar, dipolar, or chemical shift tensors), relaxation dispersion curves, and exchange rates. The timescales of dynamic processes accessible by solid-state NMR span from picoseconds to minutes. The absence of overall tumbling in the solid-state samples offers two advantages: (i) the micro- to milliseconds regime, which is

challenging to study by solution NMR, is readily accessible and (ii) the interpretation of relaxation rates and dynamic lineshapes is more straightforward than in solution. The main challenges for extracting heteronuclear (13C, 15N) relaxation parameters in solids are caused by spin-diffu-sion effects on T1 and coherent contributions to T2 and T1ρ that are caused by incomplete averaging of dipole–dipole interactions. The first problem can be alleviated by

Page 12: Appendix II: Methods of Structure Determination

Methods of Structure Determination 11

sample deuteration and fast magic angle spinning, while for T1ρ extraction, fast magic angle spinning (>45 kHz) and strong spin-lock fields are required. Extracting accurate T2 parameters, however, still remains difficult. The quad-rupolar deuterium nucleus is another excellent probe for studying dynamics, given the exquisite sensitivity of quad-rupolar lineshapes and relaxation parameters to nano- to

microsecond motions. Indeed, methods for measuring het-eronuclear anisotropic dipolar and CSA lineshapes are cur-rently being developed, and the recent results suggest that R-symmetry-based sequences are advantageous since they are compatible with broad ranges of magic angle spinning frequencies and Radio Frequency (RF) powers.

Jelinski LW, Sullivan CE, Batchelder LS & Torchia DA (1980) Deuterium nuclear magnetic resonance of specifically labeled native collagen. Investigation of protein molecular dynamics using the quadrupolar echo technique. Biophys J 32:515–529.

A seminal paper that demonstrates the use of deuterium solid-state NMR for probing protein dynamics.

Williams JC & McDermott AE (1995) Dynamics of the flexible loop of triosephosphate isomerase: the loop motion is not ligand gated. Biochemistry 34:8309–8319. Epub 1995/07/04.

This paper establishes a direct link between loop dynamics and catalysis in triosephosphate isomerase, using deuterium solid-state NMR.

Agarwal V, Xue Y, Reif B & Skrynnikov NR (2008) Protein side-chain dynamics as observed by solution- and solid-state NMR spec-troscopy: a similarity revealed. J Am Chem Soc 130:16611–16621 (doi: 10.1021/ja804275p). Epub 2008/12/04.

Chevelkov V, Xue Y, Linser R et al (2010) Comparison of solid-state dipolar couplings and solution relaxation data provides insight into protein backbone dynamics. J Am Chem Soc 132:5015–5017 (doi: 10.1021/ja100645k). Epub 2010/03/20.

Yang J, Tasayco ML & Polenova T (2009) Dynamics of reassembled thioredoxin studied by magic angle spinning NMR: snapshots from different time scales. J Am Chem Soc 131:13690–13702.

These papers address the question of protein dynamics in solution and the solid state.

For ubiquitin, Agarwal et al. showed that the motional modes are similar and that the dynamic behavior is not perturbed significantly in the microcrystalline state. Likewise, Yang et al. assessed the dynamics of reassembled thioredoxin, via relaxation and dipolar order parameters from nano- to microseconds and concluded that the motional behavior of thioredoxin in a polyethylene glycol (PEG)-precipitate (nanocrystalline) is similar to that in solution.

Lewandowski JR, Halse ME, Blackledge M & Emsley L (2015) Protein dynamics. Direct observation of hierarchical protein dynamics. Science 348:578–581.

This paper presents an innovative approach for studying hierarchical protein dynamics, using temperature-dependent relaxation measurements. By freezing proteins down to temperatures of –168°C, all motions are stopped. Slowly increasing the temperature permitted the consecutive observation of fast solvent motions, slow protein side-chain motions, and fast protein backbone motions. A tour de force in methodology and conceptual understanding of solid-state protein dynamics.

Page 13: Appendix II: Methods of Structure Determination

Methods of Structure Determination 12

A.4 COMPUTATIONAL APPROACHES

Contributed by Andrej Sali, University of California at San Francisco, USA

Protein fold classification

Proteins consist of one of or more ‘domains’ that span from fifty to a few hundred amino acid residues. Most domains assume a defined structure that can be classi-fied into one of the limited number of fold types. A fold is defined by the sequential order and spatial arrangement of the secondary structure segments, such as helices and strands. It has been estimated that there are several thou-sand fold types, of which approximately 1500 have already been observed in known atomic structures. Because the structure of a protein domain is determined by its sequence, a domain fold is often assigned by comparing its sequence to representative fold types. This comparison can be achieved by sequence alignment or by threading meth-ods (three-dimensional template matching). The sequence

alignment methods can align only a pair of sequences or multiple sequences (for example profile alignment and hid-den Markov models). The threading methods improve the sensitivity of fold assignment by assessing a coarse three-dimensional model of the sequence based on its align-ment to a fold, for each fold type, instead of assessing the sequence alignment directly; that is, the domain sequence is threaded through a library of three-dimensional folds. A large number of programs and web servers for fold assign-ment are available, currently allowing us to assign a fold to approximately one-half of sequence domains. A fold assignment has many applications, including comparative protein structure modeling and functional annotation.

Hendlich M, Lackner P, Weitckus S et al (1990) Identification of native protein folds amongst a large number of incorrect models: the calculation of low energy conformations from potentials of mean force. J Mol Biol 216:167–180.

This paper paved the way for threading methods by reintroducing a potential of mean of force for assessing a fit between a sequence and a structure.

Bowie JU, Luthy R & Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253:164.

The concept of matching a protein sequence to a protein structure (that is, threading or three-dimensional template matching) is introduced, by assessing the fit between a residue and its structural environment implied by an alignment of the sequence to a structure.

Orengo CA, Jones DT & Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372:631–634.

The first systematic effort to define domain folds, classify known folds, estimate the number of all folds, and discuss the implications for protein structure modeling and function assignment is described.

Murzin AG, Brenner SE, Hubbard T & Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540.

The reference SCOP classification of domain folds is described.

Orengo CA, Michie AD, Jones S et al (1997) CATH–a hierarchic classification of protein domain structures. Structure 5:1093–1108.

The reference CATH classification of domain folds is described.

Yang J, Yan R, Roy A et al (2015) The I-TASSER Suite: protein structure and function prediction. Nat Methods 12:7–8.

A particularly successful web server for fold assignment is described.

Comparative protein structure modeling

Comparative (homology) protein structure modeling predicts the atomic structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). Comparative modeling is possible because small changes in the pro-tein sequence usually result in small changes in its three-dimensional structure. The prediction process consists of fold assignment, target–template alignment, model build-ing, and model evaluation. One often-used model-build-ing method relies on modeling by satisfaction of spatial restraints obtained from the target template alignment, sta-tistical potentials, and a molecular mechanics force field.

Comparative models can be refined by applying side-chain modeling and loop modeling methods. The overall accu-racy of comparative models spans a wide range, from low-resolution models with only a correct fold to more accurate models comparable to medium-resolution crystallographic structures. The accuracy can be predicted by model assess-ment methods that rely on a number of different criteria, including the degree of target-template sequence similarity. It is currently possible to model with useful accuracy signif-icant parts of approximately one-half of all known protein sequences. As with fold assignment, a number of programs and web servers are available.

Page 14: Appendix II: Methods of Structure Determination

Methods of Structure Determination 13

Browne WJ, North AC, Phillips DC et al (1969) A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme. J Mol Biol 42:65–86.

The first description of comparative modeling.

Chothia C & Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823–826.

This seminal paper quantified the strong correlation between the sequence and structure conservation in related proteins, thus pro-viding the basis for comparative protein structure modeling.

Sali A & Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815.

A description of comparative modeling by satisfaction of spatial restraints extracted from the target–template alignment, statistical potentials, and a molecular mechanics force field.

Marti-Renom MA, Stuart AC, Fiser A et al (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29:291–325.

A comprehensive review of all aspects of comparative modeling.

Schwede T, Sali A, Honig B et al (2009) Outcome of a workshop on applications of protein models in biomedical research. Structure 17:151–159.

A contemporary review of comparative modeling, emphasizing various applications of comparative models.

Haas J, Roth S, Arnold K et al (2013) The Protein Model Portal – a comprehensive resource for protein structure and model informa-tion. Database bat031 (Oxford).

A review of a portal to comparative models for millions of sequences, produced by many programs and web servers. The portal also provides several model assessment tools and educational materials. The portal is a result of a community-wide coordination led by the Protein Data Bank.

Protein function prediction

All functions of a protein depend on interactions with its ligands, which in turn depend on the protein structure and dynamics. The biochemical function is defined by the identity of the ligands, resulting in the network, cellular, and phenotypic functions when considering the larger context. Various functional classifications have been proposed, such as the Enzyme Classification for enzymes and Gene Ontol-ogy functional annotation schema for proteins in general. Because the function of proteins tends to be conserved in evolution, it is often predicted by detecting homology between the functionally uncharacterized and character-ized proteins, using either sequence or structure compari-son. Biochemical function can also be predicted by virtual screening that computationally docks each potential small molecule ligand in a virtual library to a given binding site of

known or modeled structure. Function can also be inferred by functionally linking uncharacterized and characterized proteins using methods that don’t rely on protein sequence or structure similarity. For example, a recent such approach predicts higher level functions by sensitively detecting similarity between sets of known ligands of uncharacter-ized and characterized proteins. Particularly popular are methods for predicting functional consequences of point mutations, relying on sequence, structure, and/or other information about the protein of interest, although they are often limited by the lack of contextual information (such as knowledge about the protein’s binding partners). A large number of programs and web servers for many flavors of functional prediction are available.

Watson JD, Roman A Laskowski RA & Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284.

Methods and tools for prediction of function based on the sequence and structure similarities are reviewed.

Hermann JC, Marti-Arbona R, Fedorov AA et al (2007) Structure-based activity prediction for an enzyme of unknown function. Nature 448:775–781.

A first experimentally validated prediction of enzymatic function based on virtual screening is described.

Marcotte EM, Pellegrini M, Thompson MJ et al (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402:83–86.

The concept of the ‘functional link’ between two proteins is introduced, by describing a method for relating two functionally similar proteins without relying on sequence or structure similarity.

Page 15: Appendix II: Methods of Structure Determination

Methods of Structure Determination 14

Keiser MJ, Roth BL, Armbruster BN et al (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25:197–206.

A technique that quantitatively relates proteins based on the chemical similarity of their ligands is described.

Adzhubei IA, Schmidt S, Peshkin L et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249.

The popular PolyPhen web server for predicting functional consequences of point mutations is described.

Kelley LA, Mezulis S, Yates CM et al (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10:845–858.

A popular web server for prediction and analysis of protein structure, function, and mutations is described.

Radivojac P, Clark WT, Oron TR et al (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221–227.

The results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment are reported. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms.

Evolution and assembly

Recent analyses have provided a deeper understand-ing of the intimate relationship between protein structure, dynamics, function, and evolution at the tertiary and qua-ternary structure levels. It has been shown that local struc-tural constraints control both local residue flexibility and evolutionary rate, conformational fluctuations resemble evolutionary changes in structure, protein dynamics are evolutionarily conserved, protein complex assembly path-ways reflect evolutionary changes in quaternary structure, and that the deviations between structural and evolutionary

dynamics can provide functional insight. In addition, it has been demonstrated that bacterial gene organization into operons reflects a fundamental mechanism for spatiotem-poral regulation vital to effective co-translational protein complex assembly. A number of studies revealed that iden-tities of residues across protein interfaces are often covariant in evolution, allowing us to predict contacts across inter-faces in protein complexes. Practical applications of these developments include prediction of quaternary structure, (dis)assembly pathways, and design of novel nanomaterials.

Levy ED, Boeri Erba E, Robinson CV & Teichmann SA (2008) Assembly reflects evolution of protein complexes. Nature 453:1262–1265 (doi: 10.1038/nature06942). Epub 2008 Jun 18.

The mechanisms that drive formation of homomers at the level of evolution and assembly in the cell are examined, by inspecting over 5000 unique atomic structures. It is shown experimentally that the (dis)assembly pathway mimics the evolutionary pathway. The resulting model of self-assembly allows reliable prediction of evolution and assembly of a complex solely from its crystal structure.

Marsh JA, Hernández H, Hall Z et al (2013) Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153:461–470 (doi: 10.1016/j.cell.2013.02.044).

Evidence of evolutionary selection for ordered protein complex assembly is presented. The assembly pathways of several heteromeric complexes are characterized by experiment. It is also shown that they can be simply predicted from their three-dimensional struc-tures. By mapping gene fusion events identified from fully sequenced genomes onto protein complex assembly pathways, evolution-ary selection for conservation of assembly order is demonstrated.

Marsh JA & Teichmann SA (2013) Parallel dynamics and evolution: protein conformational fluctuations and assembly reflect evolu-tionary changes in sequence and structure. Bioessays 36:209–218.

The relationship between the structural and evolutionary dynamics at the quaternary structure level is reviewed.

King NP, Bale JB, Sheffler W et al (2014) Accurate design of co-assembling multi-component protein nanomaterials. Nature 510:103–108.

Highly ordered nanoscale architectures of biological systems inspired a computational method for designing protein nanomaterials in which multiple copies of two distinct subunits co-assemble into a specific architecture. The method is used to design five 24-subu-nit cage-like protein nanomaterials in two distinct symmetric architectures, subsequently validated by experiment.

Shieh YW, Minguez P, Bork P et al (2015) Operon structure and cotranslational subunit association direct protein assembly in bacte-ria. Science pii: aac8171. Epub ahead of print.

It is shown for Escherichia coli luciferase subunits LuxA and LuxB that complex formation is directly coupled to the translation process, and involves spatially confined, actively chaperoned co-translational subunit interactions. Bacterial gene organization into operons therefore reflects a fundamental mechanism for spatiotemporal regulation vital to effective co-translational protein complex assembly.

Page 16: Appendix II: Methods of Structure Determination

Methods of Structure Determination 15

Ovchinnikov S, Kamisetty H & Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein inter-faces using evolutionary information. Elife 3:e02030 (doi:10.7554/eLife.02030).

Evolutionary covariation of identities of residues that make contact across protein interfaces is used to predict contacts across inter-faces and assemble models of biological complexes.

Wilson C, Agafonov RV, Hoemberger M et al (2015) Using ancient protein kinases to unravel a modern cancer drug’s mechanism. Science 347:882–886 (doi:10.1126/science.aaa1823).

Evolution modifies a protein’s function by altering its energy landscape. The evolutionary pathway between two modern human oncogenes, Src and Abl, was recreated by reconstructing their common ancestors. The evolutionary reconstruction combined with X-ray structures of the common ancestor and pre-steady-state kinetics reveals a detailed atomistic mechanism for selectivity of the successful cancer drug Gleevec.

de Juan D, Pazos F & Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261 (doi:10.1038/nrg3414).

The main co‐evolution‐based computational approaches, their theoretical basis, potential applications, and foreseeable developments are comprehensively reviewed.

Protein ligand docking

Protein ligand docking aims to compute the position of a given ligand so that it interacts favorably with a given binding site on a protein of known structure. This search can potentially consider the flexibility in the protein-bind-ing site as well as the ligand, and is guided by a scoring function, ranging from simple measures of geometric com-plementarity to sophisticated energy functions. The result of this calculation is the geometry of the protein–ligand complex and a score quantifying the interaction. In virtual screening applications, docking is applied to rank a large number of small molecules in a virtual library in an attempt to discover a true ligand. Such screens are now widely used to discover new ligands for targets of known structure, including drug leads. One approach, fragment-based drug

discovery, starts by computationally mapping favorable interactions between tens to hundreds of small fragments, followed by connecting a few of the neighboring fragments into a drug lead for further optimization. Recently, inves-tigators have also turned to predicting new substrates for enzymes of unknown function, by docking not only sub-strates, but also intermediates, and products of enzymatic reactions. A comparison of the hit rates, the true posi-tives, and the false positives from the docking screens to those from empirical, high-throughput screens, reveals the strengths, weaknesses, and complementarities of both approaches. A typical virtual screening campaign is gener-ally followed by experimentally testing tens of top-scoring ligands, of which at least a few are often validated.

Kuntz ID, Blaney JM, Oatley SJ et al (1982) A geometric approach to macromolecule-ligand interactions. J Mol Biol 161:269–288.

The first algorithm for docking a small molecule into a binding site on a known protein structure is described. The algorithm ranks alternative conformations and configurations based on steric overlap.

Nicholls A (2008) What do we know and when do we know it? J Comput Aided Mol Des 22:239–255.

Two essential aspects of virtual screening are candidly considered: experimental design and performance metrics.

Jain AN & Nicholls A (2008) Recommendations for evaluation of computational methods. J Comput Aided Mol Des 22:133–139.

Recommendations are presented for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage in the field of virtual screening.

Halperin I, Ma B, Wolfson H & Nussinov R (2002) Principles of docking: an overview of search algorithms and a guide to scoring functions. Proteins 47:409–443 (doi:10.1002/prot.10115).

Various methods for small molecule docking against protein structures are comprehensively reviewed.

Kolb P, Ferreira RS, Irwin JJ & Shoichet BK (2009) Docking and chemoinformatic screens for new ligands and targets. Curr Opin Biotechnol 20:429–436.

Applications of protein ligand docking are reviewed, including prediction of functions of enzymes and mapping of polypharmacology.

Carr RAE, Congreve M, Murray CW & Rees DC (2005) Fragment-based lead discovery: leads by design. Drug Discov Today 10:987–992 (doi:10.1016/S1359 6446(05)03511-7).

An application of molecular docking for discovering drug leads is described.

Page 17: Appendix II: Methods of Structure Determination

Methods of Structure Determination 16

Huang N, Shoichet BK & Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801 (doi:10.1021/jm0608356).

An often-used DUD benchmark for testing docking methods is described.

Irwin JJ, Shoichet BK, Mysinger MM et al (2009) Automated docking screens: a feasibility study. J Med Chem 52:5712–5720.

The popular DOCKBLASTER web server for automated virtual screening is described.

Irwin JJ, Sterling T, Mysinger MM et al (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 52:1757–1768.

A key publically available set of annotated virtual ligand libraries is described, including a set of more than 10 million drug-like small molecules and a set of commercially purchasable compounds.

Molecular dynamics simulations

Richard Feynman remarked in 1963 “... if we were to name the most powerful assumption of all, which leads one on and on in an attempt to understand life, it is that all things are made of atoms and that everything that living things do can be understood in terms of the jigglings and wigglings of atoms”. Such trajectories of atoms and molecules are in fact provided by molecular dynamics simulations. The simulation typically involves numerically solving the New-ton’s equations of motion for the atoms with an empirical potential energy function (defined by a molecular mechan-ics force field). The simulated system can include proteins, nucleic acids, solvent molecules, solutes, and other types of molecules. A particular strength of the simulations is that all of its aspects are under a scientist’s control, resulting in the term ‘computational microscope.’ Various approxima-tions of atomistic molecular dynamics simulations have

been introduced to extend the length and timescales that can be simulated; for example, coarse-graining of molecu-lar representations and time steps; approximate and accel-erated numerical integration schemes; restrained sampling along specified degrees of freedom only; and addition of experimentally derived restraint terms to the potential energy function. Accurate atomistic simulations of a small protein can now be extended to the millisecond timescale on special purpose computing hardware. Alternatively, sys-tems consisting of billions of atoms, in which numerous cell functions reside, can be simulated on shorter timescales. Only a combination of molecular dynamics simulations with the interpretation of related experiments will provide a comprehensive and unified description of motions in pro-teins and their assemblies.

McCammon JA, Gelin BR & Karplus M (1977) Dynamics of folded proteins. Nature 267:585–590.

The first molecular dynamics simulation of a protein is described.

Karplus M & Lavery R (2014) Significance of molecular dynamics simulations for life sciences. Israel Journal of Chemistry 54:1042–1051.

This essay focuses on the utility of molecular dynamics simulations to biologists.

Perilla JR, Goh BC, Cassidy CK et al (2015) Molecular dynamics simulations of large macromolecular complexes. Curr Opin Struct Biol 31:64–74.

This review describes molecular dynamics simulations of very large systems.

Shaw DE, Maragakis P, Lindorff-Larsen K et al (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330:341–346.

Two fundamental processes in protein dynamics—protein folding and conformational change within the folded state—are examined by means of extremely long (1 ms) all-atom MD simulations conducted on a special-purpose machine.

Phillips JC, Braun R, Wang W et al (2005) Scalable molecular dynamics with NAMD. J Comp Chem 26:1781–1802.

This review describes how to use the popular NAMD program for molecular dynamics simulations. The NAMD program can be combined with a molecular visualization program VMD for easier setup of the simulations and analysis of the results.

Berneche S & Roux B (2001) Energetics of ion conduction through the K+ channel. Nature 414:73–77.

Molecular dynamics free energy simulations were performed to elucidate the mechanism of ion conduction at the atomic level, start-ing with the X-ray structure of the KcsA K+ channel.