Aims of this lecture -...

16
Maya Topf Interpretation of 3D EM maps, fitting of atomic structures Image processing for cryo-electron microscopy 1-11 September 2015 Lecture 14 Aims of this lecture • To understand when 3D EM density fitting is needed. • To describe the different types of density fitting methods (rigid, flexible, assembly). • To be aware of different software tools used for visualization and density fitting. • To be aware of the errors involved in density fitting and understand how to critically assess the resulting models. Structural biology at different levels of resolution EMDB Statistics

Transcript of Aims of this lecture -...

Page 1: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

May

a To

pfIn

terp

reta

tion

of 3

D E

M m

aps,

fit

ting

of a

tom

ic s

truc

ture

s

Imag

e pr

oces

sing

fo

r cry

o-el

ectro

n m

icro

scop

y �

1-11

Sep

tem

ber 2

015

Lect

ure

14

Aims of this lecture

• To understand when 3D EM density fitting is needed.

• To describe the different types of density fitting methods (rigid, flexible, assembly).

• To be aware of different software tools used for visualization and density fitting.

• To be aware of the errors involved in density fitting and understand how to critically assess the resulting models.

���������������� ���������� � ������������������ ���������������������������� �������

��������� ����� ��� ���� ����

Structural biology at different levels of resolution

������� ���� ���� !��� !����

"#��$���$��� ���%�$�&�'#%��� � %$�

��"�� � �( �������� � %$�)�� ������������ � %$�)���*#���� ����������+��

,�������� � %$���������+��"#��$� � ���%�$��

'� ��� �

-������� ������ � %$��-��+�����&��

EMDB Statistics

Page 2: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

3D-EM constrained modelling of macromolecular assemblies

Real-space refinementNo

Fitting all known folds

No

3D-EM map

Component Sequence

SegmentationComponent

structure known?

What resolution?

No

Template detected?

Fold assignment

from sequence

De novo chain tracing

Homology modelling

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMA

‘template-free’ modelling

Yes

No

YesÅ

< 20Å< 4.5Å ~4.5-10Å

Yes

SSE assignment

Component structure

3D-EM map

- Academic programs: • Chimera (UCSF) • Vision (Scripps) • VMD (U Illinois Urbana-Champaign) • VolRover (UT Austin) • Gorgon (NCMI & Washington Uni) • Veda (IBS, Grenoble)�

• Coot (Univ of York) • O (Uppsala Univ)�

�- Commercial programs: • PyMOL (Schrodinger) • Amira (TGS, San Diego, CA) • Iris Explorer (NAG, Dowber Grove, IL)

Real-space refinementNo

Fitting all known folds

No

3D-EM map

Component Sequence

SegmentationComponent

structure known?

What resolution?

No

Template detected?

Fold assignment

from sequence

De novo chain tracing

Homology modelling

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMA

‘template-free’ modelling

Yes

No

YesÅ

< 20Å< 4.5Å ~4.5-10Å

Yes

SSE assignment

Component structure

3D-EM map

Segmentation

Page 3: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

- Identify boundaries between 3D regions that represent structural components in the context of structural, biochemical and bioinformatic knowledge.

- The identified boundaries can be useful in detecting the positions of known component structures in the map.

- The size of the segmented components depends on the resolution.

20 Å 4.5 Å10 Å

protein secondary structure elements

shapedomains, RNA double-helix

backbone

Segmentation tools

- Manual

Mask Box around marker/atoms Hand erasing

Segmentation tools

- Knowledge-based segmentation:

• Antibody labeling; gold clusters; subunit/domain deletion -> difference mapping (Chimera).

• Recognition of structural components - density fitting.

- Automated: based on density alone (with or without the use of symmetry information)

SeggeR: Pintilie et al, J Struct Biol 2011

Segmentation tools Segmentation methods

- Automated segmentation based on density alone: • Density thresholding: protein and RNA (Spahn et al. 2000).

• Watershed methods (Volkmann 2002), SeggeR (Pintilie et al. 2009, 2011).

• Level set (Baker et al 2006)

• Eigenvalue methods (Frangakis & Hegerl 2002)

• Fast marching method (Frangakis & Hegerl 2002, Bajaj 2003).

• Inference methods to find conserved regions (Saha et al. 2010, Xu et al. 2011)

Mature bacteriophage P22 at 9.5 Å resolution Baker et al. J Struct Biol 2006

Assembly Component

Page 4: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Real-space refinementNo

Fitting all known folds

No

3D-EM map

Component Sequence

SegmentationComponent

structure known?

What resolution?

No

Template detected?

Fold assignment

from sequence

De novo chain tracing

Homology modelling

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMA

‘template-free’ modelling

Yes

No

YesÅ

< 20Å< 4.5Å ~4.5-10Å

Yes

SSE assignment

Component structureFitting all

known foldswn

What resolution?resolution?

De novo chain tracing

No

< 20Å< 4.5Å ~4.5-10ÅSSE SE

assignmentnnm

Fitting of a domain from1.20.1060.10 (mainly alpha) into 1.10.530.10 (mainly-alpha).

SPI-EM: Velazquez-Muriel et al. JMB 2005

< ~15-20 Å: Fit domains from a non-redundant protein domain database (e.g. CATH); • Calculate a Z-score.

12 Å

Detection of bacteriophage Lambda

FREDS: Khayat et al. JSB 2010

Fold recognition from density

BALBES–MOLREP pipeline: Brown et al. Acta Crystallogr D 2015

7 Å

Fold recognition from density

Baker et al. Structure 2007

< ~4.5-10 Å: Secondary structure element detection (SSEHunter)

Programs: Helixhunter, SSEhunter, Ematch, Pathwalker, Coot

�Jiang et al. Nature 2008, 4.5 Å �

~3.5-4.5 Å: De novo Cα tracing

Real-space refinementNo

Fitting all known folds

No

3D-EM map

Component Sequence

SegmentationComponent

structure known?

What resolution?

Template detected?

Fold assignment

from sequence

De novo chain tracing

Homology modelling

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMA

‘template-free’ modelling

Yes

No

YesÅ

< 20Å< 4.5Å ~4.5-10Å

Yes

SSE assignment

Component structure

NoNo

Template detected?

Fold assignment

from sequenceHomology

gmodellingg‘template-free’

gmodellingg

sYes

Page 5: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Anabaena 7120

Anacystis nidulans

Condrus crispus

Desulfovibrio vulgaris

Evolution (rules)

Threading Homology Modelling

Evolutionary couplings

GFCHIKAYTRLIMVG�

Folding (physics)

Ab initio (de novo) prediction

Zhang, Curr Opin Struct Biol 2008; Marks et al. Nat Biotechnol. 2012

Fold recognition from sequence

Template-free modellingReal-space refinementNo

Fitting all known folds

No

3D-EM map

Component Sequence

SegmentationComponent

structure known?

What resolution?

Template detected?

Fold assignment

from sequence

De novo chain tracing

Homology modelling

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMA

‘template-free’ modelling

Yes

No

YesÅ

< 20Å< 4.5Å ~4.5-10Å

Yes

SSE assignment

Component structure

Component structure known?

Yes

N

Component pstructure

NoNo

Template detected?

Fold assignment

from sequenceHomology

gmodellingg‘template-free’

gmodellingg

sYes

Real-space refinementNo

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMAYes

De novo

Flexible Fitting

Rigid-Body Fitting / Assembly Fitting

Conformational Changes / Shape

Domain boundaries

αα------helix

ββ------sheets

Side chains

25 Å 10 Å 4.5 Å

Density fitting

Villa & Lasker, Curr Opin Struct Biol, 2014.

Fitting an atomic structure within the envelope (an isocontour) of the density using visualisation programs.�

Pros:�-No current computers can beat the human brain in certain pattern recognition tasks.�-Immediate feedback and intelligent choices by the user.�-Often good for the initial placement of the component in the map.�

Cons:�-High level of subjectivity may lead to error, especially if the map does not have sufficient distinctive features for an unambiguous placement of the component.�-Depends on contour level.�-Conformational rearrangements cannot be modelled (misfits and steric clashes).

Manual fitting

Page 6: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Automated fitting

All automated fitting methods require:

1. a way of representing both the structure and the density map (representation).

2. a way of measuring the goodness-of-fit (scoring).

3. a method of finding the best fit (an optimisation algorithm).

Optimisation based on goodness-of-fit

Density mapComponent atomic structure

Component representation and placement

Villa & Lasker, Curr Opin Struct Biol, 2014.

�������������� ��Cross Correlation Coefficient (CCC)

CCC

������������������������ ρcalc

Blur atomic structure (m)

ρobs

������������������������

Compare with Experimental map

rigid fittingX-ray structure

Representation and scoring

Density-based scoring functions

Wriggers & Chacon, Structure 2001; Vasishtan & Topf, J Struct Biol 2011, Farabella et al. J Appl Cryst. 2015

• Cross-correlation coefficient (CCC)

• Filters: Laplacian-filtered CCC (LAP)

• Local correlation (SCCC)

• Mutual information-based score (MI)

i �

�� ii �

p(x), p(y)�

I(X;Y) = p(x, y)logp(x, y)

p(x)p(y)y∈Y

∑x∈X

∑ iii �

p(x,y)�(

Local scoring

Roseman, Acta Crystallogr D 2000; Pandurangan et al., J Struct Biol 2014

SSCCC =

• Segment-based cross-correlation coefficient (SCCC)

Target density Y

Probe density X

Page 7: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

���

Normal Vector score (NV)

���������� ���

Surface-based scoring functions

Ceulemans & Russell, J Mol Biol 2004; Vasishtan & Topf, J Struct Biol 2011

• Normal vector score (NV)

• Chamfer distance (CD)

TEMPy

Optimisation: rigid fitting

Rotate and translate the component to search through all possible configurations in the density map so as to maximise the fit between the component and the map.

6D search

.............. .............. .............. .............. .............. .............. .............. .............. .............. .............. .............. �

.............. .............. .............. .............. .............. .............. .............. .............. .............. .............. .............. �

Exhaustive search

- Local fitting - Search exhaustively a given sub-region in the map.

.............................

........................................................................................................................................................................................................................................................... .........

..........

.............................................................................................

Pros: Get the global solution in respect to a given scoring function. Cons: The search in real space is too large for most scores (very expensive). ��

- Acceleration: FFT (translational moves); Spherical harmonics (rotational moves).

j

Pros: Fast; easy to implement different scoring functions. Cons: The model can be “trapped” in local minima.

6D rotational & translational search

Stochastic and gradient methods

Page 8: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Sco

re

Parameters

Optimisation follows steepest gradient

Gradient methods

Optimisation follows gradient method, with random ‘jumps’ to avoid local minima

Monte Carlo methods

Sco

re

Parameters

Problems with density fitting

��

i. Limitations of resolution

2 Å�����10 Å�����20 Å

Correct fit Flipped 180

Solutions: - Improve scoring of goodness-of-fit.

�- Coarse-graining (change representation) �- Fit/model assessment. �

Problems: - At low resolution: many local optima with

similar numerical values. �

- Local resolution, noise, scaling, filtering, masking.

�- Blurring of the atomic structure.

Page 9: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Protein structure prediction

NMR spectroscopy

X-ray crystallography

20 Å

ii. Multi-component fitting

Problem: Components may migrate toward the centre of the map.

Sequential fitting

����������������� ���������������������

correct fit

Solution: - Simultaneous fitting (assembly/multiple fitting) �- Multiple scores (additional constraints).

Assembly fitting

Birmanns & Wriggers. J Struct Biol 2007; Lasker et al, JMB 2009; Zhang et al. Bioinformatics 2010 Programs: Chimera/MultiFit, γ-TEMPy

• Divide the density into groups having approximately the same number of points closest to them (Vector Quantization or K-means clustering). Each group is represented by its centroid point.

• Match between centroids and components.

• Score of components simultaneously using CCF and additional scores/restraints (eg. geometric complementarity, clash scores). �

•Score of components simultaneously using CCC and spatial restraints from additional experiments

Integrative modelling

Programs: IMP, Rosetta, HADDOCK

Page 10: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

26S proteasome

Forster et al. Biochem Biophys Res Commun, 2009.

Integrative modelling

The anaphase-promoting complex (APC/C)

Schreiber et al. Nature, 2011.

iii. Conformational variability

Solution: change the conformation of the atomic model during the fitting process — flexible fitting.

Problem: Conformations observed by 3D EM often deviate from the conformations of the atomic models we fit. �- Dynamics. - Crystal packing effects. - Errors in structure prediction.

Real-space refinementNo

Fitting all known folds

No

3D-EM map

Component Sequence

SegmentationComponent

structure known?

What resolution?

No

Template detected?

Fold assignment

from sequence

De novo chain tracing

Homology modelling

Rigid fitting

fit different from map?

Multiple conformations

ENM / NMA

‘template-free’ modelling

Yes

No

YesÅ

< 20Å< 4.5Å ~4.5-10Å

Yes

SSE assignment

Component structure

Real-space refinement

g Multiple conformations

ENM / NMAYes

- Identify one of the most accurate models from a decoy set based the quality of fit

Topf et al, J Struct Biol 2005

Fitting multiple conformations

Baker et al. PloS Comput Bio 2006

Programs: MODELLER, Rosetta

Page 11: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

- Structure library of ~4600 fits - Score by SCCC

Fitting multiple conformations

Lukoyanova et al. PLoS Biol. 2015

Pleurotolysin (PlyA-B) Pore-forming protein

NMA-based refinement�

- Elastic Network Model (ENM)

- Normal Mode Analysis (NMA): A collection of harmonic oscillators; those with low frequency and large amplitude motions often correlate with experimentally observed conformational changes.

- Cons: A ligand can stretch the protein in ways that involve higher frequency modes that are not taken into account; subjective selection of modes; All density must be accounted for.

Tama et al, 2004 Programs: NMFF, iMODFIT, NORMA�

- Geometry-based conformational sampling (Direx)

- The fit of the probe structure is optimised simultaneously with the stereo-chemical properties by the minimisation of a scoring function, such as:

- Optimisation is performed on “rigid bodies” by energy minimisation and molecular dynamics.�

Chen & Chapman, JSB 2003;

Topf et al., Structure, 2008;

Trabuco et al. Structure 2008;�

E = w1 ∗ECC (P) + w2 ∗ESC (P) +w3 ∗ENB (P)

Real-space refinement

Pros: Flexible (finer fragmentation); Different optimisation methods can be applied; easy to add more restraints.�

Cons: Only local search; slow; danger of over-fitting, subjective rigid bodies/constraints.

Programs: MDFF, Flex-EM, Coot�

Refinement at intermediate resolution

1VCB, 10 Å resolution

Cα RMSD from native: 7.5 Å Cα RMSD from native: 2.1 Å

Before refinement After refinement

native best predicted fit

Page 12: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Before refinement After refinement - clustered After refinement - non-clustered

Overfitting

���� �!�"

��������������

Flexible fitting of an actin subunit�at 15 Å resolution

Coarse graining

Pandurangan & Topf, J Struct Biol 2012

Initial Un-clustered Clustered

Cα RMSD:

http://ribfind.ismb.lon.ac.uk/

1dpe, 5 Å

Initial Final un-clustered Final clustered

RMSD from target: �� ���. - /��.- �/�. #$�%#����� ��� � ��0� �/�CCC:

Final two-stage refinement

'���1����� ���.

���1��������� /���

Hierarchical refinement

Bottom ring could be fitted using rigid fitting alone (PDB: 1oel).

Top ring needed refinement using hierarchical flexible fitting

Hierarchical refinement

TRs1 conformation

Clare et al., Cell 2012

Page 13: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Capturing the machine motions GroEL-ATP7Apo

Clare et al., Cell 2012 &�

Fit / Model Validation

EM Validation Task Force: “We recommend coordinated development of model assessment criteria and corresponding software, with special emphasis on criteria reflecting the suitability of models for specific end-user applications.”

Henderson et al. Structure 2012.

Why?

3128 maps in EMDB. ~687 fits in PDB.

Approaches to model validation

– Geometry: deviation from ideal bonds and angles, planes, Ramachandran plots, atom clashes

– Cross-validation of overfittng (Dimaio et al. 2013, Falkner & Schröder 2013)

– Consensus methods (Ahmed & Tama 2013, Pandurangan et al. 2014)

– Multiple scoring of ensemble models (Farabella et al 2015)

– Partial scoring (Farabella et al 2015)

Page 14: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

TEMPy: http://tempy.ismb.lon.ac.uk/

�#%���%���������#��������#�����$����'��'����($����'�

Partial (local) scoring

Segment-based cross-correlation coefficient (SCCC)

Single fit assessment: local scoring

NN cryo-EM density for Kinesin-3 motor domain

Heat map showing the quality of the local fit for specific elements of the motor domain in different nucleotide states

Atherton et al. eLife 2014;3:e03680

6.3 Å resolution EMD-2765 PDB-4uxo

Local assessment

�#%���%���������#��������#�����$����'��'����($����'�

Multiple scores and fits

Page 15: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

ADP-EM (Garzón et al. Bioinformatics 2007)

Ensemble of fits: Global assessment

)���*� ����'�������

20 Å resolutions, simulated map (PDB: 1tyq)

RMSD clustering of the top 90 fits resulted in all top-10 scoring fits in one cluster

)���*� ����'������� ��$������+��%#�������

20 Å resolutions, simulated map (PDB: 1tyq)

Ensemble of fits: Global assessment

Hierarchically Cα-RMSD clustering of the top 20 fits

Cross Correlation

Mutual Information Normal Vectors

GroEL [PDB: 1oel]

GroEL [EMD: 1080] (11.5 Å)

�low high

Re-ranking based on consensus scoring

Ensemble of fits: Local assessment

��

��

��

��� ��������

Ensemble of fits: Local assessment

,� -��..��!/0.������%#��"�/��������#�#��������1 2

� ����������

��

� ��

��

Page 16: Aims of this lecture - embo2015.cryst.bbk.ac.ukembo2015.cryst.bbk.ac.uk/embo2015/course/Lectures/... · 3D-EM constrained modelling of macromolecular assemblies Real-space No refinement

Top 20 fits

EMD-2795 PDB-4v3m

Ensemble of fits

Lukoyanova et al. PLoS Biol. 2015

Pleurotolysin (PlyA-B) Pore-forming protein

http://challenges.emdatabank.org/?q=2015_model_challenge

2015 EMDB Model Challenge

• Establish a benchmark set of 3DEM maps in the 3.0-4.5 Å resolution range, where significant growth in the number of maps is anticipated over the next few years and where a number of technical challenges exist to map interpretation and fitting �

• Encourage developers of modelling software packages and biological end users to analyze these maps and present modelling results with the best practice �

• Evolve criteria for evaluation and validation of 3DEM map-derived models �

• Compare and contrast the various modelling and analysis approaches (in a positive spirit!)

Thank you!