Symmetry in Protein Design - Hampton Research · solution to the protein crystallization problem...

Post on 19-Mar-2020

1 views 0 download

Transcript of Symmetry in Protein Design - Hampton Research · solution to the protein crystallization problem...

Symmetry Ideas in Protein Assembly

natural designed

1000 Å 100 Å

accidental

RAMC 2013

Giant Biological Protein Assemblies –

Bacterial Microcompartments

Virus-sized protein capsids inside many

bacteria, encapsulating series of

enzymes and functioning as simple

metabolic organelles

Structural studies illuminate key

assembly and molecular transport

mechanisms

Kerfeld, et al. (2005) Science 309, 936-8; Tanaka, et al. (2008) Science 319, 1083-6; Tanaka, et al. (2010)

Science 327, 81-4; Yeates, Thompson, & Bobik (2011) Curr. Opin. Struct. Biol. 2, 223.

One of the most fascinating (but

mostly overlooked) puzzles in

structural biology:

Proteins show a striking

preference to self-assemble in

certain particular symmetries.

Of the 65 possible 3D space

groups, only a handful are

commonly obtained; one is

dominant. Top 1: ~33%

Bottom 55: 20%

‘Accidental’ assemblies:

The space group preference problem

Top 1: ~33%

Bottom 55: 20%

The space group preference

problem

The differences in probability span more than 2 orders of

magnitude, yet there are no obvious energetic explanations.

Yeates and Kent (2012). Annu. Rev. Biophys.

The space group preference

problem

• Is there a statistical rather than energetic explanation?

Are some space groups simply easier to achieve?

• How many different ‘ways’ can a protein molecule form

crystals in a given space group?

• Given the continuous range of possible molecular

orientations and positions, the number of distinct

crystalline arrangements is evidently infinite.

The space group preference

problem

• There are different kinds of

infinities.

• Suppose each possible

configuration (i.e. orientations

and positions) of a set of

molecules could be described

as a point within some (high

dimensional) space.

• What would the ‘solution space’

look like for each space group

within this high dimensional

space?

6N dimensional space to

describe all possible

orientations and

positions of N molecules

Some points in the space

must correspond to space

group P1.

Some points will represent

space group P2, etc.

Most points will not represent

crystalline arrangements

The space group preference

problem

A hypothetical 1-D

solution space

A hypothetical 2-D

solution space

An infinite number of solutions

falling on a 1-D curve

Differing from each other by a

change (forwards or back) in a

single direction

An infinite number of solutions

falling on a 2-D subspace

Differing from each other by a

change (forwards or back) in a

combination of 2 directions

Protein Crystal Space Group Preferences: Degrees of Freedom Theory for Characterizing the

Dimensionality of Different Space Groups

Wukovitz and Yeates (1995). Why protein crystals favour some space-groups

over others. Nat Struct Biol. 2, 1062-7.

Protein Crystal Space Group Preferences: Degrees of Freedom Theory for Characterizing the

Dimensionality of Different Space Groups

Examining a case (p2mm) where the answer is (sort of) obvious

Protein Crystal Space Group Preferences: Degrees of Freedom Theory for Characterizing the

Dimensionality of Different Space Groups

Examining a case (p2mm) where the answer is (sort of) obvious

Free to change orientation of molecule.

Protein Crystal Space Group Preferences: Degrees of Freedom Theory for Characterizing the

Dimensionality of Different Space Groups

Examining a case (p2mm) where the answer is (sort of) obvious

Free to change orientation of molecule.

But the rest of the crystal is fully defined

thereafter.

Therefore, D=1 for p2mm

Rot & Trans (S)

rotation

trans x

trans y

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x

trans y

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x

trans y

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis yes

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis yes

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis yes

gamma --

Contacts (req.)

Degrees of freedom for p2mm

Counting free variables

and constraints

Degrees of freedom for p2mm

D = S + L – C

= 3 + 2 – 4 = 1

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis yes

gamma --

Contacts (req.) 4

Counting free variables

and constraints

Degrees of freedom for p1

Rot & Trans (S)

rotation

trans x

trans y

Unit Cell (L)

a axis

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x

trans y

Unit Cell (L)

a axis

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x

trans y

Unit Cell (L)

a axis

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis yes

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis yes

gamma

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis yes

gamma yes

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis yes

gamma yes

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis yes

gamma yes

Contacts (req.)

Degrees of freedom for p1

Rot & Trans (S)

rotation yes

trans x no

trans y no

Unit Cell (L)

a axis yes

b axis yes

gamma yes

Contacts (req.) 2

D = S + L – C

= 1 + 3 – 2 = 2

Rot & Trans (S)

rotation yes

trans x yes

trans y yes

Unit Cell (L)

a axis yes

b axis yes

gamma yes

Contacts (req.) 3

D = S + L – C

= 3 + 3 – 3 = 3

Degrees of freedom for p2

The Minimum Contact Number, C A mathematical description of the constraints

implied by molecular connectivity

C is a property of the

mathematical group, not the

molecule Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

The Minimum Contact Number, C A mathematical description of the constraints

implied by molecular connectivity

C is a property of the

mathematical group, not the

molecule Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

The Minimum Contact Number, C A mathematical description of the constraints

implied by molecular connectivity

C is a property of the

mathematical group, not the

molecule Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

The Minimum Contact Number, C A mathematical description of the constraints

implied by molecular connectivity

C is a property of the

mathematical group, not the

molecule Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

The Minimum Contact Number, C A mathematical description of the constraints

implied by molecular connectivity

C is a property of the

mathematical group, not the

molecule Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

The Minimum Contact Number, C A mathematical description of the constraints

implied by molecular connectivity

C is a property of the

mathematical group, not the

molecule Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

The minimum contact number, C,

dictates (in part) the number of

degrees of freedom, D, available for

constructing any given symmetrical

arrangement.

D can be calculated for the possible

65 space group symmetries.

Symmetries with more degrees of

freedom are expected to occur

more frequently for statistical

(rather than energetic) reasons.

Wukovitz & Yeates, Nat. Struct. Biol. 2, 1062 (1995)

D = S + L – C

The Minimum Contact Number, C Implications for the space group preference

problem

Agreement between the dimensionality for

forming different space groups and their

observed frequencies

• Only one space group, P212121, which dominates in macromolecular

crystals, has D=7 !!

• A dimensionality analysis explains most of the observed phenomenon.

D=7

D=6

D=5

D=4

• The 65 possible space group symmetries fall into 4 categories of

increasing likelihood: D= 4, 5, 6, 7 (factor of ~8 for each increment in D)

Various ideas emerging from the

minimum contact number work

1. A surprise in the achiral space

groups!! – revolution or curiosity?

2. Ability to form intermolecular

contacts consistent with

crystallographic symmetry limits

crystallization. Leads to new

approaches for crystallizing proteins.

3. An astonishing range of

architectures and symmetries can be

generated using only two contact

types (e.g. symmetric points of

protein-protein interaction). Leads to

a general approach for designing

protein assemblies.

So what?

1. A Surprising Discovery

• 65 ‘biological’ space groups

D=7 (1 space group, P212121)

D=6 (13 space groups)

D=5 (42 space groups)

D=4 (9 space groups)

International

Tables for

Crystallography

International

Tables for

Crystallography

• 65 ‘biological’ space groups

D=7 (1 space group, P212121)

D=6 (13 space groups)

D=5 (42 space groups)

D=4 (9 space groups)

• 165 ‘non-biological’ space groups

D=8 !!! (1 space group, P1(bar))

D=7 (2 space groups: P21/c, C2/c)

D < 6, 5, 4,… (162 space groups)

International

Tables for

Crystallography

1. A Surprising Discovery

International

Tables for

Crystallography

• ‘Non-biological’ space groups require racemic protein mixtures

(i.e. mirror image proteins synthesized from D-amino acids).

• This opens up a powerful approach for protein crystallization.

Why P1(bar) is the super-winner

• Inversion centers give a

well-defined origin, so

S=6 (3 rotations and 3

translations)

• Unit cell is triclinic, so 6

variable choices, L=6

• 4 unique contacts

required for connectivity;

one to connect the L and

D molecules, and three

for translational

connectivity along three

directions, C=4

D = S + L – C = 6 + 6 - 4 = 8

Mirror image proteins provide a potentially powerful

solution to the protein crystallization problem

Predictions from theory

• Proteins will crystallize much

more easily if they can be

prepared as a racemic mixture;

this requires chemical synthesis

of the mirror image protein (i.e.

from D-amino acids)

• P1(bar) will dominate for

racemic crystallization of

proteins; this highly specific

prediction provides a powerful

test of the theoretical ideas Yeates and Kent (2012). Annu. Rev. Biophys.

‘macromolecule’ space group

rubredoxin P1 (bar)

leu-enkephalin P1 (bar)

d(CGCGCG) P1 (bar)

trichogin A P1 (bar)

a-1 (designed peptide) P1 (bar)

monellin (sweet protein) P1 (bar)*

Racemic ‘macromolecule’ crystal data available

by the late ’90’s

Racemic Protein

Crystals based on

new synthetic

methods:

Native chemical

ligation - Stephen

Kent and

Collaborators

Dawson PE, Kent SB.

Annu. Rev. Biochem.

2000;69:923-60.

• Banigan JR, Mandal K, Sawaya MR,

Thammavongsa V, Hendrickx AP, et al.2010.

Protein Sci 19: 1840-49.

• Pentelute BL, Mandal K, Gates ZP, Sawaya

MR, Yeates TO, Kent SB. Chem Commun

(Camb.) 46 :8174-6.

• Sawaya, et al. Single Wavelength Phasing

Strategy for Quasi-Racemic Protein Crystal

Diffraction Data. Acta Cryst D.

• Yeates and Kent, (2011). Ann. Rev. Biophys.

Current statistics for racemic protein crystals

• Amazingly good agreement

with predictions

• A few cases where the

racemate crystallized easily

and the single enantiomer

did not, but data still

somewhat anecdotal

• No obvious trend in

resolution improvement

• Methods have not been

tested systematically in any

situation where the single

enantiomer gave only limited

resolution

N.B. Essentially no tendency observed so far

to resolve into chiral space groups

• Yeates and Kent, (2011). Ann. Rev. Biophys.

Phasing Considerations for Racemic

Crystallography

Centrosymmetric Space Groups

Acentric vs. Centric Phases

N.B. The space groups

predicted to be preferred by

racemic protein crystals

(P1(bar), P21/c, and C2/c) are all

centrosymmetric.

Implications of Racemic Protein

Crystallography

• Crystallization of racemic proteins

might be 3 or 4 times easier than

crystallizing ordinary (chiral)

proteins

• A potentially powerful solution to

the major bottleneck in

macromolecular crystallography,

especially if size and cost barriers

to synthesis can be lowered

• Potentially interesting phasing

strategies: e.g., single wavelength,

in-house, iodo-tyrosine quasi-

racemic.

2. Making Proteins (and Nucleic Acids) More

Amenable to Crystallization: ‘Synthetic

Symmetrization’

Symmetry can be built into an

otherwise asymmetric molecule

(e.g. engineered cysteines).

Two key features:

• Symmetry improves

crystallization chances

somewhat (~50%) based on

database analysis

• A given macromolecule can be

dimerized in multiple different

ways, giving rise to multiple

entirely distinct chances for

crystallization

Successful results using

disulfide-based synthetic

symmetrization in a

model protein system (T4

lysozyme)

Banatao, et al. (2006). PNAS 103, 16230-5.

6 new crystal forms of lysozyme

Successful results using disulfide-based synthetic

symmetrization to crystallize a new protein

Forse, et al. (2011) Prot. Sci. 20, 168-78

Cel A endoglucanase from T.

maritima

Successful results

using metal-based

synthetic

symmetrization in a

model protein system

(T4 lysozyme)

Laganowski, Zhao, Soriaga, et al.

(2011) Prot. Sci. 21, 1876-90.

Successful results

using metal-based

synthetic

symmetrization in a

model protein system

(MBP)

Laganowski, Zhao, Soriaga, et al.

(2011) Prot. Sci. 21, 1876-90.

GFP 1-9

Target N

C

Permissive

Loop S11 S10

Target N

C

GFP

Target N

C

A Facile Strategy for Applying Synthetic

Symmetrization:

Factoring out the protein engineering component

Part 1:

method of

attachment to a

carrier protein,

split GFP

Terminal fusion, or

loop insertion,

which gives two

chain crossings!

Hau, et al. (in press)

D102

D173 K26

D190

Q157 D117

Part 2. Synthetic symmetrization of the carrier protein, GFP

10-11 Hairpin

Purification summary:

- Ni2+ IMAC in non-reducing conditions - disulfide formation with CuSO4 at pH 9.0 - ion-exchange to separate species

Surface exposed charged residues selected.

- opposite face to the 10-11 hairpin

- ends of the β-strands or in loops to allow

the disulfide to form

- mutations made in a ‘Cys-less’ GFP backbone

K26C

D190C

Monomer

Dimer

Sites for single

cysteine insertion (into

cysteine free

background)

Crystallizability of Synthetically Dimerized GFPs

Mutant Cloned Dimer

s

Crystal

s

Xtal

Conditions#

Unique

Structures

Space Groups Resolution

K26C ✔ ✔ ✔ ~5 2 P 3221, P 212121 1.9Å - 3.2Å

D102C ✔ ✔ ✔ 30+ 4 P 1, P 212121 3.1Å -3.6Å

D117C ✔ ✔ ✔ 10+ 5 P 63, P 6422, P 3121,

P 4122, I 4122

1.7Å – 2.9Å

D173C ✗ ✗ ✗ ✗ ✗ ✗

Q157C ✔ ✔ ✔ 2 ✗ ✗

D190C ✔ ✔ ✔ ~10 2 P 212121, P 61 2.7Å – 3.1Å

~60+ 13

All dimers xtal screens in 5 commercial screens each (PACT, JCSG+, SaltRX, Wizard, CS 1+2), each mutant tends to

crystalize in

different conditions

# Non-duplicate conditions with crystals from Mosquito trays

D102C readily forms plate crystals in conditions containing PEGs, solved structures are in P1 with unique arrangements of

dimers in the

asymmetric unit

Diverse Arrangements of Dimerized GFPs

K26C

1.9Å

P 21 21 21

K26C

3.2Å

P 32 2 1

D102C 3.1-3.6Å

P 1

D102C 3.1Å

P 21 21 21

D117C 1.7-2.9Å

All crystals

D190C 2.65-3.1Å

both crystals

Six distinct arrangements of the GFP dimer have been demonstrated

(refined) so far, and several more are in process. These will constitute a

suite of independent partners for crystallization.

Combining just two contact types can give rise to

complex architectures

4-fold contact

2-fold contact

3. Extension of the Symmetric Contact Idea to

a Strategy for Designing Self-Assembling

Protein Materials

3. Extension of the Symmetric Contact Idea to

a Strategy for Designing Self-Assembling

Protein Materials

• Natural oligomeric (e.g.

dimeric and trimeric)

proteins can serve as the

building blocks

• Fusing two such proteins

together (e.g. by genetic

engineering) provides the

two interactions needed for a

rich variety of designs

Natural protein dimers

and trimers – building

blocks for designed self-

assembly

A General Method

for Designing

Self-Assembling

Protein Materials

Padilla, Colovos,

& Yeates, PNAS,

98, 2217 (2001)

• Fusion of two simple

oligomers (e.g. dimer

+ trimer)

• Outcome dictated by

geometry of axes

• Use of a continuous

a-helix to dictate

geometry

Design Rules (dimers and trimers) Symmetry Construction Geometry of symmetry elements

Cages and shells

T Dimer-Trimer 54.7°, Intersecting

O Dimer-Trimer 35.3°, Intersecting

I Dimer-Trimer 20.9°, Intersecting

Double-layer rings

Dn Dimer-Dimer 180°/n, Intersecting

Two-dimensional layers

p6 Dimer-Trimer 0°, Non-intersecting

p321 Dimer-Trimer 90°, Non-intersecting

p3 Trimer-Trimer 0°, Non-intersecting

Three-dimensional crystals

I213 Dimer-Trimer 54.7°, Non-intersecting

P4132 or P4332 Dimer-Trimer 35.3°, Non-intersecting

P23 Trimer-Trimer 70.5°, Non-intersecting

Helical filaments

Helical Dimer-Dimer any angle, Non-intersecting

Tubes of indefinite length

Tubular Dimer-Dimer-Dimer N, N, N, each intersecting the

cylinder axis perpendicularly

cages

2-D

layers

3-D

crystals

filaments

and rods

Tetrahedral, T 2-fold & 3-fold

Intersecting

at 54.7°

A first, partially successful, experiment

2

3

Discrete assemblies formed, but too polymorphic

to characterize in detail (e.g. by crystallization).

Padilla, Colovos, & Yeates, PNAS, 98, 2217 (2001)

A model of the intended

assembly

• 12 subunits

• Tetrahedral symmetry

• 160 Å diameter

Trimer: bromoperoxidase Dimer: influenza matrix protein M1 9-residue linker: KALEAQKQK Geometric design requirement: symmetry axes intersecting at 54.7º.

Designed fusion: based on database search for dimer-trimer pair (ending in

helixes) that could be fused to give the required target geometry

Lys118 was mutated to alanine to avoid clash with linker

Gln24 was mutated to valine to attract the leucine on the linker

Closer inspection (11 years later) suggests that two amino acid

changes could promote the desired geometry

A first atomic structure of a designed

protein cage

• 12 subunits

• Pseudo-tetrahedral symmetry

• Partially flattened (crystal packing

and weak helical linker) Lai, Y.-T., Cascio, D. and Yeates, T.O. (2012). Science 336, 1129.

3 Å resolution

Three independent cages in two crystal forms

70 Å diameter (hypothetical) inner sphere

Lai, Y.-T., Cascio, D. and Yeates, T.O. (2012). Science 336, 1129.

Crystal structure: matches design to within 1 Å !

24 subunit protein cage (500 kDa ): a natural trimer with a dimeric

interface designed computationally according to geometric requirements

for intersection of symmetry axes.

Designed

model

King, N.P. et al. (2012). Science 336, 1171-4.

Summary

• Symmetry ideas are useful for understanding what

limits the formation of protein assemblies

• With regard to space groups, one appreciates the

origins of key patterns, and also predicts that

racemic crystallography could provide an important

long-term strategy

• Symmetry, when engineered in variational forms,

could be a powerful strategy for increasing

favorable crystallization space

• Explicitly engineering various symmetric

architectures becomes possible, with exciting

biomedical applications

Yeates Lab

Mike Thompson

Julien Jorda

Nicole Wheatley

Yen-Ting Lai

Dan McNamara

Sunny Chun

Danny Gidaniyan

David Leibly

Allan Pang

Yuxi Liu

Inna Pashkov

Neil King (former)

Duilio Cascio

Michael Sawaya

Collaborators

Stephen Kent (U. Chicago)

Thomas Bobik (Iowa State)

David Baker (Univ. Wash)

Tom Terwilliger (Los Alamos)

Geoff Waldo (Los Alamos)

Funding: NIH, NSF, DOE