Protein expression in E.coli: Lessons from structural · PDF fileProtein expression in E.coli:...

74
Protein expression in E.coli: Lessons from structural biology Problem 1: Structural integrity Problem 3: Size Problem 2: Space and time dependent interaction network Problem 4: Unstructured pieces Specials: Expression vectors, NMR use, Tags Problem 5: Codon usage

Transcript of Protein expression in E.coli: Lessons from structural · PDF fileProtein expression in E.coli:...

Protein expression in E.coli:Lessons from structural biology

Problem 1: Structural integrity

Problem 3: Size

Problem 2: Space and time dependent interaction network

Problem 4: Unstructured pieces

Specials: Expression vectors, NMR use, Tags

Problem 5: Codon usage

How complex it could be !!!Natural synthesis of actin

Polymerizes controlled

Binds number of proteinsnebulin/tropomyosintroponins/myosinthymosin/profilingelsolin/actinin ...

mRNA transport to specific location

Problem: folding pathway / controlled interactions

Specific eukaryotic chaperonepathway

Acetylated N-terminus

Where a lot of proteins interact !The muscle

• Strong interactions

• Strong forces

• Many interactions

• Highly regulated interactions

• Flexibility

3 MegaDalton proteinstructureto solve: Titanic enterprise

Problem: size / unstructured pieces

Typical medium sized protein

• Independent domains• Head to tail interaction• Posttranslational modification• Conformational change• Linear peptides• Protein/protein interaction• Protein/lipid interaction

insolubleinsoluble

Domain phasing

solublesoluble

insolubleinsoluble

N CDomainDomainboundariesboundariesare well are well defineddefined

Defining a domain withmultiple sequence alignment

Domain boundaries ofKH domain of FMRwas predicted.

Procaryotic membernusA stops her

Expressing the KH domain defined bymultiple sequence alignment

After prediction ofdomain boundariesproduced proteinwas very unstableand lowexpressed.

More variants

Redefining a domain after expression

Domain boundaries ofKH domain afterexpression andsolving structure

Missing helix of KH domain

Domain phasing :Too much input

Multiple sequence alignment includes prokaryotic sequence with different topology .

Domain phasing :KH domain +

Qua2 region :Additional 10 aa secondary structure element is important for specific binding to ss branchpoint RNA .

Domain phasing :The missing domain

Complex domaininterface

No independence

Domain interphasesvary and change inother structuralcontext

Attempts to express the PH domain (443-551) in E. coli were notsuccessful, and the protein product is insoluble.An extension to the N terminus with a small part of DbH domain(422-551) is soluble.

Unique fold has not too much meaning for expressionresult !

protein function is unique

protein context of isolated domain is unique

PH domains with unique foldPH domains with unique fold

SOS ph domain BTK ph domain

Get rid of flexible bits

FERM structure:N and C-termini come together

• Flexible central bit removed• N- and C- terminal pieces independent

expressed• Reassembled complex

X-raysamples

N-WASP EVH1 Domain Sequence

• Domain construct insoluble• Constructs of domain fused to a minimal binding peptide

via a (Gly-Ser-Gly-Ser-Gly) linkers• One version yielded highly soluble protein

Physical fusion of a Physical fusion of a ligandligand

(Gly-Ser-Gly-Ser-Gly)

• Multiple sequence alignmentpredicts potential RNA binding ;classifies RRM fold

• First design gives strange data

• “RRM” site blocked by nonalignedsequence

• Protein interaction module instead RRM ?

Conceptual mistakeConceptual mistake

ends FW

Same fold - different meaning

insolubleinsoluble

Domain independent ?

solublesoluble

ADomain ADomain Ais integratedis integratedin structurein structure

ADomain ADomain Ais partiallyis partiallyindependentindependent

Rule of thumb:Rule of thumb:

N-end ruleN-end rule

start and endstart and endhydrophilichydrophilicsecondary structuresecondary structurenext neighbour domainnext neighbour domain

full length limited proteolysisfull length limited proteolysis

Are there simple rules ?

A

Message : Wrong borders - no folding - no expression system

Tag or no tag ?Tag or no tag ?Problems of dimeric Tag(GST)

N

N

domain or C-terminal piecesmissing due todegradation,translational stops etc.

Domain phasing :Multiple constructs

solublesoluble

N C

Staggered oligosThe winningteam

Typical weekTypical week

X-talsX-tals

•• multiple multiple PCRs PCRs and vectorsand vectors •• scale up scale up

• N15 probe

•• solubility screen solubility screen

???

339-416 345-404345-416 339-404

NMR screening of domain boundariesNMR screening of domain boundaries

N and C-termini are now in good shape

SMN tudor domain

X-ray construct

•• Creation of mutant libraryCreation of mutant libraryrandom mutations / DNA shufflingrandom mutations / DNA shufflingdeletion seriesdeletion series

like nuclease treated full length DNAlike nuclease treated full length DNA

•• Reporter proteinReporter proteinfusion to C-terminus of target proteinfusion to C-terminus of target proteiniF iF reporter folds it will give signalreporter folds it will give signalN- terminus should be foldedN- terminus should be folded

Is your protein folded ?

CATGFPComplementationMarker genes / proteomics

Combinatorial libraries together with reporter

PCR via multiplephased primers

Enzymatic orphysical breaks

Nucleasetruncation

Error prone PCR

Mutate andDNA shuffling

Reporter fusions

Coexpression : Inclusion bodies

Myc doesn’t form homodimers likemax and expresses in inclusionbodies in E.coli.

Very hydrophobicinterface

Coexpression : No inclusion bodies

Myc forms stable heterodimer withmax and expresses soluble in thecomplex.

Very hydrophobicinterface

Transcriptional Transcriptional coactivatorcoactivatorfails to interact withfails to interact withtranscription factor in tubetranscription factor in tube

Coexpression : Protein association only in vivo

E.colior cotranslation

Dcoh

+

HNF1 Complex

Heterodimers of2 different complexes

One partner is tagged ,the other not.

+ + +

Max/Myc HNF1/DCoH

Coexpression : Results

Coexpression Coexpression

Two-plasmid

KanRori1+

AmpRori2

gene I gene II

Coexpression Coexpression Dicistronic

XbaI SpeISD gene I

SD gene IIXbaI

T7

gene I gene II

+ Rnase deficient strainBL21Star

Advantage in cotranslational folding:Assembly of 7 nucleoporins into 0.5 MDComplex [Lutzmann et al]

Coexpression varies

Dicistronic variations

Staggered distances of 2nd translation initiation site

+/ - + + +

Where to go for coexpression ?• Lac repressor : pREP4

• T7 Lysozyme : pLysS

• rare tRNAs : CodonPlus strain

• Protein modification :

• ASF/SF2 phosphorylation by SRPK1

• Farnesyl group by transferase

• Heterodimers max/myc HNF1/DCoH

• Chaperones groEL/ES increases solubility of csk

• TEV protease in MBP Tev fusions increases solubility ofpassenger protein Message : Wrong partners - no folding - no expression system

Modifications ? Modifications ? •Arg Lys Methylation

SR domains, Histones

•Ser Thr Tyr Phosphorylation•Lipids like myristoyl groups•Glycosylations

Operon of Campylobacter in BL21

Modi … OPERON

n

Rare codon effectsTranscriptionfactor expressed in E.colicreates another subband

MW - + - + - + IPTG

ADD.PEPTIDE

FULL LENGTH

AGGAGG CGACGG ptRNA

Codon usage in E.coli

• Arg R• Ile I• Leu L• Pro• Gly

Rare codon effectsFrame shift causes longer product

ADD.PEPTIDE

FULL LENGTH

CGG CAG… … TAACG GCA… … AAX … C GGC … … AXX …

Rare codon effects:Misincorporation of Lysine

FMR KH domainshows strange 28 Ddifferent species in mass

ADD.PEak

FULL LENGTH

Reason:Rare arginine codonsAGAs or AGGs areloaded with lysine tRNAs;MW difference of 28 D

Rare codon effects:Proteolysis

• 2 central consecutive rare codons cause very low expression levelof Tev protease

• Causes processive degradation of nascent polypeptide at slowed translation point

……AGGAGG……. 49 50

……AGGAGG

Ribosome falls off

Proteolysis coupled to translational pausing

Codon usage problems

Signs:

• Mass difference AGA loads AAA [K] / CGG loads CAG [Q]• Consecutive rare codon spots• Protein ladder after purification with N-tag or• No expression• Signs of toxicity

Solutions:

• Codon plus / Rosetta strains• Patch rare codons: Partial gene synthesis• Scattered rare codons: Gene synthesis

Leaky promoterToxicity of membrane associated domain:

• No expression in upscaling of BL21(DE3)• Cells die on plate

Solution:

BL21(DE3) pLysS or E product switch off T7RNApol

1% glucose in medium cotrols via catboliterepression

Use of more stable promoters :Arabinose

Reason:

• Media with minor amounts oflactose

• T7 RNA polymerase is IPTGcontrolled and will beleaky

• Taget gene is transcribed andtranslated already in theupgrowth

Expression story 1 :Extracellular Ig domain with disulfide bridge

Screening of Tags:

• GST• His• trx• dsbA

Screening of proteasecleavage sites :

• Thrombin• Tev protease

Result:dsbA Tev with low yieldbut folded

Leaderlessversion of dsbA

Strains withmutations in redoxsystem[Origami…]

Example 2 :Purification of protein/Example 2 :Purification of protein/peptide peptide ligand ligand complex complex

Zrepeatactinin

TevH6

ZrepeatactininCo-lysis:

Cells with H6GST taggedZrepeat-petide mixed withcells expressing unfusedactinin domain

Double Tag to get full length

Natural unstructured titin peptide [PEVK-element]

Recent publication onpurification of arecombinantPEVK fragment

Double Tag vector

Why it’s important :

• Design might be wrong andpeptide is unfolded

• Design is OK and peptide isunstructured

Ligand interaction

Unstructured = Native ?Unstructured = Native ?

Linker influences protease cleavageLinker influences protease cleavageGB1 carrier with domain x is not cleaved by thrombin

+ thrombin

+ thrombin

Addition of a 5 aminoacid linker = cleaved by thrombin

His-tagHis-tag

proteaseproteasecleavage sitecleavage siteTev/Prec/Ek/Fxa/thrombinTev/Prec/Ek/Fxa/thrombin

2nd affinity Tag:2nd affinity Tag:GST / MBPGST / MBP

NcoINcoI

The M-series vectorsThe M-series vectors

Carrier protein + affinity tag : Carrier protein + affinity tag : dsb dsb / / trx/ nusAtrx/ nusA... ...

C-HisC-His

Non homogenousdifferent linkersdifferent control genesnot easy to subclone

The VectorsThe Vectorsgene xgene x

gene xgene x

gene xgene x

gene xgene x

gene xgene x

The new vectorsThe new vectors• Independent modules• Compatible overlaps• Multiple shufflings

• Vector backbone

• Carrier protein

• His affinity Tag

• Compatible genetic fusion site • Linkers/ specific protease cleavage

• Control gene

TevH6

Cloned cassettes:

• Carriers

• Linkers

• Protease sites

• HisTag

• Vector backbone

Promoter PassengerCarrier

Vector backbone

origin resistance

The Vectors:The Vectors:Typical structureTypical structure

Affinity_high stability_high production_cleavable

The VectorsThe VectorsRibosome

please

Translation cassette

Transcription cassette

rbs sense Lin sense

Lin rev

Screen: Tryptophan hydroxylase

Screen: Actinin head

Carrierproteins + CH domains

Protease

Parallel preparation:

Expression of a Ni-column sensitive protein:Central spliceosomal protein p14

Screening of carriers and vectors :

• pGEX Prec.• His pET

• trx pET• Z-tag pET

Screening of protease sites :

• PreScission• Thrombin• Tev protease

Result :

• Protein is highly expressed and soluble in all pET vectors, but precipitates after Tev cleavage

• pGex expression after PreScission cut: soluble, no precipitation, very low yield

• Construction of mixed casettes in pET• GST on GT column with PreScission cut plus in pET backbone gives NMR probe

Literature:Spadaccini et al. 2006RNA

Short linear peptidesSF1 1-25

U2AF65 85-112 H6Trx PreS

H6 GST TEV

GST TEV H6

SF3b 317-357

H6 GST TEV

H6Trx PreS

Complex structure solved, stable peptide

Peptide degraded from C-term (MS)

Cleavable, no degradation (MS)

Cleavable, butdegraded from C-term (MS)

Cleavable, no degradation

H6-GST-TEV Trx-H6-PreS

MBP creates artefacts :MBP creates artefacts :Soluble inclusion bodiesSoluble inclusion bodies

MBPMBP

• Misfolded peptide forms aggregate• MBP forms soluble shield around

Literature:Nomine et al. 2001ProteinExp.Purification

+ Tev

• No cleavage or• Target protein precipitates

DirectDirectfusionfusion

MBPMBP

Creation of defined N-terminal residue fromCreation of defined N-terminal residue fromfusion proteinfusion protein

Tev cleave N-Cys and most of the other aminoacids at position P-1

Literature:Kapust et al. 2002Biochem.Biophys.Res.Commun.

List of vectors on the web

1

pETM13N_HIS_NUSA_GSTEV_GFP

7559bp

Kan®

ori

T7

YFP

lacI

His6

Tev

nusA-Carrier

XhoI (158)NotIBamHIAcc65I (204)

NcoI (932)

XbaIFile with Map/Features/Sequence

Features = Data file • MW,pI• Gels • Purification data

Carrier His_GSTev/ N_His

His_PreScission/ N_His

His_Enterokinase/ N_His

His_Thrombin/ N_His

N_His __ dir C-His A/B

Trx */* */* */ */ * *

GST */* */* */* */* * *

MBP */* */* * * * *

DsbA */ */ *

NusA */* */* * *

DsbC * *

Ztag1

Ztag2 */* */* * *

GB1 */* */* * *

DsbAin * * *

DsbCin * * *

EFtag * *

Mistic

ZZ tag * *

Multiple cloning site

T7/lacO promoter --> XbaI TACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGT ATGCTGAGTGATATCCCCTTAACACTCGCCTATTGTTAAGGGGAGATCTTTATTAAAACA rbs His-tag TTAACTTTAAGAAGGAGATATACCATGAAACATCACCATCACCATCACCCCATGAAAATC AATTGAAATTCTTCCTCTATATGGTACTTTGTAGTGGTAGTGGTAGTGGGGTACTTTTAG METLysHisHisHisHisHisHisProMetLysIle GAAGAAGGTAAACTG....1068 bp....CAGACTAATTCGGGATCTGGCAGTGGTTCT CTTCTTCCATTTGAC..MBP-carrier .GTCTGATTAAGCCCTAGACCGTCACCAAGA AspAspGlyLysLeu.... 356aa.....GlnThrAsnSerGlySerGlySerGlySer Tev-site NcoI GAGAATCTTTATTTTCAG GGCGCCATGGGCAAAGTGAGC ..705bp ..TACAAGTAA CTCTTAGAAATAAAAGTC CCGCGGTACCCGTTTCACTCG .. GFP .. ATGTTCATT GluAsnLeuTyrPheGln|GlyAlaMetGlyLysValSer ..235aa ..TyrLys*** Acc65I BamHI EcoRU SacI SalI HindIII NotI XhoI GGTACCGGATCCGAATTCGAGCTCCGTCGACAAGCTTGCGGCCGCACTCGAGCACCACCA CCATGGCCTAGGCTTAAGCTCGAGGCAGCTGTTCGAACGCCGGCGTGAGCTCGTGGTGGT

Mysterious mistic• Dual topology integral membrane protein from B.subtilis

– 110–amino acid (13 kD) monomer– highly hydrophilic– associates tightly with the membrane in E. coli

• Membrane protein expression vector• Flexible fusion site

General:

• Carrier protein directly fused• Increases solubility

GB1/ZZ/Trx/MBP• Increases stability• Optimizes crystallization conditions• Dictates crystal contacts

Myosin• Helps solving structure

Myosin

Fusion of difficult target proteins

protein xprotein x

Where

SET :Trx fusionto produce shortpeptides

Folded/Structured/Design?

Ligand interaction

NMR can help

In-cell NMR

• In-cell NMR of FlgM shows structure

• Invitro unstructured

• Plus BSA 400mg/ml structured

Molecular crowding helps folding

Measurements under in-vivo condtions

• Phosphorylation

• Ligand binding (drugs)

• Conformational changes

Why E.coli ?7 or 70 years

Protein N-terminus is structurally sensitive; Rare codons;Phosphorylated in vivo

GST dimer ?C-His

• Optimization of expression • Rare codons in centre mutated• Different Tags for different purpose

• Double and triple domains C_his• Quantitative phosphorylation with Baculo expressed Pkc theta• SAXS experiments• NMR/Xray

Lessons from protein structures:Expression problems in E.coli

Multiple constructs/coexpression of modifiers

Cut in pieces and reassemble

Coexpression strategies or/and reassemble

Unstructured doesn’t mean unfolded or non native

Problem 1: Structural integrity

Problem 3: Size

Problem 2: Space and time dependent interaction network

Problem 4: Unstructured pieces

Tom Ceska

Dietrich Suck

Ralf FicnerAnnalisa Pastore

Siegfried LabeitThanks to

Uwe Sauer

Michael SattlerMaria Macias

Ari GeerlofHans van derZandtDavid Drechsel

Gilles TraveSebastianCharbonnierArnt RaaeGiovanna Musco

Target specific affinity columns

• Library of FNIII domain scaffold on surface loop• Phage display or yeast two hybrid with target• Matrix with monobody

• Endoribonuclease ACA specific• Change gene to non ACA codon usage by gene synthesis• Induce nuclease in production phase• No production of E.coli proteins

Single protein production systems

Future of E.coli :

Labeling techniques• Specific and unnatural aminoacids• NMR• D2O

Posttranslational modificationsIntein ligationLibrary and screen methodsIncell NMRSPP : Single protein production systemsMonobodies for target specific affinity columns

Physical fusion of a modified Physical fusion of a modified ligandligand

Domain or protein interaction but affinity too low;useful with modified peptides or regulatorypeptides

Domain boundaries:

C

cDNA

1 104 122 229 351243

E E E

pH1 pH2dep

C-1/K4/S16/S40Q43/S57/G63/T74/K75/E87/N97

A110/S113/S117

123/128/S132/136/N144/S167/173/175/182/Q191/S199/T204

N260/N273/299/304/S309/A330/341/A346

S231/G243

Cysteine mutations

Orientation of domains viarelative distance measurementswith spin labels and calculate full length structure

• NMR• EPR