5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta...

55
5. Ab initio modeling

Transcript of 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta...

Page 1: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

5. Ab initio modeling

Page 2: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

And today…

• Introduction to ab initio modeling: the basic principles

• Rosetta ab initio modeling protocol• Grid-based large-scale modeling & FOLDIT• I-Tasser• CASP

Page 3: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Types of structure prediction

• Comparative modeling– Structural template detected from sequence

similarity• Fold recognition

– Structural template detected from fitness to fold (threading)

• Ab initio modeling (Free Modeling)– No obvious structural template: model whole

folding process….RosettaI-Tasser

Similarity to know

n structure

Page 4: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

1. Select fragments consistent with local sequence preferences

2. Assemble fragments into models with native-like global properties

3. Identify the best model from the population of decoys

Basic Ab Initio Rosetta protocol

Figures adapted from Charlie Strauss;Protein structure prediction using ROSETTARohl et al (2004) Methods in Enzymology, 383:66

Page 5: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Fragment libraries

• 25-200 fragments for each trimer and nonamers

• Recent improvement was obtained by using fragments of additional sizes:• For a helix: length 5-19 & 3-12• For b sheet: length 4-10 & 3-7

• Selected from PDB < 2.5Å resolution & < 50% seq id

• Ranked by sequence similarity and similarity of predicted and known secondary structure

• Discard improbable conformations

1. Select fragments: local sampling C

Page 6: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

2. Create compact decoys using fragment assembly

Advantages of approach• Fragment library

approximates Gibbs sampling

• Fragments allow an accurate, but implicit, representation of the potential energy surface for local interactions.

• Computer power can be invested in optimization of global features (e.g. compactness)

loca

l

global

Page 7: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Structure Representation:• Equilibrium bonds and

angles (Engh & Huber 1991)

• Centroid: average location of center of mass of side-chain(Centroid | aa, f,)

• No modeling of side chains• Fast

Low-resolution step

Page 8: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Sss + SHS - sheet and helix-sheet geometries

compactness of structure• Scb

• Svdw no clashes

• Srgyr globular structure

Low-resolution parameters

• Senv - burial preference (number of neighbors)

• Spair - preferred amino acid pairs (e.g. cys-cys, glu-arg, etc)

small vs. largeradius of gyration (Rgyr)

Page 9: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

MC search with simulated annealing – start with extended conformation

1. 28K-36K random 9-mer fragment insertions (from top25 fragments)

XK Only vdw score (until all f,y have changed)

2K Add strand pairing score (0.3 weight)

20K Compactness: Increase pairing score + add Cb and Rgyr : ±local strand pairing weight

6K/4K Full strand pairing; Full centroid function

2. 3-mer fragment insertions

8K gunn-type (select among least perturbing fragments)

2. Create compact decoys using fragment assembly

Page 10: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Further local refinement strategiesLocal moves: how to perturb the backbone with

minimal effect on remote regions1. random torsion angle

perturbation (helix - 0o,strand <2o, rest < 3o)

– Small move - random fi,yi pair

– Shear move - Dyi-1, -Dfi

compensatory movements, move peptide plane

2. selection of globally non-perturbing fragments– Chuck move – fragments

that minimize atom msd– Gunn move – fragments

that minimize , Dy Df

Page 11: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Further local refinement strategiesLocal moves: how to perturb the backbone with

minimal effect on remote regions3. adjacent - f y variation to offset global effect of fragment insertion

– Wobble move – fast analytical gradient calculation

– Crank shaft - combination of several wobble moves

Smaller moves are accepted with higher frequency

wobble crank shaft

Before insertionafter insertion insertNo changes Final conformation

Page 12: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Fragment exchange

Local moves

Initial global changes

Further refinement

Movie by Jens Meiler

Global sampling

Page 13: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

3. Identify best structure

• Generate decoy population (103-105)

• Filter to correct sampling biases

• Cluster analysis identifies broadest minimum

• Fullatom refinement will identify lowest energy minimum

Page 14: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

High-resolution step: parameters

• VdW – 12-6 Lennard Jones– linear repulsion– Cutoff within 5.0-5.5Å

• Solvation (Lazaridis-Karplus)

• Hydrogen bonds

rij

polarpolar

NH

O Cd

• Weak pair potential– Electrostatic interactions– -p p, p-+

• Backbone torsions (rama score)

+-

Page 15: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

High-resolution refinement of models

MCM protocol: • 120 steps of small & shear

moves– Random perturbation of 5/10

torsions angles (2-3o)– Side chain optimization:

rotamer trial (each 10 steps full repacking)

– minimization

• steps 1-60: gradually ramp up vdw repulsive

• steps 60-120: add side chain minimization

Side chain optimization

Backboneoptimization

Small backbone moves and MCM

vdW repulsive

Side chain optimization+ minimization

Backboneoptimization

Page 16: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Target 0281 CASP6• Topology sampled by ab initio trajectory of homolog sequence

(rmsd=2.2Å) • Full atom refinement reduces rmsd to 1.5Å• Side chain packing accurately recovered

First atom-resolution model

Page 17: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Atom-resolution Ab Initio (I)

• Challenge: Sample near-native conformations (<~2.5A)

• Approach: Model set of homologs → diverse population samples basin of attraction

Example: exposed Leucine

Models starting from extended confModels starting from native conf

Toward high-resolution de novo structure prediction.Bradley et al (2005) Science 309:1868

Page 18: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Low-resolution homolog folding improves prediction

• Collect 50 homologs (psi-blast 2 rounds; 60% non-redundant)

• For each– create 2000 low-resolution models– cluster, retain large clusters (n>5), and

select 500 models

• Thread query sequence back onto ~20-30K models

• Proceed to fullatom refinement: evaluate also homolog sequences (2 rounds of MCM protocol)

… … …

Page 19: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Atom-resolution Ab Initio (II)

Step1: low resolution– model homologs

Step2: atom resolution– 103-104 models– Energy-based model

selection

Results:11/16 proteins of length <88

residues are modeled within <5Å

Hox-B1 Ubiquitin

Page 20: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Sampling of b sheet topologies

• Fold-tree representation of protein allows tailored optimization

Page 21: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

• BOINC – donate idle time of many home computers for Rosetta runs

Tera=1012 strong desktop: ~ gigaflop (109)

How can we improve? (1) More computer time

Page 22: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

More computer time – is sampling an issue?

Perform very long runs on the grid (>106 decoys)

3 categories(a) Near-native lowest energy

model (<3.5Å) ✔(b) Problem with sampling(E near-native structures <<E decoys)

(c) Problem with energy function (E near-native structures >E decoys)

Sampling bottlenecks in de novo protein structure predictionKim et al (2007) JMB 393:249

Why don’t we sample these conformations (b) ?????

Page 23: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

“linchpin features” are rarely sampled

• Describe models as feature vectors • Identify native features not sampled in low-

energy models

Sampling bottlenecks in de novo protein structure prediction Kim et al (2007) JMB 393:249

Native torsion bin

Frequently sampled Native torsion bin

Never sampled Native torsion bin

Residue position

Tors

ion

bins

Position 23 never samples native helix conformation simulations never succeed

O: w=cisE: left-handed strandG: left-handed helixB: right-handed strandA: right handed helix

Enforcement of native-like value for feature Some simulations now succeed

Page 24: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Examples for “linchpin features”

• Near active site

• Regions that form late in folding

• Irregular b strand pairing (mostly in edge strands)

Page 25: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

How can we improve? More brains

• FOLDIT – folding game• donate idle time of many brains to improvestructure prediction• Now as Android

application!“win the Nobel prize by just

playing a game”http://vimeo.com/focusforwardfilms/semifinalists/51888393

http://www.youtube.com/user/UWfoldit Look also for “black belt” foldit lessons

Page 26: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Foldit

Predicting protein structures with a multiplayer online game Cooper et al (2010) Nature 466:756

Players: human spatial reasoning• Explore also strategy space: new search algorithms• Excel in solving problems where substantial backbone

rearrangement is needed to bury hydrophobic residue

Challenge: Formulate problem as game• Easy to understand to non-scientists• Competition/Collaborations

Native structure

Starting structure

Foldit Model

Page 27: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

LLG: log likelihood of a model: useful models must have better LLG than best random models (in shade)

Starting model

Solved structure

Nature Structure and Molecular Biology 2011

Example1: help in structure determination

Page 28: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Example 2: Foldit Puzzle #986875

Predicting protein structures with a multiplayer online game Cooper et al (2010) Nature 466:756

• Foldit detects better structures,

• … using trajectories that visit high energy structures on the way

Native structure

Starting structure

Foldit Model

Page 29: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Algorithm discovery by Foldit players

Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

• Added ability to create, edit, share and rate “recipes” (each player can create its own “cookbook”)

• Evaluated what strategies evolve and how they spread among players

Main strategies

Used at different stages in during

modeling

Top Players

All Players

Page 30: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

“Blue Fuse” and “Quake”

are most popular

Many new recipes evolve from “Blue Fuse”

Algorithm discovery by Foldit players

Page 31: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Foldit players detect algorithms that are similar to those used in Rosetta

Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

Foldit “Blue Fuse”:• very similar to new Rosetta

protocol “Fast Relax” (repeated decrease/increase of repulsive term)

• Comparable efficiency for short runs

Page 32: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

ab initio modeling – summary:

Roy, Kukucural, Zhang (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5:725

• Highly accurate• Computationally expensive (~150 CPU hours/protein)

Server of Rosetta http://robetta.bakerlab.org/

Good alternative: I-Tasser• Protocol developed by Zhang and Skolnick

• Based on threading of parts of sequence onto parts of known structures

• Very efficient and accurate (~5 CPU hours/protein)

Server of iTasser http://zhanglab.ccmb.med.umich.edu/I-tasser

Page 33: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

I-Tasser Iterative Threading Assembly Refinement (Zhang, & Skolnick)

Separate training of protocol for: easy/ medium/ hard targets

Page 34: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

i-Tasser (Zhang & Skolnick)

Threading: 1. Create profile:

1. Psiblast -> sequence profile2. Psipred -> secondary structure profile

2. LOMETS: Metaserver for threading (FUGUE, HHSEARCH, MUSTER, PROSPECT, PPA, SP3 & SPARKS)

3. Excise aligned structure elements from top-scoring templates for next step

(20/30/50, depending on difficulty of target)

Wu, Solnick, Zhang (2007) Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology 5:17Roy, Kukucural, Zhang (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5:725

Page 35: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

i-Tasser (Zhang & Skolnick)

Page 36: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Tasser: Schematic representation of polypeptide chain in on- and off-lattice Ca model

Zhang Y., Skolnick J. PNAS 2004;101:7594-7599

©2004 by National Academy of Sciences

Structure assembly - efficient modeling:

• 2 points/residue (Ca + SG)

• on-lattice ab initio for unaligned regions

• off-lattice for aligned regions

Page 37: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

i-Tasser (Zhang & Skolnick)

Monte Carlo Search by replica exchange• Exchange between simulations at different

temperatures: better samplingScoring function: separately trained for easy, medium and

hard targets– Secondary structure (PSIPRED & SAM)– Statistical terms: backbone hydrogen bonds; hydrophobicity

and Ca/side chain correlations– Spatial restraints from threading templates– Sequence-based contact predictions (SVM) (and accessible surface area prediction; NN)

Page 38: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

i-Tasser (Zhang)

Example for improvement over template

Constraints from threading; contact prediction are located at different sites and complement each other

Page 39: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

i-Tasser (Zhang)

Clustering

additional iteration of MC simulation starting from cluster centers

Final model created by optimizing hydrogen bonds

Page 40: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Contact-assisted structure prediction

Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

ab initio restricted to small (100aa), single domain proteins• + information about contacts -> dramatic increase of scope (… 500aa)• Info from:

• Contact prediction (bioinfo)

• Experiments: e.g. NMR chemical shifts, mutagenesis, etc

Contacts may assist in1. Determination of Topology:

• Filter fragments• Find fragment pairs

2. Refinement of Topology:• Refine structure by imposing

constraintsAssessment on CASP10 of Rosetta ab initio modeling: one reliable non-local contact every <12aa> needed for reliable modeling

Page 41: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Contact-assisted structure prediction

Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

Flowchart of protocol

Topology determination: from partial threading (SPARKS, Rosetta)

Topology refinement:RosettaCM recombination protocol (next week)

Page 42: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Contact-assisted structure prediction

Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

Improved models for large structures using contacts:

native

Ab initio

Assisted ab initio

Page 43: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

CASP• Double-blind structure prediction

experiment allows assessment of different approaches

• every 2 years; summer 2014: CASP11• Steady improvement of methodology

Categories: • Template based modeling (TBM)• Free modeling (FM)• Refinement of initial models

http://www.predictioncenter.org/casp10/meeting/talks.htmlProteins special issue vol:82, S2

Identification of “winner strategies”: • Rosetta in CASP4-6 • iTasser in CASP7 & CASP8• servers• improved combination of

multiple templates in CASP9

• CASP10: refinement with MD

• CASP11: contact prediction methods & contact-assisted modeling• New: prediction of contacts, unstructured regions, ligand binding

sites

Page 44: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

CASPAround 130 targets in last

roundsUntil CASP 9: Target difficulty

decreases

CASP10: • 131 domains

(20 free modeling)• Targets now more difficult

than previous CASPsProteins special issue vol:79, S10

Kryshtafovych et al.(2011). Proteins 79:S196–207

Page 45: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Measure of performance

Compare predicted to solved structure•superimpose short fragments (length n=3,5,7 residues; iteratively)•find maximal superimposed part N, where

– N Ca atom pairs are within xÅ– 4 thresholds: x=1.0, 2.0, 4.0, 8.0

•GDT_TS = ¼ (N1+N2+N4+N8)

Page 46: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

• 18 newly solved structures predicted prior to publication of structure.

• none recognized by sequence similarity

• none with close structural homologs

Independently assessed scoring: 2=“Well Above Average”, 1=“okay”, 0=“lousy”

Rosetta

CASP4 ab initio summary

Page 47: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

Improvement over the years

Improvement in each round

• CASP7: in difficult region• CASP8: accuracy in

template-based modeling (few difficult cases)

• CASP9: intermediate difficulty targets

• CASP10: refinement using MD (Michael Feig)

Page 48: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

48

Free Modeling with Rosetta in CASP8

Page 49: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

49

T0581• Server model: predicts kinked helix• Only model with 4 beta strands (most

predictions: all helical protein)

Free Modeling with Rosetta in CASP9

model best template

Kinch et al. (2011). Proteins, 79:S59–73

Page 50: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

50

T0806

Free Modeling with Rosetta in CASP11

model

best template

http://www.predictioncenter.org/casp11/doc/presentations/CASP11_FM_NG.pdf

Page 51: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

• longer fragments for alpha helical proteins (5-19; 3-12)

• shorter fragments for beta sheets (4-10; 3-7)

Rosetta in CASP8: modification of fragment size improves prediction

Page 52: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

52

FM with ITasser

Page 53: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

• increased contribution of automatic servers– predictions of

mostly similar quality

• improve now also difficult targets

Improved automatic servers

Hum

an+

serv

er +

Page 54: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

CASP10: Foldit platform joins the game for coopetition

• Start from Foldit models; proceed with different approaches

• Joint forces produce best model

Page 55: 5. Ab initio modeling. And today… Introduction to ab initio modeling: the basic principles Rosetta ab initio modeling protocol Grid-based large-scale.

55

• steady improvement of structure prediction over the years

• impressing quality of current ab initio modeling– efficient combination of appropriate sampling

strategies and a tailored energy function• models now often better than template• automatic servers outperform now also FM

Summary