Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

22
Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I Prof. Corey O’Hern Department of Mechanical Engineering Department of Physics Yale University

description

Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I. Prof. Corey O’Hern Department of Mechanical Engineering Department of Physics Yale University. Statistical Mechanics. Protein Folding. Biochemistry. What are proteins?. Linear polymer. Folded state. - PowerPoint PPT Presentation

Transcript of Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Page 1: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Bioinformatics: Practical Application of Simulation and Data

Mining

Protein Folding I

Prof. Corey O’HernDepartment of Mechanical Engineering

Department of PhysicsYale University

Page 2: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

StatisticalMechanicsBiochemistry

Pro

tein

Fold

ing

Page 3: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

What are proteins?

•Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, …•Linear polymer chain composed of tens (peptides) to thousands (proteins) of monomers•Monomers are 20 naturally occurring amino acids•Different proteins have different amino acid sequences•Structureless, extended unfolded state•Compact, ‘unique’ native folded state (with secondary and tertiary structure) required for biological function•Sequence determines protein structure (or lack thereof)•Proteins unfold or denature with increasing temperature or chemical denaturants

Linear polymer Folded state

Page 4: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Amino Acids I

N-terminal C-terminalC

variable side chain

R

peptidebonds

•Side chains differentiate amino acid repeat units•Peptide bonds link residues into polypeptides

Page 5: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Amino Acids II

Page 6: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

The Protein Folding Problem:

What is ‘unique’ folded 3D structure of a protein based on its amino acid sequence? Sequence Structure

Lys-Asn-Val-Arg-Ser-Lys-Val-Gly-Ser-Thr-Glu-Asn-Ile-Lys- His-Gln-Pro- Gly-Gly-Gly-…

Page 7: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

•Folding: hydrophobicity, hydrogen bonding, van der Waals interactions, …•Unfolding: increase in conformational entropy, electric charge…

H (hydrophobic)

P (polar)

inside

outside

Hydrophobicity index

Driving Forces

Page 8: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Higher-order Structure

Page 9: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Secondary Structure: Loops, -helices, -strands/sheets

-sheet

-strand

•Right-handed; three turns•Vertical hydrogen bonds between NH2 (teal/white) backbone group and C=O (grey/red) backbone groupfour residues earlier in sequence•Side chains (R) on outside; point upwards toward NH2

•Each amino acid corresponds to 100, 1.5Å, 3.6 amino acids per turn•(,)=(-60,-45)•-helix propensities: Met, Ala, Leu, Glu

-helix

•5-10 residues; peptide backbones fully extended•NH (blue/white) of one strand hydrogen-bonded to C=O (black/red) of another strand•C ,side chains (yellow) on adjacent strands aligned; side chains along single strand alternate up and down•(,)=(-135,-135)•-strand propensities: Val, Thr, Tyr, Trp, Phe, Ile

Page 10: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Backbonde Dihedral Angles

cosθ =π1gπ 2

Page 11: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Ramachandran Plot: Determining Steric Clashes

CCNC

CN CC

N CCN

Backbonedihedral angles

=0,180

Non-Gly Gly

theory

PDB

vdW radii< vdW radii

backboneflexibility

4 atoms define dihedral angle:

Page 12: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

How can structures from PDB exist outside Ramachadran bounds?

•Studies were performed on alanine dipeptide•Fixed bond angle (θ=110) [105,115]

J. Mol. Biol. (1963) 7, 95-99

Page 13: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Side-Chain Dihedral Angles

Side chain: C-CH2-CH2-CH2-CH2-NH3

4: Lys, Arg

5: Arg

Use NCCCCCN to define 1, 2, 3, 4

Page 14: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Molecular biologist Biological Physicist

Your model is oversimplified and has nothing to do with biology!

Your model is too complicated and has no predictive power!

Page 15: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

•For all possible conformations, compute free energyfrom atomic interactions within protein and protein-solvent interactions; find conformation with lowest free energy…e.g using all-atom molecular dynamics simulations

Possible Strategies for Understanding Protein Folding

Not possible?, limited time resolution

•Use coarse-grained models with effective interactionsbetween residues and residues and solvent

General, but qualitative

Page 16: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Why do proteins fold (correctly & rapidly)??

Nc : μ2Nμ = # allowed dihedral angles

How does a protein find the global optimum w/oglobal search? Proteins fold much faster.

For a protein with N amino acids, number of backbone conformations/minima

Levinthal’s paradox:

Nc~ 3200 ~1095 fold ~ Nc sample ~1083 s

universe ~ 1017 svs fold ~ 10-6-10-3 s

Page 17: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

U, F =U-TSEnergy Landscape

rr1,

rr2 ,K ,

rrN{ }

all atomic coordinates;

dihedral angles

M1

M3M2

r∇U = 0

∇2U > 0

∇2U = 0

∇2U < 0

⎨⎪

⎩⎪

S12 S23

S-1

minimum

saddle point

maximum

Page 18: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Roughness of Energy Landscape

smooth, funneled rough(Wolynes et. al. 1997)

Page 19: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Folding Pathways

similarity tonative state

dead end

Page 20: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Folding Phase Diagram

smooth rough

Page 21: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

•What differentiates the native state from other low-lying energy minima?

•How many low-lying energy minima are there? Can we calculatelandscape roughness from sequence?

•What determines whether protein will fold to the native state or become trapped in another minimum?

•What are the pathways in the energy landscape that a given protein follows to its native state?

Open Questions

NP Hard Problem!

Page 22: Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I

Digression---Number of Energy Minima for Sticky Spheres

Nm Ns Np

4 1 1

5

6

7

1

2

5

6

50

486

8

9

10

13

52

-

5500

49029

-

spherepackings

polymerpackings

Ns~exp(aNm); Np~exp(bNm) with b>a