Protein Structure Prediction

32
Protein Structure Prediction Why ? Type of protein structure predictions Sec Str. Pred Homology Modelling Fold Recognition Ab Initio Secondary structure prediction Why History Performance Usefullness

description

Protein Structure Prediction. Why ? Type of protein structure predictions Sec Str. Pred Homology Modelling Fold Recognition Ab Initio Secondary structure prediction Why History Performance Usefullness. Why do we need structure prediction?. 3D structure give clues to function: - PowerPoint PPT Presentation

Transcript of Protein Structure Prediction

Page 1: Protein Structure Prediction

Protein Structure Prediction● Why ?● Type of protein structure

predictions– Sec Str. Pred

– Homology Modelling

– Fold Recognition

– Ab Initio

● Secondary structure prediction– Why

– History

– Performance

– Usefullness

Page 2: Protein Structure Prediction

Why do we need structure prediction?

● 3D structure give clues to function: – active sites, binding sites, conformational changes... – structure and function conserved more than sequence – 3D structure determination is difficult, slow and

expensive – Intellectual challenge, Nobel prizes etc... – Engineering new proteins

Page 3: Protein Structure Prediction

The Use of Structure

Page 4: Protein Structure Prediction

The Use of Structure

Page 5: Protein Structure Prediction

The Use of Structure

Page 6: Protein Structure Prediction

It's not that simple...

● Amino acid sequence contains all the information for 3D structure (experiments of Anfinsen, 1970's)

● But, there are thousands of atoms, rotatable bonds, solvent and other molecules to deal with...

● Levinthal's paradox

Page 7: Protein Structure Prediction

Structure predictionSummary of the four main approaches to structure prediction. Note

that there are overlaps between nearly all categories.

Method Knowledge Approac h Difficulty Usefulness

Compar a tive modelling (Homolog y modelling)

Protei ns of known stru cture

Ident ify relate d s tructure with sequenc e me thod s , cop y 3D coord s and modi fy where necessa ry

Relativel y eas y Very, if sequence identity drug design

Fold recognitio n

Protei ns of known stru cture

Sam e a s abov e , but use mor e sophistic a ted me thod s to find related structure

Mediu m Limited due to poor models

Seconda ry stru cture predictio n

Sequence -stru cture stat istics

Forg e t 3D arra ngeme nt a nd predic t wher e the helice s /s trand s are

Mediu m Can improve alignments, fold recognition, ab initio

ab initio tertia ry stru cture predictio n

Energ y functions , stat istics

Simulat e folding, or gen e rate lots of s tructure s an d try to pick the corre ct one

Ver y har d Not really

Page 8: Protein Structure Prediction

CASP Critical Assessment of Techniques for Protein Structure Prediction

● Why do we have CASP ?

– People cheat! ● people work hard to make prediction programs work for their

favourite proteins, but...

– benchmarking may be polluted by ``information leakage'' ● Difficult to compare methods fairly

● software and data issues

● different measures, standards

● What we want is fully blind trials of prediction methods by a third party, i.e. CASP

Page 9: Protein Structure Prediction

CASP

Page 10: Protein Structure Prediction

Secondary structure predictions

● Ignore 3D, it's too hard!

– Usually concentrate on helix, strand and ``coil''. ● Pattern recognition, but which patterns? ● some amino acids have preferences for helix or strand; due to

geometry and hydrogen bonding ● spatial (along sequence) patterns, alternating hydrophobics (helical

wheel) ● conservation (down alignment) in different members of protein

family; insertions and deletions ● Three main generations/stages in SSP method development since

1970's.

Page 11: Protein Structure Prediction

What is ``known secondary structure''?

● Of critical importance in training/assessment of SSP methods

● Can be defined: ● visually by structural biologist ● by geometric and chemical criteria (, angles,

distances between atoms, hydrogen bonds...) by programs like DSSP and STRIDE

Page 12: Protein Structure Prediction

Secondary structures -Helix

Page 13: Protein Structure Prediction

Secondary Structure - Sheet

Page 14: Protein Structure Prediction

Secondary structure - turns

Page 15: Protein Structure Prediction

Physics of secondary structures

● Two main opposing forces– sidechain conformational entropy – mainchain hydrogen bonding.

● This predicts:– Helix propensity Ala>Leu>Ile>Val

● Other factors– Polarity (low helical propensity of Ser, Thr, Asp and

Asn)

Page 16: Protein Structure Prediction

Secondary Structure Predictions

Some highlights in performance

– 1974 Chou and Fasman 50% – 1978 Garnier 62% – 1993 PhD 72% – 2000 PsiPred 76%

Page 17: Protein Structure Prediction

Secondary structure

prediction 1st generation

methods

● Chou and Fassman1) Assign all residues the appropriate set of parameters.

2) Scan through the peptide and identify helical regions

3) Repeat this procedure to locate all of the helical regions in the sequence.

4) Scan through the peptide and identify sheet regions.

5) Solve conflicts between helical and sheet assignments

6) Identify turns● Claims of around 70-80% - actual accuracy about 50-60%

Helix Strand

Strong former E A L M V I

Former H M Q W V F C Y F Q L T W

Weak former K I A

Indifferent D T S R C R G D

Breaker N Y K S H N P

Strong breaker

P G E

Page 18: Protein Structure Prediction

GOR III Garnier, Osguthorpe, Robson, 1990

● Secondary structure depends on aminoacids propensities– As in Chou Fassman

● Also influences by neighboring residues– Helix capping– Turns etc

● How to include distant information.● Performance approximately 67%

Page 19: Protein Structure Prediction

GOR III Garnier, Osguthorpe, Robson, 1990

The helix propensity tables thus have 20x17 entries.

Assign the state with the highest propensity

Page 20: Protein Structure Prediction

Status of predictions in 1990

● Too short secondary structure segments ● About 65% accuracy ● Worse for Beta-strands● Example:

Page 21: Protein Structure Prediction

Secondary structure prediction 2nd generation methods

● sequence-to-structure relationship modelled using more complex statistics, e.g. artificial neural networks (NNs) or hidden Markov models (HMMs)

● evolutionary information included (profiles) ● prediction accuracy >70% (PhD, Rost 1993)

Page 22: Protein Structure Prediction

PhD (Rost & Sander, 1994)

Page 23: Protein Structure Prediction

PhD-Input

Page 24: Protein Structure Prediction

PhD-architecture

Page 25: Protein Structure Prediction

PhD-predictions

● Secondary structure ``prediction'' by homology

● If sequence of unknown secondary structure has a homologue of known structure, it is more accurate to make an alignment and copy the known secondary structure over to the unknown sequence, than to do ``ab initio'' secondary structure prediction.

Page 26: Protein Structure Prediction

PhD summary

● First methods with >70% Q3● Correct length distributions● Much better beta strand predictions● Good correlation between score and accuracy● Better predictions for larger multiple sequence

alignments

Page 27: Protein Structure Prediction

3rd generation methods

● enhanced evolutionary sequence information (PSI-BLAST profiles) and larger sequence databases takes Q3 to > 75%

● PHD and PSIPRED are the best known methods

Page 28: Protein Structure Prediction

PSIPRED

● Similar to PhD● Psiblast to detect more remote homologs● only two layers● SVM or NN gives similar performance

Page 29: Protein Structure Prediction

Current Status of Secondary Structure predictions

● Best Methods

– PsiPred

– Sam-T02

– Prof ● About 75%-76% accuracy

● Improvement mainly due to:

– Larger Databases

– PSI-BLAST

Page 30: Protein Structure Prediction

Other secondary structure prediction methods

● turn prediction ● transmembrane helix prediction ● coiled coil ● Dissorder predictions● contact prediction, disulphides

Page 31: Protein Structure Prediction

What use is it?

● No 3D means no clues to detailed function, so... ● Accurate secondary structure predictions help

sequence analysis: finding homologues, aligning homologues, identifying domain boundaries.

● Can help true 3D prediction

Page 32: Protein Structure Prediction

Future improvements to SSP

● Long range information– Baker

● Folding pathway and/or 3D-information