Thomas Huber [email protected] Computational Biology and Bioinformatics Environment ComBinE...

14
Thomas Huber [email protected] Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland Protein Scoring Functions: Essential Tools or Fancy Fad?
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    2

Transcript of Thomas Huber [email protected] Computational Biology and Bioinformatics Environment ComBinE...

Thomas Huber [email protected]

Computational Biology andBioinformatics Environment

ComBinE

Department of Mathematics The University of Queensland

Protein Scoring Functions:Essential Tools or Fancy Fad?

Why do we (still) care about Protein Structures/Prediction?

• Academic curiosity?– Understanding how nature works

• Urgency of prediction 104 structures are determined

• insignificant compared to all proteins

– sequencing = fast & cheap

– structure determination = hard & expensive

Transistors in

Intel processo

rs

TrEMBL sequences

(computer annotated)

SwissProt sequences (annotated)

structures in PDB

What would we like to be able to predict?

• What is a protein’s structure?– Does a sequence adopt a known fold?

• Fold recognition

– Does a sequence adopt a new fold?• New fold prediction (dream of structural

genomics)

• How stable is a protein– Thermodynamic stability

• What is a protein’s function?– Functional annotation

Three basic choices in molecular modelling

• Representation– Which degrees of freedom are treated

explicitly

• Scoring– Which scoring function (force field)

• Searching– Which method to search or sample

conformational space

Two Linages of Protein Structure Prediction

• The physicist’s approach– Thermodynamics: Structures with low

energy are more likely

• The biologist’s approach– Similar sequences similar structures

Fragment Scoring

• Proteins are decomposed into overlapping fragments of 7 residues

• Each fragment is described by• Amino acid specific local structure• Non-specific environment

• Fragments are clustered and a statistical model for each cluster is built

• Total score = fragment scores

Finding Remote Homologueswith sausage

• 572 sequence-structure pairs• Structures are similar (FSSP)• > 70% structurally aligned• < 20% sequence identity

250

300

350

400

0 50 100 150 200

alignment quality(arb. units)

sequence similarity weight (arb. Units)

RN

A-d

epen

den

t RN

A P

olymerases

A Real Case Example RNA-dependent RNA polymerases

• Dengue virus

• Bacteriophage 6

Testing/Breaking the Scoring

• Designed -sheet (Serrano)– 12 residues

– Forms stable -sheet at

room temperature

Another Uniquely Folded Mini-Protein

• Villin head-piece (36 residues)– High thermodynamic

stability (Tm>70º)

– Folds autonmously

A Uniquely Folded Mini-Protein

• Zinc finger analoge (Mayo)– 28 residues

– thermodynamic stable (Tm25º)

Trimer Stability

• Nitrogen regulation proteins– 2 protein (PII (GlnB) and GlnK)

– 112 residues

– sequence: 67% identities, 82% positives

– structure: 0.7Å RMSD

– trimeric

– Dr S. Vasudevan: hetero-trimers

Hetero-trimer Stability• What is the most/least stable trimer• Why use a low resolution force field?

– Structures differ (0.7Å RMSD)– Side chains are hard to optimise

• Calculation: – GlnB3 > GlnB2-GlnK > GlnB-GlnK2 > GlnK3

• Experiment:– GlnB3 > GlnB2-GlnK > GlnB-GlnK2 > GlnK3

GlnK

GlnB

People• sausage

– Andrew Torda (RSC)

– Oliver Martin (RSC)

• GlnB/GlnK, RdR polymerases– Subhash Vasudevan (JCU)

Sausage and Cassandra freely available http://rsc.anu.edu.au/~torda

[email protected]

• Increasing urgency for in-silico proteomics

• Good force fields = essential for success – Different tasks (may) require different

scoring schemes

Summary