MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science...

14
MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri, Columbia, MO, USA
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science...

Page 1: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM – A Combination Pipeline for Protein Structure Prediction

Jianlin Cheng

Computer Science Department & Informatics InstituteUniversity of Missouri, Columbia, MO, USA

Page 2: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM Structure Prediction PipelineServer PredictorQuery Sequence

Output

Human Predictor

Page 3: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM Structure Prediction PipelineQuery Sequence

Output

• PSI-BLAST• HHSearch• COMPASS• FOLDpro + SPEM

Query-template alignments:

Find a set of good templates / fragments; generate alternative query-templatealignments

Page 4: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM Structure Prediction PipelineQuery Sequence

Output

1. Combine top ranked query-template alignment (QTA) withother significant QTAs2. Take fragments from lesssignificant QTA (Template-free)

Don’t try to find the best template; Instead combine multiple good templates / fragments.

Combination

Page 5: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM Structure Prediction PipelineQuery Sequence

Output

1. Modeller 2. Rosetta for template-free small domains

Domain-level combination of template-based and template-free approaches

Integrative Model Generation

Page 6: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM Structure Prediction PipelineQuery Sequence

Output

Model Ranking by ModelEvaluator

Page 7: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

ModelEvaluator3D Model Ab initio Sequence-Based Structural Feature Prediction

EEEECCEEEHHHHHHHHHHHHEEEECCEEEHHHH

eeee-----eeeee----------eeeee------eeeee---eeeeeeee

Secondary Structure

Relative Solvent Accessibility

Contact Map

Beta-Sheet Pairing

Input Features

Predicted GDT-TS score

Good models ranked at the top. Very effective fortemplate-free models.

Comparison

Page 8: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

MULTICOM Structure Prediction PipelineQuery Sequence

Output

1. Start from a top ranked model2. Combine it with other models having global similarity (80%, 4Å)3. Combine it with the longest

similar model fragments

Global-Local Model Combination

Modeller Iterative Modeling

Average Model

Don’t try to find the best model.Instead combine multiple goodmodels / fragments (2-3% improvement).

Page 9: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

Good Template-Free Example: T0416_2

Structure MULTICOM (GDT = 0.66, RMSD = 2.5)

Superposition (red: model) (Courtesy by Prof. Joel Sussman)

Combination of 20 models:

Zhang-ServerRobettaTASSERMULTICOMYASARAforecast

Success: rank very good models at top.

Page 10: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

Good Template-Free Example: T0513_2

StructureMULTICOM (GDT = 0.73, RMSD=2.1)

Combine Robetta modelsBetter than each one of them

Superposition (blue: model)

Success: rank very good models at top and combination improves modeling.

Page 11: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

Not Good Template-Free Example: T0405_1

Structure(Helix Bundle)

MULTICOMGDT = 0.41

Superposition (by Prof. Sussman)(Gray: structure, yellow: best modelgreen: MULTICOM model)

Failure: ModelEvaluator fails to identify correct helix orientations.

Page 12: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

Concluding Remarks• CASP Community can sometime generate good

template-free models (e.g. Rosetta-based tools)

• ModelEvaluator can rank good template-free models at the top

• Iterative global-local combination of models can improve template-free modeling

• Blending of template-free and template-based modeling

Page 13: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

Blending of Template-Free and Template-Based Modeling

100% TBM 100% FM50% TBM+50%FM

Protein Modeling Spectrum

Page 14: MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri,

Acknowledgements

• CASP8 organizers and assessors• CASP8 participants• MU colleagues: Dong Xu, Toni Kazic • My group: Zheng Wang Allison Tegge Xin Deng