Single-model quality assessment using protein structural and …caora//materials/MCBIOS... ·...

36
SpeakerRenzhi Cao MajorComputer Science Fifth year Ph.D Single-model quality assessment using protein structural and contact information with machine learning techniques

Transcript of Single-model quality assessment using protein structural and …caora//materials/MCBIOS... ·...

Page 1: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Speaker: Renzhi Cao

Major: Computer Science

Fifth year Ph.D

Single-model quality assessment using

protein structural and contact information

with machine learning techniques

Page 2: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Outline

Part I: IntroductionProtein quality assessment

CASP competition

Part II: QAcon method

Part III: Result

Part IV: Conclusion

Page 3: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Part I: Introduction

>T0759 HR9083A, Human, 109 residues

MGHHHHHHSHMVVIHPDPGRELSPEEAHRAGLIDWNMFVKLRSQECDWEEISVKGPNGES

SVIHDRKSGKKFSIEEALQSGRLTPAHYDRYVNKDMSIQELAVLVSGQK

Page 4: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Native

Predicted model2Predicted model1

Page 5: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Predicted model2 Predicted model3

Page 6: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

GDT-TS: 0.07 GDT-TS: 0.33GDT-TS: 0.21

Predicted model1 Predicted model2 Predicted model3

Page 7: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Evaluating metrics

1. Loss

2. Correlation

𝐺𝐷𝑇𝑏𝑒𝑠𝑡 𝑚𝑜𝑑𝑒𝑙 − 𝐺𝐷𝑇𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑡𝑜𝑝 1

𝑛 𝑥𝑦 − ( 𝑥)( 𝑦)

[𝑛 𝑥2 − 𝑥 2][𝑛 𝑦2 − 𝑦 2]

Page 8: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

CASP (Critical Assessment of Techniques for Protein

Structure Prediction). Sel20 (Stage1)

Top150 (Stage2)

Page 9: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Outline

Part I: IntroductionProtein quality assessment

CASP competition

Part II: QAcon method

Part III: Result

Part IV: Conclusion

Page 10: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

12 features10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

Page 11: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

1. RWp:

RWplus

score

10.

ES 4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

Page 12: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

2. MoE:

ModelEvalu-

ator

score

10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF

3.

Dop

Page 13: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

3. Dop:

Dope score10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

Page 14: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

4. Con:

Contact

satisfaction

score

10.

ES

1.

RWp

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

Page 15: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Contact threshold is set to 8

Res1

Res2

Page 16: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

5. SS score:

Secondary

structure

similarity

score

10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

Page 17: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

6. SP score:

Secondary

structure

penalty score

10.

ES

1.

RWp

4.

Con

7.

EC

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

𝑆𝑝𝑒𝑛𝑎𝑙𝑡𝑦 =𝑁𝑢𝑚𝐻𝑒𝑙𝑖𝑥 + 𝑁𝑢𝑚𝑆ℎ𝑒𝑒𝑡

𝑁

Page 18: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

7. EC:

Euclidean

Compact

score

10.

ES

1.

RWp

4.

Con

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

𝑆𝐸𝑢𝑐𝑙𝑖 = )𝐸𝑢𝑐𝑙𝑖(𝑖, 𝑗

3.8 ∗ 𝑖 − 𝑗

Page 19: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

8. SU: score:

Surface

score

10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

𝑆𝑠𝑢𝑟𝑓 = 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑛𝑜𝑛𝑝𝑜𝑙𝑎𝑟𝑖 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑎𝑟𝑒𝑎𝑖

Page 20: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

9. EM:

Exposed

mass score

10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

11.

SA

12.

RF 2.

MoE

3.

Dop

𝑆𝑚𝑎𝑠𝑠 = 𝐴𝑟𝑒𝑎 𝑜𝑓 𝑁𝑜𝑛𝑝𝑜𝑙𝑎𝑟𝑖 ∗ 𝑀𝑎𝑠𝑠𝑖 𝐴𝑟𝑒𝑎 𝑜𝑓 𝐸𝑥𝑝𝑜𝑠𝑒𝑑𝑖 ∗ 𝑀𝑎𝑠𝑠𝑖

Page 21: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

10. ES:

Exposed

surface

score

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

12.

RF 2.

MoE

3.

Dop

𝑆𝐸𝑆 = 𝐸𝑥𝑝𝑜𝑠𝑒𝑑 𝑎𝑟𝑒𝑎𝑖 𝐴𝑟𝑒𝑎𝑖

Page 22: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

11. SA:

Solvent

accessibility

similarity

score

10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

12.

RF 2.

MoE

3.

Dop

Page 23: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

12. RF:

RF_CB_SRS

_OD score

10.

ES

1.

RWp

4.

Con

7.

EC

6.

SP

5.

SS

8.

SU

9.

EM

11.

SA

2.

MoE

3.

Dop

Page 24: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

RF

RWp

MoE

Con

SP

SS

12 features

OutputHidden layer

Error: 1

2𝑦 − 𝑡 2

𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛

𝑥 =1

1 + 𝑒𝑥

𝐵𝑎𝑐𝑘𝑤𝑎𝑟𝑑 𝑝𝑟𝑜𝑝𝑜𝑔𝑎𝑡𝑖𝑜𝑛

𝑤𝑡+1 = 𝑤𝑡 + η ∗әE

ә𝑤

𝐹𝑜𝑟𝑤𝑎𝑟𝑑 𝑝𝑟𝑜𝑝𝑜𝑔𝑎𝑡𝑖𝑜𝑛

𝑤𝑖,𝑗𝑥𝑖

Page 25: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Outline

Part I: IntroductionProtein quality assessment

CASP competition

Part II: QAcon method

Part III: Result

Part IV: Conclusion

Page 26: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Part III: Result

Server name Ave. corr. Ave. loss

ProQ2 0.643 0.090

QAcon 0.639 0.100

VoroMQA 0.561 0.108

Wang_SVM 0.655 0.109

Wang_deep_1 0.613 0.128

RWplus 0.536 0.135

raghavagps-qaspro 0.35 0.156

Table 1. The per-target average correlation, average loss for QAcon and other methods on sel20 of CASP11.

Page 27: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Part III: Result

Server name Ave. corr. Ave. loss

ProQ2 0.372 0.058

QAcon 0.395 0.067

VoroMQA 0.401 0.069

RWplus 0.295 0.084

Wang_SVM 0.362 0.085

raghavagps-qaspro 0.222 0.085

Wang_deep_1 0.302 0.089

Table 2. The per-target average correlation, average loss for QAcon and other methods on top150 of CASP11.

Page 28: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).
Page 29: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Part III: Result

Target name Contact satisfaction

T0778 0.6142

T0825 0.6049

T0807 0.5387

T0815 0.5189

T0817 0.5181

T0811 0.5176

T0854 0.4953

T0762 0.4607

T0819 0.4531

T0768 0.4529

T0776 0.4492

T0798 0.4343

T0805 0.4252

T0801 0.3936

T0847 0.3864

Table 3.Contact satisfaction score of all

CASP11 native structures (top15)

Contact satisfaction Ave. Corr Ave. Loss

Con (Top 25) 0.682 0.156

Con (Bottom 25) -0.016 0.233

Table 4. The average correlation and loss for CASP11 sel20 targets

Contact satisfaction Ave. Corr Ave. Loss

Con (Top 25) 0.221 0.146

Con (Bottom 25) 0.080 0.134

Table 5. The average correlation and loss for CASP11 top150 targets

Page 30: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Outline

Part I: IntroductionProtein quality assessment

CASP competition

Part II: QAcon method

Part III: Result

Part IV: Conclusion

Page 31: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Part IV: Conclusion

QAcon

Contact as a potential feature for QA

Page 32: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

Q & A

Email: [email protected]

Badri Adhikari

Debswapna Bhattacharya

Miao Sun

Jianlin Cheng

All other lab members

Jie Hou

Page 33: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

33

Supplementary

1. The RF_CB_SRS_OD

score(Rykunov and Fiser, 2007)

energy score for evaluating the protein

structure based

on statistical distance dependent pairwise

potentials

2. RWplus score(Zhang and

Zhang, 2010)

energy score evaluating protein models

based on distance-dependent

atomic potential

3. ModelEvaluator score(Wang,

et al., 2009)

score evaluating protein models based on

structural features and support vector

machines.

Page 34: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

34

Supplementary

4. Dope score(Shen and Sali,

2006)

energy score evaluating protein models based

on the reference state

of non-interacting atoms in homogeneous

sphere

5. Con scoreThe contact score is calculated by the satisfaction

of contact predicted from the sequence and the

one parsed from the model. PSI-COV is used for

contact prediction, and the NNcon is used when

PSI-COV fails to make predictions.

6. SS score This score is calculated by the difference between

secondary structure predicted by Spine X

(Faraggi, et al., 2012) from the protein sequence

and those of a model parsed by DSSP (Kabsch

and Sander, 1983).

Page 35: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

35

Supplementary

7. SP score This score is calculated by the percentage of helix and

sheet matching between secondary structure predicted

and the on parsed from the model

8. EC scoreThe Euclidian compact score is calculated by summation

of pairwise Euclidean distance between amino acids

divided by (N*N-1)*3.8, N is the total number of amino

acids in the sequence

9. SU score This surface score is calculated by the total

area of exposed nonpolar residues divided

by the total area of all residues

Page 36: Single-model quality assessment using protein structural and …caora//materials/MCBIOS... · 2016-06-24 · CASP (Critical Assessment of Techniques for Protein Structure Prediction).

36

Supplementary

10. EM score The exposed mass score is calculated as the total mass of

nonpolar residues area divided by the total mass of exposed

residue area

11. ES score The exposed surface score is calculated as the total exposed

residue area divided by the total residue area.

12. SA score The solvent accessibility score is calculated by the

percentage of difference between the predicted solvent

accessibility and the one parsed from the model.