Multi-factor model for prediction of the caspase degradome Lawrence Wee
description
Transcript of Multi-factor model for prediction of the caspase degradome Lawrence Wee
Multi-factor model for prediction of the caspase degradome
Lawrence Wee
What are Caspases?
1. Fuentes-Prior et al. Biochem J. 2004 Dec 1;384(Pt 2):201-32.
2. Thornberry et al. J Biol Chem. 1997 Jul 18;272(29):17907-11.
The Biochemistry of Caspases1
Caspases are cysteine proteases.
In-vitro optimal tetrapeptide specificities2
P4 P3 P2 P1
Group I
Caspase-1 W E H D
Caspase-4 W/L E H D
Caspase-5 W/L E H D
Group II
CED-3 D E T D
Caspase-3 D E V D
Caspase-7 D E V D
Caspase-2 D E H D
Group III
Caspase-6 V E H D
Caspase-8 L E T D
Caspase-9 L E H D
Recognize tetrapeptide sequence on substrates (P4-P3-P2-P1).
P4 P3 P2 P1 P1’ P2’
- D– E – V – D --- T – Y
Cleave after canonical Asp (D) residue at the P1 position.
What are Caspases?
1. Hengartner MO. The biochemistry of apoptosis.Nature. 2000 Oct 12;407(6805):770-6.
As the final effectors of apoptosis, caspases cleave many protein substrates.
Caspases in Apoptosis1
Extrinsic
Intrinsic
The Caspase Degradome
The State of the Caspase Degradome1
1. Categories are assigned according to Fischer et al (2003).
Functional Distribution of Caspase Substrates
Ser/Thr-Protein kinases in signal transduction
12%
Cytoskeletal and structural proteins
10%
DNA-binding and transcription factors
9%
RNA synthesis and splicing7%
DNA synthesis, cleavage and repair
7%
Cell adhesion6%
Cell Cycle proteins6%
Calcium, c-AMP, c-GMP and Lipid metabolism
5%
Nuclear structural and abundant proteins
4%
Membrane Receptors4%
Neurodegeneration4%
G protein signaling3%
Apoptosis regulation3%
ER and Golgi-resident proteins
1%
Protein phosphatases1%
Protein modification1%
Tyr protein kinases1%
Other substrates4%
Protein degradation3%
Viral proteins3%
Adapter proteins1%Cytokines
1%
Protein translation4%
What is the Degradome?
The Degradome
The Caspase Degradome: The repertoire of proteins cleaved by caspases .
Genome Transcriptome Proteome
Degradome
SubstratesProteases
Genomics Transcriptomics Proteomics
Degradomics
1. Lopez-Otin C and Overall CM. Protease degradomics: a new challenge for proteomics. Nat Rev Mol Cell Biol. 2002 Jul;3(7):509-19.
Degradome: The protease-substrate repertoire in a cell, tissue or organism1
Question
What proteins are cleaved by caspases?
Question
What proteins are cleaved by caspases?
Strategy
How about predicting the caspase degradome?
Algorithms and servers
1. Accuracy as reported in papers using the authors’ datasets.
2. GraBCas accuracy when tested on our dataset.
Existing algorithms and servers
Program Algorithm Accuracy Authors Dataset
PeptideCutterConsensus
Motifs Not Reported
Gasteiger et al (2005)
Outdated dataset
PEPSConsensus
Motifs Not Reported
Lohmuller et al (2003)
Outdated dataset
CasPredictorPosition Specific Scoring Matrices 81%
1 Garay-Maipartida et al (2005)
137 sequences (Fischer, 2003)
GraBCasPosition Specific Scoring Matrices 87%
2 Backes et al (2005)
No dataset provided
BBBF NN Neural Networks 96%1
Yang (2005)Small dataset.
(12 sequences)
SVM Prediction
SVM 82-97%Wee et al (2006,
2007)219 sequences
Server for SVM-based prediction
CASVM Web Server1,2
1. Wee et al.CASVM: web server for SVM-based prediction of caspase substrates cleavage sites. Bioinformatics. 2007 Dec 1;23(23):3241-3.
2. Wee et al. SVM-based prediction of caspase substrate cleavage sites. BMC Bioinformatics. 2006 Dec 18;7 Suppl 5:S14
CASVM web server predicts caspase cleavage sites using our SVM algorithmwww.casbase.org/casvm
Question
What proteins are cleaved by caspases?
Strategy
How about predicting the caspase degradome?
Problem
Predicting caspase cleavage sites is not good enough
Predicting the Caspase Degradome
Limitations of Caspase Cleavage Sites Prediction
1. Analysis on our caspase substrates dataset.
Not all bona fide cleavage site motifs are cleaved in vivo1:
• 80% of true substrates contain at least one other identical caspase cleavage site sequence which is not reported as a true cleavage site in literature.
- Tpr (DDED-2117)- p28BAP31 (AAVD-163) - golgin 160 (SEVD-311)- Topo I (PEDD-123) - heterogeneous nuclear ribonucleoparticle C1/C2 (GEDD-305)
Problem
Predicting caspase cleavage sites is not good enough
Solution
How about incorporating other structural factors?
Problem
Predicting caspase cleavage sites is not good enough
Solution
How about incorporating other structural factors?
Secondary structures? Solvent exposure?
Analysis of caspase cleavage sites
Caspase cleavage sites are analyzed for:
Dataset of caspase cleavage sites& non-cleavage sites
SABLE1
Propensity for secondary structures
Propensity for solvent exposure
1. http://sable.cchmc.org/
Analysis of caspase cleavage sites
Cleavage sites tend to locate in unstructured regions
Figure 1 Figure 2
Cleavage sites prefer unstructured regions
Analysis of caspase cleavage sites
Cleavage sites tend to locate in solvent exposed regions
Figure 3 Figure 4
Cleavage sites prefer solvent exposed regions
Analysis of caspase cleavage sites
Cleavage sites tend to locate in unstructured and solvent exposed regions
Cleavage sites prefer highly unstructured regions with high solvent exposure
Figure 5
Analysis of caspase cleavage sites
Cleavage sites tend to locate in unstructured and solvent exposed regions
Non-cleavage sites prefer regions with secondary structures and less solvent exposure
Figure 6
Multi-factor model
Current algorithms Better algorithm?
Cleavage site prediction Cleavage site prediction
Secondary structures
Solvent exposure
Multi-factor model prediction
Schematic diagram of the multi-factor algorithm
Step 1 - Caspase cleavage site prediction (using an existing algorithm)
......MIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF.....
Step 2 – Selection of structurally favorable candidates
VETELKLICCDILDVLDKHLIPAA ELKLICCDILDVLDKHLIPAANTG
LAKAAFDDAIAELDTLSEESYKDS
Cp, Sp and P-score are calculated for all subsequences
ELKLICCDILDVLDKHLIPAANTGCleavage sites in subsequences with P-score above cut-off are selected
Algorithms and servers
Existing algorithms and servers
Program Algorithm Accuracy Authors Dataset
PeptideCutterConsensus
Motifs Not Reported
Gasteiger et al (2005)
Outdated dataset
PEPSConsensus
Motifs Not Reported
Lohmuller et al (2003)
Outdated dataset
CasPredictorPosition Specific Scoring Matrices 81%
1 Garay-Maipartida et al (2005)
137 sequences (Fischer, 2003)
GraBCasPosition Specific Scoring Matrices 87%
2 Backes et al (2005)
No dataset provided
BBBF NN Neural Networks 96%1
Yang (2005)Small dataset.
(12 sequences)
SVM Prediction
SVM 82-97%Wee et al (2006,
2007)219 sequences
1. Accuracy as reported in papers using the authors’ datasets.
2. GraBCas accuracy when tested on our dataset.
Multi-factor model prediction
Validating the multi-factor model
Dataset of caspase cleavage sites& non-cleavage sites
Analysis
Test Multi-factor model prediction
Multi-factor model prediction
Validating the multi-factor model
Figure 7
Using CASVM
Multi-factor model prediction
Validating the multi-factor model
Figure 8
Using GraBCas
Multi-factor model prediction
Validating the multi-factor model
Positive Predictive Values of Models
0
10
20
30
40
50
60
70
80
90
100
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
P-Score
PPV
(%)
Figure 9
CASVM
GraBCas
RTK cleavage prediction
Prediction of potential caspase cleavage among RTKs
Receptor Tyrosine Kinases (RTKs)
Belong to the tyrosine kinase superfamily
Plasma membrane bound
Involved in cell survival, proliferation, differentiation
Image taken from http://www.pvrireview.org/viewimage.asp?img=PVRIReview_2009_1_2_124_50732_u1.jpg
Questions
Which RTKs are cleaved by caspases?
What are the consequences of cleavage?
RTK cleavage prediction
Prediction of RTK cleavage using the multi-factor model
52 RTKs from Uniprot
Step 1Caspase cleavage sites predicted with
CASVM
Step 2
Cleavage sites scored for Cp, Sp and P-score
Selection of structurally favorable cleavage sites
RTK cleavage prediction
Prediction of potential caspase cleavage among RTKs
Global mapping of predicted caspase cleavage sites on receptor tyrosine kinases
RTK Family RTKs UNIPROT ID Predicted Caspase Cleavage Sites1
EGF receptor EGFR P00533 321 458 587 770 916 1006 1009 1012 1083 1127 1152 1171
ERBB2 P04626 277 326 382 639 1016 1019 1087 1125
ERBB3 P21860 162 165 242 581 1010 1020 1327
ERBB4 Q15303 218 245 300 335 510 564 585 595 878 922 1012 1015 1018 1068 1241
Insulin receptor INSR P06213 75 483 526 546 549 672 704 716 949 985 1145 1210 1259 1330 1344
INSRR P14616 154 585 676 816 916 1101 1166 1207 1280
IGF1R P08069 156 300 342 519 539 542 675 1121 1186 1235 1294 1306
ROS1 P08922 100 358 483 513 684 711 842 1202 1391 1853 2058 2062 2135 2247
PDGF receptor PDGFRA P16234 215 244 287 422 568 733 763 846 902 919 1015 1024 1033 1074
PDGFRB P09619 78 200 285 575 691 737 1091
CSF1R P07333 51 63 269 741 746
KIT P10721 439 479 768
FLT3 P36888 200 455 600 959
FGF receptor FGFR1 P11362 69 90 110 130 131 132 133 142 218 527 768 782
FGFR2 P21802 75 126 135 136 138 506 521 530 785 794 795
FGFR3 P22607 77 136 139 143 147 497 516 521 776 792
FGFR4 P22455 119 129 187 240 507 516 575 770 779
VEGF receptor VEGFR1 P17948 372 495 630 958 987 1135 1165 1168 1262
VEGFR3 P35916 19 45 77 304 371 556 725 728 1130 1216 1274
RTK cleavage prediction
Prediction of potential caspase cleavage among RTKs
Results and conclusion
• Cleavage sites are found throughout the length of receptor• 92% of all RTKs contain intracellular cleavage sites• 98% contain extracellular cleavage sites• 21% contain juxtamembrane domain cleavage sites (in cytoplasmic portion)• 80% contain cleavage sites within the tyrosine kinase domain (in cytoplasmic portion)
Conclusion
Conclusion
• Multi-factor model can be applicable to other protease-substrate prediction problem.
• Two step approach may be better than a single step
• Other factors can be incorporated into separate steps (exosites prediction, protein-protein interactions). But correlations must be low.
The End