Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids
description
Transcript of Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids
![Page 1: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/1.jpg)
Identifying Extracellular Plant Proteins Based on
Frequent Subsequences of Amino Acids
Y. Wang, O. Zaiane, R. Goebel
![Page 2: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/2.jpg)
2
IntroductionProtein: linear sequence of amino acidsProtein subcellular localization Plant: nuclear, cytoplamic,
mitochondria, extracellular, …Intracellular vs. Extracellular Sequence information alone Class imbalance Transparency
![Page 3: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/3.jpg)
3
Related WordN-terminal sorting signalsAmino acid compositionLexical analysisIntegrative approachSubsequence methods
![Page 4: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/4.jpg)
4
Predicting Extracellular Proteins
Feature ExtractionSupport Vector MachineBoostingFrequent Pattern Method
![Page 5: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/5.jpg)
5
Feature ExtractionFrequent subsequences: subsequences that occur in more than a certain percentage of extracellular proteins Strong discriminative power Perform similar functions via
relationed biochemical mechanism Capture local similarity
![Page 6: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/6.jpg)
6
Generalized Suffix Tree
![Page 7: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/7.jpg)
7
Support Vector MachineInput data represented as feature vectorsFind a linear separator that separate the data and maximize the marginKernel function: nonlinear separator
![Page 8: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/8.jpg)
8
SVM for extracellular protein prediction
Data Transformation(sequencevector) Frequent subsequences as features Transform protein sequence as binary
vectorsKernel Functions Linear kernel Polynomial kernel RBF kernel
![Page 9: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/9.jpg)
9
BoostingIterative algorithms to improve weak classifierDifferent weighted distribution of examples in each iterationIncrease the weights of incorrectly classified examples, and decrease the weights of correctly classified ones
![Page 10: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/10.jpg)
10
AdaBoost
![Page 11: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/11.jpg)
11
Frequent Pattern MethodFrequent pattern: *X1*X2*…*Xn* extracellular X1,X2,…Xn are frequent
subsequences “*” can be substituted to zero or up to
MaxGap amino acids when matching a protein sequence
![Page 12: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/12.jpg)
12
FOIL algorithm
![Page 13: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/13.jpg)
13
Z-number
:accuracy of rule R:support of rule R
![Page 14: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/14.jpg)
14
![Page 15: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/15.jpg)
15
ExperimentsDataset(PASub project at UofA) Plant: 3293 proteins, 171 extracellularFive-cross validation
![Page 16: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/16.jpg)
16
Evaluation MatrixOverall accuracy is not good enoughF-measure
![Page 17: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/17.jpg)
17
Result(SVM with subsequence)
![Page 18: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/18.jpg)
18
Result(Boosting with subsequence)
![Page 19: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/19.jpg)
19
Result(Frequent Pattern)
MinLen=3Min_gain=0.1
03.08.0
MinSup=5%MinConf=80%MaxGap=300
![Page 20: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/20.jpg)
20
Result(SVM with composition)
![Page 21: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/21.jpg)
21
Result(Boosting with composition)
![Page 22: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/22.jpg)
22
Cross Comparision
![Page 23: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/23.jpg)
23
SVM with combined features
![Page 24: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/24.jpg)
24
Boosting with combined features
![Page 25: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/25.jpg)
25
Effects of MinLen on SVM
![Page 26: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/26.jpg)
26
Effects of MinLen on boosting
![Page 27: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/27.jpg)
27
ConclusionPresented three methods for identifying extracellular proteins based on frequent subsequence of amino acidsSVM achieves the best resultFSP method provides easily interpretable rules
![Page 28: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.fdocuments.us/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/28.jpg)
28
Future WorkUse for information about proteins (e.g., structure, function, …)Integrating amino acid composition into FSP methodIncorporate more biological knowledge