LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function...
-
Upload
anissa-miller -
Category
Documents
-
view
213 -
download
0
Transcript of LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function...
LSM3241: Bioinformatics and BiocomputingLSM3241: Bioinformatics and Biocomputing
Lecture 3: Machine learning method for Lecture 3: Machine learning method for protein function predictionprotein function prediction
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]@nus.edu.sg
http://http://bidd.nus.edu.sgbidd.nus.edu.sgRoom 07-24, level 7, SOC1, Room 07-24, level 7, SOC1,
National University of Singapore National University of Singapore
22
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
33
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
44
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
55
Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM
• A protein is classified as either belong (+) or not belong (-) to a functional family
• By screening against all families, the function of this protein can be identified (example: SVMProt)
Protein
Family-1 SVM
Family-2 SVM
Family-3 SVM
Protein belongs toFamily-3
-
-
+
--
66
Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM
What is SVM? • Support vector machines, a machine learning method,
learning by examples, statistical learning, classify objects into one of the two classes.
Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of sequence-derived physico-chemical features as
basis for classification. • Suitable for functional classification of novel proteins
(distantly-related proteins, homologous proteins of different functions).
77
Machine Learning MethodMachine Learning Method Inductive learning:
Example-based learning
Descriptor
Positive examples
Negative examples
88
Machine Learning MethodMachine Learning Method
A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)
Feature vectors: Descriptor
Feature vector
Positive examples
Negative examples
99
SVM MethodSVM Method Feature vectors in input space:
A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)
Z
Input space
X
Y
BAE
F
Feature vector
1010
SVM MethodSVM Method
BorderNew border
Project to a higher dimensional space
Protein familymembers
Nonmembers
Protein familymembers
Nonmembers
1111
SVM methodSVM method
Support vector
Support vector
New border
Protein familymembers
Nonmembers
1212
SVM MethodSVM Method
Protein familymembers
Nonmembers
New border
Support vector
Support vector
1313
SVM MethodSVM Method
Border line is nonlinear
1414
SVM methodSVM method
Non-linear transformation: use of kernel function
1515
SVM methodSVM method
Non-linear transformation
1616
SVM MethodSVM Method
1717
SVM MethodSVM Method
1818
SVM MethodSVM Method
1919
SVM MethodSVM Method
2020
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
• Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties:– amino acid composition– Hydrophobicity– normalized Van der Waals volume– polarity,– Polarizability– Charge– surface tension– secondary structure– solvent accessibility
• Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties.
Nucleic Acids Res., 31: 3692-3697
2121
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
2222
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
From protein sequence:
To Feature vector :
(C_amino acid composition, T_ amino acid composition, D_ amino acid composition, C_hydrophobicity, T_hydrophobicity, D_hydrophobicity, … )
Nucleic Acids Res., 31: 3692-3697
Protein function prediction software SVMProtProtein function prediction software SVMProtUseful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions
Your protein sequence
Computer loaded Computer loaded with SVMProtwith SVMProt
Support vector machinesSupport vector machinesclassifier for every classifier for every
protein functional familyprotein functional family
Identified Identified Functional familiesFunctional families
Protein functionalProtein functionalindicationsindications
Send sequence to classifierSend sequence to classifier
Nucl. Acids Res. 31, 3692-3697 (2003)
Input sequencethrough internet
Option 2Option 1
Input sequenceon local machine
http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi
Your protein sequence
Which functional Which functional families your protein families your protein
belong to?belong to?
Protein function prediction software SVMProtProtein function prediction software SVMProt
Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions.
Protein families covered:
46 enzyme families, 3 receptor families, 4 transporter and channel families, 6 DNA- and RNA-binding families, 8 structural families, 2 regulator/factor families.
SVMProt web-version at:http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi
Nucl. Acids Res. 31, 3692-3697 (2003)
Protein function prediction software SVMProtProtein function prediction software SVMProt
Nucl. Acids Res. 31, 3692-3697 (2003)
Check covered protein families here
Input sequence here
Check format here
Protein function prediction software SVMProtProtein function prediction software SVMProt
Nucl. Acids Res. 31, 3692-3697 (2003)
Probability of correct prediction
Prediction score
2727
Summary of Today’s lectureSummary of Today’s lecture
• Machine learning method for protein function prediction.
• Use of SVMProt for probing protein function