By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli...

7
By Eng. Monther Alhamdoosh Supervisor : Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches LAUREA MAGISTRALE IN BIOINFORMATICS INTERNATIONAL BOLOGNA MASTER IN BIOINFORMATICS ALMA MATER STUDIORUM ▪ UNIVERSITÀ DI BOLOGNA Session II 2009/2010

Transcript of By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli...

Page 1: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

ByEng. Monther Alhamdoosh

Supervisor: Prof. Rita CasadioCo-supervisor: Dr. Piero Fariselli

Disulfide Connectivity Prediction Using Machine Learning

Approaches

LAUREA MAGISTRALE IN BIOINFORMATICSINTERNATIONAL BOLOGNA MASTER IN BIOINFORMATICS

ALMA MATER STUDIORUM ▪ UNIVERSITÀ DI BOLOGNA

Session II 2009/2010

Page 2: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

In Literature

September 10th, 2010M.Sc. Thesis in BioinformaticsEng. Monther Alhamdoosh 2

Accuracy indicesThe percentage of connectivity patterns

that are correctly predicted.

The percentage of disulfide bridges that are correctly predicted.

δ(x, y) = 1 when the predicted pattern y matches the correct pattern x.

• Introduction The Amino Acid

Cysteine Importance of SS

Bonds Machine Learning

• Statement of the Problem

Aim of Research In Literature

• Our Proposed Solutions

• Results

• Comparisons with previous methods

• Conclusions

Page 3: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

Our Proposed Solutions

September 10th, 2010M.Sc. Thesis in BioinformaticsEng. Monther Alhamdoosh 3

• Introduction The Amino Acid

Cysteine Importance of SS

Bonds Machine Learning

• Statement of the Problem

Aim of Research In Literature

• Our Proposed Solutions

• Results

• Comparisons with previous methods

• Conclusions

Machine Learning

1

2

3

4

Basic System Design

Pattern Scoring Schemes

Page 4: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

Our Proposed Solutions

September 10th, 2010M.Sc. Thesis in BioinformaticsEng. Monther Alhamdoosh 4

Step 3: Estimate the disulfide propensity

Neural Networks-based ModelsSingle-Layer Feed-forward Network (SLFN).Extreme Learning Machines (ELMs).

Pseudo-inverse matrix to get output weights.

Additive (Sigmoid) Hidden NeuronsRBF (Guassian) Hidden Neurons.

Back-propagation (BP).Gradient Descent to get all weights.

Support Vector Machines (SVM)Support Vector Regression (SVR).Radial Basis Function (RBF) Kernels. Grid Search is used to find the best values

for g and c.

• Introduction The Amino Acid

Cysteine Importance of SS

Bonds Machine Learning

• Statement of the Problem

Aim of Research In Literature

• Our Proposed Solutions

• Results

• Comparisons with previous methods

• Conclusions

Page 5: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

SLFN

September 10th, 2010M.Sc. Thesis in BioinformaticsEng. Monther Alhamdoosh 5

ELM (Additive vs. RBF hidden neurons)Training Time curves

• Introduction The Amino Acid

Cysteine Importance of SS

Bonds Machine Learning

• Statement of the Problem

Aim of Research In Literature

• Our Proposed Solutions

• Results

• Comparisons with previous methods

• Conclusions Additive Hidden Neurons RBF Hidden NeuronsNumber of Neurons

Number of Neurons

Page 6: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

ELM outperforms BP

September 10th, 2010M.Sc. Thesis in BioinformaticsEng. Monther Alhamdoosh 6

The accuracy values of ELM and BP

Performance Enhancement

• Introduction The Amino Acid

Cysteine Importance of SS

Bonds Machine Learning

• Statement of the Problem

Aim of Research In Literature

• Our Proposed Solutions

• Results

• Comparisons with previous methods

• Conclusions

Comparison of different ELM and BP models.

Model

B = 2 B = 3 B = 4 B = 5 Overall Best # of

neurons

Time (s)Qc Qp Qc Qp Qc Qp Qc Qp Qc Qp

ELM (Sig) 65 65 42 28 42 24 27 4 46 41 150 28.52

ELM (RBF) 66 66 45 32 45 26 31 5 48 43 9018.5

5

BP (Sig) 62 62 38 26 40 23 29 5 44 38 95559.29

Our method performance with L1 RBF kernels initialized using k-mean clustering. The Best performing number of hidden neurons is 270 and the corresponding training time is 425.11 seconds.Connectivit

y Size2 3 4 5 overal

Qc 67 48 44 37 51

Qp 67 36 27 6 45

Page 7: By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.

SVR vs. NN

September 10th, 2010M.Sc. Thesis in BioinformaticsEng. Monther Alhamdoosh 7

Comparison of SVR and NN-based methods Both tested on PDB0909 with Set A

of descriptors.

• Introduction The Amino Acid

Cysteine Importance of SS

Bonds Machine Learning

• Statement of the Problem

Aim of Research In Literature

• Our Proposed Solutions

• Results

• Comparisons with previous methods

• Conclusions

MethodB = 2 B = 3 B = 4 B = 5 Overall

Qc Qp Qc Qp Qc Qp Qc Qp Qc Qp

SVR (BSP) 69 6959

46

44 23 45 23 5650

NN (ELM) 67 67 48 36 44 27 37 6 51 45