Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in...

Post on 18-Jan-2016

214 views 1 download

Tags:

Transcript of Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in...

Example of regression by RBF-ANN

Prediction of charge on peptides after electron-spray ionization in mass spectrometry

What are the best attributes to predict charge?

Review of molecular biology

DNA sequence determines protein sequence

Amino acids with different side chains

have different names

Glycine gly G

alanine ala A

valine val V

leucine leu L

isoleucine ile I

methionine met M

porline pro P

phenylalanine phe F

tryptophan trp W

serine ser S

cysteine cys C

threonine thr T

glutamine gln Q

asparagine asn N

histidine his H

tyrosine tyr Y

glutamic acid glu E

aspartic acid asp D

lysine lys K

arginine arg R

What are amino acids?

C-terminusN-terminus

Side chain

chemical properties of amino acids

code mass pi pK1 pK2 charge Hydrophobic?

Polar?

A 89.09404 6.01 2.35 9.87 0 T F

R 174.20274 10.76 1.82 8.99 + F F

N 132.1190 5.41 2.14 8.72 0 F T

D 133.10384 2.85 1.99 9.9 - F F

C 121.15404 5.05 1.92 10.7 0 F T

E 146.14594 3.15 2.1 9.47 - F F

Q 146.14594 5.65 2.17 9.13 0 F T

G 75.06714 6.06 2.35 9.78 0 T F

H 155.15634 7.6 1.8 9.33 + F T

I 131.17464 6.05 2.32 9.76 0 T F

L 131.17464 6.01 2.33 9.74 0 T F

K 146.18934 9.6 2.16 9.06 + F F

M 149.20784 5.74 2.13 9.28 0 T F

F 165.1918 5.49 2.2 9.31 0 T F

P 115.13194 6.3 1.95 10.64 0 T F

S 105.09344 5.68 2..19 9.21 0 F T

T 119.12034 5.6 2.09 9.1 0 F T

W 204.22844 5.89 2.46 9.41 0 T T

Y 181.19124 5.64 2.2 9.21 0 F T

V 117.14784 6.0 2.39 9.74 0 T F

More properties of amino acids

Amino Acids Polymerize to Form Proteins (polypeptides)

-N-C-C-N-C-C-N-

H 0

R H R H

H 0

H

formation of peptide bond

Proteases: enzymes that cut proteins at the peptide bond

-N-C-C-N-C-C-N-

H 0

R H R H

H 0

H

Most proteases have cleavage specificity.

Trypsin cleaves mainly at arginine (R) and lysine (K)

Digestion of a protein with trypsin produces peptides of various length

Analysis of digestion mixture yields information about proteins in sample

peptides are retained for differing times on the LC column L

C c

olu

mn

Electro-spray ionization

Mass spectrometer

Digested protein mixture

Peptides may have multiple charges. Charges in dataset are averages from several runs

Liquid chromatography coupled to mass spectrometry

Sequence Charge

AAAAAAPDDVAAQLVVADLDLVGGHVEDAFAR 2.8

AAAAADLANR 2

AAAAAQASASAAAK 1.714286

AAAAAVAQGGPIEDAER 2

First 4 of ~ 23,000 data pairs are

Can peptide sequence be an input?

What inputs can we calculate from the input sequence?

Some suggestions for inputs from properties of amino acids

Length of peptideMass of peptideFirst amino acidLast amino acidFactions of amino acids of each typeFractions of hydrophobic, polar, and charged residuesNet formal chargeAverage isoelectric pointAverage disassociation constant

MLP with default options.600 examples reserved for test setPoor results

Other regression options