Hidden unit weights in network model correlations.
-
date post
21-Dec-2015 -
Category
Documents
-
view
225 -
download
2
Transcript of Hidden unit weights in network model correlations.
![Page 1: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/1.jpg)
Hidden unit weights in network model correlations
![Page 2: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/2.jpg)
Compartments in the eukaryotic cell
![Page 3: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/3.jpg)
Protein targeting/localization signals
• Signal peptide• Mitochondrial targeting peptide• Chloroplast targeting peptide• LPxTG sorting signal • Peroxisomal targeting signal (PTS2)• Signal anchor• Nuclear localization signal• ER/Golgi retention signal • Peroxisomal targeting signal (PTS1)• Transmembrane helices
Cleaved
Uncleaved
![Page 4: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/4.jpg)
Classical secretory pathway
![Page 5: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/5.jpg)
The secretory signal peptide
![Page 6: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/6.jpg)
Targeting to the ER
![Page 7: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/7.jpg)
Eukaryotic signal peptide logo
![Page 8: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/8.jpg)
Characteristics of signal peptides
Length n-region h-region c-region -3, -1
Euk 22 only slightly Arg-rich
short, very hydrophobic
short, no pattern
small and neutral
residues
Gram- 25 Lys+Arg-rich slightly longer, less
hydrophobic
short, Ser+Ala-
rich
almost exclusively
Ala
Gram+ 32 Lys+Arg-rich very long, less hydrophobic
longer, Thr+Pro-
rich
almost exclusively
Ala
![Page 9: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/9.jpg)
Prokaryotic signal peptide logos
Gram-positive bacteria
Gram-negative bacteria
![Page 10: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/10.jpg)
Positive and negative training data: secreted versus cytoplasmic and nuclear sequences 130 YGIW_ECOLIMAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPNGSVTTVESAKSLRDDTWVTLRGNIVERISDDLYVFKD 80ASGTINVDIDHKRWNGVTVTPKDTVEIQGEVDKDWNSVEIDVKQIRKVNP 160SSSSSSSSSSSSSSSSSSSSCMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------- 160 184 PMFA_PROMIMKLSKIALAAALVFGINSVATAENETPAPKVSSTKGEIQLKGEIVNSACGLAASSSPVIVDFSEIPTSALANLQKAGNIK 80KDIELQDCDTTVAKTATVSYTPSVVNAVNKDLASFVSGNASGAGIGLMDAGSKAVKWNTATTPVQLINGVSKIPFVAYVQ 160AESADAKVTPGEFQAVINFQVDYQ 240SSSSSSSSSSSSSSSSSSSSSSCMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------------------------------------- 160------------------------ 324 CYSB_KLEAEMKLQQLRYIVEVVNHNLNVSSTAEGLYTSQPGISKQVRMLEDELGIQIFARSGKHLTQVTPAGQEIIRIAREVLSKVDAI 80KSVAGEHTWPDKGSLYVATTHTQARYALPGVIKGFIERYPRVSLHMHQGSPTQIAEAVSKGNADFAIATEALHLYDDLVM 160LPCYHWNRSIVVTPEHPLATKASVSIEELAQYPLVTYTFGFTGRSELDTAFNRAGLTPRIVFTATDADVIKTYVRLGLGV 240GVIASMAVDPVSDPDLVKLDANGIFSHSTTKIGFRRSTFLRSYMYDFIQRFAPHLTRDVVDTAVALRSNEDIEAMFKDIK 320LPEK 400MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------------------------------------- 160-------------------------------------------------------------------------------- 240-------------------------------------------------------------------------------- 320---- 400 157 SBMC_ECOLIMNYEIKQEEKRTVAGFHLVGPWEQTVKKGFEQLMMWVDSKNIVPKEWVAVYYDNPDETPAEKLRCDTVVTVPGYFTLPEN 80SEGVILTEITGGQYAVAVARVVGDDFAKPWYQFFNSLLQDSAYEMLPKPCFEVYLNNGAEDGYWDIEMYVAVQPKHH 160MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM---------------------------------------------------------- 160
![Page 11: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/11.jpg)
Data partitioning for training and test
Remove highly similar sequences from data set, where cleavage siteInformation reliably can be transferred by alignment.
A redundancy reduced data set can be used to make, say five-fold cross-validation.
The training set may ideally contain equal amounts of sequences with negative and positive examples.
Training
Test
![Page 12: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/12.jpg)
Sliding window
Sequence: MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPNGSVTTVES ...
Window size here is 9 (example)
Window 1: MAKFAAVIAWindow 2: AKFAAVIAVWindow 3: KFAAVIAVMWindow 4: FAAVIAVMA...Window 10: VMALCSAPV...
For signal peptide prediction typically the first 70 aa of positive and negative sequenes are used.
![Page 13: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/13.jpg)
Graphical output from SignalP
![Page 14: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/14.jpg)
Alternative start codon “prediction”
![Page 15: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/15.jpg)
Symmetric and asymmetric neural network window sizes
SignalP uses two different networks for signal peptide prediction:
• Cleavage site prediction network (C-score)• Signal peptide vs. non-signal peptide discrimination network (S-score)
An asymmetric window is used for cleavage site prediction (more information are found upstream of the cleavage site (see logo))
A symmetric window is used for discrimination between signal peptide windows and mature protein windows
![Page 16: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/16.jpg)
Neural network windows in SignalP
MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPN
MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPN
Asymmetric window
Symmetric window
Cleavage
![Page 17: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/17.jpg)
Performance calculation
fntp
tp
ySensitivit
fn) fp)(tn fp)(tn fn)(tp (tp
fp · fntp · tn -
cc
tp: true positivetn: true negativefp: false positivefn: false negative
fptp
tp
ySpecificit
![Page 18: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/18.jpg)
Optimization of window sizes
Optimization of window sizes for SignalP version 3.0
![Page 19: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/19.jpg)
NN window sizes for SignalP 3.0
Cleavage site network
Discrimination network
Window Hidden Window Hidden
Euk 19+4 2 27 4
Gram- 11+3 2 19 3
Gram+ 21+2 0 19 3
Window sizes used in the final method
An asymmetric window is best for the cleavage site prediction,whereas symmetric windows is best for discrimination.
![Page 20: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/20.jpg)
SignalP 3.0 architecture
...
...
I1 I2 I3
H1 H2 H3
O1
Input layer
Weights
Hidden layer
Output layer
Weights
O2
Input sequence data
I
H
I
H
Sequence composition
Window position
In addition to sequence input, composition (entire sequence) and position of the sliding window was used in the neural network of SignalP 3.0
![Page 21: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/21.jpg)
Implementation of position neuron
RLAV = 24 IF (LET .LT. RLAV) THEN X = REAL(LET)/REAL(RLAV) ELSEIF (REAL(LET) .GT. 2.0*RLAV) THEN X = 0.0 ELSE X = 1.0 - ((REAL(LET)-RLAV)/REAL(RLAV)) ENDIF
MKLLQRGVALALLTTFTLASETALAYEQDKTYKITVLHTNDHHGHFWRNEYGEYGLAAQK
Fortran code
1
24 Position in sequence
Input to NN
0
0 48
![Page 22: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/22.jpg)
Composition of secretory vs. non-secretory proteins
![Page 23: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/23.jpg)
Composition weights
![Page 24: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/24.jpg)
What is new in SignalP version 3.0!
• Data set– From SWISS-PROT rel. 40.0– Highly curated– Cleaned for spurious residues at pos. -1
• Length and composition– improves the performance significantly– Length improves both discrimination and cleavage performance– Composition improves discrimination
• D-score– Average of mean-S score and Y-max score – Better discrimination
![Page 25: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/25.jpg)
Database annotation errors
• Some of the manually curated databases contain obvious errors that can be eliminated
• General ``SIGNAL´´ errors– Signal peptide include propeptide– Wrong signal peptide cleavage site– The secreted protein is processed by proteases– Wrong start codon used– Signal peptide of different class, ie. TAT or bacteriocin
(prokaryote)
![Page 26: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/26.jpg)
Signal peptide or propeptide
N –
S igna l peptide
P ropeptide
M ature pro te in
![Page 27: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/27.jpg)
Signal peptide or propeptide
Propeptide cleavage
Signal peptide cleavage
![Page 28: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/28.jpg)
Isoelectric point calculations
![Page 29: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/29.jpg)
Improvement by length and composition
![Page 30: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/30.jpg)
Performance of three different SignalP versions
VersionCleavage site (Y-score) Discrimination (SP/non-SP)
Euk Gram- Gram+ Euk Gram- Gram+
SignalP1 NN 70.2 79.3 67.9 0.97 0.88 0.96
SignalP2 NN 72.4 83.4 67.4 0.97 0.90 0.96
SignalP2 HMM 69.5 81.4 64.5 0.94 0.93 0.96
SignalP3 NN 79.0 92.5 85.0 0.98 0.95 0.98
SignalP3 HMM 75.7 90.2 81.6 0.94 0.94 0.98
SignalP paper now has more than 2500 citations.
![Page 31: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/31.jpg)
Exons and introns: discontinous protein coding regions in eukaryotes
![Page 32: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/32.jpg)
![Page 33: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/33.jpg)
Two ways to solve the problem
Predict splice sites (GT-donor and AG-acceptor)
or
Predict coding versus non-coding
(at least in non-UTRs)
![Page 34: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/34.jpg)
C C T G G A C C G G G T G A
0.12 0.11 0.10
![Page 35: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/35.jpg)
C T G G A C C G G G T G A C
0.12 0.11 0.10 0.14
![Page 36: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/36.jpg)
T G G A C C G G G T G A C G
0.12 0.11 0.10 0.14 0.23
![Page 37: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/37.jpg)
Splice site networks overpredict a lot
![Page 38: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/38.jpg)
Combination of splice site and coding/non-coding networks
![Page 39: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/39.jpg)
Combinationof splice siteand coding/non-codingnetworks
![Page 40: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/40.jpg)
1 HUMA1ATP TACATCTTCTTTAAAGGTAAGGTTGCTCAACCA 1 HUMA1ATP CCTGAAGCTCTCCAAGGTGAGATCACCCTGACG 1 HUMACCYBA CCACACCCGCCGCCAGGTAAGCCCGGCCAGCCG 1 HUMACCYBA CGAGAAGATGACCCAGGTGAGTGGCCCGCTACC 1 HUMACTGA GCGCCCCAGACACCAGGTGAGTGGATGGCGCCG 1 HUMACTGA AGAGAAGATGACTCAGGTGAGGCTCGGCCGACG 1 HUMACTGA CACCATGAAGATCAAGGTGAGTCGAGGGGTTGG 1 HUMADAG TCTTATACTATGGCAGGTAAGTCCATACAGAAG 1 HUMALPHA CGTGGCTCTGTCCAAGGTAAGTGCTGGGCTACC 1 HUMALPI CCTGGCTCTGTCCAAGGTAAGGGCTGGGCCACC 1 HUMALPPD TGTGGCTCTGTCCAAGGTAAGTGCTGGGCTACC 1 HUMAPRTA CCTGGAGTACGGGAAGGTAAGAGGGCTGGGGTG 1 HUMCAPG GAAGGCTGCCTTCAAGGTAAGGCATGGGCATTG 1 HUMCFVII GGAGTGTCCATGGCAGGTAAGGCTTCCCCTGGC 1 HUMCP21OH CACCTTGGGCTGCAAGGTGAGAGGCTGATCTCG 1 HUMCP21OHC CACCTTGGGCTGCAAGGTGAGAGGCTGATCTCG 1 HUMCS1 GTGGCAATGGCTCCAGGTAAGCGCCCCTAAAAT 1 HUMCSFGMA AATGTTTGACCTCCAGGTAAGATGCTTCTCTCT 1 HUMCSPB AAAGACTTCCTTTAAGGTAAGACTATGCACCTG 1 HUMCSFGMA AATGTTTGACCTCCAGGTAAGATGCTTCTCTCT 1 HUMCSPB AAAGACTTCCTTTAAGGTAAGACTATGCACCTG 1 HUMCYC1A GCTACGGACACCTCAGGTGAGCGCTGGGCCGGG ... 2 HUMA1ATP CCTGGGACAGTGAATCGTAAGTATGCCTTTCAC 2 HUMA1ATP AAAATGAAGACAGAAGGTGATTCCCCAACCTGA 2 HUMA1GLY2 CGCCACCCTGGACCGGGTGAGTGCCTGGGCTAG 2 HUMA1GLY2 GAGAGTACCAGACCCGGTGAGAGCCCCCATTCC 2 HUMA1GLY2 ACCGTCTCCAGATACGGTGAGGGCCAGCCCTCA 2 HUMA1GLY2 GGGCTGTCTTTCTATGGTAGGCATGCTTAGCAG 2 HUMA1GLY2 CACCGACTGGAAAAAGGTAAACGCAAGGGATTG 2 HUMACCYBA GCGCCCCAGGCACCAGGTAGGGGAGCTGGCTGG 2 HUMACCYBA CAGCCTTCCTTCCTGGGTGAGTGGAGACTGTCT 2 HUMACCYBA CACAATGAAGATCAAGGTGGGTGTCTTTCCTGC 2 HUMACTGA TCGCGTTTCTCTGCCGGTGAGCGCCCCGCCCCG 2 HUMADAG CTTCGACAAGCCCAAAGTGAGCGCGCGCGGGGG 2 HUMADAG TGTCCAGGCCTACCAGGTGGGTCCTGTGAGAAG 2 HUMADAG CGAAGTAGTAAAAGAGGTGAGGGCCTGGGCTGG ... 11 HUMCS1 AACGCAACAGAAATCCGTGAGTGGATGCCGTCT 11 HUMGHN AACACAACAGAAATCCGTGAGTGGATGCCTTCT 52 HUMHSP90B CTCTAATGCTTCTGATGTAGGTGCTCTGGTTTC 80 HUMMETIF1 ACCTCCTGCAAGAAGAGTGAGTGTGAGGCCATC 112 HUMHSP90B ATACCAGAGTATCTCAGTGAGTATCTCCTTGGC 113 HUMHST GCGGACACCCGCGACAGTGAGTGGCGCGGCCAG 113 HUMLACTA GACATCTCCTGTGACAGTGAGTAGCCCCTATAA 151 HUMKAL2 ATCGAACCAGAGGAGTGTACGCCTGGGCCAGAT 157 HUMCS1 CACCTACCAGGAGTTTGTAAGTTCTTGGGGAAT 157 HUMGHN CACCTACCAGGAGTTTGTAAGCTCTTGGGGAAT 164 HUMALPHA CAACATGGACATTGATGTGCGACCCCCGGGCCA 622 HUMCFVII CTGATCGCGGTGCTGGGTGGGTACCACTCTCCC 636 HUMADAG CCTGGAACCAGGCTGAGTGAGTGATGGGCCTGG 895 HUMAPOCIB TCCAGCAAGGATTCAGGTTGTTGAGTGCTTGGG 970 HUMALPHA CGGGCCAAGAAAGCAGGTGGAGCTGGGGCCCGG2114 HUMAPRTA ATCGACTACATCGCAGGCGAGTGCCAGTGGCCG
![Page 41: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/41.jpg)
Neural network weight analysis: reading frame detection
![Page 42: Hidden unit weights in network model correlations.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d6b5503460f94a4ad58/html5/thumbnails/42.jpg)
Exon-intron transistion detection units