Biological sequence analysis and information processing by artificial neural networks Søren Brunak...

33
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark [email protected]
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of Biological sequence analysis and information processing by artificial neural networks Søren Brunak...

Biological sequence analysis and information processing by artificial neural networks

Søren Brunak

Center for Biological Sequence Analysis

Technical University of Denmark

[email protected]

Parvis alignment>carp Cyprinus carpio growth hormone 210 aa vs.

>chicken Gallus gallus growth hormone 216 aa

scoring matrix: BLOSUM50, gap penalties: -12/-2

40.6% identity; Global alignment score: 487

10 20 30 40 50 60 70

carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD

:: . : ...:.: . : :. . :: :::.:.:::: :::. ..:: . .::..: .: .:: :.

chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE

10 20 30 40 50 60 70 80

80 90 100 110 120 130 140 150

carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN

: ::.:::..:..: ..:::.:. ::.:: : : ::. .:.:. :. ... ::: ::. ::..:.. : .: .

chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G

90 100 110 120 130 140 150 160

170 180 190 200 210

carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL

.: : .. : . . .:. : ... ::.:::::.:::::::.: .::: .::::.

chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI

170 180 190 200 210

Biological neuron

Diversity of interactions in a network enables complex calculations

• Similar in biological and artificial systems

• Excitatory (+) and inhibitory (-) relations between compute units

Transfer of biological principles to neural network algorithms

• Non-linear relation between input and output

• Massively parallel information processing

• Data-driven construction of algorithms

• Ability to generalize to new data items

Simplest non-trivial classification problem

CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY, ...

• Two categories: positives and negatives• Data described by two features, e.g. charge, sidechain volume, molecular weight, number of atoms, ...

Features of phosphorylations sites

PKGcGMP-dep.kinase

PKC

CaM-IICa++/cal-modulin-dep. kinase

cdc2Cyclin-dep.kinase 2

CK-IICasein kinase 2

Homotypical cerebral cortex –(from primate) - 6 layers

DEMO

negativepositive

Training and error reduction

Transfer of biological principles to neural network algorithms

• Non-linear relation between input and output

• Massively parallel information processing

• Data-driven construction of algorithms

Sparse encoding of amino acid sequence windows

Sparse encoding of nucleotide sequence windows

Nucleotides

4 letter alphabet

Normally no need for a fifth letter

ACGTAGGCAATCTCAGACGTTTATC

1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100