Post on 10-Jun-2020
Molecular Machine Learning: Molecular Machine Learning: A Personal IntroductionA Personal Introduction
Byoung-Tak Zhang
Adapted from “Biological Machine Learning: From Biology to Machine Learning and Back” presented November 9, 2002
as Invited Talk at Korea Information Science Society SIG CVPR
Center for Bioinformation Technology (CBIT) & Biointelligence Laboratory
School of Computer Science and EngineeringSeoul National University
http://bi.snu.ac.kr/http://cbit.snu.ac.kr/
2(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
From BT to IT and Back:From BT to IT and Back:Biological Machine LearningBiological Machine Learning
Bioscience
Biotechnology
Bioinformatics
MolecularComputation
Neural Computation
Evolutionary Computation
Biocomputing
Bio-InspiredMachine Learning
BiotechnicalMachine Learning
S/W
BT
H/W
S/W
IT
H/W
3(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
OutlineOutline
Biological Machine Learning: BITML Inspired by Biology: NN & GA
ML for Biology: BioinformaticsML Using Biology: Molecular ML
Conclusions and Outlook
ML Inspired by Biology:ML Inspired by Biology:
Neural NetworksNeural NetworksGenetic AlgorithmsGenetic Algorithms
5(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
The Brain vs. ComputerThe Brain vs. Computer
1. 1011 neurons with1014 synapses
2. Speed: 10-3 sec3. Distributed processing4. Nonlinear processing5. Parallel processing
1. 1011 neurons with1014 synapses
2. Speed: 10-3 sec3. Distributed processing4. Nonlinear processing5. Parallel processing
1. A single processor with complex circuits
2. Speed: 10 –9 sec 3. Central processing4. Arithmetic operation
(linearity) 5. Sequential processing
1. A single processor with complex circuits
2. Speed: 10 –9 sec 3. Central processing4. Arithmetic operation
(linearity) 5. Sequential processing
6(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
From Biological Neuron to From Biological Neuron to Artificial NeuronArtificial Neuron
7(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Activation FunctionScaling Function
Output Comparison
Information Propagation
Error Backpropagation
Input x1
Input x2
Input x3
Output
Input Layer Hidden Layer Output Layer
Weights
Activation Function
Multilayer Multilayer PerceptronPerceptron (MLP)(MLP)
∑∈
−≡outputsk
kkd otwE 2)(21)(
iiiii w
Ewwww∂∂
−=∆∆+← η ,
x )(xfo =
8(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Learning as Error MinimizationLearning as Error Minimization
9(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gradient DescentGradient DescentGradient Descent
iiiii w
Ewwww∂∂
−=∆∆+← η ,
∑∈
−=∆Dd
idddi xotw )(η
10(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Application Example:Application Example:Autonomous Land Vehicle (ALV)Autonomous Land Vehicle (ALV)
NN learns to steer an autonomous vehicle.960 input units, 4 hidden units, 30 output units Driving at speeds up to 70 miles per hour
Weight valuesfor one of the hidden units
Image of aforward -mountedcamera
ALVINN System
11(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Charles Darwin (1859)“Owing to this struggle for life, any variation, however slight and from
whatever cause proceeding, if it be in any degree profitable to an individual of any species, in its infinitely complex relations to other organic beings and to external nature, will tend to the preservation of that individual, and will generally be inherited by its offspring.”
12(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Evolutionary ComputationEvolutionary Computation
What is the Evolutionary Computation? 4Stochastic search (or problem solving) techniques that
mimic the metaphor of natural biological evolution.
Metaphor
EVOLUTION
IndividualFitness
Environment
PROBLEM SOLVING
Candidate SolutionQualityProblem
13(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
General Structure of General Structure of GAsGAs
solutions
1100101010
1011101110
0011011001
1100110001
110010 1110
101110 1110
110010 1010 crossover
mutation00110
101110 1010
10011
00110 10010
evaluation
1100101110
1011101010
0011001001
solutions
fitnesscomputation
roulettewheel
selectionnew population
encodingchromosomes
14(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
MajorMajor Evolutionary AlgorithmsEvolutionary Algorithms
Genetic Programming
Evolution Strategies
Genetic Algorithms Evolutionary
ProgrammingClassifier Systems
• Genetic representation of candidate solutions• Genetic operators• Selection scheme• Problem domain
Hybrids: BGA
15(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Selection StrategiesSelection Strategies
Proportionate Selection4 Reproduce offspring in proportion to fitness fi.
Ranking Selection4 Select individuals according to rank(fi).
Tournament Selection4 Choose q individuals at random, the best of which survives.
Other Ways
( ) ( )( )∑ =
= λ
1jtj
tit
isaf
afap
( )⎪⎩
⎪⎨
⎧
≤≤
≤≤=
λµ
µµ
i
iap t
is
,0
1,1
( ) ( ) ( )( )qqq
tis iiap −−+−= λλ
λ11
16(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
, where rx, rσ , rα ∈{-, d, D, i, I, g, G}, e.g. rdIIασ rrr xr
⎪⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪⎪
⎨
⎧
′−⋅+
′−⋅+
′−+
′−+
′
′
′
=′
−
GiSiTiiS
giSiTiS
IiSiTiS
iiSiTiS
DiTiS
diTiS
iS
i
rxxx
rxxx
rxxx
rxxx
rxx
rxx
rx
x
i
i
i
teintermedia dgeneralize panmictic)(
teintermedia dgeneralize)(
teintermedia panmictic2/)(
teintermedia2/)(
discrete panmicticor
discreteor
ionrecombinat no
,,,
,,,
,,,
,,,
,,
,,
,
χ
χ
ES: Recombination OperatorES: Recombination Operator
17(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
m{τ,τ’,β} : Iλ → Iλ is an asexual operator.4nσ = n, nα = n(n-1)/2
41 < nσ < n, nα = 0
4nσ = 1, nα = 0
)),(,0(
)1,0())1,0()1,0(exp(
ασ
βααττσσ
′′+=′
⋅+=′⋅+⋅′⋅=′
CNxx
NNN
jjj
iii
( )0873.02
2
1
1
≈∝′
⎟⎠⎞⎜
⎝⎛∝
−
−
βτ
τ
n
n
)1,0())1,0()1,0(exp(
iiii
iii
NxxNN
⋅′+=′⋅+⋅′⋅=′
σττσσ
)1,0())1,0(exp( 0
iii NxxN
⋅′+=′⋅⋅=′
στσσ
n/10 ∝τ
ES: Mutation OperatorES: Mutation Operator
18(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Geometric Analogy Geometric Analogy -- Fitness LandscapeFitness Landscape
ML for BioinformaticsML for Bioinformatics
20(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Computers Meet BiosciencesComputers Meet Biosciences
Bioinformation Technology(BIT)
BT IT
21(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Structural Genomics
FunctionalGenomics Proteomics Pharmaco-
genomics
AGCTAGTTCAGTACA
TGGATCCATAAGGTA
CTCAGTCATTACTGC
AGGTCACTTACGATA
TCAGTCGATCACTAG
CTGACTTACGAGAGT
Microarray (Biochip)
Infrastructure of Bioinformatics
Areas and Workflow of BioinformaticsAreas and Workflow of Bioinformatics
22(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Topics in BioinformaticsTopics in Bioinformatics
Structure analysisStructure analysis4 Protein structure comparison4 Protein structure prediction 4 RNA structure modeling
Pathway analysisPathway analysis4Metabolic pathway4 Regulatory networks
Sequence analysisSequence analysis4 Sequence alignment4 Structure and function prediction4 Gene finding
Expression analysisExpression analysis4 Gene expression analysis4 Gene clustering
23(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Machine Learning Techniques for Machine Learning Techniques for BioinformaticsBioinformatics
Sequence Alignment4 Simulated Annealing4 Genetic Algorithms
Structure and Function Prediction4 Hidden Markov Models4Multilayer Perceptrons4 Decision Trees
Molecular Clustering and Classification4 Support Vector Machines4 Nearest Neighbor Algorithms
Expression (DNA Chip Data) Analysis4 Self-Organizing Maps4 Bayesian Networks
24(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gene Finding Gene Finding
Upstream Open Reading FrameDownstream
mRNA
25(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gene Finding ProgramsGene Finding ProgramsName Methods Organism
ER Discriminant Analysis Human, Arabidopsis
GENSCAN (seems the most accurate)
Semi Markov Modelvertebrate, caenorhabditis, arabidopsis, maize
GRAIL Neural Networkhuman, mouse, arabidopsis, drosophila, E.coli
GenLang Definite Clause Grammer Vertebrate, Drosophila, Dicot
GenView Linear combination Human, Mouse, Diptera
GeneFinder(FGENEH,etc.)
LDAHuman, E.coli, Drosophila, Plant, Nematode, Yeast
GeneID Perceptron,rules Vertebrate
GeneMark 5th-Markov Almost all model organism
GeneParser Neural networks Human
Genie GHMM Human (vertebrate)
GlimmerInterpolated Markov models (IMMs)
microbial
MORGAN Decision Tree vertebrate
MZEF Quadratic Discriminant Analysis Human, mouse, Arabidopsis, Pombe
NetPlantGene Combined Neural Networks A. thaliana
OC1 Decision tree Human
PROCRUSTES Spliced alignment vertebrate
Sorfind Rule base Human
VEIL HMM vertebrate
Hogehoge Wonderful method extraterrestrial
26(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
27(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Learning SchemeLearning Scheme
Training setAATGCGTACCTCATACGACCACAACGAATGAATATGATGT………
Training setAATGCGTACCTCATACGACCACAACGAATGAATATGATGT………
Test setTCGACTACGAGCCTCATCGACGAACGAATGAATATGATGT………
Test setTCGACTACGAGCCTCATCGACGAACGAATGAATATGATGT………
PredictionMethod
PredictionMethod
Learning (Model Construction)
Outputinput
inputoutput
28(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
29(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Neural Networks in GRAILNeural Networks in GRAIL
CATATTCAAGAATTGAAGCGTGTAGTCCTGACTTGAGAGCTGTAGATGACGTGCTTATATGTTC………………………..
Known Sequence
0.7 0.8 0.1 0.3 … 0.9 0.2
0.4 0.2 0.6 0.1 … 0.4 0.5
…
x1x2
xn
0.2 0.9 0.3 0.1 … 0.8 0.3
0.6 0.3 0.2 0.8 … 0.2 0.4
Coding potential valueGC Composition
LengthDonor
Intron vocabulary
1
0
0
1
t1t2
tn
x3 t3
Exon
Input Layer
Hidden Layer
Output Layer
Weights
Training
Preprocessing
Testing
ATGACGTACGATCCCGTGACGGTGACGTGAGCTGACGTGCCGTCGTAGTAATTTAGCGTGA………………………..
Unknown Sequence
0.6 0.3 0.2 0.8 … 0.2 0.4x f(x) ?∑
∈
−≡outputsk
kkd otwE 2)(21)(
iiiii w
Ewwww∂∂
−=∆∆+← η ,
o
30(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Protein Structure PredictionProtein Structure Prediction
Amino acid sequences of protein determine its 3D conformation
MNIHRSTPITIARYGRSRNKTQDFEELSSIRSAEPSQSFSPNLGSPSPPETPNLSHCVSCIGKYLLLEPLEGDHVFRAVHLHSGEELVCKVFDISCYQESLAPCF
Sequence Structure Function
31(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Motif, Domain, FamilyMotif, Domain, Family
Domain consists of combinations of motifs, the size of domains varies from about 25 to 30 amino acid residues to about 300, with an average of about 100.
Motif (protein sequence pattern): is recognizable combinations of α helices and β strands that appear in a number of proteins.
Protein family consist of members which has1) Same function; and2) Clear evolutionary relationship; and 3) Patterns of conservation, some positions are
more conserved than the others, and some regions seem to tolerate insertions and deletions more than other regions, the similarity usually > 25% .
32(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Neural Network for Structure Neural Network for Structure PredictionPrediction
33(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
SequenceSequence--toto--Structure Network Structure Network ArchitectureArchitecture
Frequency of amino acid‘E’ in multiple alignments
of some protein familyCurrent input window
Each input unit consists of 21 frequencies
34(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
First level: sequence-to-structure
Second level: structure-to-structure
Third level: jury decision
Overall Architecture
The input is based on a profile made from amino acid occurrences in columns of a multiple alignment of sequences with high similarity of the query sequence.
35(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
cDNAcDNA MicroarrayMicroarray
cDNA clones(probes)
PCR product amplificationpurification
Printing
Microarray
Hybridize target to microarray
mRNA target
Excitation
Laser 1Laser 2
Emission
Scanning
Analysis
Overlay images and normalize
0.1nl/spot
36(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Image AnalysisImage Analysis
Scanned images probe intensities numerical values for higher-level analysis
Array target segmentationBackground intensity extraction Target detection
Target intensity extractionRatio analysis
37(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gene Expression Data MiningGene Expression Data Mining
Data preprocessing:
- Normalization
- Discretization
- Gene selectionLearning:
- Greedy search
- EM algorithm
…
…
- Classification
- Clustering
- Analysis of gene regulations
38(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Cancer Classification with DNA Cancer Classification with DNA MicroarraysMicroarrays
39(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gene Expression ProfilingGene Expression Profiling
[DNA microarray dataset]
[Gene Cluster 1]
[Gene Cluster 2]
[Gene Cluster 3]
[Gene Cluster 4]
40(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Cell CycleCell Cycle--regulated Genes in regulated Genes in S. S. cerevisiaecerevisiae (Yeast)(Yeast)
Identify cell cycle-regulated genes by cluster analysis.4104 genes are already known to
be cell-cycle regulated.4Known genes are clustered into
6 clusters.Cluster 104 known genes and other genes together.The same clustersimilar functional categories.
[Fig.] 104 known gene expression levels according to the cell cycle(row: time step, column: gene).
41(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Probabilistic Clustering Using Probabilistic Clustering Using Latent VariablesLatent Variables
gi: ith genezk: kth clustertj: jth time stepp(gi|zk): generating probability
of ith gene given kth clustervk=p(t|zk): prototype of kth
cluster
)()()|()|()(
i
kkiikki p
zpzpzpzpg
ggg ==∈
∑∑ ∑=i j k
kjkikij ztpzpzpgztf ))|()|()(log(),,( gg
∑=j
kjijki vxsimilarity ),( vx
: (*) objective function(maximized by EM)
42(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results:Experimental Results:Prototype Expression Levels of Found ClustersPrototype Expression Levels of Found Clusters
[Fig.] Prototype expression levels of genes found to be cell cycle-regulated (4 clusters).
• The genes in the same cluster show similar expression patterns during the cell cycle.• The genes with similar expression patterns are likely to have correlated functions.
43(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
NCI Drug Discovery ProgramNCI Drug Discovery Program
NCI 60 cell lines data set
44(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Bayesian NetworksBayesian Networks
Represent the joint probability distribution among random variables efficiently using the concept of conditional independence.
BA
C D
Enet) Bayes example (by the )|()|(),|()()(
rule)chain (by ),,,|(),,|(),|()|()(),,,,(
CEPBDPBACPBPAPDCBAEPCBADPBACPABPAP
EDCBAP
==
•A, C and D are independent given B.
•C asserts dependency between A and B.
•A, B and E are independent given C.
An edge denotes the possibility of the causal relationship between nodes.
45(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Bayesian Network LearningBayesian Network Learning
Dependence analysis [Margaritis ’00]4Mutual information and χ2 test
Score-based search
• D: data, S: Bayesian network structure
4NP-hard problem4Greedy search4Heuristics to find good massive network structures
quickly (local to global search algorithm)
∏ ∏ ∏= = = Γ
+Γ
+Γ
Γ⋅=
=
n
i
q
j
r
kijk
ijkijk
ijij
iji i NN
Sp
SDpSpSDp
1 1 1 )()(
)()(
)(
)|()(),(
αα
αα
46(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
A Small Bayesian Network for A Small Bayesian Network for Classification of CancerClassification of Cancer
Zyxin
Leukemiaclass
MB-1
C-mybLTC4S
1.3/340/38RBF networks1/340/38Neural trees2/340/38Bayes nets
Test errorTraining error
•The Bayesian network was learned by full searchusing BD (Bayesian Dirichlet) score with uninformative prior [Heckerman ’95] from the DNA microarray data for cancer classification(http://waldo.wi.mit.edu/MPR/).
[Table] Comparison of the classification performance with other methods [Hwang ’00].
47(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results:Experimental Results:DrugDrug--Drug DependencyDrug DependencyDrug-drug activity correlations
<Part of the learned Bayesian network structure>
- Three drugs “Aphidicolin-glycinate”, “Floxuridine”, and “Cytarabine” directly depend on each other.
- “Cyclocytidine” directly depends on “Cytarabine” and vice versa.
Confirmation
48(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results:Experimental Results:GeneGene--Drug DependencyDrug Dependency
Gene expression-drug activity correlations
- The negative correlation between “ASNS” and “L-asparagine” is mediated by two other genes.
- The relationships revealed by the Bayesian network is putative and should be verified by biological experiments. exploratory analysis
`
<Part of the learned Bayesian network structure>
Confirmation
Discovery
49(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
ML Using Biology: ML Using Biology: BiocomputingBiocomputing
50(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
BioinformationBioinformation Technology (BIT)Technology (BIT)
BTBTITIT
Bioinformatics (in silico Biology)
Biocomputing (e.g. DNA Computing)
Model
Tool
Tool
NNGA
MC
AL
51(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Biology as Inspiration: HistoryBiology as Inspiration: History
Neural Networks4McCulloch & Pitts (1943)4Rosenblatt (1958)
Genetic Algorithms4Fogel (1960s)4Rechenberg (1960s)4Holland (1975)
Artificial Life 4Langton (1988)
DNA Computing4Adleman (1994)
52(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Biology and Artificial Intelligence Biology and Artificial Intelligence (AI)(AI)Symbolic AI
Rule-Based Systems
Connectionist AI Neural Networks
Evolutionary AI Genetic Algorithms
Molecular AI: DNA Computing
53(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNA ComputingDNA Computing: Biology as : Biology as TechnologyTechnology
011001101010001 ATGCTCGAAGCT
54(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
LiquidLiquid--Phase Phase 3D3D Biochemical Biochemical Reaction as ComputingReaction as Computing
R ∨ ¬P ∨ ¬Q S
Q ∨ ¬T ∨ ¬S T P
¬R
55(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Computing: Hybridization & Computing: Hybridization & LigationLigation
CGTACCTTAGGCT
AGCTTAGGATGGCATGG AATCCGATGCATGGC
CGTACCTTAGGCTAGCTTAGGATGGCATGGAATCCGATGCATGGC
CGTACCTTAGGCTAGCTTAGGATGGCATGGAATCCGATGCATGGC
CGTACCTTAGGCT
AGCTTAGGATGGCATGGAATCCGATGCATGGC+
+
Ligation
Hybridization
Dehybridization
56(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Complementary
Magnetic Beads
Magnet
Solution Detection: Bead SeparationSolution Detection: Bead Separation
57(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
HPPHPP
...
......
...ATGATGACGACG
TGCTGC
CGACGA
TAATAAGCAGCA
CGTCGT...
...
...
...... ......
...
10
3
2 56
4
SolutionSolution
ATGTGCTAACGAACG
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT
TAAACG
CGACGT
TAAACGGCAACG
...
...
...
...
CGACGTAGCCGT
...
...
...
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGTACGCGTAGCCGT
ACGCGT
......
...
...
...
ACGGCATAAATGTGCACGCGTACGCGAGCATAAATGCGATGCCGT
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT
... ... .........
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT
...
.........
...
Decoding
Ligation
Encoding
Gel Electrophoresis
Affinity Column
ACGCGAGCATAAATGTGCACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
ACGCGAGCATAAATGTGCACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
2
0 13 4
56
Node 0: ACG Node 3: TAANode 0: ACG Node 3: TAANode 1: CGA Node 4: ATGNode 1: CGA Node 4: ATGNode 2: GCA Node 5: TGCNode 2: GCA Node 5: TGC
Node 6: CGTNode 6: CGT
DNA Computing: DNA Computing: BioBio--Lab ProcedureLab Procedure
PCR(Polymerase
Chain Reaction)
58(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Why DNA Computing?Why DNA Computing?
6.022 × 1023 molecules / moleImmense, brute force search of all possibilities4Desktop: 109 operations / sec4Supercomputer: 1012 operations / sec41 µmol of DNA: 1026 reactions
Favorable energetics: Gibb’s free energy
1 J for 2 × 1019 operationsStorage capacity: 1 bit per cubic nanometer
-1mol 8kcalG −=∆
59(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNA Computers vs. Conventional DNA Computers vs. Conventional ComputersComputers
Electronic data are vulnerable but can be backed up easily
DNA is sensitive to chemical deterioration
Setting up only requires keyboard input
Setting up a problem may involve considerable preparations
Smaller memoryCan provide huge memory in small space
Can do substantially fewer operations simultaneously
Can do billions of operationssimultaneously
Fast at individual operationsSlow at individual operationsMicrochip-based computersDNA-based computers
60(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNA Computing for AI and DNA Computing for AI and CogSciCogSci
Memory4Associative memory with enormous density46 x 10^23 molecules / mole43 grams of water contains 10^22 molecules
Inference4Massive parallel reactive process410^19 operations for Joule
Learning4Generalization through biochemical reaction43-dimensional diffusion in solution4Global parameterization, e.g. temperature
61(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Molecular Machine LearningMolecular Machine Learning
Molecular Learning Machines
Molecular Computing
Machine Learning
62(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Example: Concept Learning for a Example: Concept Learning for a Janitor RobotJanitor Robot
four, fivefloorfaculty, staffstatuscs, eedepartment
ValuesAttribute
Examples: x1: <cs, faculty, four>+ x2: <ee, faculty, five>-
Conceptsh1: <faculty>h2: <cs, faculty>h3: <cs, faculty, four>
More general:h1 >g h2 >g h3
ee faculty staff four five
cs ∧ faculty ee ∧ facultycs ∧ staff ee ∧ staff faculty ∧ four faculty ∧ five
cs ∧ staff ∧ five ee ∧ faculty ∧ four cs ∧ faculty ∧ five cs ∧ faculty ∧ four
∧
cs
…
…
More general
More specific
63(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNADNA--Based Concept LearningBased Concept Learning< >, <cs>, <ee>, <faculty>, <staff>, <four>, <five>, <cs, faculty>, <cs, staff>, <ee, faculty>, <ee, staff>, <cs, four>, <cs, five>, <ee, four>, <ee, five>, <faculty, four>, <faculty, five>, <staff, four>, <staff, five>, <cs, faculty, four>, <cs, faculty, staff>, <cs, staff, four>, <cs, staff, five>, <ee, faculty, four>, <ee, faculty, staff>, <ee, staff, four>, <ee, staff, five>
<>, <cs>, <faculty>, <four>, <cs, faculty>, <cs, four>, <faculty, four>,<cs, faculty, four>
<cs, faculty, four> (+)
< >, <cs>, <ee>, <faculty>, <staff>, <four>, <five>, <cs, faculty>, <cs, staff>, <ee, faculty>, <ee, staff>, <cs, four>, <cs, five>, <ee, four>, <ee, five>, <faculty, four>, <faculty, five>, <staff, four>, <staff, five>, <cs, faculty, four>, <cs, faculty, staff>, <cs, staff, four>, <cs, staff, five>, <ee, faculty, four>, <ee, faculty, staff>, <ee, staff, four>, <ee, staff, five>
cs four
staffee
?status?dept
five
faculty
?floor
<>, <cs>, <faculty>, <four>, <cs, faculty>, <cs, four>, <faculty, four>,<cs, faculty, four>
<cs, staff, five> (-)
<faculty>, <four>, <cs, faculty>, <cs, four>,<faculty, four>,<cs, faculty, four>
<four>, <cs, four>,
Intersection
<faculty>, <cs, faculty>,<faculty, four>,<cs, faculty, four>
Difference
<cs, staff, four> is classified as negative!
< >, <cs>, <staff>, <four>,<cs, staff>, <cs, four>, <staff, four>,
<cs, staff, four>
<faculty>, <four>, <cs, faculty>, <cs, four>,<faculty, four>,<cs, faculty, four>
<cs, staff, four> ?
Majority Voting
64(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
BioBio--Lab ProcedureLab Procedure
1. Sequence Design and Synthesis1. Sequence Design and Synthesis
2. Hybridization2. Hybridization
3. Ligation3. Ligation
4. Learning (Affinity Separation)4. Learning (Affinity Separation)
5. Classification5. Classification
Initial Version Space
cs four
staffee
?status?dept
five
faculty
?floor
Primer
Sticky End
cs faculty four
cs faculty five
cs faculty ?
cs staff four
cs ? ?
ee faculty ?
ee ? ?
? ? ?
...
...
...
...
65(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
1. Generating the Concept Space1. Generating the Concept Space
One DNA molecule for one hypothesisSticky ends and “don’t care symbols”
cs four
staffee
?status?dept
five
faculty
?floor
cs faculty four
cs faculty five
cs faculty ?
cs staff four
cs ? ?
ee faculty ?
ee ? ?
? ? ?
...
...
...
...
Primer
Sticky End
66(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
2. Given a Positive Example: 2. Given a Positive Example: <<cscs, faculty, four>+, faculty, four>+
Given a positive example, select all hypotheses that are consistent with the example (and don’t cares)
cs
? dept
faculty
? status
four
? floor Affinity separation by magnetic beads
A
BA n B A
A A n B
67(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
staff
five
ee
3. Given a Negative Example: 3. Given a Negative Example: <<cscs, staff, five>, staff, five>--
Given a negative example, filter out all hypothesesthat are consistent with the example
A – B AA
A A - B
B
A n B
Affinity separation by magnetic beads
68(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
4. Given an Unknown Example:4. Given an Unknown Example:<<cscs, staff, four>?, staff, four>?
Compute Y = A n B and N = A – B
Answer yes if |Y| > |N|, answer no, otherwise.
This example isnegative!
A
A
A
Y
N
B Amplify(Query)
YN
Fluorescence or UV
A n B
Y A n BN A - B
A - B
69(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results (1)Experimental Results (1)
Generation of the initial concept space4Sequence design: H-measure, similarity, and Tm4Attribute: 20 mer4Sticky end: 10 mer4Total: 80 mer (3 attributes per hypothesis)
100 bp
75 bp
50 bp
25 bp
Gel Electrophoretogramfor hybridization & ligation
70(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results (2)Experimental Results (2)Learning from the first (positive) example4<cs, faculty, four> +
M 1 2 3 4 5 6 7 8 M 1 2 3 4 5 6 7 8 9 10 M9 10 M
100bp75bp
50bp25bp
M : 25bp marker (KDR)
lane 1: cs / 4
lane 2: cs / ?floor
lane 3: ?dept / 4
lane 4: ?dept / ?floor
lane 5: ee / 5
lane 6: No Primers
lane 7: ligation mixture
lane 8: sup. after cs/ ?dept
lane 9: sup. after faculty / ?stat
lane 10: sup. after 4/ ?floot
<cs, faculty, four>, <cs, faculty, ?>, <cs, ?, four>, <?, faculty, four>,<cs, ?, ?>, <?, faculty, ?>, <?, ?, four>, <?, ?, ?>
71(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results (3)Experimental Results (3)
Learning from the second (negative) example4<cs, staff, five> -
• <cs, faculty, four>, <cs, faculty, ?>, <cs, ?, four><?, faculty, four>, <?, faculty, ?>, <?, ?, four>
Answering to the query (unknown example)4<cs, staff, four> ?
• (+): <cs, ?, four>, <?, ?, four>• (-): <cs, faculty, four>,<cs, faculty, ?>
<?, faculty, four>, <?, faculty, ?>• Should be classified as “negative”• Result
4 (+) : (-) = 0.762 : 1.134
UV spectrophotometry
0
0.5
1
1.5
2
2.5
3
3.5
4
180.00 90.00 45.00 22.50 11.25 5.63 2.81 1.41 0.70
Concentration (μg/ml)
72(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experimental Results (4)Experimental Results (4)
M: 25 bp marker (KDR)
lane p1: cs / 4
lane p2: ?dept / 4
lane p3: ee / 5
lane p4: No Primers
lane n1: cs / 4
lane n2 cs / ?floor
lane n3: ?dept / 4
lane n4: ?dept / ?loor
lane n5: No PrimersFigure 3. PCR product of majority voting
confirmed by 3% agarose gel electrophoresis
100bp
M p1 p2 p3 p4 n1 n2 n3 n4 n5 M p1 p2 p3 p4 n1 n2 n3 n4 n5
75bp50bp
25bp
(+): <cs, ?, four>, <?, ?, four>(-): <cs, faculty, four>,<cs, faculty, ?>,
<?, faculty, four>,<?, faculty, ?>
73(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Conclusions and Outlook Conclusions and Outlook
Bio-inspired ML: Machine learning models have been inspired by biological systems.4 Neural networks from central nervous systems4 Genetic algorithms from natural selection
ML for Biology: Bio-inspired machine learning techniques are applied back to study biology.4 Neural networks for protein structure analysis4 Genetic algorithms for sequence alignment
Biotechnical ML: The resulting advancement in biotechnology will further develop the machine learning technology, not just in models but in substrates as well.4 DNA-based inductive machine learning4 Learning to diagnose diseases from natural DNA
74(c) 2002 SNU Biointelligence Lab, http://bi.snu.ac.kr/
Collaborating labs서울대바이오지능 & 인공지능연구실서울대세포및미생물공학연구실
한양대프로테오믹스연구실
㈜바이오니아 & ㈜바이오인포메틱스
Supported by
과기부국가지정연구실사업
산자부차세대신기술연구개발사업
More information at http://bi.snu.ac.kr/
http://cbit.snu.ac.kr/
AcknowledgementsAcknowledgements