Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl...

52
Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative Biology East Tennessee State University

description

Microarray Data Goal: Identify genes which are up- or down- regulated when an organism is in a certain state Examples: What genes cause certain insects to enter diapause (similar to hibernation)? In Cystic Fibrosis, what non-CFTR genes are up- or down-regulated?

Transcript of Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl...

Page 1: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Analysis of Microarray Data using Monte Carlo Neural Networks

Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey

The Institute for Quantitative BiologyEast Tennessee State University

Page 2: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Outline of Talk

Microarray Data Neural Networks A Simple Perceptron Example Neural Networks for Data Mining

A Monte Carlo Approach Incorporating Known Genes

Models of the Neuron and Neural Networks

Page 3: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Microarray Data

Goal: Identify genes which are up- or down- regulated when an organism is in a certain state

Examples: What genes cause certain insects to enter

diapause (similar to hibernation)? In Cystic Fibrosis, what non-CFTR genes are

up- or down-regulated?

Page 4: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

cDNA Microarray’s Obtain mRNA from a population/tissue in the given

state (sample) and a population/tissue not in the given state (reference)

Synthesize cDNA’s from mRNA’s in cell cDNA is long (500 – 2,000 bases) But not necessarily the entire gene Reference labeled green, Sample labeled red

Hybridize onto “spots”—each spot is a gene Each “Spot” is often (but not necessarily) a gene cDNA’s bind to each spot in proportion to

concentrations

Page 5: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

cDNA Microarray Data

Ri, Gi = intensities of ith spot Absolute intensities often cannot be compared Same reference may be used for all samples There are many sources of Bias Significant spot-to-spot intensity variations

may have nothing to do with the biology Normalization to Ri,= Gi on average

Most genes are unchanged, all else equal But rarely is “all else equal”

Page 6: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Microarray Data

Several Samples (and References) A time series of microarrays A comparison of several different samples Data is in the form of a table

Jth microarray intensities are Rj,i, Gj,I

We often have subtracted background intensity Question: How can we use Rj,i, Gj,I for n

samples to predict which genes are up- or down- regulated for a given condition?

Page 7: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

MicroArray Data

We do not use Mj,I = log2( Rj,I / Gj,I ) Large | Mj,I | = obvious up or down regulation

In comparison to other | Mj,I | But must be large across all n microarrays

Otherwise, hard to make conclusions from Mj,I

It is often difficult to manage i jiMn1

Page 8: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Microarray Data

-6

-4

-2

0

2

4

6

0 2 4 6 8 10 12 14 16

Mi =

Log

2(R

i/Gi)

Log2(Ri Gi)

Page 9: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Microarray Analysis

Is a classification problem Clustering: Classify genes into a few

identifiable groups Principal Component Analysis: Choose

directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters

Neural Nets and Support Vector Machines Trained with Positive and Negative Examples Classifies unknown as positive or negative

Page 10: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Artificial Neural Network (ANN)

Made of artificial neurons, each of which Sums inputs from other neurons Compares sum to threshold Sends signal to other neurons if above

threshold Synapses have weights

Model relative ion collections Model efficacy (strength) of synapse

Page 11: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Artificial Neuron

th thi jw synaptic weight betweeni and j neuron

" "firing function that maps state to output

i i ix s i ij js w x

1x2x3x

nx

1iw2iw3iw

inw...

thj threshold of j neuron

Nonlinear firing function

Page 12: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Firing Functions are Sigmoidal

j

j

j

Page 13: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Possible Firing Functions

Discrete:

Continuous:

01

j jj j

j j

if ss

if s

1

1 j j jj j s

se

Page 14: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

3 Layer Neural Network

Output

Hidden(is usually much larger)

Input

The output layer mayconsist of a single neuron

Page 15: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

ANN as Classifiers

Each neuron acts as a “linear classifier” Competition among neurons via nonlinear

firing function = “local linear classifying” Method for Genes:

Train Network until it can classify between references and samples

Eliminating weights sufficiently close to 0 does not change local classification scheme

Page 16: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Multilayer Network

1 1 1t w x

1x2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

Page 17: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

How do we select w’s

Define an energy function

t vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is

equivalent to the minima of the total squared energy function

2

1

12

n

i i ii

E t

Page 18: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Back Propagation Minimize the Energy Function

Choose wj and j so that

In practice, this is hard Back Propagation with cont. sigmoidal

Feed Forward and Calculate E Modify weights using a rule

Repeat until E is sufficiently close to 0

0, 0ij j

E Ew

jjj

jjnewj

newjjjj

newj

newj twwt

1

,

Page 19: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

ANN as Classifier

1. Remove % of genes with synaptic weights that are close to 0

2. Create ANN classifier on reduced arrays3. Repeat 1 and 2 until only the genes that

most influence the classifer problem remain

Remaining genes are most important in classifying references versus samples

Page 20: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Simple Perceptron Model

Input

referencefromifsamplefromif

Output01

Gene 1

Gene 2

Gene m

w1

w2

wm

The wi can be interpreted to be measures of how important the ith gene is to determining the output

Page 21: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Simple Perceptron Model

Features The wi can be used in place of the Mji

Detects genes across n samples & references Ref: Artificial Neural Networks for Reducing the

Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004.

Drawbacks The Perceptron is a linear classifier (i.e., only

classifies linearly separable data) How to incorporate known genes

Page 22: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Linearly Separable Data

Separation using Hyperplanes

Page 23: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Data that Cannot be separated Linearly

Page 24: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Functional Viewpoint

ANN is a mapping f: Rn → R Can we train perceptron so that f(x1,…,xn) =1 if

x vector is from a sample and f(x1,…,xn) =0 if x is from a reference?

Answer: Yes if data can be linearly separated, but no otherwise

So then can we design such a mapping for a more general ANN?

Page 25: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Hilbert’s Thirteenth Problem

Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”

Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?

Page 26: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Kolmogorov’s Theorem

Modified Version: Any continuous function f of n variables can be written

where only h depends on f

2 1

11 1

,...,n n

n ij j ij i

f s s h g s

Page 27: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Cybenko (1989)

Let be any continuous sigmoidal function, and let x = (x1,…,xn). If f is absolutely integrable

over the n-dimensional unit cube, then for all >0,there exists a (possibly very large ) integer N andvectors w1,…,wN such that

where 1,…,N and 1,…,N are fixed parameters.

1

NT

j j jj

f

x w x

Page 28: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Recall: Multilayer Network

1 1 1t w x

1x2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

Page 29: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

ANN as Classifer

Answer: (Cybenko) for any >0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within by a multilayer neural network.

But the weights no longer have the one-to-one correspondence to genes.

Page 30: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

ANN and Monte Carlo Methods

Monte Carlo methods have been a big success story with ANN’s Error estimates with network predictions ANN’s are very fast in the forward direction

Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere) (De Freitas J. F. G., et. al., 2000)

Page 31: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Recall: Multilayer Network

..

....

12

N

1

Nt

j j jj

out

w x

N Genes N nodeHidden Layer

j correspond to genes,but do not directly dependon a single gene.

Page 32: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Naïve Monte Carlo ANN Method

1. Randomly choose subset S of genes2. Train using Back Propagation3. Prune based on values of wj (or j , or both)

4. Repeat 2-3 until a small subset of S remains5. Increase “count” of genes in small subset6. Repeat 1-5 until each gene has 95%

probability of appearing at least some minimum number of times in a subset

7. Most frequent genes are the predicted

Page 33: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Additional Considerations If a gene is up-regulated or down-regulated

for a certain condition, then put it into a subset in step 1 with probability 1.

This is a simple-minded Bayesian method. Bayesian analysis can make it much better.

Algorithm distributes naturally across a multi-processor cluster or machine Choose the subsets first Distribute subsets to different machines Tabulate the results from all the machines

Page 34: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

What Next…

Cybenko is not the “final answer” Real neurons are much more complicated ANN abstract only a few features Only at the beginning of how to separate noise

and bias from the classification problem. Many are now looking at neurons themselves

for answers

Page 35: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Components of a Neuron

Dendrites

Soma

nucleus

Axon

Myelin Sheaths

Synaptic Terminals

Page 36: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Signals Propagate to Soma

Signals Decay at Soma if below a Certain threshold

Page 37: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Signals May Arrive Close Together

If threshold exceeded,then neuron “fires,” sending a signal along its axon.

Page 38: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Signal Propogation along Axon

Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator

Propagation is electro-chemical Sodium channels open at breaks in myelin Rapid depolarization at these breaks Signal travels faster than if only electrical

Neurons send “spike trains” from one to another.

Page 39: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Hodgkin-Huxley Model

1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) Can only be solved numerically Produce Action Potentials

Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable

Page 40: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Hodgkin-Huxley Equations

24 3

24

1 , 1 ,

1

m l K NaK Nai

n n m m

h h

d V VC g V V g n V V g m h V VR x t

n mn n m mt t

h h ht

where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:

/8010 /10

10 1,8100 1

Vn nV

VV V ee

Page 41: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Hodgkin-Huxley nearly intractable

So researchers began developing artificial models to better understand what neurons are all about

Page 42: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

A New Approach

Poznanski (2001): Synaptic effects are isolated into hot spots

Soma

Synapse

Page 43: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Tapered Equivalent Cylinder

Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder

Soma

Page 44: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Tapered Equivalent Cylinder

Assume “hot spots” at x0, x1, …, xm

Soma

0 x0 x1 . . . xm l

. . .

Page 45: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Ion Channel Hot Spots

Ij is the ionic current at the jth hot spot

Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )

2

214

nm

m j jji

R d V VR C V I t x xR x t

Page 46: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Convolution Theorem

The solution to the original is of the form

The voltage at the soma is

0

0

, , ,n t

initial j jj

V x t V G x x t I d

0

0

0, 0, ,n t

initial j jj

V t V G x t I d

Page 47: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Ion Channel Currents

At a hot-spot, “voltage” V satisfies ODE of the form

Assume that ’s and ’s are large degree polynomials Introduce a new family of functions

“Embed” original into system of ODE’s for

4 3m l K NaK Na

VC g V V g n V V g m h V Vt

rqprqp hmnU ,,

rqpU ,,

Page 48: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Linear Embedding: Simple Example

nnVAA

dtdV

...0To Embed

jj VU Let . Then

110

1 njn

jjj VjAVjAdtdVjV

dtdU

Page 49: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Linear Embedding: Simple Example

110 njnjj UjAUjA

dtdU

The result is

The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation.

However, linear embeddings do often produce good numericalapproximations.

Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s

Page 50: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

The Hot-Spot Model “Qualitatively”

0

0

0, 0, ,n t

j jj

V t G x t I d

Weighted sumsof functions

of one variable

convolutionsof

TheSum

of

Kolmogorov’s Theorem (given that convolutions are related to composition)

Page 51: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

Any Questions?

Page 52: Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative.

ReferencesCybenko, G. Approximation by Superpositions of a sigmoidal function,

Mathematics of Control, Signals, and Systems, 2(4),1989, p. 303-314.

De Freitas J. F. G., et. al. Sequential Monte Carlo Methods To Train Neural Network Models. Neural Computation, Volume 12, Number 4, 1 April 2000, pp. 955-993(39)

L. Glenn and J. Knisley, Solutions for Transients in Arbitrarily Branching and Tapering Cables, Modeling in the Neurosciences: From Biological Systems to Neuromimetic Robotics, ed. Lindsay, R., R. Poznanski, G.N.Reeke, J.R. Rosenberg, and O.Sporns, CRC Press, London, 2004.

A. Narayan, et. al Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data. Neurocomputing, 2004.