Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

53
Intro to comp genomics Lecture 3-4: Examples, Approximate Inference
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    2

Transcript of Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Page 1: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Intro to comp genomics

Lecture 3-4: Examples, Approximate Inference

Page 2: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 1: Mixtures of Gaussians

i

iii xNpxP ),;()|(

),;()|( xNxP

i

iii xNpxP ),;()|(

We have experimental results of some valueWe want to describe the behavior of the experimental values:Essentially one behavior? Two behaviors? More?In one dimension it may look very easy: just looking at the distribution will give us a good idea..

We can formulate the model probabilistically as a mixture of normal distributions.

As a generative model: to generate data from the model, we first select the sub-model by sampling from the mixture variable. We then generate a value using the selected normal distribution.

If the data is multi dimensional, the problem is becoming non trivial.

Page 3: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Inference is trivial

i

iii xNpxP ),;()|(

Let’s represent the model as:

i

iii xNpxP ),;()|(

iiiisss

s

xNpxNp

sxssxsxsP

),;(/),;(

)'|Pr()'Pr(/)|Pr()Pr()|(

What is the inference problem in our model?

s

sxsx )|Pr()Pr()Pr(

Inference: computing the posterior probability of a hidden variable given the data and the model parameters.

For p0=0.2, p1=0.8, 0=0, 1=1, 0=1,1=0.2, what is Pr(s=0|0.8) ?

Page 4: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Estimation/parameter learning

i

iii xNpxP ),;()|(

Given data, how can we estimate the model parameters?

i jjjij

ii

n

xNp

xxxL

),;(

)|Pr(),..,|( 1

Transform it into an optimization problem!

Likelihood: a function of the parameters. Defined given the data.

Find parameters that maximize the likelihood: the ML problem

Can be approached heuristically: using any optimization technique.

But it is a non linear problem which may be very difficult

Generic optimization techniques:

Gradient ascent:

Find

Simulation annealing

Genetic algorithms

And more..

)),..,|((maxarg 11

nkk

ak xxLaL

Page 5: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

The EM algorithm for mixtures – inference allow for learning

i

iii xNpxP ),;()|(

We start by guessing parameters:

We now go over the samples and compute their posteriors (i.e., inference):

iis iiss

xNpxNpxsP ),;(/),;(),|( 00000

We use the posteriors to compute new estimates for the expected sufficient statistics of each distribution, and for the mixture coefficients:

ii

iii

xsP xsP

xsPxxE

s ),|(

),|(][

0

0

)|(1

ii

iisi

xsP xsP

xsPxxV

s ),|(

),|()(][

0

021

)|(

21

i

ixsPNps

),|(1 01

Continue iterating until convergence.

The EM theorem: the algorithm will converge and will improve likelihood monotonically

But:

No Guarantee of finding the optimumOr of finding anything meaningful

The initial conditions are critical:

Think of starting from 0=0, 1=10, 1,2=1

Solutions: start from “reasonable” solutionsTry many starting points

-1 0 1

Page 6: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 2: Mixture of sequence models

• a probabilistic model for binding sites:

• This is the site independent model, defining a probability space over k-mers

• Assume a set of sequences contain unknown binding sites (one for each)• The position of the binding site is a hidden variable h.

• We introduce a background model Pb that describes the sequence outside of the binding site (usually a d-order Markov model)

• Given complete data we can write down the likelihood of a sequence s as:

k

ii imPmP

1

])[()(

k

ibackiback

S

ibackback

ildilsilsPilsPsPlPlsP

idisisPsP

1

||

1

]))1..[|][(/])[(()()()|,(

]))1..[|][()(

Page 7: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

• Inference of the binding site location posterior:

• Note that only k-factors need to be computed for each location (Pb(s) is constant))

One hidden variable = trivial inference

i

isPlsPslP )|,(/)|,(),|( 111

• If we assume some of the sequences may lack a binding site, this should be incorporated into the model:

k

ibackiback ildilsilsPilsPsPlPhitPlsP

1

]))1..[|][(/])[(()(*)(*)()|,(

hitl

s

• This is sometime called the ZOOPS model (Zero or one positions)

Page 8: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Hidden Markov Models

Observing only emissions of states to some probability space EEach state is equipped with an emission distribution (x a state, e emission))|Pr( xe

)|Pr()|Pr(),Pr( 1 iiiii sesses

Emission space

Caution! This is NOT the HMM Bayes Net

1.Cycles2.States are NOT random vars!

State space

Page 9: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 3: Mixture with “memory”

h i

iiio

h

hxPhhhxPh

hxP

xP

)|()|Pr()|()Pr(

)|,(

)|(

10

We sample a sequence of dependent valuesAt each step, we decide if we continue to sample from the same distribution or switch with probability p

We can compute the probability directly only given the hidden variables.

P(x) is derived by summing over all possible combination of hidden variables. This is another form of the inference problem (why?)

There is an exponential number of h assignments, can we still solve the problem efficiently?

B A

)|( ABP

)|( BAP

)|( AxP )|( BxP

Page 10: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Inference in HMM

Forward formula:

0:1?)(

)'|Pr()|Pr(

0

)('

1'

startsf

ssfsef

s

sNs

is

iis

)|Pr(

)'|Pr()|'Pr( 1

)('

1'

sfinishb

sessbb

Ns

i

sNs

is

is

Backward formula: Emissions

States FinishStart

isf

Emissions

States FinishStart

isb

S

Ls sfinishfL )|Pr(

S

s beginsbL )|Pr(1

Page 11: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Computing posteriors:

Emissions

States FinishStart

)'|Pr()|Pr()(

1)|',Pr( 11

' sssebfsL

ess iis

is

The posterior probability for transition from s’ to s after character i?

The posterior probability for emitting the i’th character from state s?

is

is

i bfsL

es)(

1),|Pr(

Page 12: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 4: Hidden states

Example:Two Markov models describe our dataSwitching between models is occurring at randomHow to model this?

No EmissionHidden state

Page 13: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 5: Profile HMM for Protein or DNA motifs

M

I

D

M

I

D

M

I

D

M

I

D

M

I

D

M

I

D

S F

•M (Match) states emit a certain amino acid/nucleotide profile•I (Insert) states emit some background profile•D (Delete) states are hidden

•Use the model for classification or annotation•(Both emissions and transition probabilities are informative!)

•Can use EM to train the parameters from a set of examples•(How do we determine the right size of the model?)(google PFAM, Prosite, “HMM profile protein domain”)

Page 14: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 6: N-order Markov model

•In most biological sequences, the Markov property is a big problem

•N-order relations can be modeled naturally:

Common error:

Forward/Backward in N-order HMM. Can dynamic programming work?

Page 15: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Emissions

States FinishStart

FinishStart

1-HMM Bayes Net:

2-HMM Bayes Net:

Page 16: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 7: Pair-HMM

Given two sequences s1,s2, an alignment is defined by a set of ‘gaps’ (or indels) in each of the sequences.

ACGCGAACCGAATGCCCAA---GGAAAACGTTTGAATTTATAACCCGT-----ATGCCCAACGGGGAAAACGTTTGAACTTATA

indel

indel

Standard dynamic programming algorithm compute the best alignment given such distance metric:

Standard distance metric: )(#])[],[(),,,( 22

11

2121 gapslslssslldi

ii

),,,(min),( 2121

,

2121 sslldssdll

Affine gap cost: l Substitution matrix: ))log(Pr(~),( baba

Page 17: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Pair-HMM

Generalize the HMM concept to probabilistically model alignments.

Problem: we are observing two sequences, not a-priori related. What will be emitted from our HMM?

M

G1

G2S

F

Match states emit and aligned nucleotide pairGap states emit a nucleotide from one of the sequences onlyPr(M->Gi) – “gap open cost”, Pr(G1->G1) – “gap extension cost”

Is it a BN template?Forward-backward formula?

Page 18: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example 8: The simple tree model

H2

S3

S2 S1

H1

Sequences of extant and ancestral species are random variables, with Val(X) = {A,C,G,T}

Extant Species Sj1,., Ancestral species Hj

1,..(n-1)

Tree T: Parents relation pa Si , pa Hi

(pa S1 = H1 ,pa S3 = H2 ,The root: H2)

For multiple loci we can assume independence and use the same parameters (today):

),Pr(),Pr( jjj hshs

ii paxxiii Qtxx ,)exp()pa|Pr(

)pa|Pr()Pr(),Pr( !ji

jirootiroot

jj xxhhs )|Pr()|Pr()|Pr(

)|Pr()Pr()Pr(

111223

212

hshshs

hhhs

In the triplet:

The model is defined using conditional probability distributions and the root “prior” probability distribution

The model parameters can be the conditional probability distribution tables (CPDs)

Or we can have a single rate matrix Q and branch lengths:

96.001.002.001.0

01.096.001.002.0

02.001.096.001.0

01.002.001.096.0

)|Pr( yx

Page 19: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Ancestral inference

)pa|Pr()Pr()|,Pr( !ji

jirootiroot

jj xxhhs

h

shPs )|,()|Pr(

The Total probability of the data s:

This is also called the likelihood L(). Computing Pr(s) is the inference problem

)|Pr(

)|,(),|Pr(

s

shPsh Given the total probability it is easy

to compute:

)|Pr(/),(),|Pr(|

sshPsxhxhh

i

i

Easy!

Exponential?

Marginalization over hi

We assume the model (structure, parameters) is given, and denote it by :

Posterior of hi given the data

Total probability of the data

Page 20: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Example:

?

A

C A

?

xhh

i

i

shPsxh|

),()|Pr(

Given partial observations s:

)),,Pr(( ACA

The Total probability of the data:

)),,(|Pr( 1 ACAAh

96.001.002.001.0

01.096.001.002.0

02.001.096.001.0

01.002.001.096.0

)|Pr( yx

Uniform prior

Page 21: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Algorithm (Following Felsenstein 1981):

Up(i):if(extant) { up[i][a] = (a==Si ? 1: 0); return}up(r(i)), up(l(i))iter on a

up[i][a] = b,c Pr(Xl(i)=b|Xi=a) up[l(i)][b] Pr(Xr(i)=c|Xi=a) up[r(i)][c]

Down(i):

down[i][a]= b,c Pr(Xsib(i)=b|Xpar(i)=c) up[sib(i)][b] Pr(Xi=a|Xpar(i)=c) down[par(i)][c]

down(r(i)), down(l(i))Algorithm:

up(root);LL = 0;foreach a {

L += log(Pr(root=a)up[root][a])down[root][a]=Pr(root=a)

}down(r(root));down(l(root));

Dynamic programming to compute the total probability

?

S3

S2 S1

? up[4]

up[5]

Page 22: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Algorithm (Following Felsenstein 1981):

Up(i):if(extant) { up[i][a] = (a==Si ? 1: 0); return}up(r(i)), up(l(i))iter on a

up[i][a] = b,c Pr(Xl(i)=b|Xi=a) up[l(i)][b] Pr(Xr(i)=c|Xi=a) up[r(i)][c]

Down(i):

down[i][a]= b,c Pr(Xsib(i)=b|Xpar(i)=c) up[sib(i)][b] Pr(Xi=a|Xpar(i)=c) down[par(i)][c]

down(r(i)), down(l(i))Algorithm:

up(root);LL = 0;foreach a {

L += log(Pr(root=a)up[root][a])down[root][a]=Pr(root=a)

}down(r(root));down(l(root));

?

S3

S2 S1

? down[4]

down5]

up[3]

P(hi|s) = up[i][c]*down[i][c]/

(jup[i][j]down[i][j])

Computing marginals and posteriors

Page 23: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Simple Tree: Inference as message passing

s

s

s s

s

s

sYou are P(H|our data)

You are P(H|our data)

I am P(H|all data)

DATA

Page 24: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Transition posteriors: not independent!

A CA

C

DATA

96.001.002.001.0

01.096.001.002.0

02.001.096.001.0

01.002.001.096.0

)|Pr( yxDown:(0.25),(0.25),(0.25),(0.25)

Up:(0.01)(0.96),(0.01)0.96),(0.01)(0.02),(0.02)(0.01)

Up:(0.01)(0.96),(0.01)0.96),(0.01)(0.02),(0.02)(0.01)

Page 25: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Understanding the tree model (and BNs): reversing edges

The joint probability of the simple tree model:

),|Pr(),|Pr(),|Pr(),|Pr()|Pr(),Pr( 212213121 hxhxhxhhhxh

Can we change the position of the root and keep the joint probability as is?

)',|Pr()',|Pr()',|Pr()',|Pr()'|Pr(),Pr( 212213212 hxhxhxhhhxh

We need: )',|Pr()'|Pr(),|Pr()|Pr( 212121 hhhhhh

)'|Pr()',|Pr()'|Pr(),|Pr()|Pr( 2212121

11

hhhhhhhhh

)|Pr()'|Pr(/),|Pr()|Pr( 212121 hhhhhh

Page 26: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Inference can become difficult

)pa|Pr()Pr()|,Pr( !ji

jirootiroot

jj xxhhs

We want to perform inference in an extended tree model expressing context effects:

2 31

4 5 6

7 8 9

With undirected cycles, the model is well defined but inference becomes hard

We want to perform inference on the tree structure itself!

Each structure impose a probability on the observed data, so we can perform inference on the space of all possible tree structures, or tree structures + branch lengths

)'Pr()'|Pr(/)Pr()|Pr()|Pr('

DDD

1 2 3

What makes these examples difficult?

Page 27: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Factor graphs

Defining the joint probability for a set of random variables given:

1) Any set of node subsets (hypergraph)

2) Functions on the node subsets (Potentials)

)(1

)Pr( aa xZ

x

)( ax

)|{, VaaAV

x

aa xZ )(

Joint distribution:

Partition function:

If the potentials are condition probabilities, what will be Z?

Things are difficult when there are several modes

Factor

R.V.

Not necessarily 1! (can you think of an example?)

Page 28: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

More definitions

The model: a

axZx )(log)log())log(Pr(

Potentials can be defined on discrete, real valued etc.it is also common to define general log-linear models directly:

))(logexp(1

)Pr( a

aa xwZ

x

Inference:

Dx a

aa xwZ

D ))(logexp(1

)|Pr(

)|Pr(/))(logexp(1

),|Pr(,|

DxwZ

DxDxx a

aai

i

Learning:

Find the factors parameterization: )|Pr(maxarg

D

Page 29: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Belief propagation in a factor graph

)(1

)|( aaa xZ

xP

• Remember, a factor graph is defined given a set of random variables (use indices i,j,k.) and a set of factors on groups of variables (use indices a,b..)

)( iia xm

• Think of messages as transmitting beliefs:

a->i : given my other inputs variables, and ignoring your message, you are x

i->a : given my other inputs factors and my potential, and ignoring your message, you are x

• xa refers to an assignment of values to the inputs of the factor a

• Z is the partition function (which is hard to compute)

• The BP algorithm is constructed by computing and updating messages:

• Messages from factors to variables:

• Messages from variables to factors: )( iai xm

(any value attainable by xi)->real values

Page 30: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Messages update rules:

)()(\)(

iicaiNc

iai xmxm

ia xx

jajiaNj

aaiia xmxxm )()()(\)(

Messages from variables to factors:

Messages from factors to variables:

a

i aiN \)(

a

iiaN \)(

Page 31: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

The algorithm proceeds by updating messages:

• Define the beliefs as approximating single variables posterios (p(hi|s)):

)()()(

iiaiNa

ii xmxb

Algorithm:

Initialize all messages to uniformIterate until no message change:

Update factors to variables messagesUpdate variables to factors messages

• Why this is different than the mean field algorithm?

)()( ii hqhq

Page 32: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Beliefs on factor inputs

This is far from mean field, since for example:

)()(

)()()()(

\)()(

)()(

jjcajNcaNj

a

jjaNj

jajaNj

aaa

xmx

xbxmxxb

The update rules can be viewed as derived from constraints on the beliefs:

1.requirement on the variables beliefs (bi)

2.requirement on the factor beliefs (ba)

3.Marginalization requirement:

a

i aiN \)(

a

iiaN \)(

ia xxjjc

ajNcaNjaiiiid

iNdxmxxbxm )()()()(

\)()()(

ia xxjjc

ajNciaNjaiia xmxxm )()()(

\)(\)(

ia xx

aaii xbxb\

)()(

)()()(

iiaiNa

ii xmxb

)()()(\)()(

jjcajNcaNj

aaa xmxxb

Page 33: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

BP on Tree = Up-Down

s4 s3

h2

h3e

s2 s1

h1

b a

c

d

)|Pr()|Pr()( 12111hshsxup ih

111)( 1 hbhach mmhm

)()()(

)()()(

2\

1

1\

1

11

11

smxhm

smxhm

bshx

bhb

ashx

aha

ib

ia

32

32

1

,313232 )|Pr()|Pr()()(

)(

hhhh

ih

hhhhhdownhup

xdown

3 2

2

3

33

1

31

)(),()(),(

)()(),(

)()()(

323313

3313

\31

h hehedc

hhehdc

hxchcchc

hmhhhhh

hmhmhh

hmxhmc

2 1

3

Page 34: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Loopy BP is not guaranteed to converge

X Y

Y

x

01

10

Y

x

01

10

1 1

0 0

This is not a hypothetical scenario – it frequently happens when there is too much symmetryFor example, most mutational effects are double stranded and so symmetric which can result in loops.

Page 35: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Sampling is a natural way to do approximate inference

MarginalProbability(integration overall space)

MarginalProbability(integration overA sample)

Page 36: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Sampling from a BN

Naively: If we could draw h,s’ according to the distribution Pr(h,s’) then: Pr(s) ~ (#samples with s)/(# samples)

Forward sampling:use a topological order on the network. Select a node whose parents are already determined sample from its conditional distribution (all parents already determined!)

Claim: Forward sampling is correct:

2 31

How to sample from the CPD?

4 5 67 8 9

),Pr()],(1[ shshEP

Page 37: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Focus on the observations

Naïve sampling is terribly inefficient, why?

What is the sampling error?

Why don’t we constraint the sampling to fit the evidence s?

2 31

4 5 6

7 8 9

Two tasks: P(s) and P(f(h)|s), how to approach each/both?

This can be done, but we no longer sample from P(h,s), and not from P(h|s) (why?)

Page 38: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Likelihood weighting

Likelihood weighting: weight = 1use a topological order on the network. Select a node whose parents are already determined if the variable was not observed: sample from its conditional distributionelse: weight *= P(xi|paxi), and fix the observation

Store the sample x and its weight w[x]

Pr(h|s) = (total weights of samples with h) / (total weights)

7 8 9

),|Pr( 211 ij

iij shs

),|Pr( 1ij

iij shs

),|Pr( 11 ij

iij shs ),|Pr( 211 i

jii

j shs),|Pr( 1i

jii

j shs),|Pr( 11 ij

iij shs

Weight=

Page 39: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Importance sampling

M

mmP hf

MfE

1

)(1

][

])(

)()([)]([ )()( HQ

HPHfEHfE HQHP

M

D mhwmhfM

fE

mhhD

1

])[(])[(1

][ˆ

]}[],..,1[{

Unnormalized Importance sampling:

])[(

])[()(

mhQ

mhPhw

)(|)(|)( HPHfHQ To minimize the variance, use a Q distribution is proportional to the target function:

22 )])([(]))()([())()(( HfEHwHfEHwHfVar PQQ

Our estimator from M samples is:

But it can be difficult or inefficient to sample from P. Assume we sample instead from Q, then:

Claim:

Prove it!

Page 40: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Correctness of likelihood weighting: Importance sampling

Unnormalized Importance sampling with the likelihood weighting proposal distribution Q and any function on the hidden variables:

Proposition: the likelihood weighting algorithm is correct (in the sense that it define an estimator with the correct expected value)

For the likelihood waiting algorithm, our proposal distribution Q is defined by fixing the evidence at the nodes in a set E and ignoring the CPDs of variable with evidence.

We sample from Q just like forward sampling from a Bayesian network that eliminated all edges going into evidence nodes!

)pa|Pr(),(

),()( ii

Eixx

shQ

shPhw

)pa|Pr()|( iiEi

xxDxQ

M

hhD mhwmhM

Eii

1

])[(])[(11

]1[ˆ

Page 41: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Normalized Importance sampling

hh

HQ hPhQ

hPhQHwE )('

)(

)(')()]([)(

/)]()([)]([ )()( HwHfEXfE HQHP

M

M

D

mhwM

mhwmhfMfE

mhhD

1

1

])[(1

])[(])[(1

][ˆ

]}[],..,1[{Sample:

NormalizedImportance sampling:

])[(

])[(')(

mhQ

mhPhw

When sampling from P(h|s) we don’t know P, so cannot compute w=P/Q

We do know P(h,s)=P(h|s)P(s)=P(h|s)=P’(h)

So we will use sampling to estimate both terms:

)][(1

/)][()(11

)|(ˆ'

11MM

D mhwM

mhwhM

shP

Using the likelihood weighting Q, we can compute posterior probabilities in one pass (no need to sample P(s) and P(h,s) separately):

Page 42: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Likelihood weighting is effective here:

But not here:

observed

unobserved

Limitations of forward sampling

Page 43: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Symmetric and reversible Markov processes

Definition: we call a Markov process symmetric if its rate matrix is symmetric:

jiij QQji ,

What would a symmetric process converge to?

Definition: A reversible Markov process is one for which:

)|Pr()|Pr( iXjXiXjX stts

i j j i

Time: t s

Claim: A Markov process is reversible iff such that:i

jijiji qq If this holds, we say the process is in detailed balance and the p are its stationary distribution.

i j

qji

qij

Proof: Bayes law and the definition of reversibility

Page 44: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Reversibility

Claim: A Markov process is reversible iff we can write:

ijjij sq where S is a symmetric matrix.

Q,tQ,t’ Q,t’ Q,t

Q,t+t’

Claim: A Markov process is reversible iff such that:i

jijiji qq If this holds, we say the process is in detailed balance.

i j

qji

qijProof: Bayes law and the definition of reversibility

Page 45: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Markov Chain Monte Carlo (MCMC)

We don’t know how to sample from P(h)=P(h|s) (or any complex distribution for that matter)

The idea: think of P(h|s) as the stationary distribution of a Reversible Markov chain

)()|()()|( yPyxxPxy

Find a process with transition probabilities for which:

Then sample a trajectory ,,...,, 21 myyy

)()(1

lim xPxyCn i

n

Theorem: (C a counter)

Process must be irreducible (you can reach from anywhere to anywhere with p>0)

(Start from anywhere!)

Page 46: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

The Metropolis(-Hastings) Algorithm

Why reversible? Because detailed balance makes it easy to define the stationary distribution in terms of the transitions

So how can we find appropriate transition probabilities?

)()|()()|( yPyxxPxy

)|()|( xyFyxF

))(/)(,1min( xPyP

We want:

Define a proposal distribution:

And acceptance probability:

)|()(

)1,)(

)(min()|())(),(min()|(

))(

)(,1min()|()()|()(

yxyP

yP

xPyxFyPxPxyF

xP

yPxyFxPxyxP

What is the big deal? we reduce the problem to computing ratios between P(x) and P(y)

x yF

))(/)(,1min( xPyP

Page 47: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Acceptance ratio for a BN

To sample from:

)),(/),'(,1min())|(/)|'(,1min( shPshPshPshP

We affected only the CPDs of hi and its children

Definition: the minimal Markov blanket of a node in BN include its children, Parents and Children’s parents.

To compute the ratio, we care only about the values of hi and its Markov Blanket

For example, if the proposal distribution changes only one variable h i what would be the ratio?

?))|,..,,,..,Pr(/)|,..,',,..,Pr(,1min( 11111111 shhhhhshhhhh niiiniii

)|Pr( shWe will only have to compute:

Page 48: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Gibbs sampling

)|..,,,..,Pr(/),,..,',..,Pr()|,..,,..,Pr(

),..,,,..,|'Pr()|,..,,..,Pr(),|'()|(

11111111

111111

shhhhshhhshhh

shhhhhshhhshhshP

niinini

niiini

A very similar (in fact, special case of the metropolis algorithm):

Start from any state hdo { Chose a variable Hi

Form ht+1 by sampling a new hi from }

This is a reversible process with our target stationary distribution:

Gibbs sampling is easy to implement for BNs:

ihiij

jiii

iijji

ii

niii

hhhhh

hhhhhshhhhh

''pa

pa1111

)ˆ,''|Pr()ˆ|''Pr(

)ˆ,'|Pr()ˆ|'Pr(),..,,,..,|'Pr(

ih

)|Pr( ti hh

Page 49: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Sampling in practice

)()(1

lim xPxyCn i

n

How much time until convergence to P?(Burn-in time)

Mixing

Burn in Sample

Consecutive samples are still correlated! Should we sample only every n-steps?

We sample while fixing the evidence. Starting from anywere but waiting some time before starting to collect data

A problematic space would be loosely connected:

Examples for bad spaces

Page 50: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

More terminology: make sure you know how to define these:

Inference

Parameter learning

Likelihood

Total probability/Marginal probability

Exact inference/Approximate inference

Page 51: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Z-scores, T-test – the basics

BABA

BBAA

BA

nnnnSnSn

XXt

112

)1()1( 22

You want to test if the mean (RNA expression) of a gene set A is significantly different than that of a gene set B.

If you assume the variance of A and B is the same:

t is distributed like T with nA+nB-2 degrees of freedom

If you don’t assume the variance is the same:

)1/()1/(/:..222222

22

BB

BA

A

A

B

B

A

A

B

B

A

A

BA

nn

sn

n

s

n

s

n

sfod

ns

ns

XXt

But in this case the whole test becomes rather flaky!

In a common scenario, you have a small set of genes, and you screen a large set of conditions for interesting biases.

You need a quick way to quantify deviation of the mean

For a set of k genes, sampled from a standard normal distribution, how would the mean be distributed?

)1

,0(K

NThe Mean

So if your conditions are normally distributed, and pre-standartize to mean 0, std 1

You can quickly compute the sum of values over your set and generate a z-score

|| A

XZ A

Page 52: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Kolmogorov-smirnov statistics

|)()(|max

|)()(|max

22xSxSD

xPxSD

NNx

Nx

1

21 22

)1(2)(j

jjKS eQ

The D-statistics is a-parameteric: you can transform x arbitrarly (e.g. logx) without changing it

The D statistics distribution is given by a the form:

)/11.012.0(

)(21

21

DNNQ

observedDP

NN

NNN

eeKS

e

An a-parameteric variant on the T-test theme is the Mann-Whitney test.

You Take your two sets and rank them together. You count the ranks of one of your set (R1)

2

)1( 111

nnRU

12

)1(

2/

),(~

2121

21

nnnn

nn

NU

U

U

UU

Page 53: Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Hyper-geometric and chi-square test

B

A

B

A

n

N

k

n

kn

nN

kBAP )|(|

A

B

Nnnn

nnnn

nnnn

nnnn

321

3333231

2232221

1131211

ji ji

jiji

nN

nnn

, ,

2,,,

2)(

Chi-square distributed with m*n-m-n+1 d.o.f.