Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf...
-
Upload
archibald-stevenson -
Category
Documents
-
view
224 -
download
5
Transcript of Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf...
Recitation on EMslides taken from:
http://www.cs.ucsb.edu/~ambuj/Courses/bioinformatics/EM.pdf
Computational GenomicsRecitation #6
All EM questions are in the format:
1. Write the likelihood function.2. Write the Q function.3. Derive the update rule.
Estimation problems
Estimation problems
What is the unobserved data in this case?
Estimation problems
?
?
?
?
?
?
?
?
?
?
??
?
?
?
?
??
EM question
• Let G = (G1, … , Gn) be n contiguous DNA regions representing genes. For each Gi we define the mRNA concentration of the gene as Pi, s.t. their sum is equal to 1. P = (P1, … , Pn) can be interpreted as the normalized expression levels for the regions in G.
EM question
• Our model assumes that reads are generated by randomly picking a region R from G according to the distribution P, and then copying this region. The copying process is error-prone. This process is repeated until we have a set of m reads R = r1, … , rm generated according to the model described above.
EM question
• For each region Gj and read ri, we have a probability pij = P(rj | Gi), the probability of observing rj given that the locus of the read was gene Gi. In practice, for each read rj, this probability will be close to zero for all but a few regions.
Likelihood function
• Write the likelihood of observing the m reads.
?
Q function
• Write the Q(P | P(t)) term.
?
?
M-step
• Write the M-step term using argmax function.
Update rule
• Infer from c the update step for P.
When we want to maximize ∑iailog(Pi) based on Pi, we achieve the maximum Pi=ai/∑iai
?