CpG Island Identification with Hidden Markov Models
-
Upload
kshitij-tayal -
Category
Software
-
view
600 -
download
2
Transcript of CpG Island Identification with Hidden Markov Models
![Page 1: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/1.jpg)
CpG Island identification with Hidden Markov
Models !
- Kshitij Tayal
1
![Page 2: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/2.jpg)
CpG Island• Region of the genome with high frequency of CpG
sites than the rest of the genome.
• Formal Definition - CpG island is a region with at least 200 bp, and a GC percentage that is greater than 50 % .
• CpG is shorthand for “—C—phosphate—G—that is, cytosine and guanine separated by only one phosphate.
2
![Page 3: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/3.jpg)
Genome ~ 3 billion characters. Find gene ?
3
![Page 4: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/4.jpg)
Importance of CpG Islands• CpG island acts as a proxy to
identify a gene.
• They often occur at the start of the gene.
• Cytosines in CpG dinucleotides can be methylated(have methyl group attache) to form 5-methylcytosine.
4
![Page 5: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/5.jpg)
5
![Page 6: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/6.jpg)
Importance of Methylation• Our body consist thousand of cell . Every cell of our body
contain same copy of DNA with same blueprint of genetic code, then how do they decide among themselves which function has to performed ?
• How Does heart cell know it’s a heart cell
• How Does skin cell know it’s skin cell.
• They need outside instructions from these little carbon hydrogen compounds called methyl group.
• How characteristics change across generations without changes to the DNA sequence itself.
6
![Page 7: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/7.jpg)
Epigenetics & CpG Islands• Literal meaning of epigenetic is ‘above genetics’. It
decides methylation of CpG island
• CpG islands regulate expression of nearby genes.
• Proteins involved in gene expression can be repelled or attracted by the methyl group
7
![Page 8: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/8.jpg)
Background: Epigenetics• Environmental factors like what we do, what we eat, what we
smoke and how stressed we are decide the methyl group binding.
• Bad diet can actually lead methyl group binding to the wrong place and with these bad instruction cell become abnormal and become disease
• Epigenetics is also controlled by histones. Histones are protein that are basically spools that DNA wind itself around . Histones can change how tightly or loosely the DNA is around them.
• If loosely around — the gene get more expressed
• If tightly around — the gene get less expressed
8
![Page 9: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/9.jpg)
9
![Page 10: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/10.jpg)
Background: Epigenetics• So methyl group is more like a ‘switch’ and histones
are more like a ‘knob’
• Every cell of your body has a distinct methylation and histones pattern that gives every cell its marching order.
• DNA can be thought of as body ‘hardware’ and epigenome is more like a software which tells the hardware what work it has to do and hence justifies its meaning.
10
![Page 11: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/11.jpg)
Now Some Computer Science……..
• Task - Design a method that, given a candidate string (k-mer), score it according to how confident it came from CpG Island.
• Apply, Sequence Model which is a probabilistic model that associates probabilities with sequences.
11
![Page 12: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/12.jpg)
Sequence Models
• Sequence models learn from examples.
• Say we have sampled 100K 5-mers from inside CpG islands and 100K 5-mers from outside.
• Can we guess whether CGCGC came from CpG island.?
• P(inside) = 315/(315 + 12)
12
# CGCGC inside 315
# CGCGC outside 12
![Page 13: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/13.jpg)
Sequence Models • To estimate p(x) we count # times x appears in the
training set labelled INSIDE divided by total # of times x appears in training set.
• But for sufficiently long k, we might not see any occurrences of x, or very few.To overcome this limitation we will go for joint probability distribution.
• P(X) = P(Xk,Xk-1,………X1) where P(X) is the probability of sequence X
13
![Page 14: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/14.jpg)
14
![Page 15: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/15.jpg)
15
![Page 16: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/16.jpg)
16
![Page 17: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/17.jpg)
• P(x) now equal product of all the Markov chain edge weights on our
string driven walk through the chain
!!
• Nodes label are symbol and transition label are conditional probability
17
![Page 18: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/18.jpg)
18
![Page 19: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/19.jpg)
19
![Page 20: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/20.jpg)
20
![Page 21: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/21.jpg)
Hidden Markov Model• In simpler Markov models (like a Markov chain), the
state is directly visible to the observer, and therefore the state transition probabilities are the only parameters.
• In a hidden Markov model, the state is not directly visible, but the output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. The adjective 'hidden' refers to the state sequence through which the model passes.
21
![Page 22: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/22.jpg)
22
![Page 23: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/23.jpg)
23
![Page 24: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/24.jpg)
24
![Page 25: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/25.jpg)
25
![Page 26: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/26.jpg)
26
![Page 27: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/27.jpg)
27
![Page 28: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/28.jpg)
28
Hidden Markov Model- Viterbi Algorithm
• Given flips can we say when the dealer was using loaded coin.
• We want to find p* , the most likely path given the emission.
!
• Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events.
![Page 29: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/29.jpg)
29
![Page 30: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/30.jpg)
30
![Page 31: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/31.jpg)
31
![Page 32: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/32.jpg)
32
![Page 33: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/33.jpg)
33
![Page 34: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/34.jpg)
34
![Page 35: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/35.jpg)
35
![Page 36: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/36.jpg)
36
![Page 37: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/37.jpg)
37
![Page 38: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/38.jpg)
38
![Page 39: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/39.jpg)
39
![Page 40: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/40.jpg)
Hidden Markov Model
40
![Page 41: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/41.jpg)
Hidden Markov Model
41
![Page 42: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/42.jpg)
42
EMISSIONS
![Page 43: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/43.jpg)
43
![Page 44: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/44.jpg)
44
Hidden Markov Model
![Page 45: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/45.jpg)
45
![Page 46: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/46.jpg)
46
![Page 47: CpG Island Identification with Hidden Markov Models](https://reader031.fdocuments.us/reader031/viewer/2022030304/58779d9d1a28ab826e8b4717/html5/thumbnails/47.jpg)
THANK YOU
47