Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.
-
Upload
brendan-harmon -
Category
Documents
-
view
212 -
download
0
Transcript of Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.
![Page 1: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/1.jpg)
Position Weight Matrices for Representing Signals in
Sequences
Triinu Tasa, Koke 04.02.05
![Page 2: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/2.jpg)
Definitions
• Sequence, string – ordered arrangement of letters {'A', 'C', 'G', 'T'}
• Pattern – simplified regular expression, alphabet {'A', 'C', 'G', 'T', '.'}, where '.' - wild-card of length 1 ('A', 'C', 'G' or 'T')
Triinu Tasa, Koke 04.02.05
![Page 3: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/3.jpg)
What is a weight matrix?
GATGAG
GATGAT
TGATAT
GATGATor
[GT][AG][TA][GT]A[GT]
What is a weight matrix?
Triinu Tasa, Koke 04.02.05
![Page 4: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/4.jpg)
Alignment matrix C:A 0 2 1 0 3 0C 0 0 0 0 0 0G 2 1 0 2 0 1T 1 0 2 1 0 2
Frequency matrix F:A 0 0.7 0.3 0 1 0 C 0 0 0 0 0 0 G 0.7 0.3 0 0.7 0 0.3T 0.3 0 0.7 0.3 0 0.7
Better: GATGAG
GATGAT
TGATAT
Triinu Tasa, Koke 04.02.05
What is a weight matrix?
![Page 5: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/5.jpg)
Or weight matrix W:
where
N – number of sequences used
- a priori probability of letter iip
1.09 1.10 0.51 1.09 1.47 1.09
1.09 1.09 1.09 1.09 1.09 1.09
1.10 0.51 1.09 1.10 1.09 0.51
0.51 1.09 1.10 0.51 1.09 1.10
A
C
G
T
, ,, ln ~ ln
( 1)i j i i j
i ji i
c p fw
N p p
What is a weight matrix?
Triinu Tasa, Koke 04.02.05
![Page 6: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/6.jpg)
Importance matrix I:
I(i, j) = *,i jc ,i jf
A 0 1.4 0.3 03 0
C 0 0 0 00 0
G 1.4 0.3 0 1.40 0.3
T 0.3 0 1.4 0.30 1.4
What is a weight matrix?
Triinu Tasa, Koke 04.02.05
![Page 7: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/7.jpg)
Applications
• Pattern clustering1. G.GATGAG.T 62/75 1:39/49 2:23/26 R:17.3026 BP:1.12008e-37
2. G.GATGAG 89/110 1:45/60 2:44/50 R:10.436 BP:1.61764e-34
3. GATGAG.T 124/148 1:52/70 2:72/78 R:7.36961 BP:2.79148e-33
4. TG.AAA.TTT 132/145 1:53/61 2:79/84 R:6.84578 BP:1.83509e-32
5. AAAATTTT 200/231 1:63/77 2:137/154 R:4.69239 BP:1.19109e-30
6. TGAAAA.TTT 104/114 1:45/53 2:59/61 R:7.78277 BP:3.86086e-29
7. AAA.TTTT 343/537 1:79/145 2:264/392 R:3.05349 BP:5.66833e-29
8. G.AAA.TTTT 135/156 1:51/62 2:84/94 R:6.19534 BP:5.69933e-29
9. TG.GATGAG 49/57 1:30/35 2:19/22 R:16.1117 BP:9.35765e-28
10. TG.AAA.TTTT 86/91 1:40/43 2:46/48 R:8.87311 BP:1.1124e-27
...
Triinu Tasa, Koke 04.02.05
Applications - Clustering
![Page 8: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/8.jpg)
G.GATGAG.T:
GAGATGAGAT
GTGATGAGAT
GAGATGAGGT
...
A -6.9 0.98 -6.9 1.38 -6.9 -6.9 1.38 -6.9 0.98 -6.9
C -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9
G 1.38 -6.9 1.38 -6.9 -6.9 1.38 -6.9 1.38 0.29 -6.9
T -6.9 0.29 -6.9 -6.9 1.38 -6.9 -6.9 -6.9 -6.91.38
Triinu Tasa, Koke 04.02.05
Applications - Clustering
![Page 9: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/9.jpg)
Compare matrices with each other using the dynamic programming approach
:
where
A, B – matrices
i, j - columns
If D(m,n) > threshold => matrices are different
( 1, ) _ cos ,
( , ) min ( , 1) _ cos ,
( 1, 1) ( , )
D i j deletion t
D i j D i j insertion t
D i j d i j
2, ,
1
( , ) ( )m
i j k i k jk
d A B A B
Triinu Tasa, Koke 04.02.05
Applications - Clustering
![Page 10: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/10.jpg)
G.GATGAG.T TG.AAA.TTT AAAATTTT
G.GATGAG TGAAAA.TTT AAA.TTTT
GATGAG.T TG.AAA.TTTT
We want to represent the clusters by
logos:
We need to align the patterns first – position the similar parts of the patterns above each other:
G.GATGAG.T
G.GATGAG--
--GATGAG.T
or the logo will look like this:
Triinu Tasa, Koke 04.02.05
Applications - Clustering
![Page 11: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/11.jpg)
•Multiple Alignment
Importance matrix I – represents the aligned patterns.
Example:
G.GATGAG.T
GATGAG.T
G.GATGAG
1. Insert the first pattern into I: ('.' gives 0.25 to each)
A 0 0.25 0 1 0 0 1 0 0.25 0
C 0 0.25 0 0 0 0 0 0 0.25 0
G 1 0.25 1 0 0 1 0 1 0.25 0
T 0 0.25 0 0 1 0 0 0 0.25 1
2. Align the second pattern with I using a dynamic programming approach:
, : ( ,0) 0, (0, ) 0i j v i v j
,
, ,
0, 0 :
0, 0,( , ) max
( 1, 1) , 0i
i i
S j
S j S j
i j
Iv i j
v i j I I
Triinu Tasa, Koke 04.02.05
Applications – Multiple alignment
![Page 12: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/12.jpg)
Dynamic programming matrix:
G . G A T G A G . T
G 0.00 0.10 0.01 0.10 0.00 0.00 0.10 0.00 0.10 0.01 0.00
A 0.00 0.00 0.11 0.00 0.20 0.00 0.00 0.20 0.00 0.11 0.00
T 0.00 0.00 0.01 0.00 0.00 0.30 0.00 0.00 0.00 0.01 0.21
G 0.00 0.10 0.01 0.11 0.00 0.00 0.40 0.00 0.10 0.01 0.00
A 0.00 0.00 0.11 0.00 0.21 0.00 0.00 0.50 0.00 0.11 0.00
G 0.00 0.10 0.01 0.21 0.00 0.00 0.10 0.00 0.60 0.01 0.00
. 0.00 0.00 0.10 0.01 0.21 0.00 0.00 0.10 0.00 0.60 0.01
T 0.00 0.00 0.01 0.00 0.00 0.31 0.00 0.00 0.00 0.01 0.70
G.GATGAG.T
--GATGAG.T
Triinu Tasa, Koke 04.02.05
Applications – Multiple alignment
![Page 13: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/13.jpg)
3. Add the pattern '--GATGAG.T' to I, if necessary add columns to the matrix.
4. Repeat the procedure for every pattern.
Output:
G.GATGAG.TG.GATGAG----GATGAG.T
Why importance matrix?
Triinu Tasa, Koke 04.02.05
Applications – Multiple alignment
![Page 14: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/14.jpg)
Example:
Pattern: GATG
So far aligned:
GATGATGTA-- - - GATGTGG
We want: w(G, 4) > w(G, 1) > w(G, 9)
Solution – importance matrix
Triinu Tasa, Koke 04.02.05
Applications – Multiple alignment
![Page 15: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/15.jpg)
● Weight Matrix Matching
Purpose: find the sequences that the weight matrix describes best in a given text file
...CATAGGAAATTCCACCTCTTTGGCTTTGCCCAGTCTTCCCTTGAGGATGCCTACGTTC...
1. Calculate the score for each position
2. if score > threshold => signal
Problem: finding a good threshold
● Threshold – 99.5% quantile
1.09 1.10 0.51 1.09 1.47 1.09
1.09 1.09 1.09 1.09 1.09 1.09
1.10 0.51 1.09 1.10 1.09 0.51
0.51 1.09 1.10 0.51 1.09 1.10
A
C
G
T
Triinu Tasa, Koke 04.02.05
Applications – Weight matrix matching
![Page 16: Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.](https://reader036.fdocuments.us/reader036/viewer/2022083008/56649ea15503460f94ba5400/html5/thumbnails/16.jpg)
Questions?
Triinu Tasa, Koke 04.02.05