Classifying Proteins into Families02251/recitations/... · But what are Transition and...
Transcript of Classifying Proteins into Families02251/recitations/... · But what are Transition and...
![Page 1: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/1.jpg)
• Proteins are organized into protein families represented by multiple alignments.
• A distant cousin may have weakpairwise similarities with familymembers failing a significancetest.
• However, it may have weak similarities with manyfamily members, indicating a relationship.
Classifying Proteins into Families
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 2: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/2.jpg)
From Alignment to Profile
Remove columns if the fraction of space symbols (“-”) exceeds θ, the maximum fraction of insertions threshold.
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 3: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/3.jpg)
From Alignment to Profile
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 4: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/4.jpg)
From Alignment to Profile
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 5: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/5.jpg)
From Profile to HMM
HMM diagram
A D D A F F D F1 * .25 * .75 * .20 * 1 * .20 * .75 * .60
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 6: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/6.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
A D D A F F D F F
Checkpoint: How do we model insertions?
Toward a Profile HMM
![Page 7: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/7.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I1
A D D A F F D F F
Toward a Profile HMM: Insertions
![Page 8: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/8.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A D D A F F D F F
Toward a Profile HMM: Insertions
![Page 9: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/9.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A D D A F F D F F
Toward a Profile HMM: Insertions
![Page 10: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/10.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A D D A F F D F F
Toward a Profile HMM: Insertions
![Page 11: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/11.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A D D A F F D F F
Toward a Profile HMM: Insertions
Checkpoint: How do we model deletions?
![Page 12: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/12.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A A F F D F
Toward a Profile HMM: Deletions
![Page 13: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/13.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A A F F D F
Toward a Profile HMM: Deletions
![Page 14: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/14.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A A F F D F
Checkpoint: How many edges are in this HMM diagram?
Toward a Profile HMM: Deletions
![Page 15: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/15.jpg)
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3
I0 I1 I2 I3 I4 I5 I6 I7 I8
A A F F D F
Adding “Deletion States”
![Page 16: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/16.jpg)
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A A F F D F
Adding “Deletion States”
![Page 17: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/17.jpg)
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A A F F D F
Adding “Deletion States”
Checkpoint: Are any edges still missing in this HMM diagram?
![Page 18: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/18.jpg)
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
Adding Edges Between Deletion/Insertion States
![Page 19: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/19.jpg)
But what are Transition and Emission matrices?
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 ES
Profile HMM Problem: Construct a profile HMM from a multiple alignment.• Input: A multiple alignment Alignment and a threshold
θ (maximum fraction of insertions per column).• Output: Transition and emission matrices of the profile
HMM HMM(Alignment,θ).
The Profile HMM is Ready to Use!Start
End
![Page 20: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/20.jpg)
Hidden Paths Through Profile HMM
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A! C! D! E! F! AC! A! D! F!
S E
A! F! D! A! C! C! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
(-)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Note: this is a hidden path in an HMM diagram (not in a Viterbi graph).Bioinformatics Algorithms: An Active Learning Approach.
Copyright 2018 Compeau and Pevzner.
![Page 21: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/21.jpg)
Transition Probabilities of Profile HMM
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A! C! D! E! F! AC! A! D! F!
S E
A! F! D! A! C! C! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
(-)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
A! C! A! E! F! A! C!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
(-)
A! D! D! E! F! AA! A! D! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
transitionMatch(5),Insertion(5) = 3/4transitionMatch(5),Match(6) = 1/4transitionMatch(5),Deletion(6) = 0
4 transitions from M5 :
1 + 1 + 1 = 3 into I51 into M6
0 into D6
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 22: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/22.jpg)
Emission Probabilities of Profile HMM
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
A! C! D! E! F! AC! A! D! F!
S E
A! F! D! A! C! C! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
(-)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
A! C! A! E! F! A! C!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
(-)
A! D! D! E! F! AA! A! D! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 S E
emissionMatch(2)(A) = 0 emissionMatch(2)(C) = 2/4 emissionMatch(2)(D) = 1/4 emissionMatch(2)(E) = 0 emissionMatch(2)(F) = 1/4
symbols emitted from M2:C, F, C, D
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 23: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/23.jpg)
Forbidden Transitions
C H A P T E R 1
S I0 M1 D1 I1 M2 D2 I2 M3 D3 I3 M4 D4 I4 M5 D5 I5 M6 D6 I6 M7 D7 I7 M8 D8 I8 E
S 1
I0
M1 .8 .2
D1
I1
M2 1
D2
I2
M3 1
D3
I3 1
M4 .8 .2
D4
I4
M5 .25 .75
D5 .33 .67
I5 1
M6 .8 .2
D6
I6
M7 1
D7
I7 1
M8 1
D8
I8
E
4
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 E S
Don’t forget pseudocounts: HMM(Alignment,θ,σ)
Gray cells: edges in the HMM diagram.
Clear cells: forbiddentransitions.
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 24: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/24.jpg)
Aligning a Protein Against a Profile HMM
Alignment
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8 E S
Protein ACAFDEAF
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 25: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/25.jpg)
Aligning a Protein Against a Profile HMM
Alignment
Protein
Apply Viterbi algorithm to find optimal hidden path!
A! C! D! E! (-) A! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
AF!
S E
(-)
ACAFDEAF
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 26: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/26.jpg)
ACAFDEAF
Aligning a Protein Against a Profile HMM
Alignment
Protein
Apply Viterbi algorithm to find optimal hidden path!
A! C! D! E! (-) A! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
AF!
S E
(-)
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 27: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/27.jpg)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Profile HMM diagram
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 28: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/28.jpg)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Checkpoint: How many rows and columns does the Viterbi graph of this profile HMM have?
Profile HMM diagram
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 29: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/29.jpg)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Profile HMM diagram
Viterbi graph of profile HMM:
#columns=#visited states
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8 I8
A! (-) (-) E! F! D! D! C!F!
I0 I0
M1 M1
D1 D1
I1 I1
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 30: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/30.jpg)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Profile HMM diagram
Checkpoint: What is wrong with this Viterbi graph?
Viterbi graph of profile HMM:
#columns=#visited states
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8 I8
A! (-) (-) E! F! D! D! C!F!
I0 I0
M1 M1
D1 D1
I1 I1
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 31: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/31.jpg)
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Viterbi graph of profile HMM:
#columns=#visited states
Profile HMM diagram
By definition,#columns =#emitted symbols
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8 I8
A! (-) (-) E! F! D! D! C!F!
I0 I0
M1 M1
D1 D1
I1 I1
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 32: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/32.jpg)
I0 I0 I0
I1
I2
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
D1
M2
D2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Profile HMM diagram
Vertical edges enter “silent” deletion states
Nearly correct Viterbi graph of profile HMM:
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 33: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/33.jpg)
I0 I0
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
M1
I0
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
I4
M5
D5
I5
M6
D6
I6
M7
D7
I7
M8
D8
I8
D1
D2
D3
D4
D5
D6
D7
D8
!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
D6
A! E! F! F! D! C!
I8 S E
D!(-) (-)
Profile HMM diagram
Adding 0-th column that contains only silent states
Correct Viterbi graph of profile
HMM:
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 34: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/34.jpg)
Alignment with a Profile HMM
Sequence Alignment with Profile HMM Problem: Align a new sequence to a family of aligned sequences using a profile HMM.• Input: A multiple alignment Alignment, a string Text,
a threshold θ (maximum fraction of insertions per column), and a pseudocount σ.
• Output: An optimal hidden path emitting Text in the profile HMM HMM(Alignment, θ, σ).
A! C! D! E! (-) A! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
AF!
S E
(-)
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 35: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/35.jpg)
Have I Wasted Your Time?
A! C! D! E! (-) A! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
AF!
S E
(-)
M1
M2
I2
I2
M3
M4
D5 M6
D7 M8
A!
C!
A!
F!
D!
E!(-)
A!(-)
F!Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 36: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/36.jpg)
A! C! D! E! (-) A! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
AF!
S E
(-)
The choice of alignment path is now based on varying transition and emission probabilities!
M1
M2
I2
I2
M3
M4
D5 M6
D7 M8
A!
C!
A!
F!
D!
E!(-)
A!(-)
F!
si-1, j + score(vi, -)si, j-1 + score(-,wj)si-1, j-1+ score(vi,wj)
si, j = max
sI(j-1),i-1 * weight(I(j-1),M(j),i-1)sD(j-1),i-1 * weight(D(j-1), M(j),i-1)sM(j-1),i-1 * weight(M(j-1), M(j),i-1)
sM(j),i = max
Have I Wasted Your Time?
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.
![Page 37: Classifying Proteins into Families02251/recitations/... · But what are Transition and Emissionmatrices? D 1 M 1 M 2 M 3 M 4 M 5 M 6 M 7 8 2 3 D 4 D 5 6 D 7 8 S I 0 1 I 2 3 4 I 5](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f03fe167e708231d40bc8d0/html5/thumbnails/37.jpg)
I Have Not Wasted Your Time!
Individual scoring parameters for each edge in the alignment graph capture subtle similarities that evade traditional alignments.
M1
M2
I2
I2
M3
M4
D5 M6
D7 M8
A!
C!
A!
F!
D!
E!(-)
A!(-)
F!
sI(j-1),i-1 * weighti-1(I(j-1),M(j))sD(j-1),i-1 * weighti-1(D(j-1), M(j))sM(j-1),i-1* weighti-1(M(j-1), M(j))
sM(j),i = max
A! C! D! E! (-) A! F!!!
!!
D1
M1 M2 M3 M4 M5 M6 M7 M8
D2 D3 D4 D5 D6 D7 D8
I0 I1 I2 I3 I4 I5 I6 I7 I8
AF!
S E
(-)
Bioinformatics Algorithms: An Active Learning Approach. Copyright 2018 Compeau and Pevzner.