Motif -...
Transcript of Motif -...
![Page 1: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/1.jpg)
'&
$%
Motifinference
![Page 2: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/2.jpg)
'&
$%
Dispersedrepeatmotifsormotifscommontoasetof
strings
![Page 3: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/3.jpg)
'&
$%
Motifsearch�
Motifinference
Search
aknownmotif
atext
=)
positionsinthetext
wherethemotif
is\found"
Inferencea
setofproperties
atext
=)
motifssatisfying
theproperties
![Page 4: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/4.jpg)
'&
$%
Motifsearch(verybrie y)
W
hatisthebestwayofrepresentingamotif?
pattern
positionweightmatrix
orpro�leHMM
30-40%
falsenegatives
45-60%
falsepositives
neuralnetworks
betterismoreexamples
![Page 5: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/5.jpg)
'&
$%
Exampleofapositionweightmatrix:Positions3to9of
theCRP
bindingsite
T
T
G
T
G
G
C
T
T
T
T
G
A
T
A
A
G
T
G
T
C
A
T
T
T
G
C
A
C
T
G
T
G
A
G
A
T
G
C
A
A
A
G
T
G
T
T
A
A
A
T
T
T
G
A
A
T
T
G
T
G
A
T
A
T
T
T
A
T
T
A
C
G
T
G
A
T
A
T
G
T
G
A
G
T
T
G
T
G
A
G
C
T
G
T
A
A
C
C
T
G
T
G
A
A
T
T
G
T
G
A
C
G
C
C
T
G
A
C
T
T
G
T
G
A
T
T
T
G
T
G
A
T
G
T
G
T
G
A
A
C
T
G
T
G
A
C
A
T
G
A
G
A
C
T
T
G
T
G
A
G
![Page 6: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/6.jpg)
'&
$%
Correspondingfrequencyandlog-likelihoodposition
weightmatrices
Frequencymatrix
A
0.35
0.043
0
0.043
0.13
0.83
0.26
C
0.17
0.087
0.043
0.043
0
0.043
0.3
G
0.13
0
0.78
0
0.83
0.043
0.17
T
0.35
0.87
0.17
0.91
0.043
0.087
0.26
Log-likelihoodpositionweightmatrix
A
0.48
-2.5
�
1
-2.5
-0.94
1.7
0.061
C
-0.52
-1.5
-2.5
-2.5
�
1
-2.5
0.28
G
-0.94
�
1
1.6
�
1
1.7
-2.5
-0.52
T
0.48
1.8
-0.52
1.9
-2.5
-1.5
0.061
![Page 7: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/7.jpg)
'&
$%
Example of a pro�le HMM
i1
d1
m1
i0
b e
i2
d2
m2
i3
d3
m3
CCCCC
1
AGDVK
2
�
FWYFY
�
3
X X XX
C
X
¡
FY
Ø ü
![Page 8: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/8.jpg)
'&
$%
Di�erent HMM or HMM-related architectures
1 2 3
¢
4
1 2
£
3
¢
4
¤
5
¥
6
¦
1 2 3
¢
4 5
¥
6
¦
1 2 3
¢
4
BLOCKS
META-MEME
profile HMM
HMMER2 "Plan 7"
Ø v
![Page 9: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/9.jpg)
'&
$%
Motifinference:Setofproperties
Mainproperty:motifofinterest=
\conserved"element
Variouspossiblemeasuresfor\conservation"
conservationatthesequencelevel?
conservationatthelevelofphysico-chemical
propertiesofthenucleotidesequences?
![Page 10: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/10.jpg)
'&
$%
inthistalk:
\letter"conservation
physico-chemicalconservation
TATAAT
runofpyrimidines
TTGNCA
RFXCP
runofhydrophilesaa
oramixtureofboth
TA[AT]N[AT]T
[ILMV][ASG]XXC[ILMV]H[FYW
]P
![Page 11: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/11.jpg)
'&
$%
inthistalk:
\letter"conservation
physico-chemicalconservation
TATAAT
runofpyrimidines
TTGNCA
RFXCP
runofhydrophilesaa
oramixtureofboth
TA[AT]N[AT]T
[ILMV][ASG]XXC[ILMV]H[FYW
]P
![Page 12: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/12.jpg)
'&
$%
\Statistical"conservationmeasure
G
T
T
T
T
T
C
T
C
T
G
C
A
T
C
T
G
T
G
T
A
A
C
C
G
G
G
T
A
T
G
T
T
T
G
T
C
T
C
T
G
C
T
T
A
T
C
T
A
T
G
T
C
T
C
T
G
A
G
T
A
T
C
A
G
T
G
T
A
G
G
T
G
T
G
A
A
T
C
A
A
1
1
0
1
7
1
0
2
C
1
1
0
1
2
0
8
1
G
7
1
8
0
0
1
2
0
T
1
7
2
8
1
8
0
7
![Page 13: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/13.jpg)
'&
$%
\Mostsurprising"setsofwords
PLi
=
1
P�2�
fi�log2
fi�
f�
(relative
entropy)
A G T C
G A C T
T G C A
C G A T
G C A T
f�=14
0=
2
2
2
2
2
weighted
average
ofthelog-likelihood
(theweightsare
thefrequencies)
![Page 14: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/14.jpg)
'&
$%
\Mostsurprising"setsofwords
PLi
=
1
P�2�
fi�log2
fi�
f�
(relative
entropy)
A G T C
G A C T
T G C A
C G A T
G C A T
f�=14
0=
0
0
0
0
0
![Page 15: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/15.jpg)
'&
$%
\Mostsurprising"setsofwords
PLi
=
1
P�2�
fi�log2
fi�
f�
(relative
entropy)
A A A A
T T T T
C C C C
G G G G
C C C C
f�=14
10=
2
2
2
2
2
![Page 16: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/16.jpg)
'&
$%
\Mostsurprising"setsofwords
PLi
=
1
P�2�
fi�log2
fi�
f�
(relative
entropy)
A A A A
A A A A
A A A A
A A A A
A A A A
A
fA
=
11
6
20=
4
4
4
4
4
![Page 17: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/17.jpg)
'&
$%
\Mostsurprising"setsofwords
PLi
=
1
P�2�
fi�log2
fi�
f�
(relative
entropy)
A A A A
A A A A
A A A A
A A A A
A A A A
fA
=34
2=
0.4
0.4
0.4
0.4
0.4
![Page 18: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/18.jpg)
'&
$%
\Deterministic"conservationmeasure
\Model"
G
T
G
T
A
T
C
T
2
G
T
T
T
T
T
C
T
2
C
T
G
C
A
T
C
T
2
G
T
G
T
A
A
C
C
2
G
G
G
T
A
T
G
T
2
T
T
G
T
C
T
C
T
2
G
C
T
T
A
T
C
T
2
A
T
G
T
C
T
C
T
2
G
A
G
T
A
T
C
A
2
G
T
G
T
A
G
G
T
2
G
T
G
A
A
T
C
A
![Page 19: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/19.jpg)
'&
$%
Model
A
motifa
wordwrittenoverthesamealphabetasthetext,
oroveradegenerate(physico-chemical)alphabet
A
numberofspeci�cproperties
minimum
numberofoccurrencesthemotifmust
have(quorum)
foreachoccurrence,maximum
numberof
di�erencesallowedinrelationtothemotif
(subs.only,orsubs.andindels)
![Page 20: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/20.jpg)
'&
$%
Infact,thetwoarenotsodi�erent
![Page 21: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/21.jpg)
'&
$%
Exceptperhapsfor:
C
T
G
T
A
T
C
G
C
T
G
A
T
T
C
G
C
T
G
A
G
A
C
G
G
T
G
C
A
T
C
G
C
T
C
G
C
T
C
G
C
T
G
C
G
T
C
G
C
T
G
T
C
T
C
G
C
T
G
C
T
T
C
G
C
T
G
T
C
T
C
G
C
T
G
G
A
T
C
G
A
0
0
0
2
3
1
0
0
C
9
0
1
3
3
0
10
0
G
1
0
9
2
2
0
0
10
T
0
10
0
3
2
9
0
0
![Page 22: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/22.jpg)
'&
$%
W
hichmay,atcurrenttime,perhapsbebettercaptured
by:
\Model"
C
T
G
N
N
T
C
G
0
C
T
G
T
A
T
C
G
0
C
T
G
A
T
T
C
G
1
C
T
G
A
G
A
C
G
1
G
T
G
C
A
T
C
G
1
C
T
C
G
C
T
C
G
0
C
T
G
C
G
T
C
G
0
C
T
G
T
C
T
C
G
0
C
T
G
C
T
T
C
G
0
C
T
G
T
C
T
C
G
0
C
T
G
G
A
T
C
G
![Page 23: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/23.jpg)
'&
$%
Approachesusingastatisticalconservationmeasure
Objective
Findthesetofwordsthatisthe\mostsurprising
possible"
Itisanoptimisationproblem,whichingeneralleads
toauniquesolution
Algorithm
Onlyapproachpossible:testallsetofwordsand,
foreachofthem,calculatethevalueoftheformula
Tootimeconsuming(O(nNk)),onemusttherefore
useheuristics
![Page 24: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/24.jpg)
'&
$%
\Heuristic"
Threemainapproaches
Expectation-Maximization
(Lawrenceetal.,Proteins,7:41-51,1990)
MEME(Baileyetal.,MachineLearn.21:51-80)
Gibbssampling
(Lawrenceetal.,Sci.,262:208-214,1993)
Greedyalgorithm
(w)consensus(Hertzetal.,Bioinfo.,15:563-577,
1999)
![Page 25: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/25.jpg)
'&
$%
Gibbssampling
p
forallp
value
ofthe
form
ula:Fp
and
we
startagain
(with
anotherstring)
untilconvergence
m
ax
Fp
or(stochastic)with
prob.
Fp
Pp
Fp
![Page 26: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/26.jpg)
'&
$%
Approachesusinga\deterministic"conservationmeasure
Objective
Givenamodel(alphabetforthemotifsand
propertiessuchasquorum
andmaximum
di�erence
rateallowed),�ndallmotifswhichsatisfythe
properties
Itisanenumerationproblem,whichproducesin
generalvarious(oftenagreatnumberof)solutions
Algorithm
Anexhaustiveapproachispossible
Timecomplexitydependsonproperties
![Page 27: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/27.jpg)
'&
$%
How
doesthealgorithm
work?
Itdoesnotmattersincethealgorithm
isexact!
![Page 28: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/28.jpg)
'&
$%
However,thishasingeneraltobefollowedby
aSTATISTICALEVALUATION
ofthemodelsfoundtoclassifythem
accordingto
how
SURPRISING
theyaregiventheremainingofthe
sequences
![Page 29: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/29.jpg)
'&
$%
Onecanmakethemodelsmorecomplex:motifinference
withdi�erencesandanontransitiverelation
Alphabetofmodelscorrespondstogroupsofaminoacids
-
wild
card
M
F
W Y
H
K RD
Q E
N T
L I
V
C
S
A
G
P
![Page 30: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/30.jpg)
'&
$%
Onecanmakethemodelsmorecomplex:motifinference
withdi�erencesandanontransitiverelation
Example
modelswrittenoveraphysico-chemicalalphabet
[A
ST][ILM
V]X
X
[FY
W
][H
K
R]X
[P
G]C
occurrences
0di�erence
1substitution
1deletion
AIAGW
HAPC
ATTAYHSPC
SVMLFLPC
![Page 31: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/31.jpg)
'&
$%
Onecanmakethemodelsmorecomplex:structuredmodels
Smile(Marsanetal.,JCB,7:345-362,2000)
anorderedcollectionofpboxes,pmaximum
ratesof
di�erences,p�
1
intervalsofdistances
(betweensuccessiveboxesinthecollection)
occurrences
quorum
=
3/4
18
TTG
ACT
TAAAAT
17
TTG
ACA
TATAAA
TTG
CCA
trop
loin
TATTAT
17
TTG
TCT
TATAAT
e1
=
2
TTG
ACA
d�
�
17
�
1
e2
=
1
TATAAT
![Page 32: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/32.jpg)
'&
$%
Onecanmakethemodelsmorecomplex:structuredmodels
anorderedcollectionofpboxes,pmaximum
ratesof
di�erences,p�
1
intervalsofdistances
(betweensuccessiveboxesinthecollection)
occurrences
quorum
=
3/4
TTG
ACT
18
TAAAAT
16
TTG
ACA
TATAAA
TTG
CCA
too
far
TATTAT
17
TTG
TCT
TATAAT
e1
=
2
TTG
ACA
d�
�
17
�
1
e2
=
1
TATAAT
![Page 33: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/33.jpg)
'&
$%
A
few
applications
\Experimental"set
Escherichia
coli
441sequences,35115nucleotides
Bacillussubtilis
131sequences,13099nucleotides
\Genomic"set
Escherichia
coli
1062sequences,196736nucleotides
Bacillussubtilis
1148sequences,226928nucleotides
![Page 34: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/34.jpg)
'&
$%
\Experimental"set{MEME
Escherichia
coli
MOTIF1
width
=
46
sites
=
185.2
bits
2.2
2.0
1.7
*
1.5
*
Information
1.3
*
*
content
1.1
*
*
(10.0bits)
0.9
**
*
0.7
*
**
*
0.4
*
**
*
0.2
***
*
**
***
**
*
0.0
----------------------------------------------
Multilevel
AAATAAAAGTTGACATTTTTTGGAGTAAATGGTATAATGCGCCCCC
consensus
CTTATTTCT
TGACAACGCGCCCAATTTGTT
A
C
T
CGGGGA
sequence
C
CTA
C
CACGAATGTCCGCC
A
A
T
GGC
A
C
T
C
![Page 35: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/35.jpg)
'&
$%
\Experimental"set{MEME
Bacillussubtilis
MOTIF
1
width=
30
sites
=
121.0
bits2.2
2.0
1.7
1.5
Information
1.3
*
*
content
1.1
*
*
*
(11.6
bits)
0.9
**
**
*
0.7
***
**
*
*
0.4
***
**
***
0.2
******
**
*******
0.0
------------------------------
Multilevel
TTGACATTATTTTAAAAATATGATATAATA
consensus
TTATAATAAAATTTTGT
G
A
G
sequence
C
CC
AG
T
![Page 36: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/36.jpg)
'&
$%
\Experimental"set{Combinatorialalgorithm
(1box)
Escherichia
coli
Bacillussubtilis
ATAATGCGG
34
3.90
24
TATAATA
94
48.06
32
TATAATGCGC
23
1.60
19
GTATAAT
74
34.34
24
Family1
ATAATGCGC
30
5.75
17
TGTTATA
66
34.96
15
TGTGTATA
47
15.85
16
TTTTACA
76
45.96
13
ACAATGCGC
24
3.85
15
ATAATAT
82
52.52
13
GTTGACAC
36
10.80
14
GTGACA
68
39.76
12
TCACACTT
36
11.10
13
TTTACAA
75
48.56
10
Family2
TGACACTT
38
12.35
13
GTTGAC
66
40.10
10
GCTGACA
64
31.55
12
TTGACA
92
66.34
10
ACACTTAT
41
14.95
12
ATGATA
10
80.26
10
TTGACACT
37
13.75
11
TTACGCTG
39
12.80
14
Family3
TGTTACGC
39
14.45
12
TTTACGCT
44
17.85
11
Family4
TTTTTTTTTC
23
5.40
11
Family5
GCGCCCC
44
18.85
10
![Page 37: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/37.jpg)
'&
$%
\Experimental"set{Combinatorialalgorithm
(2boxes)
Escherichia
coli
Bacillussubtilis
[4,6]
[6,8]
[9,11]
[14,16][15,17]
[17,19][16,18]
[19,21][18,20]
[22,24]
[5,7]
[7,9][8,10]
[10,12][11,13][12,14][13,15]
[20,22][21,23]
[23,25][24,26]
Χ2
[4,6]
[6,8]
[9,11]
a
[14,16][15,17]
[17,19][16,18]
[19,21][18,20]
[22,24]
[5,7]
[7,9][8,10]
[10,12][11,13][12,14][13,15]
[20,22][21,23]
[23,25][24,26]
Χ2
distances between tw
o parts of a model
8 9
121110
TTATTC_TATAAT
TTGACT_ATAATG
distances between tw
o parts of a model
TTGACA_TATAAT
b18171615141312
TTGACT_TAAAAT
TTGACT_TAAAAT
![Page 38: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/38.jpg)
'&
$%
\Genomic"set{MEME
Escherichia
coli
MOTIF
1
width=
30
sites
=
111.4
bits2.2
2.0
1.7
1.5
Information
1.3
*
*
content
1.1
*
*
*
*
(12.3
bits)
0.9
*
*
*
*
0.7
***
****
0.4
**
****
*****
0.2
***
*****
*******
0.0
------------------------------
Multilevel
AATTTTAAATTGTGATCTAAATCACATATT
consensus
CGAAGATTTA
C
AGTGT
G
ATAA
sequence
G
G
TAGT
G
![Page 39: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/39.jpg)
'&
$%
\Genomic"set{MEME
Escherichia
coli
MOTIF
2
width=
39
sites
=
128.7
bits
2.2
2.0
1.7
1.5
Information
1.3
content
1.1
(12.0
bits)
0.9
*
0.7
*
*
*
0.4
**
*
*
*
*
*
**
*
*
0.2
****
****
*
*
***
****
*
**
*
0.0---------------------------------------
Multilevel
TAATTAATATACACAATTTTTTTTTTATTTTCATGATTT
consensus
AC
AATTATCTAGTTAAAACAAGAATAAAAT
TCAAA
sequence
C
C
CGTA
C
GG
G
A
C
T
C
C
![Page 40: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/40.jpg)
'&
$%
\Genomic"set{MEME
Escherichia
coli
MOTIF
3
width=
12
sites
=
181.4
bits
2.2
2.0
1.7
*
1.5
*
Information
1.3
**
content
1.1
**
(6.2bits)
0.9
**
0.7
**
0.4
**
*
0.2
********
**
0.0
------------
Multilevel
CGCCCTGTTTGC
consensus
T
GACTCCGTG
sequence
AGG
ACT
![Page 41: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/41.jpg)
'&
$%
\Genomic"set{MEME
Bacillussubtilis
MOTIF
1
width=
12
sites
=
308.7
bits
2.2
2.0
1.7
1.5
Information
1.3
content
1.1
**
(5.6bits)
0.9
**
0.7
**
0.4
******
0.2
********
0.0
------------
Multilevel
AAAAAAAGGAGG
consensus
TGG
ACGAA
sequence
CT
T
T
![Page 42: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/42.jpg)
'&
$%
\Genomic"set{MEME
Bacillussubtilis
MOTIF
2
width
=
22
sites=
54.4
bits
2.2
2.0
1.7
1.5
Information
1.3
content
1.1
(8.3
bits)
0.9
0.7
**
*
*
0.4
***
*
**
*
*
0.2
******
*****
*
**
**
0.0
----------------------
Multilevel
GGCAGCAGCCCGTGCAGAGCGA
consensus
C
T
C
AAA
GAATACCGAG
sequence
G
T
A
CTAAC
![Page 43: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/43.jpg)
'&
$%
\Genomic"set{MEME
Bacillussubtilis
MOTIF
3
width
=
43
sites
=
173.2
bits
2.2
2.0
1.7
1.5
Information
1.3
content
1.1
(11.3bits)
0.9
0.7
*
*
0.4
**
*
**
*
*
0.2
*****
*
*****
********
**
*
*
*
*
0.0
-------------------------------------------
Multilevel
TTTTTTCATAATTTTTTTTTTTTTCTTTTTTTATTTAATATTT
consensus
CCCCCAACACCAACCACACACCCTCA
ACCTCAACTATAGA
sequence
CTT
TT
CA
G
C
G
T
C
C
![Page 44: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/44.jpg)
'&
$%
\Genomic"set{Combinatorialalgorithm
(1box)
Escherichia
coli
Bacillussubtilis
CCTGAC
573
424.60
39
TATGATA
627
407.05
91
CTGACG
587
439.70
38
TATCATA
615
403.00
84
Family1
CTGACA
701
557.00
36
TATAATAA
445
277.95
58
TCCTGA
671
538.70
30
TTATTATA
439
273.85
57
CCCTGA
575
446.80
28
TACTATA
491
325.70
54
GTCAGG
576
412.10
47
ATGATAA
617
477.10
36
TGTCAG
702
555.00
37
ATGAGAA
500
377.15
29
Family2
CATCAG
711
574.60
32
TGAGAAA
520
417.85
19
CGTCAG
580
443.60
32
ATCAGG
689
553.30
32
TTTTCTG
553
419.20
31
TGACAAA
510
405.20
21
CTCTTTT
464
348.50
25
Family3
TTTCTGT
469
357.05
23
TTTTCAG
531
416.40
23
CTGATTT
498
384.60
23
CAGAAAA
539
410.55
29
CCTTTTT
638
413.05
95
CTGAAAA
525
407.75
24
CCTTTTC
496
291.10
84
Family4
GAGAAAA
460
359.50
19
CTCTTTT
600
391.80
81
AGATAAA
512
415.60
16
CTTTTCT
613
410.90
77
GTGAAAA
509
414.75
16
CTTTTTC
652
451.20
76
etc
![Page 45: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/45.jpg)
'&
$%
\Genomic"set{Combinatorialalgorithm
(2boxes)
Escherichia
coli
Bacillussubtilis
19 23 27 31 35 Χ2
[4,6]
[6,8]
[9,11]
[14,16][15,17]
[17,19][16,18]
[19,21][18,20]
[22,24]
[5,7]
[7,9][8,10]
[10,12][11,13][12,14][13,15]
[20,22][21,23]
[23,25][24,26]
TTGACA_TATAAT
TTGACA_TATAAT
GAAAAA_TTTTTC
distances between tw
o parts of a model
b
ATTGAC_TATAAT
a
[4,6]
[6,8]
[9,11]
[14,16][15,17]
[17,19][16,18]
[19,21][18,20]
[22,24]
[5,7]
[7,9][8,10]
[10,12][11,13][12,14][13,15]
[20,22][21,23]
[23,25][24,26]
Χ2
13 17 21 25 29 31
TTGTGA_TCACAT
TGTGAT_ACATTT
TGTGAT_TCACAT
TGTGAT_TCACAT
distances between tw
o parts of a model
![Page 46: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/46.jpg)
'&
$%
\Noise"inthedata
Approachesusingastatisticalconservationmeasure
Donotselectanoccurrenceinasequenceifthe
scoreobtainedisbelow
agiventhresholdforallp
Approachesusingadeterministicconservationmeasure
Quorum
![Page 47: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/47.jpg)
'&
$%
Variablelengthofthemotifs
Approachesusingastatisticalconservationmeasure
Problem
:therelativeentropyisalwayspositive
andcanonlyincrease
Twopossiblesolutions
Normalizetheentropybythematrixlength
Estimatea\p-value"
Approachesusingadeterministicconservationmeasure
Noproblem
![Page 48: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/48.jpg)
'&
$%
Variousdi�erentfamiliesofmotifsinasamesequence
dataset
Approachesusingastatisticalconservationmeasure
Variousmatricesarekept
Approachesusingadeterministicconservationmeasure
Noproblem
(onthecontrary)
![Page 49: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/49.jpg)
'&
$%
\Toomany"motifsfoundbytheapproachesusinga
deterministicconservationmeasure
A
posterioristatisticalevaluationofthemotifsfound
Careful!Di�erentingeneralfrom
thestatistics
employedbyGibbs
A
prioriprobabilityofagivenmotif(wordorsetof
words)
Sameprobabilitybutestimatedbysimulation
ApplicationofmethodssuchasGibbsonthemotifs
initiallyfoundbyanexhaustivesearch
Comparisonwithobservedon\counter-exampledata"
![Page 50: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/50.jpg)
'&
$%
Otherconstraints
Palindromicorrepeatedmotifs
Quiteafew
approachesmayconsidersuchmotifs
Positioninrelationtoabiologicallandmarkinthesequence
Someapproaches(vanHeldenetal.,NAR,
28:1808-1818,2000inparticular)takethisinto
account(duringtheidenti�cationsteporatprinting
time)
![Page 51: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/51.jpg)
'&
$%
New
approaches
Inferencefrom
asetofphylogeneticallyrelatedsequences
(\Phylogeneticfootprinting")
Simplewayofconstructingasetofmolecular
sequencesthatisreducedinsizeandpotentially
containsless\noise"
Motifconservationmeasureswhichtakeinto
accountthephylogenyoftheorganisms(Blanchette
etal.,ISMB
2000,
http://ismb00.sdsc.edu/technical-program.html)
![Page 52: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/52.jpg)
'&
$%
Phylogeneticfootprinting{Mainidea
A
setofphylogeneticallyrelatedsequences
TTCG
ATCG
AACG
ATGG
TTCG
...AACG......AATG...
...TACG......TTCG...
1
1
1
0
0
1
1
0
total:5
![Page 53: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/53.jpg)
'&
$%
Phylogeneticfootprinting{A
hintofthediÆculties
possiblyevolutionary
unrelatedsequences
\our"motifs(motifs)
TATA
AAAT
AAAT
AATA
AAAT
TAAA
\ancestor"ofastar-tree
(butnotthemostparsimonious)
![Page 54: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/54.jpg)
'&
$%
Phylogeneticfootprinting{A
hintofthediÆculties
evolutionaryrelated
sequences(orthologs)
themotifsweshouldseek
motif
\ancestor"ofthe\true"evolutionarytree
(underparsimony)forthespeciesconcerned
![Page 55: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/55.jpg)
'&
$%
Phylogeneticfootprinting{A
hintofthediÆculties
evolutionaryrelated
sequences(orthologs)
themotifsweshouldseek
??
plusotherevolutionaryrelated(ina
(di�erentway)sequences(paralogs)
![Page 56: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/56.jpg)
'&
$%
Phylogeneticfootprinting{A
hintofthediÆculties
evolutionaryrelated
sequences(orthologs)
themotifsweshouldseek
????
how
tomodelsuch
\multi-dimensionalconservation"
?
plusotherevolutionaryrelated(ina
(di�erentway)sequences(paralogs)
plusevolutionaryunrelatedsequences
![Page 57: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/57.jpg)
'&
$%
A
specialuseofphylogeneticfootprinting{Gene�nding
by\purehomology"
![Page 58: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/58.jpg)
'&
$%
A
veryelementaryview
ofaneukaryoticgene
3
nucleotides
(codon)!
1
am
ino
acid
5'U
TR
5'
3'U
TR
3'
exon
intron
startcodon
G
T
(donor
site)
AG
(acceptor
site)
stop
codon
splicing
gene
!
protein
5'U
TR
and
3'U
TR
:transcribed
into
R
N
A
butnottranslated
![Page 59: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/59.jpg)
'&
$%
Gene�nding{A
few
generalities
Detectionbysignal
Promotersequence(verydiÆcult)
Splicing(donorandacceptor)sites
PolyA
signal
Detectionbydi�erenceofcomposition
Themostcommon:di�erentk-mercounts(oftenk=
6)
Detectionbyhomologywith\known"(storedina
database)sequence
(Observehomologyis\stronger"thansimilarity)
![Page 60: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/60.jpg)
'&
$%
A
complicatedcaseofgene�nding{\Orphangenes"
Anorphangeneisageneforwhichnohomology(inthe
sensehereof\strongenough"similarity)hasbeen
detectedwiththesequencesstoredinthedatabases
![Page 61: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/61.jpg)
'&
$%
Mainideaaroundtheproblem
Anorphanmayhave\parents",thatisotherorphanslike
itselfwhichareitsHOMOLOGS(havingcommon
ancestor)
ORTHOLOGSpossiblymoresimilar
PARALOGS
possiblyhavingdivergedmore
inbothcases,havingpossiblydi�erentgenestructures
(i.e.
adi�erentnumberofexons)
![Page 62: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/62.jpg)
'&
$%
Additionalhypothesis
(importantbutisitalwaysjusti�ed?)
Exonsare\betterconserved"or,moreaccurately,
\di�erentlyconserved"thanintronsor5'UTR
or3'UTR
orintergenicsequence
![Page 63: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/63.jpg)
'&
$%
Flavourofmethod
Findstructurebycomparingorphanswhicharehomologs
usingasinformationonlythebareessentials(methodby
\purehomology")
Usingadynamicprogrammingapproachwithafew
twists
Sequencesarecomposedofcodingandnoncoding
regions
Therearetwopotentialtypesof\errors"
Nature's(gaps,substitutions)
Man's(sequencingreadingerrors=
\frameshifts")
Utopia(Blayoetal.,acceptedTCS)
![Page 64: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/64.jpg)
'&
$%
Objective
Findbestassemblyofexons
whichsatis�es\bareessentials"genemodel
where\best"meanshighestscoringassemblageofexons
![Page 65: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/65.jpg)
'&
$%
W
hyacombinatorial,\bareessentials"typeofapproach?
Itdoesnotsubstituteforother,statisticalinparticular,
approaches
Itcannot(perhaps)evencompetewiththem
(itwasnot
meantto)
BUTIt
isGENERIC
Itallows,indeedobligestothinkoveragainour
notionsof\conservation"and,inparticular,ofthe
non-conservationofnon-codingregions
Itisindependentofwhatcanbelearnedfrom
speci�ccharacteristicsofknownexamplesofgenes
![Page 66: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/66.jpg)
'&
$%
Preliminaryapplications(1)
13ADH
proteingenesofplants(amongthem
Arabidopsis
thaliana)
dicoandmonocotyledones
oneparalogand12orthologs
ofdi�erentgenestructuresandlengths
5to10exons
from
942bp.to1046bp.
![Page 67: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/67.jpg)
'&
$%
-4500
-4000
-3500
-3000
-2500
-2000
-1500
-1000
-500
0
0 500 1000 1500 2000 2500
D84
240
M36
469
M59
082
U36
586
U53
701
U63
931
U65
972
X02
915
X04
050
X54
106
Z24
755
X12733 (9 exons)
X12733 compared with 11 related sequences with pam120, intronIndel 20. Specif : 97% Sensit : 98%
’annot’’D84240’’M36469’’M59082’’U36586’’U53701’’U63931’’U65972’’X02915’’X04050’’X54106’’Z24755’
![Page 68: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/68.jpg)
'&
$%
Sensitivityandspeci�city
Sensitivity
sensitivity=
numberofcorrectlypredicteditems
numberofactualitems
=
TP
TP
+
FN
Speci�city
specificity=
numberofcorrectlypredicteditems
numberofpredicteditems
=
TP
TP
+
FP
![Page 69: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/69.jpg)
'&
$%
Preliminaryapplications(2)
7genesfrom
amultigenefamilyinArabidopsisthaliana
ofunknownfunction
goingbythenameofMYST
ofdi�erentgenestructuresandlengths
13to15exons
from
1848bp.to2040bp.
![Page 70: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/70.jpg)
'&
$%
-7000
-6000
-5000
-4000
-3000
-2000
-1000
0
0 1000 2000 3000 4000 5000 6000 7000
MY
ST
2 M
YS
T3
MY
ST
4 M
YS
T5
MY
ST
6 M
YS
T7
MYST1 (15 exons)
MYST1 compared with 6 related sequences with pam120, intronIndel 20. Specif : 93% Sensit : 97%
’annot’’MYST2’’MYST3’’MYST4’’MYST5’’MYST6’’MYST7’
![Page 71: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/71.jpg)
'&
$%
-7000
-6000
-5000
-4000
-3000
-2000
-1000
0
0 500 1000 1500 2000 2500 3000 3500
MY
ST
1 M
YS
T3
MY
ST
4 M
YS
T5
MY
ST
6 M
YS
T7
MYST2 (13 exons)
MYST2 compared with 6 related sequences with pam120, intronIndel 20. Specif : 93% Sensit : 98%
’annot’’MYST1’’MYST3’’MYST4’’MYST5’’MYST6’’MYST7’
![Page 72: Motif - pbil.univ-lyon1.frpbil.univ-lyon1.fr/members/duret/cours/chile161001/cours/Sagot-cours.pdf · Motif searc h (v ery brie y) What is the b est w a y of represen ting a motif?](https://reader033.fdocuments.us/reader033/viewer/2022041615/5e3a6931de827b0a9f3b571a/html5/thumbnails/72.jpg)
'&
$%
Mainidea(currentlybeingexplored)
Puttingtogethermotifinferenceandgenedetectionby
multiplecomparison