Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007
Benasque RNA 2012: RNA Motifs
-
Upload
paul-gardner -
Category
Technology
-
view
158 -
download
2
Transcript of Benasque RNA 2012: RNA Motifs
Hunting RNA motifs
Paul Gardner
July 23, 2012
Paul Gardner Hunting RNA motifs
Classification problems
I What is this?I The second most abundant M. tuberculosis transcriptI No hits to Rfam, no homologs have been identified, no known
function
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
6e+05
7e+05
Mycobacteria tuberculosis genome: RNA−seq and annotations
Rea
d co
unts
+ve strand−ve strand
CDSsPfam domain
ncRNAsterminators
4099500 4100000 4100500 4101000 4101500 4102000
genome coordinate
Rv3661
HAD
miscrna00343
Rv3662c
Rank Reads Gene
1 5,741K SSU rRNA
2 689K sRNA / RUF?
3 236K RNaseP RNA
4 197K LSU rRNA
5 119K ATP synthase
6 119K HSPX
7 115K tmRNA
8 103K Secreted protein
Arnvig et al.
(2011) PLoS
Pathog.
Paul Gardner Hunting RNA motifs
Classification problems
I What about this?I The seventh most abundant N. gonorrhoeae transcriptI No hits to Rfam, no homologs have been identified, no known
function
0
20000
40000
60000
80000
Neisseria gonorrhoeae genome: RNA−seq and annotations
Rea
d co
unts
+ve strand−ve strand
CDSsPfam domain
ncRNAsterminators
2137500 2138000 2138500 2139000
genome coordinate
NGO2161
Inositol_P
terminator12061
terminator12062 NGO2162
SEC−C
terminator12063
terminator12064
Rank Reads Gene
1 206K RNaseP RNA
2 178K tRNA -Ser
3 130K ZapA 3‘ UTR?
4 120K Phage protein
5‘ UTR?
5 97K tmRNA
6 93K BRO protein
7 81K sRNA?
8 81K Hsp70 protein
9 76K tRNA -Asp
Isabella & Clark(2011) BMCGenomics.
Paul Gardner Hunting RNA motifs
Classification problems
I And this?I The ninth most abundant C. difficile transcriptI No hits to Rfam, no homologs have been identified, no known
function
0
1000
2000
3000
4000
Clostridium difficile genome: RNA−seq and annotations
Rea
d co
unts
+ve strand−ve strand
CDSsPfam domain
ncRNAsterminators
3014000 3014500 3015000 3015500 3016000 3016500 3017000
genome coordinate
toxB toxB toxB
terminator0881
terminator1760 trpS
Rank Reads Gene
1 12K RNaseP RNA
2 12K 6S RNA
3 12K SRP RNA
4 11K ldhA
5 10K tmRNA
6 6K spoVG
7 6K Gly reductase
8 5K slpA
9 4K sRNA / RUF?
10 4K hypothetical prot.
11 4K sRNA / RUF?
Deakin, Lawley
et al. (2012)
Unpublished.
Paul Gardner Hunting RNA motifs
Classification problems
I And this?I The eleventh most abundant C. difficile transcriptI No hits to Rfam, no homologs have been identified, no known
function
0
1000
2000
3000
4000
Clostridium difficile genome: RNA−seq and annotations
Rea
d co
unts
+ve strand−ve strand
CDSsPfam domain
ncRNAsterminators
1760600 1760800 1761000 1761200 1761400 1761600
genome coordinate
feoA2 terminator1637terminator1105 acetyltransferase
Rank Reads Gene
1 12K RNaseP RNA
2 12K 6S RNA
3 12K SRP RNA
4 11K ldhA
5 10K tmRNA
6 6K spoVG
7 6K Gly reductase
8 5K slpA
9 4K sRNA / RUF?
10 4K hypothetical prot.
11 4K sRNA / RUF?
Deakin, Lawley
et al. (2012)
Unpublished.
Paul Gardner Hunting RNA motifs
What is our next big challenge?
I Past:I How many non-coding RNA genes are there?
I Future:I Can we determine the functions, if any, for large sets of given
RNAs?
Paul Gardner Hunting RNA motifs
Will a “periodic table of RNA” be useful?
I My aim is to build an analog to the Periodic Table forclassifying RNA families and motifs, enabling researchers topredict function.
base basepair
R
AU
AGA
U YA
C
AU
U5´
Y
GA
A
R
5´
CU
U CG
G
5´
R
UR R R
Y
5´
R
RGC
GU
R ARA
GCY
5´
RYGGAGY
R RR
RC R
RGA
R
R
5´
CGAAGYY
R
Y
Y RR
GG
GRUGGAG
5´
CCRA
YCC
CRU C
CG
AA
CUYGG
5´
A N Y A G N R A U N C G T loop U turn k t rn1 k t rn2 tw ist
R
CYRG
GAA
C UGA
RC R
UYA
GUA
C
GG
G A R R A5´
Y
YY
AG
U A
G Y RA
G
GAA
RR
R
5´ RY
GR
YAA
Y
C
RY
A YYA
G
RG
AA
YC
5´
R
CAG G A
G
Y5´
AC
AC U
GRY
R
Y G Y R R R R
RYCAR
U
Y
5´RAGC
R
C
GR
A
GY A
YG
YYRGUUY
5´AAAAAGCYRYY R
RYGGYUUUUU
U Y U Y5´RRARR Y
YUUU
U U U Y5´
sar r ic1 sar r ic2 U A A G A N C sr C loop domV term1 term2
R Y Y Y Y
GCGAGCAGACGCARAA
CRCCCRR
Y
R
R
YGGGYG
UUYUGCGUCUGCUCGC
R R R R5´
YUYUC
UCAA
CA
G UGY
UUGR
RRAAY
5´ YY
YY
Y
AUG
A Y GRYY
YYA
AA Y
YY
YY
RR
GR
RYC U G
AU
YY
YR
RR
5´GGGUC UCUCU
GYUAGA CCAGAU
CU G
AGC
CU
G GGA
GCUCUCUGGCUARCUAGGGAACCC
A C5´ UGUAAACAUCCU
Y GACUGGAAGCUG
URR
R Y R YRR
RRGCUUUCAGUCGGAUGUUUGC
5´ U CUUUGGUUAUCUAGCU
GU
AUGAG
UG
YY R
CRU
CAU
AA
AGCUAGAU
ACCGAA
R U5´ CYYR
UCCC
UGAGA
CCCU
AACYUGUGAG
YU
YYYAG
YUU
CACARGU
RGGY
UCUY
GGG
RCY
RGG
5´GCUAAAAGGAACGAUCGUUGUGAUAUGC
GUURR
U UYCGU
UAC
AUAUCACAGUGAUUUUCCUUU AUARC
G C5´ C Y GYG
YYCAUCUUAC
YGR
GCAGUGUUGGA
U
GYY
YRRGY
CUC
UAAYACUGY
CU
GGUAA YGAUGR
CR
Y C G G5´ Y Y Y Y R R G
YACAURCUUCUUUAUAU
CCC AUAY
RA
YRR
RCU
AUGGA
AUGUAAAGAAGUAUGUA
Y Y Y G G Y5´ Y R R YYCRUCAAARUGGYUGUGA
R UGU
Y
R
UCA
UAUCACAGCCACUUUGAUGA
G Y U Y R R5´ Y A A RAAGGGAAYRGUUGCUGUGAURUA
YYY
A Y
YY
Y UYU
AUAUCACAGUGGCUGUUCUUUU
U G G U Y5´ YCRGG
UGAGGUAGUAGGUUGUAUAGUU
RRRR
Y
YYY Y
GG
AGYAACURUACAAYCURCUA
CUUYCCUGR
5´GGCUGGUCCGA
RR
GUAGUGGGU
UA
YRU
YAAYY
Y
Y
UU
RY Y Y Y
UCYC
CCYC
YCACU R
CUR
YAC
UUGACURGCC
U U U5´ Y
YYCUGYRRUGUCGUAR
Y
YYY
YUGARCCRAY
YYYYY
GGG
RGYY
YY
YRG
GYA
G CC
CYY G
GG
AA
RC
AAR
YRRRRYR
CCC A CC
UR
RRY
RYRGGUUCA
RRR
R
YACGGCAYYRYGGR
Y
YY Y5´
YY
RCGRCC
AUA
CRR
R
GRA
RC
AC
CYGRUC
CC
A U CC
GA A
CYCRG
AA
GU U
AA
GC
YYY YG
G C YR R
GUA C U
RG R Y
G RGR
AYC
CUGGGAA
RYRGGUGYYGY
RR
Y5´
G
RUA
GYYYARY
GGY A
R R R CR
YY
RG
YUY A
A
YYRR R
R
YR
RG
G UUC
RARUC
CY
YY
YR5´ R
R
AARYU
CR
YR
RR
RGYYAC
R
RY
GA
GU
R
YY R
YR
CU
C Y
CY
YY
Y G G G A A GG
UC U G A G
A
RG
CCAY
YR
CC
CU
GG
GG
YR Y
YYYYYG
R R
RRGRRR
R Y G R G Y YACCA
GA A A Y
RR Y Y
Y
Y
R
RGYU
UGGAA
RRCUYRYG
GC
YR G Y R R Y U
AGU
CAA
UR
YGRRYRR
Y
YYR
AACYC
RA
UUCAGAC
UA
UCU
Y
Y
5´
T R I T I R E SE C I S m ir -T A R m ir -30 m ir -9 l in -4 m ir -5 m ir -8 m ir -1 m ir -2 m ir -6 let -7 Y R N A 6S 5S tR N A R N aseP
AU
RR
GR
YAGGY
AUUGAA
CU
GU
A U U G U GC
R C C UU
GC
AU
AR
AG
CU
AA
AG
CA
CU
AA
AA
AG
GA
GU
AA
5´
AGUCAUGA
UY
GCUAUUC
YY Y
AAAUAGU
GA
UUGUGAU
AGCG
AUGCGG
YGUGU
UG
CGCACR
YCGYAY
CGC
G C U5´
AG
AG
GA
AR
CR
GGGGCC
AY
GCAG
AA GCGUUC
AC G UC G C G G C C C C
UG
UCAGAUUCR
GU R A A U C U GC
GA
AU
UC
UG
CU
5´ G A U ACAUAGGA
ACCU
CCUC A
AAGGA
UUCUAUG
G A C AGUCGAUGCAGGGAG
G
G A CRR
CUCCCUGCAUCGGC
G A U U U U5´ AC
GRR
GU
R RA
RU
G
C
GA U A A Y A Y
AA
UAA
UGAAAU
UCCUC
UU U G A CGGCCAAUA
GC G
AUAUUGGCC
AUU
UUU
UU
5´ RYCUUUAG
CGGG
YU
R
RR
UY A R U CURGYY
GGYGU U U
CGCC
GRCY Y
U
RCY
YUGA
YRY
5´ RY
YR
YY
CCGUGGU
GA
UUUGRY
CG
GC
CG G C U U G C
AG C C A C G
UU
AA
AY
AA
UC
GC
UA
AA
RA
GG
CC
GR
GG
RR
R5´
GUCGRR
U
Y Y C ACU
G A U G AG U C Y
U R
ARGACG
AAA
C
5´ Y Y R
AUYU
AAARA
AACA G C
U U UC A A
G U G CCU U U Y U G
C A GUU
YYYCARGAGCGC
AAGAUR
G R U A5´RYGGY
Y GYUUGCCAUACGCC
CY
Y Y YYC
GGC A
GGUAUGGAARCA
CCCY
C G Y A CGACUGGY
YC G
GAC
AC
Y GYC
GUCC
CG
CCAG AUC
5´ CACAUCAG
AU U U
CCUGGUGU
A A CGAAUUUUCAAGUG
CU U C
UUGCAUAAGCAA GUUU
RAUCCCG
CYC
CY Y
CGRG
YCGGGAU
U U5´ A U GGAGAC
AUGGCR
UA
A AG C C AG A RA
G U R AGA
AC R U A A C
YU
AGAC
UR
UACUUGAAC
UG
AUUYRC
AUCUC
A U U U U5´
GCR
CYG
CA
A AAU
CRGRYGC
C G G G AU UGG
YAYCCCGRA
Y
RRRRYR
A R C G CY
GCGYUUUUUU
5´
Y U R C G U G A C G A A G CGCG
CG
CAA
AGU
GG A C AAUA
AAG
CCU
R A G CRUYR
AGUAG
UCG YC
AGACGCCGG
UU A A
GCCGGCGUU
U U U U5´ YR
YA
C
G
UR
YCY
GU
UR
UR G
YCCGGUU
GCU
UU
G GUC
GGUGA
CCGGR
R R RRAGCCCRC
UU
GGUGGGYUU
U U U5´
G
G
YCRGCYC
RCC C C C
CR
GRGCYGR
C
CG A C G G C C C C C G C
U CCCC
CCYGGCGGGGGYCGUC
CCYY
5´
U U G G C G A U R UUUUUGGU
U GGAAUG
UAGUGYYY
UU A
R C A C U AA A C
G C U GC
C A C AAAUA
ACCUGU
CAGUUAUUUCA
YCAAAA
A U A A A5´
RY
YR
YU
GCCCUC
Y G G G CG
UU
UC
CU
CC
CU
AG
AC
UU
GGCYYYY
R R G G C CU
UU
UU
UU
UY
YY
5´
SA M V symR C P E B 3 F inP sroB msr SA M a H H 3 V mntn3 l ivK D srA C A E SA R isrK sroD isrB 6C r spL suhB
UYGC
AUCCGCYAA
YCGGUY
A G C C GU G UCG C GG A A G
GUUY Y
YA
AC
CA G C U
R YY U Y Y G R
AACRRAG
RRAGGUG
AGCG
5´
UGAAAGACG
CG
CAUUUG
UU A U C A U C
AU
CC C
UGU Y
CA
GAG
AU
GY
AAU
UU
GG
CC
AC A
GY
RY
GU
GG
CC
UUUUC
5´ * UUCUACUGACU
CU
UUUAAA
AUAA
U UAUUCAUU
GGA
G G U UUAA
UAUGAAUA U
AA A G G A U G A G C
A U AUAG
AAG
CGUUUG
CUCYUU
GUUAGA
UC
RGUUAGUAGGA
A5´G A U U U
GGURRCUGCGCU
CU
U C UA
AGCCAGUUACC
CGGUUCAAARA
UUG C C
AGCUU
YGAACC
U UCGAAAAACCACCU
Y CR
RGGUGGUUU UUUCG
U5´R R R R R R R R
CUCRUAU
AAYYYCRRRAA
UAU G G
Y Y Y G R R AG
U U U C UACC R R G Y R C C
GU
AAAYRYYYG
ACU
AYGAG
RR R5´
CGGCAUC
CCCAU
UA C C
UAUGG A
CA
CGGUGCCG
C A R G C U C U G G R AG UUC
GUYCCRGAGYYUGYYGGAARGGUUUUCCGUGUCCAG
5´
R
R
YGG
ARGCRR
UG
A RYRY
YYYU
YAUY
U G G G CACYU
GRR
RYRY
GGA
GCYA
G U R GUGC
AACCG
RCCR
YR
RR
5´
GUUGUAAC
UA
UGUUGCARY
A R A C G AGAACCGAG
UAUA
GU
UC
AU
GGGRU Y A
CA
UG
AA
UU G U U
UAACURUC
C UC
UGGAU U
CCC
GUCCAU
GRCAGUCGGUUC
5´
CU
UACUGAGAGCAC
AA
AGU
UUCCCGUGC
CA A C
AG G G A G U G U U
AU A
AC G G U
UU
AU
UAGUCUGGAG
AC G GC A G A C U A
UC
CU
CU
UCCCGGU
CCCC
U A UG C C G G G
UU
UU
UU
UU
AU
GU
C5´ UU
RG
RY
UY
RC
CU
GAAU
GUGACU
A U C A C U U CA
AA
CR
RY
GR
GY
AA
CC
UC
AG
UA
UC
AU
CR
YR
GA
GY
UAAACCCUCGCCGCC
UG A C G G Y G A G G G U U U
UC
UU
UU
GG
R5´ U G U A A A A A A C A U Y A U U U
AGCGUGAYU
UU
CUAUCAAC
AGC U A A C
AAUUGUUA
UUACU
G CCUAA
YGYUCAU
AA G G G U A A
UUUUAA
AA
AAGGG C
GAUA
AAA
A ACGAUUG G GG
GAUGAGA
YA
UGAAC
GCU
C A A G C A5´
C C C A G A G G U A U U G A UUGGUGA
U RRCAY
YU C U
RUGYUY
A
UUY A
UUR
CACCA
A C C U G C G C RGAUGCGCAGGU
UUUUUUU
5´
ARR
R Y YY
YY
AA
UR
YC
AA
CY
UU
UA
GC
GC
AC
GGCUCUYY
A A G A G C CA
UU
YC
CC
UAGRCCAAAC
AGGA
A U YG U U U G G Y C U
UU
UU
UU
5´
GGGCARGAUA
UG
UG
AA
GUR
GCY
AC
C
GCA
A GC
YG
RU
A
CY
CU
UC
AC Y
Y Y C CUUA U U
CG C
U
YGC
UCAAC
GGR
AUCYUGCUC
U G C G A G G C Y5´
GU
GC
RR
YC
YR
AU
UY
YR
GYYGYGCCY
RYRARAAC
AU
CA
YA A R A
UA
CG G C R C R R C
CA
CR
AU
UU
CC
CU
GGUG
UUG
GCGCAGU
AU U CG C G C A C C
CC
GG
UC
UA
CC
5´
Y
UUYRYURRUUU
YAUCA
RAY
C U GUU
UGAURRAAGYUARYGAR
R Y Y C A Y UAAC
RGCU
YUYGC
Y G
GCY Y G
AC
CCGAG
RYY
GUU
U U U U U5´
RA
CG
UU
CA
YCCYYY
R G G RC
GC
AY
RA
YCARRYCAYGG
AA
C G GG G
RY Y U G R R
5´
sucA SraD sxy R N A I P ur ine SA M -C hl cd iG M P 2 A nt i -Q G adY rnk ldr P r fA O mrA -B R yeB t raJ 2 SraH 23Smeth D S-pep
U U C G G C C Y CGCRRCG
YU U Y
UY
CGYYGC
C C U C U G C A YGCCGU CGCCG
ACGCAY
UCC
YAUUC
GA A Y Y G U
GCGAUC C U
GUCGC C
YUC
CU
GCGGCGCGGC
5´ CGYRGC
GC
UUGU
UA U U
URYY
G C UG
UG
UAG U G U
C
G
U
CY
YR A R Y Y R G R R Y Y Y
AAACCCCGCCY
UU Y
GGCGGGGUUUU
G C U U U U U5´ ** CUUACCGGAGGY
RUAUGGACC
CU
G A UCC C A
CY C C UCUCCC
C GA
UGG
AG
AAU
YYYU
UUCCGGUAAG
C C Y G Y C U Y YRCUGYYUUAC
CG
G UGY
GUAAGGCAGU
G A C G U Y U5´GGRAG
RYR
YCU
GGU G R
Y
CG
GC
UU
C A AA
CC
GR
Y GR
RG
YR
Y
YY
Y
GGYRGG UU
CGAY
UCCYRY
Y
CUYCC
5´ UGACCCU
U
UA R
CCR
AGGGUCA
C C U A G C C A A C U G A C GUUGU
UA
GUGAAY
YY A
UGUUCAC A
RAUAR
GCCAAUCGC
UUU
GCGRUUGG
C U U U U U U U U U5´ C U U A A URAACAA
GA
AAACYA
AR C GUACYUUC
CY C
CUGA
G UU
CAGGC
UGGAAUGCGC A C
AG C U R
A U U G U U G A U AA G G G C
UACUC
AUACCGACAAGC
CAGUGA
AG
CG
AU
GAAU
GU
CGG
UUC
C A C5´
RUYY
RCU
GAYG
A GUCC
CA
A AUA
GGACG
AAA C G C
GCGUCY
GRAU
5´ CUCCAUGU
AUCUU
U GGGA
CCUGUC
AG
C UGU
GGCAG U
CUCCC UU
CCU
AGCC
AUGGA
A G A G C A U A U U C UUGUU
UA
UUGGCAA
AG
CUG
UC
ACCA
UU
U RAU
UGGU
AU
CAGA U U
CU
GACUUGC ACAA
GU
AACA
U U C5´C Y G G U U G
GUGG
CGCACU
UCCYY
ACGGGC
GGUGU RUYAC
G Y R Y U R Y R R Y A G A R R R A Y A C CAGCCCGCY
RR
R AGCGGGCU
U U U U U5´
GUCAUAC
U A CG
GU
GCA
AYG
YR R
AA
AGU A
AAC
GAUGAC
C C YARG
AACUCYR
G G U AA A A
URCR
UAUCAAAAUGYAAAAUUG
UY U G A C C U G G G
R U
YY
UCCGGGUYRGYUYUUUU
5´
U R U G C U A A C U R R R A A YGUUGY
A U
RYAAC
CCUUGRYGC
UUA
U YCCUU
URYCAAG
C A U A U U A Y A
RCG
RU
CGYY
A A A G G A G A A A U G5´U C R A A A G A A C A
UGAAAU
GGAGGAGAAAUU
ACAG
C A A U U UAU
C ARC U
GA
AA
UUA
UAG
GU
GUA
G AC
AC A
UG
UCAG
C R G UGGAA
ACAGU
UU
C U A UCA A A A UU A A A
GUA
UUUAG
AGAUUUUC
CUC A
AAUUUCA
A A U5´
AC
AG
GGUARGGRYYYYYUUR U R R R R R Y C C U U A C C G G
RU
UU
CUC
AARUYGGRGYA
AA Y C C G R U U G RA
RU
AU
AR
AG
GA
RG
5´ CG
YG
UU
AUAUGCCU U U A
UU G U
CA
CA
RU
UY
UU
UU
UY
YGYUGR
YC
AUUG
GY
AY
YA U U R A U
UY
C C A G CR
AU
AA
AY
GAC
AAGCCCGAACRY U G U U C G G G C U U U U
UU
UU
RR
UY
A5´
Y Y Y AUGGYGG
Y
GR
GGGR
RCCUU
YG GG Y
YGCCGGUU
CCY
Y R
CCGGU Y U RCCA
ACCC
YY
R
CYRCCA
C C Y5´ AU
GG
AY
RU
GCGCAGG
A A G C G C RA
AG
AC
AR
AC
AG
GG
AC
AC
RY
AG
GR
ACC
C G G AU
GG
YG
GR
RY
AG
GA
UG
UC
AG
GR
AA
CA
GU
CU
GCAAAGCCCCGCYY
Y G G C G G G G U U U U5´
P s-R ho rnk ps M gsens tR N A S Q r r isrC H H 1 SN R 24 T rp ldr greA preQ 12 H A R 1F T ermL eu M icC C 4 R smY R ibosome
Paul Gardner Hunting RNA motifs
What is a motif?
Escher, MC (1938) Sky and Water 1.
I Wiktionary definitions for motif:I A recurring or dominant element; a
theme.
I For the purposes of this work:I an RNA motif is a recurring RNA
sequence and/or secondary structurefound within larger structures thatcan be modelled by either acovariance model or a profile HMM.
Paul Gardner Hunting RNA motifs
What is a motif?
I The kink-turn RNA motifI Found in rRNA, riboswitches, RNase P, snoRNAs, ...I An asymmetric internal loopI Causes a sharp kink between two flanking helical regions
Cruz & Westhof (2011) Sequence-based identification of 3D structural modules in RNA with RMDetect. NatureMethods.
Paul Gardner Hunting RNA motifs
Can we build a seed alignment and CM resource of thesemotifs?
I For the hairpin motifs?I YES
I For the internal loopmotifs?
I MAYBE
I For the sequence motifs?I Not certain yet
GNRA
Y
GA
A
R
5´
k-turn-1
RYGGAGY
R RR
RC R
RGA
R
R
5´
k-turn-2
CGAAGYY
R
Y
Y RR
GG
GRUGGAG
5´
Doshi et al. (2004) Evaluation of the suitability of free-energy minimization using nearest-neighbor energy pa-rameters for RNA secondary structure prediction. BMCBioinformatics.
Paul Gardner Hunting RNA motifs
The RNA Motifs: RMfamANYA
R
AU
A
GA
U YA
C
AU
U5´
GNRA
Y
GA
A
R
5´
T-loop
R
UR R R
Y
5´
UNCG
CU
U CG
G
5´
U-turn
R
RGC
GU
R ARA
GCY
5´
k-turn-1
RYGGAGY
R RR
RC R
RGA
R
R
5´
k-turn-2
CGAAGYY
R
Y
Y RR
GG
GRUGGAG
5´
pK-turn
A GGRR
RY
RR
Y
R C YC
G
YY
RY
YYC
5´
sarcin-ricin-1
RACCYRRYGAAC
UGA
AR C
AUCU
CA GUAR
YRG AGG
A R A A5´
sarcin-ricin-2
Y
Y
AGUA
G Y RA
G
GAA
R
5´
tandem-GA
G CRRUGA
RRRR
AYA
AG
A
RARYRY
YYYRGAG
5´
twist_up
CCGA
YCC
CAU C
CG
AA
CUCGG
5´
UAA_GAN
RY
GR
YAA
Y
C
RY
A Y
YA
G
RG
AA
YC
5´
C-loop
GA
CAC U
G
Y
R
Y R Y R RR
R
CAR
U
Y
5´
Csr-Rsm
R
CAG G A
G
Y5´
domain-V
RAGC
R
C
GR
A
GY A
YG
YYRGUUY
5´
SRP_S_domain
R
GRAC Y
YRY
CAGG
Y
GR A
A
R
AGC
AGYR
AA
Y
5´
vapC_target
A U A UYG
R
RC
RC
C
R
GYCGC RY
Y
G Y C5´
terminator1
AAAAAGCYRYY R
RYGGYUUUUU
U Y U Y5´
terminator2
RRARR Y
YUUU
U U U Y5´
TRIT
R Y Y Y Y
GCGAGCAGACGCARAA
CRCCCRR
Y
R
R
YGGGYG
UUYUGCGUCUGCUCGC
R R R R5´
R R A R G G R G R R Y A U G R R5´
R R G G R R R Y A U G A R R5´
A A G G R R A U G R5´
Paul Gardner Hunting RNA motifs
Breaking Rules
I Prefer motifs found inmultiple families &/ormultiple locations in thesame family
I Aligning analogousstructures
I Motifs are allowed tooverlap
I Models are non-specificI High false-positive rates
I Sequences are not edited,spliced or otherwiseinterfered with
I Sequence sources arediverse (PDB, EMBL &publications)
Paul Gardner Hunting RNA motifs
An example entry# STOCKHOLM 1.0
#=GF ID twist_up#=GF AU Gardner PP#=GF SE Gardner PP#=GF SS Published; PMID:21976732#=GF GA 19.00#=GF DR URL;http://rnafrabase.cs.put.poznan.pl/?act=pdbdetails&id=1S72#=GF RN [1]#=GF RM 21976732#=GF RT Clustering RNA structural motifs in ribosomal RNAs using#=GF RT secondary structural alignment.#=GF RA Zhong C, Zhang S#=GF RL Nucleic Acids Res. 2012;40:1307-17.
2zkr_Y/27-46 CCGAUCUCGUC-UGAUCUCGG#=GR 2zkr_Y/27-46 SS <<<<---<_____>--->>>>2gv3_A/3-20 CCAGACUC---CCGAAUCUGG#=GR 2gv3_A/3-20 SS (((((.(.---....))))))3iz9_C/29-48 CGGAUCCCGUC-AGAACUCCG#=GR 3iz9_C/29-48 SS ((((...(...-.)...))))3bbo_B/31-50 CCAA-UCCAUCCCGAACUUGG#=GR 3bbo_B/31-50 SS .(.....<.....>.....).1nkw_9/33-53 CCCACCCCAUGCCGAACUGGG#=GR 1nkw_9/33-53 SS (((...............)))1c2x_C/31-51 CUGACCCCAUGCCGAACUCAG#=GR 1c2x_C/31-51 SS ((((...<.....>...))))1giy_B/33-53 CCGUUCCCAUUCCGAACACGG1S72_9/29-50 CCGUACCCAUCCCGAACACGG#=GR 1S72_9/29-50 SS ((((...(.....)...))))#=GR 1S72_9/29-50 MT ...xxxxx.....xxxxx...#=GC SS_cons ((((...(.....)...))))//
Paul Gardner Hunting RNA motifs
An example entry# STOCKHOLM 1.0
#=GF ID twist_up#=GF AU Gardner PP#=GF SE Gardner PP#=GF SS Published; PMID:21976732#=GF GA 19.00#=GF DR URL;http://rnafrabase.cs.put.poznan.pl/?act=pdbdetails&id=1S72#=GF RN [1]#=GF RM 21976732#=GF RT Clustering RNA structural motifs in ribosomal RNAs using#=GF RT secondary structural alignment.#=GF RA Zhong C, Zhang S#=GF RL Nucleic Acids Res. 2012;40:1307-17.
2zkr_Y/27-46 CCGAUCUCGUC-UGAUCUCGG#=GR 2zkr_Y/27-46 SS <<<<---<_____>--->>>>2gv3_A/3-20 CCAGACUC---CCGAAUCUGG#=GR 2gv3_A/3-20 SS (((((.(.---....))))))3iz9_C/29-48 CGGAUCCCGUC-AGAACUCCG#=GR 3iz9_C/29-48 SS ((((...(...-.)...))))3bbo_B/31-50 CCAA-UCCAUCCCGAACUUGG#=GR 3bbo_B/31-50 SS .(.....<.....>.....).1nkw_9/33-53 CCCACCCCAUGCCGAACUGGG#=GR 1nkw_9/33-53 SS (((...............)))1c2x_C/31-51 CUGACCCCAUGCCGAACUCAG#=GR 1c2x_C/31-51 SS ((((...<.....>...))))1giy_B/33-53 CCGUUCCCAUUCCGAACACGG1S72_9/29-50 CCGUACCCAUCCCGAACACGG#=GR 1S72_9/29-50 SS ((((...(.....)...))))#=GR 1S72_9/29-50 MT ...xxxxx.....xxxxx...#=GC SS_cons ((((...(.....)...))))//
Paul Gardner Hunting RNA motifs
An example annotated Rfam alignment
# STOCKHOLM 1.0#=GF ID 5S_rRNA#=GF AC RF00001#...#=GF WK 5S_ribosomal_RNA#=GF SQ 712#=GF MT.G GNRA#=GF MT.s sarcin-ricin-2#=GF MT.t twist_up
X62858.1/62-115 GCUGCG-U-UGUGGGUGUGUACUGCGGUUUU-UUG-CUGUGGGAA--GCCCACUUCA-CUGX01588.1/64-115 UCACGU--UAGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X01588.1/64-115 MT.0 .......................GGGGGGGGGGGGGGGGG.....................X05870.1/363-414 UCACGU--UGGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X05870.1/363-414 MT.0 .......................GGGGGGGGGGGGGGGGG.....................L27170.1/63-116 CCAGCG--UCCGGCA--AGUACUGGAGUGCGCGAGCCUCUGGGAA-AUCCGGU-UCG-CCG#=GR L27170.1/63-116 MT.0 ..ssss..sssssss..ssssssssssssssssssssssssssss.sssssss.sss.ss.L27163.1/61-115 CCAGCGUUCCGGGGA---GUACUGGAGUGCGCGACCCUCUGGGAA--ACCGGGUUCG-CCG#=GR L27163.1/61-115 SS ...(((((((((.........)))))))))(((((((...((......))))).))).)..#=GR L27163.1/61-115 MT.0 .................................GGGGGGGGGGGG..GGGGGGG.......#=GR L27163.1/61-115 MT.1 ..sssssssssssss...sssssssssssssssssssssssssss..ssssssssss.ss.L27343.1/60-114 CCAGCGAACCAGCUA---GUACUAGAGUGGGAGACCCUCUGGGAG-CGCUGGU-UCG-CCG#=GR L27343.1/60-114 MT.0 .......................GGGGGGGGGGGGGGGGG.....................#=GR L27343.1/60-114 MT.1 ....sssssssssss...sssssssssssssssssssssssssss.sssssss.sss.s..M33891.1/66-117 CCAGCG--CCGGAGA---GUACUGCGC-GGGCAACCGCGUGGGAG--GCGAGG-CCGCUCG#=GR M33891.1/66-117 MT.0 .......................GGGG.GGGGGGGGGGGG.....................X01484.1/62-114 GUCAGG--CCCAGUU--AGUACUGAGGUGGGCGACCACUUGGGAA-CACUGGG-U-G-CUG#=GR X01484.1/62-114 MT.0 ........................GGGGGGGGGGGGGGGG.....................#=GR X01484.1/62-114 MT.1 ...sss..sssssss..ssssssssssssssssssssssssssss.sssssss.s.s.ss.#=GC SS_cons ::::::::::<-<<--------<<-----<<____>>----->>----->>->::::::::#=GC RF cCaGcG..cccggga...GUAcUggGGUggGcgAcCcucUGGgAA.CaCccgg.uCG.CUG//
Y
RCGRC
CAUA
CR
GR
RCACC
G
UCCCR U Y
C
GAC
C
GRA
GU Y
AA
GC
YG
G C Y R
GUA C U
R R YGGG
GACY
YUGGG
AAY
RGGY
GYYGY
R
5'
Paul Gardner Hunting RNA motifs
An example annotated Rfam alignment
# STOCKHOLM 1.0#=GF ID 5S_rRNA#=GF AC RF00001#...#=GF WK 5S_ribosomal_RNA#=GF SQ 712#=GF MT.G GNRA#=GF MT.s sarcin-ricin-2#=GF MT.t twist_up
X62858.1/62-115 GCUGCG-U-UGUGGGUGUGUACUGCGGUUUU-UUG-CUGUGGGAA--GCCCACUUCA-CUGX01588.1/64-115 UCACGU--UAGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X01588.1/64-115 MT.0 .......................GGGGGGGGGGGGGGGGG.....................X05870.1/363-414 UCACGU--UGGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X05870.1/363-414 MT.0 .......................GGGGGGGGGGGGGGGGG.....................L27170.1/63-116 CCAGCG--UCCGGCA--AGUACUGGAGUGCGCGAGCCUCUGGGAA-AUCCGGU-UCG-CCG#=GR L27170.1/63-116 MT.0 ..ssss..sssssss..ssssssssssssssssssssssssssss.sssssss.sss.ss.L27163.1/61-115 CCAGCGUUCCGGGGA---GUACUGGAGUGCGCGACCCUCUGGGAA--ACCGGGUUCG-CCG#=GR L27163.1/61-115 SS ...(((((((((.........)))))))))(((((((...((......))))).))).)..#=GR L27163.1/61-115 MT.0 .................................GGGGGGGGGGGG..GGGGGGG.......#=GR L27163.1/61-115 MT.1 ..sssssssssssss...sssssssssssssssssssssssssss..ssssssssss.ss.L27343.1/60-114 CCAGCGAACCAGCUA---GUACUAGAGUGGGAGACCCUCUGGGAG-CGCUGGU-UCG-CCG#=GR L27343.1/60-114 MT.0 .......................GGGGGGGGGGGGGGGGG.....................#=GR L27343.1/60-114 MT.1 ....sssssssssss...sssssssssssssssssssssssssss.sssssss.sss.s..M33891.1/66-117 CCAGCG--CCGGAGA---GUACUGCGC-GGGCAACCGCGUGGGAG--GCGAGG-CCGCUCG#=GR M33891.1/66-117 MT.0 .......................GGGG.GGGGGGGGGGGG.....................X01484.1/62-114 GUCAGG--CCCAGUU--AGUACUGAGGUGGGCGACCACUUGGGAA-CACUGGG-U-G-CUG#=GR X01484.1/62-114 MT.0 ........................GGGGGGGGGGGGGGGG.....................#=GR X01484.1/62-114 MT.1 ...sss..sssssss..ssssssssssssssssssssssssssss.sssssss.s.s.ss.#=GC SS_cons ::::::::::<-<<--------<<-----<<____>>----->>----->>->::::::::#=GC RF cCaGcG..cccggga...GUAcUggGGUggGcgAcCcucUGGgAA.CaCccgg.uCG.CUG//
Y
RCGRC
CAUA
CR
GR
RCACC
G
UCCCR U Y
C
GAC
C
GRA
GU Y
AA
GC
YG
G C Y R
GUA C U
R R YGGG
GACY
YUGGG
AAY
RGGY
GYYGY
R
5'
Paul Gardner Hunting RNA motifs
Uses for a motif DB (I): Improving a RNA alignments andstructure prediction constraints
Y
R C G R C
C A U A
C R
G R
RC
A C C
G
U C
C C R U Y
C
G A C C
G R A G
U Y A A G C
Y G
G C Y R
G U A C U
R R Y G G G G A C Y
Y U G G G A A Y R G G Y
G Y Y G Y
R
5'
Original Rfammodel
Sarcin-ricinloop
GNRAtetraloop
Refined Rfammodel
Y
R C G G C
C A U A
C R
G R
R C A C C
G
U C C C R U Y
C G A C C
G A A G
U Y A A G C
Y G
G C Y R G U A C U R R G G G
G A C C Y U G G G A A Y R G G Y G
Y Y G Y
R
5'
Paul Gardner Hunting RNA motifs
Uses for a motif DB (II): RNA structural motifs
I The Lysine Riboswitch
R
RGUAGAG
GY
GCRRYYA
AGUA
YUYG
RRR
R
Y C R R UG
AR R R Y
GA A R GGR R R R U Y G C
CGA A
RY
R
RR
Y Y
YY
Y
GYU G
G
YR
YR
Y
G A A
RR
RY
RR
RA
CU
GU C A Y R R
Y
YRUGGA
GGC
UAYY
5´
GAGY
R
RC RGA
5´
RG
RYR A
CY
5´
GR
A
5´
UR
5´
32% K-turn11% U-turn
13% T-loop 21% GNRATetraloop
P1
P2
P3
P4
P5
Paul Gardner Hunting RNA motifs
Uses for a motif DB (II): RNA structural motifs
I The Lysine Riboswitch
Pasteurella multocida
Bacillus sp.
Escherichia coli
Staphylococcus epidermidis
Lactococcus lactis subsp. lactis
Staphylococcus haemolyticusStaphylococcus saprophyticus
Actinobacillus pleuropneumoniaeHaemophilus ducreyi
Bacillus halodurans
Vibrio splendidus
Bacillus subtilis
Serrat ia proteamaculans
Listeria monocytogenes
Vibrio sp.
Vibrio vulnificus
Lactobacil lus plantarum
Mannheimia succiniciproducens
Bacillus cereus
Klebsiella pneumoniae
Lactococcus lactis subsp. cremoris
Listeria innocua
Haemophilus influenzae
Vibrio parahaemolyticus
Clostridium botulinumThermoanaerobacter tengcongensis
Lactobacillus brevis
Thermoanaerobacter sp.
Bacillus licheniformis
Pectobacterium atrosepticum
Mannheimia haemolyt ica
Yersinia frederiksenii
Staphylococcus aureus
Haemophilus somnus
Listeria welshimeri
Proteobacteria
Firmicutes
Pasteurell.
Vibrionales
Enterobac.
Clostridia
Lactobacil.
Bacillales
P2 stem
RYGGAGY
R RR
RC R
RGA
R
R
5´
Kink-turn
R
RGC
GU
R ARA
GCY
5´
U-turn
P4 stem
Y
GA
A
R
5´
GNRA tetrloop
R
UR R R
Y
5´
T-loop
P2 P4
Paul Gardner Hunting RNA motifs
Uses for a motif DB (III): Functionally characterising novelRNAs
Y
Y U
R R
U
Y Y
Y A G
R A
R Y C R C A Y
G G A Y G Y G R Y A G
R
Y G
G C C A Y G G A
Y G G C R Y R U
R R
C G G Y Y
C G U Y
R
R
Y
R
A R
Y Y Y
R
R5'
YRG G A
R
5´
Csr-Rsm binding motif
I TwoAYGGAY sRNA: 216 homologuesfound in Human gut metagenomesequences, γ-Proteobacteria andClostridiales species
I Weinberg Z et al. (2010)”Comparative genomics reveals 104candidate structured RNAs frombacteria, archaea and theirmetagenomes”. Genome Biol.
Paul Gardner Hunting RNA motifs
Csr-Rsm background
Toledo-Arana et al. (2007) Small noncoding RNAs controlling pathogenesis. Current Opinion in Microbiology.
Paul Gardner Hunting RNA motifs
Uses for a motif DB (IV): Identifying and ranking ncRNAsof interest in transcriptome results
0 50 100 150 200
0.00
0.03
0.06
Distribution of motif scores on S. typhi ncRNAs
Sum of motif scores (bits)
Den
sity
Known RNAsRNA−seq ncRNAsShuffled
0 50 100 150 200
0.0
0.4
0.8
CDF of motif scores on S. typhi ncRNAs
Sum of motif scores (bits)
CD
F K−S.test(Known RNA > Shuffled) => P=2.11e−37K−S.test(RNA−seq RNA > Shuffled) => P=1.47e−09
Paul Gardner Hunting RNA motifs
Uses for a motif DB (IV): Identifying and ranking ncRNAsof interest in transcriptome results
0 500 1000 1500 2000
0.00
000.
0010
Distribution of motif scores on S. typhi ncRNAs alignments
Sum of motif scores (bits)
Den
sity
Known RNAsRNA−seq ncRNAsShuffled
0 500 1000 1500 2000
0.0
0.4
0.8
CDF of motif scores on S. typhi ncRNAs alignments
Sum of motif scores (bits)
CD
F
K−S.test(Known RNA > Shuffled) => P=8.15e−06K−S.test(RNA−seq RNA > Shuffled) => P=0.53
Paul Gardner Hunting RNA motifs
80 highest scoring matches between RMfam & Rfam
T−loopTerminator
Csr
−Rsm
Dom
ain−
VGN
RAtandem−GA
k−turn
SRP_S
UN
CG
U−t
urn
twist_up
sarcin−ricin
●
●
●●
●
●
●
●
●●
●
●
tmRNAcyano_tmRNAtRNA−SectRNA
mascRNA−menRNA
RNaseP_bact_a
suhBHis_
leade
r
L10_
lead
er
L20_
lead
er
Phe
_lea
der
Thr_
lead
er
rimP
QrrIRE
S_A
ptho
CsrB
PrrB
_Rsm
Z
TwoAY
GG
AYU6
Intron_gpII
RNaseP_arch
RNaseP_bact_b
GEMM_RNA_motif
alpha_tmRNAIntron_gpI
group−II−D1D4−7group−II−D1D4−6group−II−D1D4−5
group−II−D1D4−3
group−II−D1D4−1
MOCO_RNA_motif
manA
Clostridiales−1
RtT
serC
T−box
SNORA73SA
M
Arch
aea_
SRP
Bac
teria
_sm
all_
SR
PB
acte
ria_l
arge
_SR
PF
ungi
_SR
PM
etaz
oa_S
RP
Plant_S
RP
wcaG
U3
C4
HIV
_FETerm
ite−leu5S_rRNA
SSU_rRNA_bacteria
SSU_rRNA_archaea
SSU_rRNA_eukarya
Cobalamin
FMN
Entero_5_CRE
group−II−D1D4−4
●
●
●
●
●
●
●
●
●●
●●
●●●●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ● ● ● ● ●●
●●
●
●
●
●
●
●
●
●
Paul Gardner Hunting RNA motifs
What about the highly expressed ncRNAs?
U U R R C C U U G U G U G Y R G G U C G A G U A C A A A G G A G C A CGG
AAGCYCGG
RAGGC
CAA GGYCGA
CC
GGAA G A
GAA
GGYUCGAYCUCCCGAY
CCGRGC
YCC
CA G C A C G G R
C CG G
Y A CC C A CG C G
GAGCR
GCCGCR A Y
AA
RGGCARAAGYGU U
GYGGGCCUGCGURAU
U C G R AR
YRRAC GGY
CGACGRCC C UU
UGGGUG
GGG
YU
GCRRCCRRA
RG
UCGYR
R
CGCCGAG
GCCA
CCCA CGC
AACCCRY
YGCACGCUUGG
UAACC
GRG
UCCGUGCURGCGGGCGGYGA
YC
RUY
GGR
CGCCGCCCGCU
UYRYUR
5'
RYGGAGY
R RR
RC R
RGA
R
R
5´
Possible K-turn
RRARR Y
YUUU
U U U Y5´
Possible Rho-independent terminator
Possible Rho-independent terminator
Possible Rho-independent terminator
Possible Rho-independent terminator
Possible Rho-independent terminator
Possible Rho-independent terminator
N_gon_RUF7
Y YGCUG
RCCRCYGCY
UUUA
U C CCUUU
UGGCRGUGCAGC
A A G U A AGACGU
UUUC
C CGCAAUG
UAUU G
RCCAUUCAUACU
U
C CCUU
RGUAUGRAUGGUUU
UU
UUURYR
R
R A A A U G C C G U C U G A A YG
UUCAGACGGCAUUURU
5´
C_dif_RUF9
UCAGAAGGC
UU
GYUUUCUGA
Y U U U U U U A U U C A A A U C A A A A A Y A R G U A G U U A A Y U U U U A Y C U UAUUCUACUU
UC
C C CA
AAGU AAGAAU
A U Y A C Y Y U U RGAURRGC
AGCYYAUC
U U U U U U U A A A5´
C_dif_RUF11
UCAGAAGGC
UU
GYUUUCUGA
Y U U U U U U A U U C A A A U C A A A A A Y A R G U A G U U A A Y U U U U A Y C U UAUUCUACUU
UC
C C CA
AAGU AAGAAU
A U Y A C Y Y U U RGAURRGC
AGCYYAUC
U U U U U U U U A A R5´
M_tub_RUF3
Possible Rho-independent terminator
Low Seq. G+C
RNA RNAcode RNAz M otifs complexity sRNA (gen.)
M .tub. RUF3 No RNA (prob=1.00) Terminator,k-turn No 59% (66%)
N .gon. RUF7 No RNA (prob=0.94) Terminator No 43% (52%)
C.dif. RUF9 ? (P=0.044) RNA (prob=1.00) Terminator No 30% (29%)
C.dif. RUF11 ? (P=0.079) RNA (prob=1.00) Terminator Yes (23/ 170nts) 26% (29%)
Paul Gardner Hunting RNA motifs
What about the highly expressed ncRNAs?
I Are they antisense to any 5‘ UTRs
0 5 10 15 20 25 30
0.0
0.4
0.8
RNAup RNA−RNA energies between sRNAs and 5' UTRs
negenergy (−kcal/mol)
Fre
q.M.tub3 (native)M.tub3 (shuffled)N.gon7 (native)N.gon7 (shuffled)
C.dif9 (native)C.dif9 (shuffled)C.dif11 (native)C.dif11 (shuffled)
0 5 10 15 20 25 30
0.00
0.10
Difference between native and shuffled CDFs
negenergy (−kcal/mol)
∆CD
F
K−S.test(M.tub3 RNA ~ Shuffled) => P=3.70e−07
K−S.test(N.gon7 RNA ~ Shuffled) => P=0.01
K−S.test(C.dif9 RNA ~ Shuffled) => P<2.2e−16
K−S.test(C.dif11 RNA ~ Shuffled) => P<2.2e−16
Paul Gardner Hunting RNA motifs
What about the highly expressed ncRNAs?
I Are they antisense to any 3‘ UTRs
0 5 10 15 20
0.0
0.4
0.8
RNAup RNA−RNA energies between sRNAs and 3' UTRs
negenergy (−kcal/mol)
Fre
q.M.tub3 (native)M.tub3 (shuffled)N.gon7 (native)N.gon7 (shuffled)
C.dif9 (native)C.dif9 (shuffled)C.dif11 (native)C.dif11 (shuffled)
0 5 10 15 20
0.00
0.10
Difference between native and shuffled CDFs
negenergy (−kcal/mol)
∆CD
F
K−S.test(M.tub3 RNA ~ Shuffled) => P=5.13e−09
K−S.test(N.gon7 RNA ~ Shuffled) => P=8.21e−06
K−S.test(C.dif9 RNA ~ Shuffled) => P<2.2e−16
K−S.test(C.dif11 RNA ~ Shuffled) => P<2.2e−16
Paul Gardner Hunting RNA motifs
Discussion points
I A useful curation tool for building alignments and structures
I Can provide testable function hypotheses, sometimes
I Limited use outside of alignments based on the SalmonellaTyphi results
I Even in alignments, false-positives & negatives are common
I More motifs (Terminators, Shine-Dalgarno, ...)
I Limit trivial motifs and prevent rediscovery of old motifs
I Snapshots of the data are available fromhttps://github.com/ppgardne/RMfam
I Can anyone here do better on MTB3, NGON7, CDIF9 &CDIF11?
Paul Gardner Hunting RNA motifs
Thanks!
I Lars Barquist, AlexBateman & Rob Knight
PPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the RoyalSociety of New Zealand.
Paul Gardner Hunting RNA motifs
RNA motifs
ANYA
R
AU
A
GA
U YA
C
AU
U5´
GNRA
Y
GA
A
R
5´
UNCG
CU
U CG
G
5´
T-loop
R
UR R R
Y
5´
U-turn
R
RGC
GU
R ARA
GCY
5´
k-turn-1
RYGGAGY
R RR
RC R
RGA
R
R
5´
k-turn-2
CGAAGYY
R
Y
Y RR
GG
GRUGGAG
5´
sarcin-ricin-1
RACCYRRYGAAC
UGA
AR C
AUCU
CA GUAR
YRG AGG
A R A A5´
sarcin-ricin-2
Y
Y
AGUA
G Y RA
G
GAA
R
5´
twist_up
CCGA
YCC
CAU C
CG
AA
CUCGG
5´
UAA_GAN
RY
GR
YAA
Y
C
RY
A Y
YA
G
RG
AA
YC
5´
Csr-Rsm
R
CAG G A
G
Y5´
C-loop
GA
CAC U
G
Y
R
Y R Y R RR
R
CAR
U
Y
5´
domain-V
RAGC
R
C
GR
A
GY A
YG
YYRGUUY
5´
terminator1
AAAAAGCYRYY R
RYGGYUUUUU
U Y U Y5´
terminator2
RRARR Y
YUUU
U U U Y5´
TRIT
R Y Y Y Y
GCGAGCAGACGCARAA
CRCCCRR
Y
R
R
YGGGYG
UUYUGCGUCUGCUCGC
R R R R5´
R R A R G G R G R R Y A U G R R5´
R R G G R R R Y A U G A R R5´
A A G G R R A U G R5´
T−loopTerminator
Csr
−Rsm
Dom
ain−
VGN
RAtandem−GA
k−turn
SRP_S
UN
CG
U−t
urn
twist_up
sarcin−ricin
●
●
●●
●
●
●
●
●●
●
●
tmRNAcyano_tmRNAtRNA−SectRNA
mascRNA−menRNA
RNaseP_bact_a
suhBHis_
leade
r
L10_
lead
er
L20_
lead
er
Phe
_lea
der
Thr_
lead
er
rimP
QrrIRE
S_A
ptho
CsrB
PrrB
_Rsm
Z
TwoAY
GG
AYU6
Intron_gpII
RNaseP_arch
RNaseP_bact_b
GEMM_RNA_motif
alpha_tmRNAIntron_gpI
group−II−D1D4−7group−II−D1D4−6group−II−D1D4−5
group−II−D1D4−3
group−II−D1D4−1
MOCO_RNA_motif
manA
Clostridiales−1
RtT
serC
T−box
SNORA73SA
M
Arch
aea_
SRP
Bac
teria
_sm
all_
SR
PB
acte
ria_l
arge
_SR
PF
ungi
_SR
PM
etaz
oa_S
RP
Plant_S
RP
wcaG
U3
C4
HIV
_FETerm
ite−leu5S_rRNA
SSU_rRNA_bacteria
SSU_rRNA_archaea
SSU_rRNA_eukarya
Cobalamin
FMN
Entero_5_CRE
group−II−D1D4−4
●
●
●
●
●
●
●
●
●●
●●
●●●●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ● ● ● ● ●●
●●
●
●
●
●
●
●
●
●
Paul Gardner Hunting RNA motifs
RNA classification
Backofen et al. (2007) RNAs everywhere: genome-wide annotation of structured RNAs. Journal of ExperimentalZoology.
Paul Gardner Hunting RNA motifs