De Novo Discovery MicroRNA From Small RNA Sequencing Data
-
Upload
christosnoutsos -
Category
Documents
-
view
216 -
download
0
Transcript of De Novo Discovery MicroRNA From Small RNA Sequencing Data
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
1/63
De novo discovery of microRNA
from small RNA sequencing data
Francisco D. Morón-Duran
15 de setembre de 2015
Projecte Final de Carrera per a
l'Enginyeria Tècnica en Informàtica de Sistemes
Director Xavier Messeger
Departament en Ciències de la Computació
Co-director externVictor Moreno
Institut Català d'Oncologia
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
2/63
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
3/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
!"#$% '( )'*+%*+,
!"#$%&'(#)%"*********************************************************************************************************************************+,-%'# #/0 1#$'(#'$0 %2 #/)1 &%('30"#************************************************************************************* +
, 4$)30$ %" 5%60('67$ 8)%6%9:********************************************************************************************* ;B, 2$%3 (%34'#7#)%"76 744$%7(/01******************************************************** EDJ)1(%C0$: -: 2%$?7$& 90"0#)(1K &0 "%C% 7"& -: /%3%6%9:********************************************EDL%34'#7#)%"76 4$0&)(#)%" -: 37(/)"0 607$")"9***************************************************************FM!&0"#)2)(7#)%" 2$%3 13766 >B, 10G'0"()"9 -710& %" 7 $020$0"(0 90"%30**********************FM
N$%O0(# PC0$C)0?***********************************************************************************************************************FEP-O0(#)C0****************************************************************************************************************************** FF
N67"")"9********************************************************************************************************************************FFL/$%"%6%9)(76 467"********************************************************************************************************** FFQ(%"%3)( -'&90#************************************************************************************************************* F+
N$%4%10& 4)406)"0 %'#6)"0***************************************************************************************************** FR
N$%O0(# !346030"#7#)%"*************************************************************************************************************F;
N$04$%(011)"9 %2 )"4'# S,HTU 2)601*************************************************************************************F;
>07& (%66741)"9 7"& $04$010"#7#)C)#: 2)6#0$)"9**********************************************************************FI,6)9"30"# 1#$7#09:*************************************************************************************************************** FA
J0 8$')O" 9$74/ (%"1#$'(#)%" 7"& (%"#)9 71103-6:*********************************************************FAH00& 1#04K !"&0=)"9 (%"#)91 7"& 10G'0"(0 @V30$1***********************************************************FAW%#)"9 1#04K H0G'0"(01 C%#)"9 (%"#)9 (7"&)&7#01************************************************************ FX8$07@)"9 #)01K &)1#7"(0 -0#?00" 10G'0"(01********************************************************************* FX
, 2%$376 &02)")#)%" %2 &)1#7"(0*********************************************************************************** FX
Q$$%$1 &0#0(#)%"******************************************************************************************************************** +M
>07(/)"9 7 (%"10"1'1 10G'0"(0******************************************************************************************* +E
E
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
4/63
8+2$*()*% !9 :%+;$
!&0"#)2:)"9 76$07&: 7""%#7#0& 3)>B,********************************************************************************* +E
>01'6#1***************************************************************************************************************************************++
L%"(6'1)%"1 7"& 2)"76 $037$@1**************************************************************************************************+A
>020$0"(01 7"& 8)-6)%9$74/:**************************************************************************************************** +D
!3790 ($0&)#1***************************************************************************************************************************** RE
,""0= ,K 871/ 1($)4# 2%$ S,HTU 4$04$%(011)"9********************************************************************** R+
,""0= 8K N:#/%" (%&0 2%$ #/0 4$%O0(#***************************************************************************************R;
57)" 1%'$(0K 37)"*4:*********************************************************************************************************** R;
5%&'60 10G1*4:*********************************************************************************************************************RA
5%&'60 &0-9*4:********************************************************************************************************************;E
5%&'60 4%66*4:********************************************************************************************************************* ;A
,""0= LK H($)4# 2%$ 8
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
5/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
-*+.'/01+2'*
L%34'#)"9 H()0"(0 )1 7 9$07# #%%6 #% /064 '"&0$1#7"& -)%6%9)(76 G'01#)%"1 7$)10" 1)"(0 #/05%60('67$ 8)%6%9: $0C%6'#)%" #/7# #%%@ 467(0 )" 6)20 1()0"(01 )" #/0 ED;MZ1* T/0 27(# #/7# 6)20
4$%(01101 7$0 0"(%&0& )"#% 90"%301 (%"#7)")"9 4$%9$731 #/7# (7" -0 $07&[ 3%&)2)0& 7"&
1'44$0110& -: #/0 (0661 )1 1%30?/7# 271()"7#)"9*
S%$ 7 -)%6%9)1# ?)#/ 1%30 (%34'#7#)%"76 -7(@9$%'"& )1 "%# 4%11)-60 #% #/)"@ %2 JB, 7"&
JB,V-)"&)"9 4$%#0)"1 6)@0 4%6:30$7101 %$ 3)137#(/ $047)$ 4$%#0)"1 ?)#/%'# 37@)"9 7"
7"76%9: ?)#/ T'$)"9 37(/)"01* \0"(0[ #/0 (%"C0$90"(0 -0#?00" !"2%$37#)%" T/0%$: 7"&
8)%6%9: )1 "%?7&7:1 71 "7#'$76 #/7# 1#0$0%#:401 7$%'"& (%34'#7#)%"76 1()0"#)1#1 7"& -)%6%9)1#17$0 $74)&6: (/7"9)"9 71 #/0 "00& #/7# #/010 40%460 /7C0 #% '"&0$1#7"& #% 07(/ %#/0$ -0(%30
3%$0 0C)&0"#*
T/)1 ?%$@ 4$0#0"&1 #% -0 7"%#/0$ 0=73460 %2 /%? #/0 1:"0$9)01 4$%&'(0& -0#?00" (%34'#)"9
7"& 6)20 1()0"(01 (7" 0"/7"(0 -)%6%9)(76 &)1(%C0$)01 7"& -%%1# %'$ '"&0$1#7"&)"9 %2 6)20* B%#
%"6: -: 4$0&)(#)C0 769%$)#/31 47$1)"9 7 90"%3)( (%&0 #/7# ?0 &% "%# 2'66: '"&0$1#7"& :0#[ -'#
?)#/ (%34'#7#)%"76 30#/%&1 #/7# 10$C0 71 7 379")2:)"9 96711 %$ 7 (%34711 #/7# 9')&01
1()0"#)1#1 #/$%'9/ #/0)$ -)%6%9)(76 G'01#)%"1*
3#'0+ +4% ,+.01+0.% '( +42, /'105%*+
T/)1 &%('30"# )1 37)"6: &)C)&0& )"#% 2)C0 10(#)%"1* T/0 2)$1# %"0 )1 #/0 4$010"# )"#$%&'(#)%"
7"& 9)C01 #/0 (%"#0=# )" ?/)(/ #/0 ?%$@ )1 &0C06%40& ])" 1 =+(."+ %$ :%3"*532+ >(%3%6,? [
0=467)"1 #/0 -)%6%9)(76 4$%-603 ])" @$7+%'5*($6 .(*+%/01 ][ %C0$C)0?1 #/0 ('$$0"# 1#7#0 %2
#/0 7$# ])" >+("- A()7%+, %- $5*3"(* 2*(' )"45"$*($6 ] 7"& 9)C01 #/0 $7#)%"760 2%$ #/)1 4$%O0(#
])" !()*%&"+, %- .(/01 B, *%.=5727(%$23 2==+%2*A")? *
T/0 "0=# 10(#)%" ] C+%D"*7 %&"+&("E? (%C0$1 #/0 4$%O0(#Z1 37)" %-O0(#)C0 7"& 467"")"9 #%
37#0$)76)^0 )#1 9%761 2$%3 #/0 (/$%"%6%9)(76 7"& 0(%"%3)( 4%)"#1 %2 C)0?* !# 761% %'#6)"01 7
-71)( 1(/037#)(1 %2 #/0 1#$'(#'$0 %2 #/0 &01)9"0& 1%2#?7$0 #% 9)C0 7 47"%$73)( C)0? %2 #/0
&)220$0"# 1#041 #% 2%66%? )" %$&0$ #% $07(/ #/010 9%761*
J'$)"9 #/0 C+%D"*7 @.=3"."$727(%$ 10(#)%" #/0 37)" 1#041 %'#6)"0& )" #/0 4$0C)%'1 10(#)%" 7$0
&0#7)60& 71 6%"9 71 1#$7#09)01 #% $07(/ #/0 3)>B, (7"&)&7#01 7$0 0=467)"0&*
+
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
6/63
8+2$*()*% !9 :%+;$
S)"766:[ )" /")537)[ #/0 37)" $01'6#1 %2 #/0 1%2#?7$0 %'#4'# 7$0 0=467)"0& )" 7 10# %2 -)%6%9)(76
1734601 #/7"@1 #% #/0 >(%.2+F"+) 2$' G5)*"=7(B(3(7, H$(7 2$%3 #/0 I27232$ @$)7(757" %-
J$*%3%6,9
!" 7" 0=#$7 I%$*35)(%$) 2$' -($23 +".2+F) 10(#)%" :%' ?)66 2)"& 0=4%10& #/0 37)" &)22)('6#)01
7$)10" &'$)"9 #/0 &0C06%430"# %2 #/)1 4$%O0(#[ 4%11)-60 )34$%C030"#1 7"& #/%'9/#1 7-%'# #/0
7(#'76 &0()1)%"1 #7@0" 7# #/0 -09)"")"9 %2 #/0 4$%O0(# ?)#/ #/0 40$140(#)C0 9)C0" -: #/0
%-#7)"0& $01'6#1*
R
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
7/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
3 6.25%. '* 7'$%10$". 82'$'9:
,1 0C0$: (%34'#)"9 4$%O0(#[ -02%$0 9%)"9 )"#% #/0 7-1#$7(#)%" %2 4$%-603 1%6C)"9[ #/0$0 )1 7
"00& #% '"&0$1#7"& #/0 "7#'$0 %2 #/0 G'01#)%"1 #/7# (7" 7$)10 )" #/0 4$%(011* T/0$02%$0 )1
"0(0117$: #% (%660(# 1%30 )"2%$37#)%" 7-%'# #/0 2)06& #/7# )"C%6C01 #/0 G'01#)%" #% 1%6C0* T/)1
10(#)%" 7)3 )1 #% 4$%C)&0 1%30 $060C7"# )"2%$37#)%" 7"& 7 271# 47"%$73)( C)0? %2 #/0 -)%6%9:
-7(@9$%'"& #/7# 1'44%$# #/0 4$%O0(# 4$%-603*
;2(% 2, 1'/%/ 2* /%'
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
8/63
8+2$*()*% !9 :%+;$
a][ 2%$3)"9 7 &%'-60V/06)= 1#$'(#'$0 ?)#/ +Rf #'$"1 ]E 7$31#$%"9 0G'761 EM VX
(0"#)30#0$1]* T/)1 (747()#: %2 37@)"9 &)220$0"# "'3-0$ %2 /:&$%90" -%"&1 #% 07(/ 47)$ %2
-7101 9)C01 7 140()76 4$%40$#: #% JB, 3%60('601K &"'(#')#*+* >0&'"&7"(: 766%?1 JB,&",-.*)/.$# ?/0" -%#/ (/7)"1 7$0 1047$7#0& 2$%3 07(/ %#/0$[ 71 #/0 1730 )"2%$37#)%" )1
4$010"# )" -%#/ 1#$7"&1[ 76#/%'9/ 0"(%&0& )" 7 (%346030"#7$: ?7:* >0&'"&7"(: 761%
766%?1 #% $047)$ 0$$%$1 )"#$%&'(0& )" JB,[ ?/0" 4%11)-60[ -: 3)137#(/ $047)$ 30(/7")131
#$)990$0& -: #/0 (066*
,"%#/0$ @0: 7140(# %2 JB, 3%60('601 )1 #/0)$ 7-)6)#: #% (/7"90* JB, (7" -0 76#0$0& -:
$7"&%3 3'#7#)%"1 )"#$%&'(0& 0)#/0$ -: $046)(7#)%" 0$$%$1[ -7& 3)137#(/ $047)$1 %$ -:
0=#0$"76 790"#1 6)@0 $7&)7#)%" #/7# 76#0$1 #/%10 (/03)(76 (%34%'"&1 (%"2%$3)"9 #/0 &%'-60V/06)= 7"& 37@)"9 4%11)-60 7" )34%$#7"# 7140(# %2 6)C)"9 %$97")131K "0$-(/.$#*
!"#$%& ( G*A"." %- 7A" '%5B3"
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
9/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
=2#'*01$%21 "12/ ", +4% >'.?2*9 1'6: '( +4% 9%*'5%
T/0 90"%30 )1 7 4$0()%'1 4%11011)%" 2%$ 7 (066* T/0$02%$0[ )# 3'1# -0 4$%#0(#0& 7"& 17C0&
(7$02'66:* !# )1 @"%?" #/7# "%# 766 90"%3)( 6%(7#)%"1 7$0 (%347(#0& #/0 1730 ?7:* a0"01
-0)"9 7(#)C06: 0=4$0110& 7$0 6%(7#0& )" 6011 (%347(# $09)%"1 #% 766%? #/0 #$7"1($)4#)%"
37(/)"0$: #% 7((011 #/0)$ (%&0 ?/)60 "%"V7(#)C0 90"01 7$0 @04# )" 3%$0 &0"10 $09)%"1
(%347(#0& -: 1%30 4$%#0)"1 (7660& /)1#%"01 )" 7 1#$'(#'$0 @"%?" 71 (/$%37#)"* !" #/)1
(%"&0"10& (/$%37#)"[ 90"0 (%&0 )1 4$0C0"#0& 2$%3 '""0(0117$: /7^7$&1*
S'$#/0$3%$0[ 90"01 0"(%&0& )" #/0 90"%30 7$0 90"0$)(* T/0: 3'1# -0 '102'6 )" 0C0$: (066
)" 766 #)11'01 %2 7" %$97")13[ -'# 90"01 7$0 @"%?" #% /7C0 &)220$0"# $%601 &040"&)"9 %" #/0
(066 #:40 #/0: 7$0 -0)"9 0=4$0110& %"* T/'1[ #/0 2'"(#)%"76)#: %2 7 90"0 )1 "%# %-#7)"0&
&)$0(#6: 2$%3 )#1 (%&0* a0"01 7$0 "1,&"22"' )"#% $)-%"'(60)( 7()& 3%60('601 _>B,` (7660&
30110"90$ >B, _3>B,`*
>B, )1 7 3%$0 67-)60 C0$1)%" %2 JB,[ 7"& #/'1 0$$%$V4$%"0* T/)1 )1 &'0 #% #/0 /:&$%=:6
9$%'4 (%"#7)"0& )" #/0 FZ 4%1)#)%" %2 #/0 $)-%10 #/7# (7" 7(# 71 7 "'(60%4/)60 797)"1# #/0 $01#
%2 #/0 3%60('60 _S)9* F`* H%30 C)$'101 '10 >B, )"1#07& %2 JB, 71 #/0 C0/)(60 2%$ #/0)$
90"%30 7"& #/)1 (%"20$1 7" 7&C7"#790 #% #/03 37@)"9 #/0)$ (%&0 6011 1#7-60 7"& 3%$0
&)22)('6# #% &0#0(# #% #/0)$ /%1#1 )33'"0 1:1#031* T:4)(766:[ >B, 3%60('601 7$0 1)"960V
1#$7"&0& 7"& 73%"9 #/0)$ "'(60%-7101 )1 2%'"& 5+2*(3 _g` )"1#07& %2 T*
S%$ 7 90"0 -0)"9 0=4$0110&[ 7" >B, 4%6:30$710 3'1# 7((011 )#1 (%&0 )" #/0 90"%30[ %40"
#/0 JB, &%'-60V/06)= 7"& 1#7$# /&)#2*&.3.#! )# )"#% >B,* T/0 90"0Z1 (%"#0"# )1 #/0"
(%4)0& )"#% 3'6#)460 3>B, 3%60('601 7"& #/0 3%$0 3>B, 3%60('601 %2 7 90"0 7$0
4$%&'(0&[ #/0 3%$0 0=4$0110& )1 17)& #% -0 #/7# 90"0*
T/0"[ 3>B, 3%60('601 (7" -0 4$%(0110& -: 76#0$"7#)C0 146)()"9 9)C)"9 71 7 $01'6# &)220$0"#C0$1)%"1 %2 7 90"0* H%30 %2 #/010 C0$1)%"1 7$0 @"%?" #% -0 #)11'0V140()2)( 7"& /7C0
&)220$0"# &02)"0& 2'"(#)%"1 )" #/0 (066* B%"V146)(0& 3>B, 3%60('601 7$0 @"%?" 71 =+"V
3>B,[ ?/)60 146)(0& %"01 7$0 "730& .275+" 3>B,*
\%?0C0$[ 2%$ 3%1# %2 #/0 90"01 #% -0 2'"(#)%"76 )# )1 "0(0117$: #% /&)#2-)/" #/03 2)$1# )"#%
4$%#0)"1* N$%#0)"1 7$0 73)"% 7()& (/7)"1 ]761% (7660& 4%6:404#)&01]* T/0: 7$0 #/0 2)"76
2'"(#)%"76 4$%&'(# %2 7 90"0 7"& #/0 7(#%$ #/7# 467:1 #/0 3%1# )34%$#7"# 47$#K #/0 (7#76:#)(
A
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
10/63
8+2$*()*% !9 :%+;$
4$%(011 #/7# 60#1 )#1 )"#0"&0& -)%6%9)(76 2'"(#)%" #% #7@0 467(0* S'$#/0$3%$0[ 4$%#0)"
1:"#/01)1 2$%3 7 37#'$0 3>B, (7" /7C0 &)1#)"(# %'#(%301*
>)-%1%301 ]#/0 3%60('67$ (%3460=01 $0G')$0& 2%$ 3>B, #$7"167#)%" )"#% 4$%#0)"][ $07&
3>B, #$7"1($)4#1 )" #$)460#1* QC0$: #/$00 (%"10('#)C0 -7101 (%$$014%"& #% 7 '")G'0 73)"%
7()& )" #/0 4$%#0)"* H%[ 2%$ 7 1)"960 3>B,[ #/$00 4%11)-60 +"2'($6 -+2.") 0=)1# &040"&)"9
%" #/0 #$7"167#)%" 1#7$#)"9 4%)"#* Q7(/ 1#7$#)"9 4%)"# ?/0$0 7 $)-%1%30 1#7$#1 )#1 4$%#0)"
1:"#/01)1 )1 (7660& 7" %="$ +"2'($6 -+2." _P>S`[ 7"& #/0$0 (7" -0 3'6#)460 %2 #/03
&040"&)"9 %" #/0 (%"($0#0 4%1)#)%" #/0: 7$0 467(0& 76%"9 #/0 3>B,*
8'# )# )1 )34%$#7"# #% $0303-0$ #/7# 4$%#0)"1 7$0 "%# #/0 %"6: (7#76:#)( 467:0$1 )" #/0 (066*
H%30 >B, 3%60('601 761% /7C0 7 &02)"0& 2'"(#)%" -: #/03106C01* T/)1 )1 #/0 (710 %2
$)-%1%376 >B, _$>B,` ?/)(/ (%"2%$31 #/0 $)-%1%301 76%"9 ?)#/ 1%30 $)-%1%376
4$%#0)"1 7"& 37@01 #/0 #$7"167#)%" %2 3>B, )"#% 73)"% 7()& (/7)"1* P#/0$ >B, 3%#)C01
(7" 761% /7C0 7 2'"(#)%" ?)#/%'# -0)"9 #$7"167#0& )"#% 4$%#0)" 7"&[ "%?7&7:1[ 7 -)9
$0107$(/ 7$07 /71 -00" 2%'"& )" #/0 1% (7660& "%"V(%&)"9 >B, _"(>B,`[ 7 1'-9$%'4 %2
?/)(/ 7$0 3)($%>B, _3)>B,` 3%60('601 #/7# ?)66 -0 #/0 37)" 2%('1 %2 #/)1 #0=#*
!"#$%& ) G7+5*75+" %- 2 /01 )7+2$'
X
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
11/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
7%,,%*9%. =@3 "*/ 6.'+%2* $%A%$,
T/0 #:4)(76 3>B, 1#$'(#'$0 2%$ 7 37#'$0 3>B, ]#/0 %"0 76$07&: 4$%(0110& -:
76#0$"7#)C0 146)()"9] )1 $04$010"#0& )" S)9* +* T/)1 )1 #/0 >B, 3%60('60 #/7# #/0 $)-%1%30
?)66 $07& 7"& ?)66 '10 71 7 3%6& #% 71103-60 73)"% 7()&1 )" #/0 %$&0$ &)(#7#0& -: #/0
4$%#0)"V(%&)"9 90"0 #% 1:"#/01)^0 )#1 (%$$014%"&)"9 4%6:404#)&0*
!"#$%& * G7+5*75+" %- 2 7,=(*23 A5.2$ =+%7"($ *%'($6 ./01 ($*35'($6 7A" 5$7+2$)327"' +"6(%$) LHM/)N
,1 ?0 (7" 100[ "%# 766 3>B, 10G'0"(0 )1 #$7"167#0& )"#% 4$%#0)"* P"6: #/0 *%'($6 )"45"$*"
_LJH` 2$7930"# ?)66 -0 (%"C0$#0& #% 73)"% 7()&1* S67"@)"9 #/0 LJH 7$0 ;Z 7"& +Z
5$7+2$)327"' +"6(%$) _gT>` #/7# &014)#0 "%# -0)"9 #$7"167#0& (7" 47$#)()47#0 )" #/0 90"0
#$7"167#)%" $09'67#)%" 71 ?0 ?)66 100 67#0$ %"*
,# 2)$1#[ )# ?71 #/%'9/# #/7# 4$%#0)" 60C061 1/%'6& -0 1%30?/7# 4$%4%$#)%"76 #% #/0
0=4$011)%" 60C06 %2 7 90"0 )" 3>B,* \%?0C0$[ 4$%#0%3)( 1#'&)01 1%%" $0C0760& #/7# #/)1 )1
"%# $0766: #/0 (710* T/0 73%'"# %2 7 4$%#0)" )" 7 (066 &040"&1 %" 7 C7$)0#: %2 7140(#1 1'(/
71 #/0 1#7-)6)#: %2 #/0 4$%#0)" %$ )#1 1:"#/01)1 7"& &09$7&7#)%" $7#01* >0(0"#6:[ #$7"167#)%"76
$09'67#)%" /71 030$90& 71 7" )34%$#7"# @0: 27(#%$ 9%C0$")"9 4$%#0)" 60C061* S$%3 #/0 4%%6
%2 7C7)67-60 3>B, 3%60('601 )" 7 (066 "%# 766 %2 #/03 7$0 #$7"167#0& ?)#/ #/0 1730
022)()0"(:* 5%1# )34%$#7"#6:[ #$7"167#)%" 37(/)"0$: (7" -0 &)$0(#0& #% #/%10 3>B, #/7# #/0
(066 $0G')$01 )" 7 (0$#7)" 3%30"#[ )" 7 4$%(011 @"%?" 71 7+2$)327(%$23 *%$7+%3* B%#
1'$4$)1)"96:[ 3>B, 0=4$011)%" )1 ('$$0"#6: 100" 3%$0 6)@0 7 -'220$ %2 4%#0"#)76 4$%#0)"1 2%$
#/0 (066 #/7" 6)@0 #$7&)#)%"76 90"0 0=4$011)%"*
! #$%&' ()**&+, )+ (&-- .,$/(,/$&
!" 90"0$76 #0$31[ 766 0'@7$:%#)( (0661 (7" -0 &)C)&0& )" #?% 37)" (%347$#30"#1K #/0
"'(60'1 7"& #/0 (:#%1%6* T/0 (:#%1%6 )1 )1%67#0& 2$%3 #/0 0"C)$%"30"# #/$%'9/ 7 6)4)&
-)67:0$ #/7# 766%?1 #/0 37)"#0"7"(0 %2 #/0 (%"&)#)%"1 #/7# 37@0 4%11)-60 #/0 (%$$0(#
2'"(#)%")"9 %2 #/0 (066[ ?/7# )1 @"%?" 71 (066 A%."%)72)()* T/0 "'(60'1 )1[ 7# #/0 1730
D
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
12/63
8+2$*()*% !9 :%+;$
#)30[ )"1)&0 7"& )1%67#0& 2$%3 #/0 (:#%1%6 -: 7" 7&&)#)%"76 6)4)& -)67:0$*
c/)60 #/0 90"0#)( (%&0 )1 6%(7#0& 7"& #$7"1($)-0& )" #/0 "'(60'1[ )# )1 %" #/0 (:#%1%6 #/7#
30110"90$ >B, )1 #$7"167#0& )"#% 4$%#0)"* T/)1 )346)01 #/7# -%#/[ $>B, 7"& 3>B, 3'1#
-0 0=4%$#0& 2$%3 #/0 "'(60'1* S'$#/0$3%$0[ 3)($%>B, ]?/)(/ ?)66 -0 &01($)-0& )" #/0
"0=# 10(#)%"] 761% 3'1# -0 0=4%$#0& #% #/0 (:#%1%6 -: 4$%#0)"1 @"%?" 71 "K=%+7($) #%
7(/)0C0 )#1 2'"(#)%"*
EM
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
13/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
-*+.'/012*9 521.'=@3
5)($%>B, 7$0 13766 "%"V(%&)"9 >B, 3%60('601 (%34$)1)"9 1)^01 73%"9 ED #% FR
"'(60%#)&01 (747-60 %2 3%&'67#)"9 90"0 7(#)C)#: #/$%'9/ #/0 -6%(@790 %2 #/0 #$7"167#)%"
4$%(011 %2 7 90"0Z1 3>B, #% )#1 4$%#0)" 4$%&'(# _Q1#0660$ 5[ FMEE`* , 3)>B, -)"&1 #% )#1
#7$90# 3>B, -: (%346030"#7$)#: %2 10G'0"(01 /7340$)"9 #/0 4$%9$011)%" %2 #/0 4%6:404#)&0
06%"97#)%" -: #/0 $)-%1%30 7"&h%$ 4$%3%#)"9 (607C790 7"& &09$7&7#)%" %2 #7$90#0& #$7"1($)4#1*
T/)1 $09'67#%$: (747-)6)#)01 37@0 3)>B, )"#0$01#)"9 3%60('601 71 #7$90#1 2%$ 3%60('67$
#/0$74)01 ?/)(/ $0G')$0 &)$0(#0& 1)60"()"9 %$ 7(#)C7#)%" %2 90"01 7"& 4%11)-60 9%%& (6)")(76
&)10710 -)%37$@0$1*
7'$%10$". ,+.01+0.%
, 37#'$0 3)>B, 3%60('60 )1 7 1)"960 1#$7"&0& >B, %2 ED #% FR B2)" =2(+) _-4` 60"9#/
?)#/ 7 &02)"0& )""' $09)%" 7($%11 "'(60%#)&01 F #% A %$ F #% X* T/)1 100& $09)%" /71 7 @0:
$%60 )" #/0 140()2)()#: %2 #/0 3%60('60 2%$ )#1 #7$90# 3>B, 71 )# -)"&1 )#1062 #% #/0 +Z gT> %2
#/0 30110"90$ -: 10G'0"(0 (%346030"#7$)#:* T/0 $01# %2 #/0 10G'0"(0 %2 7 3)>B, 37:
761% )"#0$7(# ?)#/ #/0 30110"90$ >B, -: (%346030"#7$)#: )" 7" 7&&)#)C0 ?7:* T/0 1#$%"90$
#/0 (%346030"#7$)#: %2 7 3)>B, 2%$ )#1 #7$90#[ #/0 3%$0 4$%-7-)6)#: %2 #/0 #7$90# #% -0(607C0& -: "'(607101 $0($')#0& -: #/010 3)>B,V3>B, )"#0$7(#)%"1*
!"#$%& + !(--"+"$7 -%+.) %- .(*+%/01 23%$6 (7) B(%6"$")() =+%*"))
802%$0 -0(%3)"9 7 1)"960 1#$7"&0& 3%60('60[ 3)>B,1 1'220$ 7 10$)01 %2 3%&)2)(7#)%"1
2$%3 7" %$)9)"76 >B, A2(+=($ %$ 6%%4 1#$'(#'$0 _ =+(V3)>B,` 1:"#/01)^0& -: >B,
4%6:30$710 !! ]%"0 %2 #/0 4%6:30$7101 )" (/7$90 %2 #$7"1($)-)"9 JB, 90"01 )"#% #/0)$
EE
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
14/63
8+2$*()*% !9 :%+;$
>B, 2%$3] _S)9* R`*
T/010 6%%41 (7" -0 2%$30& 0)#/0$ -: 140()2)(766: 0"(%&0& 3)>B, 90"01 )" #/0 1% (7660&
*)#$#.*)- ,)/56)+ %$ %$)9)"7#0& 2$%3 )"#$%")( $09)%"1 %2 7 90"0 ]#/%10 )"1)&0 7 90"0 -'#
"%# (%&)"9 2%$ 4$%#0)" 7"& 146)(0& %'# %2 #/0 37#'$0 3>B, &'$)"9 3>B, 37#'$7#)%" -:
#/0 146)(0%1%30] 2%66%?)"9 7" 76#0$"7#)C0 %$ #$#7*)#$#.*)- ,)/56)+ _S)9* ;`*
T/0 0=)1#0"(0 %2 "%"V(7"%")(76 3)($%>B, 37@01 &)22)('6# 3)>B, 4$0&)(#)%" -:
(%34'#7#)%"76 769%$)#/31 47$1)"9 #/0 90"%30[ 71 #/0: (7" -0 2%'"& "%# 71 7 1)"960 '")G'0
JB, 207#'$01 -'# %-2'1(7#0& )"1)&0 %#/0$ @"%?" %"01* T/)1 7140(# )1 %"0 %2 #/0
2'"&730"#76 3%#)C7#)%"1 #/7# 37@0 3)>B, &)1(%C0$: 2$%3 13766 >B, 10G'0"()"9 7"
7##$7(#)C0 744$%7(/ #% 6%%@ 2%$ #/03*
82'9%*%,2,
S)$1#[ )" #/0 "'(60'1 %2 #/0 (066[ >B, 4%6:30$710 !! 1:"#/01)^01 7" >B, 3%60('60 -710& %"
#/0 90"%30 10G'0"(0* T/)1 (7" -0 2$%3 0)#/0$ 7 3)>B, 90"0 _ =+(V3)>B,` %$ 7 4$%#0)"V
(%&)"9 90"0 )" ?/)(/ )"#$%"1 7$0 4$010"# 1/%$# /7)$4)"1 _6%%41` #/7# (7" -0 4$%(0110& -:
J$%1/7* J$%1/7 )1 7 "'(60710 4$%#0)" )" (/7$90 %2 0=()1)"9 #/0 6%%4 2$%3 #/0 >B, 1#$'(#'$0
2%$30& -: #/0 4%6:30$710 $01'6#)"9 )" 7 137660$ 6%%4 (7660& =+"V3)>B,*
!# )1 )34%$#7"# #% "%#0 #/7# 2$%3 07(/ =+"V3)>B, 1#$'(#'$0 7 #%#76 %2 F 37#'$0 3)>B, (7"
-0 4$%&'(0&[ %"0 2$%3 07(/ 0=#$030 %2 #/0 6%%4[ 9)C)"9 467(0 #% #/0 1%V(7660& ;4V3)>B,
%$ +4V3)>B,*
T/0 =+"V3)>B, (7" -0 0=4%$#0& 2$%3 #/0 "'(60'1 #% #/0 (:#%46713 -: 7 #$7"14%$#0$ 4$%#0)"
(7660& Q=4%$#)" ; _dNP;`* !" #/0 (:#%46713[ 7 4$%#0)" (%3460= )1 $0($')#0& )"(6'&)"9 J)(0$
7"& 7" ,aPEVR 4$%#0)"1* ,aP $0(%9")^01 #/0 &%'-60 1#$7"&0& 47$# %2 #/0 =+"V3)>B,
?/)60 J)(0$ &%01 #/0 1730 ?)#/ #/0 6%%4* c)#/ )#1 "'(60710 7(#)C)#:[ J)(0$ -$07@1 #/0 6%%4607C)"9 #/0 1#$7"&0& 47$# %2 #/0 3%60('60 ?)#/ ,aP ?/)(/ #/0" &0()&01 ?/)(/ 37#'$0
3)>B, 10G'0"(0 @0041 7"& ?/)(/ %"0 6)-0$7#01 #% -0 &09$7&0&* P"(0 7 1)"960 1#$7"&0&
>B, (%$$014%"&)"9 #% 7 37#'$0 3)>B, )1 '")#0& #% ,aP[ #/0 $0($')#30"# %2 #/0 >!HL
(%3460= ]#/7# ?)66 )"#0$7(# ?)#/ #7$90# 3>B,] )1 4$%&'(0&*
>037$@7-6:[ "%# 76?7:1 -%#/ 37#'$0 3)>B, #/7# (7" -0 %$)9)"7#0& 2$%3 7 1)"960 4$0V
3)>B, 6%%4 /7C0 7 2'"(#)%"76 $%60* !" 1%30 (7101[ %"0 %2 #/0 #?% 3%60('601 )1 $74)&6:
EF
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
15/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
&09$7&0& 7"& %"6: %"0 -0(%301 7 2'"(#)%"76 3)>B,* !" %#/0$ (7101[ -%#/ 3%60('601 (7"
-0 2'"(#)%"76 7"& ,aP 4$%#0)" &0()&01 ?/)(/ %"0 @0041 ?)#/ &)220$0"# 4$%-7-)6)#)01 -710&
%" #/0 10G'0"(0 %2 #/0 1#$'(#'$0*
!"#$%& , I2$%$(*23 2$' $%$B, #$7"167#)%"*
T/0$02%$0[ #/0 4$010"(0 %2 #/0 3)>B, )347)$1 #/0 0=4$011)%" %2 )#1 #7$90# 90"01*
T/0 (6711)(76 30(/7")13 )1 #% &)22)('6# #/0 4$%9$011)%" %2 #/0 $)-%1%30 76%"9 #/0 3>B,
#/7# )1 -%'"& #% #/0 3)>B, -: (%346030"#7$)#: %2 10G'0"(01[ 1#766)"9 #/0 4$%#0)" 1:"#/01)1
'"#)6 #/0 $)-%1%30 '"-)"&1 #/0 30110"90$ 3%60('60* !" (7101 ?/0$0 10G'0"(0
(%346030"#7$)#: -0#?00" #/0 3)>B, 7"& )#1 #7$90# )1 /)9/[ #/0 4$%#0)" (%3460= $0($')#0&
-: 3)>B, _ /@GI ̀ (7" 2%$(0 #/0 (607C790 %2 #/0 30110"90$ >B,[ 06)3)"7#)"9 )" 27(# #/0
E+
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
16/63
8+2$*()*% !9 :%+;$
0=4$011)%" %2 #/7# 90"0*
!# /71 -00" &01($)-0& 71 ?066 #/0 )347)$30"# %2 #/0 4$0V)")#)7#)%" (%3460= 2%$37#)%"[
$0G')$0& 2%$ #/0 #$7"167#)%" %2 3>B, 2$%3 #/0 $)-%1%30 -02%$0 #/0 $)-%1%30 )#1062 )1
$0($')#0&* P#/0$ 3%$0 0=%#)( )"C01#)97#0& 30(/7")131 )"(6'&0 #/0 $0($')#30"# %2 4$%#07101
#/7# 37: &09$7&0 #/0 4$%#0)" 1:"#/01)^0& -: #/0 $)-%1%30 71 1%%" 71 )# (%301 %'# %2 #/0
$)-%1%376 (%3460=[ %$ &)$0(#6: -6%(@ #/0 -)"&)"9 %2 #/0 XMH $)-%1%30 #% #/0 "'(60%#)&0
(/7)"*
!# )1 )34%$#7"# #% $037$@ #/7# #/0 3)>B, -: )#1062 /71 "% 2'"(#)%" ?)#/%'# #/0 4$%#0)"1 7"&
4$%#0)")( (%3460=01 #/7# )# $0($')#1 #% 7220(# )#1 #7$90#1* T/010 4$%#0)" (%3460=01 (7$$:)"9
#/0 3)($%>B, 0=4%10 #/0 100& $09)%" %2 #/0 10G'0"(0 #% 766%? 2%$ (%346030"#7$)#:
/:-$)&)^7#)%" #% #/0 #7$90# 7"& #% &)$0(# #/03106C01 #% #/0 $09'67#)"9 3>B, 10# &)(#7#0& -:
&)220$0"# 3)>B, 1)9"7#'$01 4$%9$7330& -: #/0 (066 )" 7 (%"#0=#V140()2)( 37""0$
_&040"&0"# %" #)11'0[ &0C06%430"#76 1#790i`*
!"#$%& - G*A".27(* %- 7A" '(--"+"$7 '")*+(B"' ."*A2$().) %- 2*7(%$ -%+ .(*+%/01
ER
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
17/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
8.2%( 42,+'.: '( *01$%21 "12/ ,%B0%*12*9
C"*9%. ,%B0%*12*9
!" EDAA[ S$0&0$)(@ H7"90$ &0C06%40& 7 30#/%& 2%$ JB, 10G'0"()"9 ?/)(/ "%?7&7:1 )1
1#)66 (%"1)&0$0& 71 #/0 9%6& 1#7"&7$& )" #/0 )"&'1#$: _H7"90$ S[ 0# 76* EDAA`* !# )1 "%# 1(767-60
-'# (%"C0")0"# 2%$ 13766V1(760 4$%O0(#1* !#1 90"0$7#0& $07&1 7$0 $067#)C06: 6%"9 7"& )1 '10&
71 7 (%"2)&0"# #0(/")G'0 )" %$&0$ #% C76)&7#0 2)"&)"91 &%"0 -: %#/0$ 744$%7(/01 &'0 #% )#1
1)346)()#: 7"& $06)7-)6)#:*
P$)9)"76 H7"90$ 10G'0"()"9 (%"1)1#1 )" 2%'$ 1047$7#0& $07(#)%"1 )" ?/)(/ 7 JB, #03467#0[
JB, 4$)30$ 7"& 7 4%6:30$710 7$0 3)=0& #%90#/0$ ?)#/ 7 4%%6 %2 &BTN ]
&0%=:"'(60%1)&0#$)4/%14/7#01 %$ JB, -7101K &8TN[ &9TN[ &:TN 7"& &;TN] 7"& 7
(%"C0")0"# 73%'"# %2 &&BTN _&)&0%=:"'(60%#)&01` 2%$ 07(/ &)220$0"# $07(#)%"* &&BTN1
67(@ +ZVP\ 9$%'4[ ?/)(/ )"/)-)#1 #/0 (747-)6)#: %2 0=#0"& #/0 JB, 10G'0"(0 -: #/0
4%6:30$710 ?/0" )# )1 )"(%$4%$7#0& #% #/0 4%6:30$)^7#)%" $07(#)%"* &&BTN 73%'"# '10& )1
1'22)()0"# #% 1#)66 766%? 2%$ #/0 0C0"#'76 1:"#/01)1 %2 #/0 6%"9 %$)9)"76 #$7"1($)4# O%)"#6: ?)#/
766 &)220$0"# 4%11)-60 1/%$#0$ 60"9#/1 10G'0"(01 )" 7 4$%-7-)6)1#)( ?7:*
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
18/63
8+2$*()*% !9 :%+;$
4%4'67$)^0& )" #/0 ")"0#)01 ?)#/ #/0 \'37" a0"%30 N$%O0(# 7"& 1)3)67$ 4$%O0(#1*
, #'"0& C0$1)%" %2 #/)1 #0(/")G'0 )1 A("+2+*A(*23 )A%765$ )"45"$*($6[ (%"1)1#)"9 %" #/0
1060(#)%" %2 #/0 3)")3'3 "'3-0$ %2 2$7930"#1 #/7# (%C0$ #/0 0"#)$0 90"%30 #% 7(/)0C0
3%$0 #/$%'9/4'# ?)#/ 6011 )"2$71#$'(#'$0[ #/%'9/ $0G')$)"9 3%$0 (%3460= 769%$)#/31*
@%07(#)%" _NL>` _H7)@) >j[ 0# 76* EDXX`*
J014)#0 NL> /71 -00" 7 9$07# #%%6 )" 3%60('67$ -)%6%9:[ 7 @"%?" 4$%-603 ?)#/ #/7#
#0(/")G'0 )1 #/0 2%$37#)%" %2 (/)30$)(76 10G'0"(01 -: /:-$)&)^7#)%" %2 '"140()2)(
2$7930"#1 76%"9 #/0 )#0$7#)C0 4$%(01101 %2 /:-$)&)^7#)%"V&0"7#'$76)^7#)%" %2 JB, 1#$7"&1*
P-C)%'16:[ #/7# )1 7 ")9/#37$0 )" #/0 10G'0"()"9 2)06&*
H%6)&V4/710 7346)2)(7#)%" 766%?1 #% 2)= JB, 2$7930"#1 %" 7 1'$27(0[ #/0$02%$0 #/0: (7""%#
/7C0 4/:1)(76 (%"#7(# ?)#/ %#/0$ 2$7930"#1 &'$)"9 #/0 7346)2)(7#)%" 4$%(011 7"& 07(/
06%"97#)%" %2 7 JB, 1#$7"& (7" %(('$ )"&040"&0"#6: 2$%3 766 #/0 %#/0$1[ ?)#/ &)220$0"#
#034%1* T/)1 1)#'7#)%" )1 )&076[ 1)"(0 )# 60#1 '1 #% 47$76606)^0 #/0 4$%(011 7"& 37@0 )# 7 /)9/V
#/$%'9/4'# #0(/")G'0*
T/0 4$)"()476 &$7?-7(@ %2 BaH #0(/"%6%9: )1 #/0 60"9#/ %2 #/0 %-#7)"0& 10G'0"(01* c)#/
H7"90$[ #/0 9%6& 1#7"&7$&[ :%' (7" 90# 67$90 2$7930"#1 %2 -710 47)$1 7# #/0 0=40"10 %2 )#1
(%1# )" #0$31 %2 #)30 7"& 3%"0:* !" #/0 BaH ?%$6&[ ?/0$0 ?0 90# $07&1 %2 F;MV;MM-4 ?)#/
R;R _>%(/0` -710& %" 03'61)%" NL> 7"& 4:$%10G'0"()"9 %$ +MVE;M-4 ?)#/ !66'3)"7
_H%60=7` -710& %" -$)&90 NL> 7"& 10G'0"()"9 -: 1:"#/01)1[ (%34'#7#)%"76 769%$)#/31 7$0"0(0117$: #% 90# )"2%$37#)%" %2 #/0 1734601 7"76:^0&* ,"& /0$0 )1 ?/0$0 1/%$# $07& 76)9"0$1
/7C0 7" )34%$#7"# $%60*
a)C0" 7 $020$0"(0 90"%30[ 1/%$# $07& 76)9"0$1 #$: #% 374 BaH $07&1 #% &)220$0"# $09)%"1
766%?)"9 &0C)7#)%"1 2$%3 #/0 $020$0"(0* H/%$# 60"9#/ %2 10G'0"(01[ 4$%-7-)6)#: %2 0$$%$1 )"
#/0 $07& 7"& #/0$02%$0 0$$%$1 )" #/0 3744)"91 7$0 %C0$(%30 -: ?%$@)"9 ?)#/ 7 6%# %2 &7#7 )"
%$&0$ #% 3)")3)^0 0$$%$ )" 7 4$%-7-)6)1#)( ?7:* H% ?0 3'1# -0 1'$0 %'$ $09)%" %2 )"#0$01# )1
EI
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
19/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
10G'0"(0& 0"%'9/ #)301 #% 90# 7 (%"10"1'1 -0#?00" &)220$0"# $07&1 )" %$&0$ #% (766 7 -710[
?/7# )1 (7660& 173460 (%C0$790*
T/0$0 7$0 7 6%# %2 769%$)#/31 '10& #% 76)9" 10G'0"(01 #% $020$0"(01[ 2$%3 16%? -'# $06)7-60
%"01 6)@0 8
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
20/63
8+2$*()*% !9 :%+;$
EX
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
21/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
D2,1'A%.: '( 52=@3 (.'5 1'560+"+2'*"$ "66.'"14%,
\)9/6: (%"10$C0& 4$)37$: 10G'0"(01 O%)"#6: ?)#/ (/7$7(#0$)1#)( 10(%"&7$: 1#$'(#'$0 %2 3)>B,
7"& #/0)$ 4$0('$1%$1 7$0 '10& 76#%90#/0$ #% 2)"& "%C06 3)($%>B, 90"01* 5'6#)460 1#$7#09)01 7$0
7C7)67-60 #% &)1(%C0$ "%C06 3)>B,[ 0)#/0$ 2$%3 /%3%6%9: 2%'"& -: 76)9"30"# %2 @"%?"
3)>B, )" %#/0$ 140()01 )"#% %'$ #7$90# 90"%301[ (%34'#7#)%"766: 47$1)"9 #/0 90"%30 6%%@)"9
2%$ $0(%9")^7-60 47##0$"1 %2 3)>B,V6)@0 $09)%"1 %$ &)99)"9 )"#% )1%67#0& 10G'0"(01 2%'"& )"
>B, 10G'0"()"9 6)-$7$)01 90"0$7#0& )" #/0 ?0# 67-%$7#%$:*
D2,1'A%.: #: ('.>"./ 9%*%+21,E 0& +)1) "*/ #: 4'5'$'9:
P"0 744$%7(/ )1 #% 76)9" 76$07&: @"%?" 3)>B, 4$0('$1%$ 10G'0"(01 )" %#/0$ 140()01 #% #/0&01)$0& %$97")13 90"%30 )" %$&0$ #% 2)"& /%3%6%9)01 C)7 6%(76 76)9"30"# #0(/")G'01 6)@0
8
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
22/63
8+2$*()*% !9 :%+;$
)'560+"+2'*"$ 6.%/21+2'* #: 5"142*% $%".*2*9
c)#/ #/0 0=)1#)"9 10# %2 @"%?" 3)>B,[ 769%$)#/31 -710& %" 37(/)"0 607$")"9 (7" -0
#$7)"0& #% &0#0(# 140()2)( 207#'$01 #/7# 37@0 7 90"%3)( 10G'0"(0 7 9%%& (7"&)&7#0 #%0C0"#'766: -0(%30 7 3)>B,* S%6&V-7(@[ (%"10$C7#)%" )"2%$37#)%" %2 10G'0"(0 7"&
10(%"&7$: 1#$'(#'$0 (7" -0 0C76'7#0& #% (6711)2: >B, 1#$'(#'$01 )"#% 3)>B,V6)@0
10G'0"(01 %$ "%#* a0"%30 10G'0"(01 %$ $07&1 4$%(00&)"9 2$%3 /)9/V#/$%'9/4'#
10G'0"()"9 0=40$)30"#1 (7" -0 '10& 2%$ #/)1 @)"& %2 &0#0(#)%"*
, 4%1#0$)%$ (%347$)1%" ?)#/ 76$07&: 7""%#7#0& 10G'0"(01 3'1# -0 &%"0 #% @004 '"&0$
(%"#$%6 27610 4%1)#)C01 7"& #7@0 (7$0 %2 4%11)-60 &'46)(7#)%" 0"#$)01 )"#% 7""%#7#)%"
&7#7-7101* H3766 >B, 6)-$7$)01 2%$ #/0 %$97")13[ )2 7C7)67-60[ (7" -0 '10& #% 71101 )2 #/0(7"&)&7#01 7$0 -0)"9 2%'"& 7# #/0 >B,V60C06 )" 7C7)67-60 1734601 71 7 4$%%2 %2 #/0)$
(747-)6)#: %2 -0)"9 7" '")&0"#)2)0& 3)>B,*
-/%*+2(21"+2'* (.'5 ,5"$$ =@3 ,%B0%*12*9 #",%/ '* " .%(%.%*1%9%*'5%
,"%#/0$ 744$%7(/ #% #/0 )&0"#)2)(7#)%" %2 "%C06 3)>B, )1 -710& %" #/0 90"0$7#)%" %2
6)-$7$)01 %2 13766 >B, 10G'0"()"9* T/010 6)-$7$)01 (%"1)1#1 %" #/0 )1%67#)%" %2 13766
2$7930"#1 %2 >B, 2%'"& )" 7 173460 4$)%$ #% #/0)$ 10G'0"()"9 -: B0=# a0"0$7#)%"
H0G'0"()"9 30#/%&1 _BaH`* P"(0 #/0 10G'0"()"9 $07&1 7$0 %-#7)"0&[ #/0: 7$0 76)9"0&
797)"1# #/0 $020$0"(0 90"%30 2%$ #/0 %$97")13 #/0: 4$%(00& 7"& $09)%"1 (%C0$0& -: #/%10
$07&1 7$0 )&0"#)2)0& )"#% #/7# 90"%30*
5)($%>B,V6)@0 $09)%"1 7$0 071)6: &0#0(#0& ?/0" -%#/[ ;4 7"& +4 3)>B, 7$0 2%'"&[ 71
#/0: 7$0 $067#)C06: (6%10 #% 07(/ %#/0$ )" #/0 90"%30 O'1# 1047$7#0& #% 766%? #/0 2%$37#)%"
%2 #/0 3%$0 %$ 6011 (%3460= >B, 6%%4[ 9)C)"9 71 7 $01'6# #?% (6%106: 6%(7#0& 407@1 %2
(%C0$790 %2 744$%=)37#06: FF "'(60%#)&01 07(/ %"0*
Q=46%$7#)%" %2 13766 >B, 6)-$7$)01 ?)#/ B0=# a0"0$7#)%" H0G'0"()"9 #0(/")G'01 (7" -0
&01)$7-60 #% )&0"#)2: 6%? 0=4$0110& 3)>B, 7"& &)22)('6# #% C76)&7#0 (7"&)&7#0 $09)%"1*
BaH #0(/")G'01 60# '1 C)0? )2 #/010 10G'0"(01 0=)1# %$ "%# )" %'$ 6)-$7$: 7# 7 967"(0* T/)1
4%)"# )1 #/0 %"0 #/7# #/)1 4$%O0(# ?)66 4'# #/0 2%('1 )"*
FM
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
23/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
F.'G%1+ HA%.A2%>
,1 ?0 /7C0 4$0C)%'16: 100"[ 10G'0"(0 %2 3)>B, 3%60('601 )1 3%1#6: (%"10$C0& )" )#1 (0"#$76$09)%" ])"(6'&)"9 #/0 100&] ?)#/ #/0 0=(04#)%" %2 1%30 4%6:3%$4/)13[ ?/)60 ;Z 7"& +Z 0"&1
%2 #/0 "'(60)( 7()& (/7)" 7$0 3%$0 C7$)7-60[ ?/)(/ $01'6#1 )" #/0 0=)1#0"(0 %2 7 10# %2 1% (7660&
)1%3)>1 2%$ 07(/ 3)>B, _B0)610" LT[ 0# 76* FMEF`* 3)>8710[ T/0 7'#/%$)#7#)C0 3)>B,
&7#7-710 (%"#7)"1 766 #/0 @"%?" 10G'0"(01 2%'"& )" #/0 6)#0$7#'$0 %$ 4$0&)(#0& -:
(%34'#7#)%"76 30#/%&1 _a$)22)#/1Vb%"01 H[ 0# 76* FMMI`*
P" 7"%#/0$ "%#0[ 13766 >B,V10G )1 7 /)9/V#/$%'9/4'# #0(/")G'0 #/7# 766%?1 #/0 10G'0"()"9 %2
1/%$# >B, 3%60('601 4$010"# )" 7 4$047$0& 173460 )"C%6C)"9 #/0)$ 7346)2)(7#)%" #% 3)")3)^0#/0 $)1@ %2 %-#7)")"9 0$$%"0%'1 $07&1 -: 1)3'6#7"0%'16: 10G'0"()"9 #/0 1730 3%60('60
3'6#)460 #)301*
g4 #% &7#0[ 0=)1#0"# 76)9"30"# #%%61 #7@0 1'-10G'0"(01 %2 10G'0"(01 #% 76)9" 71 #/0 *%+"
+"6(%$[ 7"& '1'766: #/)1 47$# )1 1060(#0& 2$%3 #/0 3%1# $06)7-60 47$# %2 7 $07&K )#1 -09)"")"9[
?/0$0 -710 G'76)#)01 7$0 #/0 -01#* 5744)"9 #/)1 *%+" +"6(%$ #% 90"%3)( (%%$&)"7#01 ]#/0 1%
(7660& )""' 1#04] )1 1#$7)9/#2%$?7$&[ '1)"9 0)#/0$ /71/)"9 #0(/")G'01 %$ 8'$$%?1Vc/0060$
#$7"12%$3 769%$)#/31 _8'$$%?1 5[ 0# 76* EDDR`* T/0"[ 7" "K7"$)(%$ )7"= #7@01 (/7$90 %237#(/)"9 #/0 $037)"&0$ %2 #/0 $07& #% #/0 1060(#0& 90"%3)( 6%(7#)%" )" %$&0$ #% C76)&7#0 %$ #%
&)1(7$& #/0 4%11)-60 37#(/* T/)1 "K7"$)(%$ )7"= )1 (%34'#7#)%"766: 0=40"1)C0[ 140()766: )"
67$90 90"%301 )" ?/)(/ $040#)#)C0 10G'0"(01 7$0 2$0G'0"# 7"& 607& #% 3'6#)460 ?$%"9 90"%3)(
6%(7#)%"[ 6)@0 #/0 /'37" 90"%30*
c)#/ 13766 >B,V10G[ %2#0" $07&1 7$0 "%# 766)9"0& &)$0(#6: #% #/0 0"#)$0 90"%30[ -'# #% 7
('$7#0& &7#7-710 %2 76$07&: @"%?" 3)>B,[ 6)@0 3)$8710* !" #/0 (710 %2 3)>B,[ #/0 0=#$0301
%2 #/0 10G'0"(0 3)9/# "%# -0 (%"10$C0&[ 4%11)-6: )"#0$20$)"9 ?)#/ )")#)76 1#041 %2 #$7&)#)%"7676)9"30"# 1%2#?7$0[ 7"& #/0 27(# #/7# 3)>B, 7$0 $04$010"#0& -: C0$: 13766 10G'0"(01 (7"
&)22)('6# #/0 76)9"30"#*
>0(0"#6:[ 7 3'6#)V100& 1#$7#09: /71 -00" 4'-6)1/0& 2%$ 3744)"9 $07&1 #% 7 $020$0"(0 90"%30
'1)"9 #/0 )""'
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
24/63
8+2$*()*% !9 :%+;$
2%$ 90"%3)( 6%(7#)%"1[ 7"& #/7# 90"%3)( 6%(7#)%" #/7# /71 3%$0 C%#01 )1 #/0 7((04#0& %"0[ ?)#/
#/0 2)"76 76)9"30"# &)$0(#6: (%34'#0& (%'"#)"9 1'-$07& 3744)"91 06)3)"7#)"9 )" 9$07#
3071'$0 #/0 %C0$6%7& %2 #/0 0=#0"1)%" 1#04*
!" #/)1 4$%O0(#[ ?0 ?)66 '10 #/0 )""'
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
25/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
! H%2#?7$0 #01#)"9iiiiiiiiiiiiiiiii ;M /
! J%('30"#7#)%"iiiiiiiiiiiiiiiiiEMM /
:?@ 5
I1'*'521 #0/9%+
c)#/ R;M / )" 3)"&[ 7" 0(%"%3)( -'&90# (7" -0 &%"0 #% 01#)37#0 #/)1 4$%O0(#Z1 (%1# )" (710
)# 1/%'6& -0 0=#0$"76)^0& -: 7 (%347":*
T/0 -)%6%9: %C0$C)0? (%301 2%$ 2$00[ 71 0&'(7#)%"76 (%1# )1 "%$3766: 47$# %2 )"&)C)&'76
2%$37#)%" 7"& "%# 711'30& -: #/0 ('1#%30$* S%$ #/0 1%2#?7$0 &01)9"[ &0C06%430"# 7"#)"9[ 7" 7&C7"(0& -)%)"2%$37#)()7" %$ (%34'#7#)%"76 -)%6%9)1# )1 "00&0& ?)#/ 7"
01#)37#0& (%1# %2 IM lh/* T/0 &%('30"#7#)%" %2 #/0 4$%O0(# (7" -0 067-%$7#0& -: 7
&%('30"#7$)1# %$ 7&3)")1#$7#)C0 ?%$@0$ ?)#/ @"%?60&90 )" (%34'#)"9 ?)#/ 7" 01#)37#0&
(%1# %2 +M lh/*
! 8)%6%9: %C0$C)0? _M lh/ = F;/`iiiiiiiiiii* M l
! H%2#?7$0 &01)9" _IM lh/ = A;/`iiiiiiiiiiR[;MM l
! H%2#?7$0 &0C06%430"# _IM lh/ = FMM/`iiiiii* EF[MMM l
! H%2#?7$0 #01#)"9 _IM lh/ = ;M/`iiiiiiiiii+[MMM l
! J%('30"#7#)%" _+M lh/ = EMM/`iiiiiiiiii+[MMM l
:(%.2+F"+) 2$' G5)*"=7(B(3(7, H$(7 2$%3 #/0 I27232$ @$)7(757" %- J$*%3%6,9
F+
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
26/63
8+2$*()*% !9 :%+;$
F.'6',%/ 626%$2*% '0+$2*%
S%$ #/0 17@0 %2 1)346)()#:[ 4$04$%(011)"9 1#041 7$0 "%# 1/%?" )" #/0 &)79$73 -06%?[ 71 #/0:
$0G')$0 #/0 '10 %2 0=)1#0"# 1%2#?7$0 ?)#/)" 7 871/ 1($)4# 4$%4%$#)%"0& )" ,""0= ,*
FR
!"#$%& . C(="3($" 7% ('"$7(-, .(/01 -+%. ).233 /01
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
27/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
F.'G%1+ -56$%5%*+"+2'*
T/0 2)$1# 744$%7(/ (%"1)&0$0& 2%$ #/)1 4$%O0(# ?71 #% '10 47)$?)10 76)9"30"#1 73%"9 )"4'#10G'0"(01 #% 2)"& 3)>B,V6)@0 10G'0"(0 (7"&)&7#01* J'0 #% #/0 (%1# )" (%34'#7#)%"76 #)30 2%$
#/)1 744$%7(/[ )# ?71 $74)&6: &)1(7$&0& )" 27C%$ %2 #/0 )""'B,V6)@0 (7"&)&7#0
9$%'41 7$0 &01($)-0&[ 7# #/0 1730 #)30 #/7# $060C7"# 2%$376)#)01 %2 #/010 4$%(0&'$01 7$0
%'#6)"0&*
F.%6.'1%,,2*9 '( 2*60+ J3C!K (2$%,
T/0 2)$1# 1#04 )1 #% 40$2%$3 7 G'76)#: (%"#$%6 (/0(@ #% )"4'# $07&1[ &)1(7$&)"9 #/%10 ?)#/ 6%?
G'76)#)01 %$ 267990& 71 -7& $07&1 -: #/0 10G'0"(0$* ,66 10G'0"()"9 467#2%$31 4$%C)&0 1'(/
#%%61 #% 0"1'$0 #/0 G'76)#: %2 #/0)$ 9)C0" %'#4'#* T/)1 )1 #/0 (710 %2 L,H,W, _!66'3)"7` R[
76#/%'9/ )"&040"&0"# 1%2#?7$0 761% 0=)1# 6)@0 S,HTdVT%%6@)#; %$ S71#ULI*
c)#/ $037)")"9 G'76)#: $07&1[ 7&74#%$ #$)33)"9 3'1# -0 40$2%$30& #% $03%C0 2$%3 #/0
10G'0"(01 #/0 6)"@)"9 7&74#%$1 )"#$%&'(0& &'$)"9 #/0 6)-$7$: 4$047$7#)%" $0G')$0& 2%$ #/0
10G'0"()"9 7346)2)(7#)%" 4$%(011* H%30#)301[ 761% -7$(%&01 7$0 )"(6'&0& )" #/010 7&74#%$1 #%
)&0"#)2: 10G'0"(01 4$%C)&)"9 2$%3 3'6#)460 &)220$0"# 1734601 10G'0"(0& 7# %"(0[ ?/7# ?0 (766
3'6#)460=0& 10G'0"()"9 ]#/)1 9)C01 (/0740$ 10G'0"(0$ $'"1 7# #/0 0=40"10 %2 6%?0$ 40$V
173460 (%C0$790]*
R /##41Khh1'44%$#*)66'3)"7*(%3h10G'0"()"9h10G'0"()"9m1%2#?7$0h(717C7*/#36; /##4Khh/7""%"67-*(1/6*0&'h271#=m#%%6@)#hI /##4Khh???*-)%)"2%$37#)(1*-7-$7/73*7(*'@h4$%O0(#1h271#G(h
F;
https://support.illumina.com/sequencing/sequencing_software/casava.htmlhttp://hannonlab.cshl.edu/fastx_toolkit/http://www.bioinformatics.babraham.ac.uk/projects/fastqc/https://support.illumina.com/sequencing/sequencing_software/casava.htmlhttp://hannonlab.cshl.edu/fastx_toolkit/http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
28/63
8+2$*()*% !9 :%+;$
, #/)$& %4#)%"76 4$04$%(011)"9 1#04 )1 #% 2)6#0$ $07&1 -710& %" $07& 60"9#/* !" #/)1 (710[ 71 ?0
7$0 100@)"9 $07&1 (%34$)10& -0#?00" EI 7"& +M "'(60%#)&0 -7101[ ?0 &)1(7$& 766 $07&1 7-%C0
+M 7"& -06%? EI -7101 60"9#/* T/)1 1/%'6& /064 #% 7C%)& "%)10 2$%3 47$#)766: &09$7&0& >B,0C0"#'766: (74#'$0& &'$)"9 6)-$7$: 4$047$7#)%"*
=%"/ 1'$$"6,2*9 "*/ .%6.%,%*+"+2A2+: (2$+%.2*9
!" %$&0$ #% 17C0 (%34'#)"9 #)30[ #/0 2)$1# 1#04 %2 %'$ 76)9"30"# 1#$7#09: (%"1)1#1 %" (%66741)"9
766 )&0"#)(76 $07&1* S%$ (67$)#:[ )" #/)1 4$%O0(#[ 7 $07& )1 &02)"0& 71 7 4)0(0 %2 )"4'# 7"& 7
10G'0"(0 71 7 '")G'0 (/7$7(#0$ 1#$)"9 #/7# $04$010"#1 7 $07&* T/0$02%$0[ 3'6#)460 $07&1 (7" -0
$04$010"#0& -: 7 '")G'0 10G'0"(0[ 7"& %"0 10G'0"(0 (7" -0 4$010"# 3%$0 #/7" %"(0 )" #/0
)"4'# ?/)60 -0)"9 (%"1)&0$0& %"0 '")G'0 10G'0"(0*
Q=4$011)%" 4$%2)601 %2 3)($%>B, 7$0 '1'766: %2 )"#0$01#* S%$ #/)1 $071%"[ @004)"9 $0(%$& %2
#/0 "'3-0$ %2 #)301 7 10G'0"(0 74407$1 )" #/0 )"4'# )1 "0(0117$:* H%30 $07&1[ 7"& 1%30#)301
7 6%# %2 #/03[ 37: 74407$ 7 $0766: 6%? "'3-0$ %2 #)301 _)*0* %"0`* T7@)"9 )"#% 7((%'"# #/7# #/)1
&7#7 (%301 2$%3 7" 7346)2)(7#)%" 4$%(011[ ?/0$0 #/0 10G'0"(0$ 7346)2)01 #/0 173460 6)-$7$:
-02%$0 $0766: 40$2%$3 7": 10G'0"()"9[ #/)1 )1 7 1#$7"90 (710 4%)"#)"9 #% 1%30 4%11)-60 0$$%$
0)#/0$ 2$%3 #/0 10G'0"(0$ %$ #/0 NL> 1#04 _j0-1(/'66 b5[ 0# 76* FME;`* !" (%"10G'0"(0[
10G'0"(01 74407$)"9 6011 #/7" 7 #/$01/%6& (7" -0 2)6#0$0& %'# -02%$0 9%)"9 7": 2'$#/0$* 8:
FI
!"#$%& / O+2=A(*23 +"=+")"$727(%$ %- +"2' *%332=)($6 .%7(&27(%$
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
29/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
&027'6#[ )" #/0 1%2#?7$0 &0C06%40& )" #/)1 4$%O0(# #/)1 $04$010"#7#)C)#: C76'0 )1 10# #% ; #)301*
,": 10G'0"(0 ?)#/ 6011 %$ 0G'76 #/7" ; $07&1 )1 "%# 9%)"9 #% -0 4$%(0110&*
3$29*5%*+ ,+."+%9:
D% 8.02G* 9."64 1'*,+.01+2'* "*/ 1'*+29 ",,%5#$:
c)#/ (%667410& $07&1 )"#% 10G'0"(01 7"& 72#0$ 2)6#0$)"9 -: $04$010"#7#)C)#: #/$01/%6&[
10G'0"(01 7$0 #/0" '10& #% 40$2%$3 7 &0 8$')O" 9$74/ ?)#/ #/0)$ F V30$1* , &027'6# F C76'0
%2 EE )1 #7@0"* T/)1 C76'0 )1 13766 0"%'9/ #% (74#'$0 C7$)7-)6)#: )" 10G'0"(01 7"& -)9
0"%'9/ #% 6)3)# #/0 0=#0"# %2 14'$)%'1 (%"1#$'(#0& (%"#)91[ &'0 #% #/0 27(# #/7# %'$
10G'0"(01 7$0 -0#?00" EI 7"& +M "'(60%#)&01 60"9#/*
P"(0 7 &0 8$')O" 9$74/ )1 -')6#[ #/%10 "%&01 #/7# (%30 2$%3 6011 #/7" 7 #/$01/%6& 10G'0"(01
7$0 $03%C0& 2$%3 #/0 9$74/* T/0 &027'6# #/$01/%6& C76'0 )1 F* T/)1 6)3)#1 2'$#/0$ 0$$%"0%'1
47#/1 9)C)"9 71 7 $01'6# (/)30$)( (%"#)91*
T/0 9$74/ )1 #/0" C)1)#0& 2$%3 07(/ %2 )#1 "%&01 '"#)6 766 #/0 '"73-)9'%'1 (%"#)91 7$0
2%'"&* g"73-)9'%'1 (%"#)91 7$0 &02)"0& -: #/0 6%"901# 47#/1 ?)#/%'# -$7"(/01 )" #/0
9$74/* T/010 (%"#)91 ?)66 -0 #/0 %"01 7$%'"& ?/)(/ 766 #/0 %#/0$ $07&1 ?)66 -0 9$%'40&
6%%@)"9 2%$ 3)>B,V6)@0 10G'0"(01*
2&&0 .,&3E -*/%
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
30/63
8+2$*()*% !9 :%+;$
&)220$0"# 2$%3 %#/0$ 4$%(01101 &01($)-0& _@nR -: &027'6# )" %'$ 1%2#?7$0`*
4),%+5 .,&3E C%B0%*1%, A'+2*9 1'*+29 1"*/2/"+%,
P"(0 #/0 (%"#)9 )"&0= /71 -00" 10# '4[ )# )1 #)30 #% 766%? #/0 10G'0"(01 #% C%#0 2%$ #/0)$
4$020$$0& (%"#)9* T/)1 )1 7(/)0C0& -: /7C)"9 07(/ F V30$ _%$ 1'-10G'0"(0 %2 60"9#/ F ̀ 2$%3
7 10G'0"(0[ 6%%@)"9 )"#% #/0 )"&0= 2%$ (%"#)91 (%"#7)")"9 #/0 F V30$ 7"& 90"0$7#)"9 7 C%#0
2%$ 07(/ %2 #/0 4%1)#)%"1 )" #/0 (%"#)9 ?/0$0 #/0 F V30$ -09)"")"9 )1 2%'"&*
, C%#0 )1 7 #'460 %2 )"#090$1 L=*P =) N[ ?/0$0 =* )1 #/0 4%1)#)%" %2 #/0 F
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
31/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
1'-10G'0"(0 %2 60"9#/ F 2$%3 ) ] %(('$1 )" #/0 10G'0"(0 )* H%[ 2%$ 766 #/0 F V10G'0"(01 #/7#
7$0 "%# F V30$1 %2 )[ #/0)$ (%%$&)"7#0 C76'0 ?)66 -0 ^0$%*
a)C0" #/0 4$0C)%'1 &02)")#)%" ?0 (7" &02)"0 7 &)1#7"(0 6)@0
'F")E[ )F#$%
($E
RF
" =")E#(& =")F#(#F _E`
T/0 2%$3'67 )" _E` )1 (7660& #/0 G452+"' R5*3('(2$ '()72$*" -0#?00" =L)S N 7"& =L)T N* !#
1/%'6& -0 "%#0& #/7# 2%$ (%347$)1%" 4'$4%101[ #/)1 &)1#7"(0 )1 1'22)()0"# ?)#/%'# #7@)"9 #/0
1G'7$0 $%%#[ -'# )# )1 "%# 7 7+5" )727()7(*23 '()72$*"[ #/7# 307"1 )# "%# 17#)12)01 #/0 #$)7"960
)"0G'76)#: _c' Tb[ 0# 76* EDDA`* T$)7"960 )"0G'76)#: 1#7#01 #/7# 7A" )5. %- 7A" 3"$67A) %- 2$,
7E% )('") .5)7 B" 6+"27"+ 7A2$ %+ "4523 7% 7A" 3"$67A %- 7A" +".2($($6 )('" _5%/7330&
,j[ 0# 76* FMME`* S%$ 7 #$'0 1#7#)1#)(76 &)1#7"(0[ ?/0" "00&0&[ #/0 1G'7$0 $%%# %2 #/0 C76'0
1/%'6& -0 #7@0"*
a)C0" #/0 "7#'$0 %2 %'$ 4$%-603[ ?/0$0 10G'0"(01 37: -0 1/%$# 7"& C7$)7-60 )" 60"9#/[ #/)1
?7: %2 (%'"#)"9 &)1#7"(01 37: -0 3)1607&)"9* T/0$02%$0[ ?0 3%&)2: 16)9/#6: #/0 &02)")#)%"
%2 (%%$&)"7#0 =L)N( #% -0 7 2$0G'0"(: %2 #/0 F V30$ &02)"0& -: 7 F V10G'0"(0 )" ) 71 7 ?7: #%
$03%C0 2$%3 #/0 3%&06 7": 10G'0"(0 60"9#/ $067#0& -)71*
=")#($%(('$$0"(01 %2 F V30$ ( )" )
#%#76 F V30$1 )" ) _F`
!# )1 @"%?" #/7# F V30$ %(('$$0"(0 )1 C7$)7-60 -0#?00" 90"%301 2$%3 &)220$0"# 140()01[ 7"&
#/7# 90"0$766: F V30$1 &% "%# /7C0 #/0 1730 (/7"(0 #% 74407$ )" 7 -)%6%9)(76 10G'0"(0* S%$
#/)1 $071%"[ 7"%#/0$ &02)")#)%" %2 &)1#7"(0 #/7# #7@01 )"#% 7((%'"# #/0 1#7"&7$& &0C)7#)%" %2
F V30$ %(('$$0"(0 )1 3%$0 1')#7-60 )" %'$ (710
!F")E[ )F#$%($E
RF
! =")E#(& =")F#('( #
F
_+`
?)#/ 7 &)79%"76 (%C7$)7"(0 37#$)=
'($ E
0 &E%$$E
0
! =")$#(&((#F
_R`
#7@)"9 B 71 #/0 "'3-0$ %2 #%#76 10G'0"(01 )" %'$ &7#710#[ 7"&
FD
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
32/63
8+2$*()*% !9 :%+;$
(($ E 0 %$$E
0
=")$#( _;`
T/0 2%$3'67 )" _+` )1 @"%?" 71 #/0 :2A232$%B() '()72$*" _57/767"%-)1 NL[ ED+I` #/7#?)#/ 7 &)79%"76 (%C7$)7"(0 37#$)= )1 @"%?" 71 $%+.23(U"' R5*3('"2$ '()72$*" _c' Tb[
EDDA`* T/)1 )1 #/0 &)1#7"(0 &02)")#)%" #/7# #/0 1%2#?7$0 &0C06%40& &'$)"9 #/)1 4$%O0(# 7&%4#1*
!" %'$ 10##)"9[ 10G'0"(0 &)1#7"(0 )1 %"6: (%34'#0& 2%$ #/0 %C0$674 -0#?00" 10G'0"(01 ?/0"
#/0 %2210# -0#?00" #/03 )1 @"%?"*
,1 7 2)"76 $037$@ )" (%"#)9 711)9"30"#1[ 2%$ #/0 0=#$030 (710 ?/0$0 &)1#7"(01 -0#?00" #?% %$
3%$0 (%"#)91 #% 7 10G'0"(0 7$0 )&0"#)(76[ 7 2)"76 &0()1)%" )1 #7@0" -: 60=)(%9$74/)(76 %$&0$ #%
711)9" #/0 10G'0"(0 #% 7 (%"#)9 )" %$&0$ #% 0"1'$0 7 &0#0$3)")1#)( -0/7C)%'$ %2 #/0 769%$)#/3 #%
$07(/ $04$%&'()-60 %'#4'#1*
I..'., /%+%1+2'*
P"(0 07(/ 10G'0"(0 )1 711)9"0& #% 7 (%"#)9[ $037)")"9 (%"#)91 ?)#/ ^0$% 711%()7#0& 10G'0"(01
7$0 $03%C0& 2$%3 303%$:* ,61%[ (%"#)91 ?)#/ 2'9)#)C0 C%#01 (7" -0 )"140(#0& #% 97)"
@"%?60&90 7-%'# ?/)(/ 10G'0"(01 $0(0)C0& #/%10 C%#01 #% 100 )2 #/%10 (%"#)91 (%'6& -0
30$90&* !"#09$)#: %2 (%"#)91 )1 #/0" 7110110& #% $03%C0 -7& 10G'0"(01K 2%$ 07(/ 10G'0"(0 )" 7(%"#)9[ #/0 90"0$7#0& C%#01 1/%'6& -0 )" )"($071)"9 %$&0$*
,1 3)($%>B, /7C0 10G'0"(01 $067#)C06: C7$)7"# ]$0(766 ()%.(/) ][ 1%30 C7$)7-)6)#: )"1)&0 7
(%"#)9 3'1# -0 766%?0&* T% 6)3)# #/0 0=#0"# %2 C7$)7-)6)#: )" 7 (%"#)9[ 7" %'#6)0$ &0#0(#)%"
30#/%& )1 )346030"#0& '1)"9 #/0 &02)")#)%" %2 10G'0"(0 &)1#7"(0 7""%'"(0& 7-%C0*
S%$ 07(/ 10G'0"(0 )" 7 (%"#)9[ 7 4%)"# ?)#/ R F (%%$&)"7#01 ]07(/ $04$010"#)"9 7 F V30$ 7"&
2)660& ?)#/ #/0 2$0G'0"(: %2 F V30$ %(('$$0"(0 )" #/0 10G'0"(0] )1 &02)"0& _@ n R -: &027'6#`*
T/0"[ #/0 &)1#7"(0 2$%3 #/0 4%)"# $04$010"#)"9 #/0 (%"#)9 10G'0"(0 #% 07(/ %2 #/0 10G'0"(0
4%)"#1 )1 (%34'#0&* L76('67#)"9 )"#0$G'7$#)60 $7"90 _!U>` 2%$ #/0 &)1#7"(01[ %'#6)0$ (7"&)&7#01
7$0 2%'"& 7-%C0 U+ Y E*;_!U>`* T/010 (7"&)&7#01 7$0 #/0" )1%67#0& 2$%3 #/0 37)" 10#[ #/0
(0"#$%)& %2 #/0 $037)")"9 4%)"#1 )1 2%'"& 7"& #/0 1#7"&7$& &0C)7#)%" %2 &)1#7"(01 2$%3 #/0
$037)")"9 4%)"#1 #% #/0 (0"#$%)& )1 (%34'#0&* !2 &)1#7"(0 2$%3 07(/ %2 #/0 (7"&)&7#0 4%)"#1 #%
#/0 (0"#$%)& )1 9$07#0$ #/7" #/0 1#7"&7$& &0C)7#)%"[ #/0 (7"&)&7#0 )1 &02)")#06: $03%C0& 2$%3
#/0 (%"#)9*
+M
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
33/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
=%"142*9 " 1'*,%*,0, ,%B0%*1%
c)#/)" ('$7#0& (%"#)91[ 10G'0"(01 C%#)"9 2%$ #/03 (7" 1#7$# -02%$0 #/0 (%"#)9 10G'0"(0 %$ 0"&
72#0$ )#* H% ?0 "00& 7 $04$010"#7#)C0 (%"10"1'1 10G'0"(0 #% $04$010"# #/7# 9$%'4 4$%-7-6:-)990$ #/7" #/0 (%"#)9 10G'0"(0 )#1062*
T% #/)1 0"&[ 7" %2210# 2%$ 07(/ 10G'0"(0 -710& %" #/0)$ C%#01 2%$ #/0 (%"#)9 )1 (%34'#0& #%
76)9" 47)$0& "'(60%#)&01* #/0"[ 2%$ 07(/ %C0$6744)"9 -710 7($%11 #/0 3'6#)460 10G'0"(01 )"1)&0
#/0 9$%'4[ #/0 2$0G'0"(: %2 %(('$$0"(0 2%$ 07(/ 2%'"& -710 )1 (76('67#0&[ 7"& #/0 3%1#
$04$010"#7#)C0 -710 )1 @004 71 #/0 (%"10"1'1 %"0 2%$ #/7# 4%1)#)%"*
-/%*+2(:2*9 "$.%"/: "**'+"+%/ 52=@3,1 7 2)"76 7"& %4#)%"76 1#04[ 9$%'4 (%"10"1'1 10G'0"(01 7$0 76)9"0& #% 7 &7#7-710 %2 7""%#7#0&
3)>B, &%?"6%7&0& 2$%3 3)>8710* T/)1 76)9"30"# (7" -0 '10& #% )&0"#)2: (%"#)91 #% 76$07&:
&01($)-0& 3)>B,[ '"C0)6 &)220$0"# (%"#)91 -0)"9 #/0 1730 3)>B, _-0(7'10 %2 )1%3)>1 %$
'"7C7)67-)6)#: %2 '"73-)9'%'1 (%"#)91 )" #/0 &0 8$')O" 9$74/` 7"& #% &0#0(# 4%11)-60
10G'0"(01 4$010"# )" #0 &7#710# #/7# (%'6& -0 3)>B, "%# :0# 4$010"# )" #/0 ('$7#0& &7#7-710*
S%$ #/)1 1#04[ 8
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
34/63
8+2$*()*% !9 :%+;$
+F
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
35/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
=%,0$+,
g1)"9 #/0 &01)9"0& 1%2#?7$0[ 2)C0 13766 >B,V10G 6)-$7$)01 %2 "%$376 (%6%")( 3'(%17 2$%347#)0"#1 ?)#/ (%6%$0(#76 (7"(0$ 2$%3 #/0 LP(%.2+F"+) 2$'
G5)*"=7(B(3(7, H$(7 7# I27232$ @$)7(757" %- J$*%3%6, ?0$0 4$%(0110&*
,1 1'337$)^0& %" #7-60 E[ 6)-$7$: &04#/1 ?0$0 C7$)7-60 )" #/0 $7"90 %2 E #% +*; 3)66)%" $07&1
744$%=)37#06:* 802%$0 10G'0"(0 (%667410[ 10G'0"(0 &7#7 ?71 $0&'(0& #% 7" 744$%=)37#0 EMq
%2 %$)9)"76 )"4'#* ,((%'"#)"9 10G'0"(01 ?)#/ 3%$0 #/7" ; %(('$$0"(01[ #/0 "'3-0$1 ?0$0
2'$#/0$ &0($0710& #% 7$%'"& 7 ;q %2 '")G'0 10G'0"(01[ 1/%?)"9 #/7# #/0 C71# 37O%$)#: %2
10G'0"(01 /7& C0$: 20? $07&1*
T/010 6%?V(%'"# 10G'0"(01 (%'6& -0 4$010"# 71 (%"#73)"7"#1 %$ &09$7&7#)%" 4$%&'(#1 (%3)"9
2$%3 0=)1#0"# #%#76 >B, )" #/0 1734601 #/7# ?0$0 (74#'$0& ?/)60 2)6#0$)"9 2%$ 10G'0"(0 60"9#/[
#/%'9/ 1%30 0$$%$1 )"#$%&'(0& -: #/0 NL> 7346)2)(7#)%" (7" 761% -0 $014%"1)-60 2%$ #/010
$07&1*
0123& ( !")*+(=7(&" %- )2.=3") =+%*"))($6 %57=57
C"56$% !'+"$ .%"/, M*2B0%
,%B0%*1%,
C%B0%*1%, >2+4
N O '110..%*1%,
7 '/%, 2* /%
8.02G* 9."64
-*2+2"$ 1'*+29, J2*"$ 9.'06,
!"##$%& !"!##"$%& !!$"&'' $(")%% $!"(%* !"(*# $"+&+
!"#"'%& %"&'#"* !''"*#) $("!'+ $!"$)! !"%$$ $")*'
!"#(#%& $"%(#"%(% $!'"'*$ )"*%# )"$*+ $"%+' +#)
!"#')%& $"!+*")%' $!!"+)) +"+*+ +"'!' $")+& &(%
!"#*+%& $"#!$"$%& $#!"*!) &"$') +"#') $"((+ &()
T/0 "'3-0$ %2 $07&1 40$ 2)"76 (%"#)9 1/%? 7" 0=4%"0"#)76 &)1#$)-'#)%"[ 71 100" )" 2)9'$0 D*
S0? (%"#)91 90# 3%1# %2 $07&1[ ?/)60 7 6%"9 #7)6 %2 (%"#)91 %"6: 90# 7 20? $07&1* T/)1 3071'$09)C01 C76'0 2%$ (%"#)9 7-'"&7"(0 7"& #/)1 )1 (%"1)1#0"# ?)#/ 0=)1#0"# 90"0 0=4$011)%" 7"76:1)1[
?/0$0 6%9V#$7"12%$37#)%" %2 0=4$011)%" C76'01 )1 90"0$766: 7((04#0& 71 7 "%$376)^7#)%" 1#04
-02%$0 ?%$@)"9 ?)#/ 90"0$7#0& &7#7 )" 90"0 0=4$011)%" 0=40$)30"#1*
Q=4%"0"#)76 &)1#$)-'#)%" )1 761% 100" )" #/0 "'3-0$ %2 10G'0"(01 )" 07(/ (%"#)9 _S)9* EM`* c0
1/%'6& $0(766 #/7# 7 10G'0"(0 (7" -0 $04$010"#7#)C0 %2 %"0 %$ 3%$0 $07&1[ 1% )#1 "'3-0$ )1 "%#
"0(0117$)6: $067#0& ?)#/ (%"#)9 7-'"&7"(0[ -'# )# 9)C01 7 40$140(#)C0 %" &)C0$1)#: %2 3%60('601*
++
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
36/63
8+2$*()*% !9 :%+;$
!"#$%& 4 V()7%6+2.) %- +"2') ="+ *%$7(6 -%+ "2*A 2$23,U"' )2.=3"9
+R
!"#$%& (5 >%K =3%7 %- )"45"$*") ="+ *%$7(6
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
37/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
T7-60 F &01($)-01 #/0 "'3-0$ %2 37#(/01 -02%$0 76)9")"9 (%"10"1'1 10G'0"(01 2$%3 2)"76
(%"#)91 797)"1# 3)>8710 ?)#/ 8
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
38/63
8+2$*()*% !9 :%+;$
+I
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
39/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
)'*1$0,2'*, "*/ (2*"$ .%5".?,
c0 (7" (%"(6'&0 #/7# #/0 1%2#?7$0 )&0"#)2)01 3)($%>B,* ,3%"9 (%"#)91 "%# 37#(/)"9 #/03)>8710[ ?0 (7"Z# 0"1'$0 #/0: 7$0 3)>B, -02%$0 7 -)%6%9)(76 C76)&7#)%" )" #/0 67-* T/)1
10G'0"(01 (7" -0 2$')# %2 47$#)76 &09$7&7#)%" %2 6%"90$ >B, 10G'0"(01 4$010"# )" #/0 (066 %$
#$'0 13766 "%"V(%&)"9 >B,* L%"#)91 ?)#/ /)9/ C%#0 7"& $07& "'3-0$1 (%'6& /064 #% )&0"#)2:
$060C7"# 7-'"&7"# 10G'0"(01 #/7# (7" -0 67#0$ #01#0& )" #/0 ?0# 67-*
T/)1 1%2#?7$0 &01)9"[ )" %$&0$ #% 2)"& )1%3)>1[ #$)01 #% 1047$7#0 1)3)67$ 10G'0"(0 9$%'41 #/7#
&)220$ 2$%3 #/0 (%"#)9 )")#)76 10G'0"(0 )" #/0)$ %C0$6744)"9 $09)%"1* T/)1[ )" 27(#[ (7" $01'6# )"
3'6#)460 (%"#)91 -0)"9 #/0 1730 3)($%>B,* , 2'$#/0$ 1#04 #$:)"9 #% 30$90 #/%10 (%"#)91 )"#%-)990$ %"01 (%'6& -0 )346030"#0& 71 7" )34$%C030"# #% #/0 ('$$0"# 4$%9$73*
J'$)"9 #/0 &01)9" 4/710 %2 #/)1 4$%O0(#[ N:#/%" 67"9'790 ?71 1060(#0& 2%$ )#1 022)()0"# '10 %2
&)(#)%"7$)01 )"&0=01 7"& )#1 %-O0(# 3%&06 #/7# $0&'(01 >,5 '1790 -: '1)"9 4%)"#0$1 #% %-O0(#1
)"1#07& %2 37)"6: (%4:)"9 &7#7* \%?0C0$[ #/)1 -0/7C)%$ 37@01 N:#/%" 7 7A+"2'
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
40/63
8+2$*()*% !9 :%+;$
+X
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
41/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
=%(%.%*1%, "*/ 82#$2'9."64:
8'$$%?1[ 5[ c/0060$[ Jb _EDDR`* r, -6%(@ 1%$#)"9 6%116011 &7#7 (%34$011)%" 769%$)#/3s[T0(/")(76 >04%$# EFR[ J)9)#76 QG')430"# L%$4%$7#)%"
L/73-0$1[ J< _EDD;`* !01\ 7A" '%5B3" A"3(K\ ="+)="*7(&" 2$' =+%)="*7(&" 27 -%+7, ,"2+)*B0? k%$@[ B*kK B0? k%$@ ,(7&03: %2 H()0"(01[ 4* RD*
J7/3[ > _FMMX`* rJ)1(%C0$)"9 JB,K S$)0&$)(/ 5)01(/0$ 7"& #/0 07$6: :07$1 %2 "'(60)( 7()&$0107$(/s* V5.2$ O"$"7(*) EFF _I`K ;I;tXE*
Q1#0660$[ 5 _FMEE`* rB%"V(%&)"9 >B,1 )" /'37" &)10710s* 0275+" /"&("E)* a0"0#)(1[ EF_EF`[XIEtAR*
S7$)"066)[ L _FMMX`* I%.=3"K(7, ."2)5+") 2$' )(.(32+(7, ."7+(*)\ =+%="+7(") 2$' 2==3(*27(%$)7% B(%3%6(*23 )(6$23)* N/J #/01)1 _,637 57#0$ H#'&)%$'3 V g")C0$1)#u &) 8%6%9"7[8%6%9"7[ !#76:` 44* EEDtEER*
a0"#6037"[ bS[ 5'66)"[ >L _EDXD`* T/0 &)1#$)-'#)%" %2 #/0 2$0G'0"(: %2 %(('$$0"(0 %2"'(60%#)&0 1'-10G'0"(01[ -710& %" #/0)$ %C0$674 (747-)6)#:* >(%."7+(*)[ R;_E`[ +;t;F*
a$)22)#/1Vb%"01[ H[ a$%(%(@[ >b[ C7" J%"90"[ H[ 87#037"[ ,[ Q"$)9/#[ ,b _FMMI`* r3)>8710K
3)($%>B, 10G'0"(01[ #7$90#1 7"& 90"0 "%30"(67#'$0s* 05*3"(* 1*(') /")"2+*A[+R_1'446 E`[ JERMtJERR*
\0$1/0:[ ,J[ L/710[ 5 _ED;F`* r!"&040"&0"# 2'"(#)%"1 %2 C)$76 4$%#0)" 7"& "'(60)( 7()& )"9$%?#/ %2 -7(#0$)%4/790s* ] O"$ CA,)(%3* +IK+DV;I*
j0-1(/'66[ b5[ v7&%$[ ,5 _FME;`* rH%'$(01 %2 NL>V)"&'(0& &)1#%$#)%"1 )" /)9/V#/$%'9/4'#10G'0"()"9 &7#7 10#1s* 05*3"(* 1*(') /")"2+*A[ 9@CAEA*
1]#/0 %C0$6%%@0& $040$#%)$0 )" #/0&:"73)( 3)($%>B,%30s* M+"$') ($ O"$"7(*)K M@O[ FX_EE`[ ;RRtD*
H7)@) >j[ a0627"& J\[ H#%2206 H[ H(/7$2 Hb[ \)9'(/) >[ \%$" aT[ 5'66)1 j8 0# 76* _EDXX`
+D
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
42/63
8+2$*()*% !9 :%+;$
rN$)30$V&)$0(#0& 0"^:37#)( 7346)2)(7#)%" %2 JB, ?)#/ 7 #/0$3%1#7-60 JB,4%6:30$710s* G*("$*" F+DK RXAtRDE
H7"90$ S[ B)(@60" H[ L%'61%" ,> _J0(03-0$ EDAA`* wJB, 10G'0"()"9 ?)#/ (/7)"V#0$3)"7#)"9 )"/)-)#%$1w* C+%*9 02739 1*2'9 G*(* g*H*,* AR _EF`K ;RI+tA*
5y@)"0"[ W[ 8067^^%'9')[ J[ L'")76[ S[ T%301(' ,! _FME;`* O"$%."(%3%6(*23 G"45"$*" 1$23,)() ($ 7A" R+2 %- V(6A
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
43/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
-5"9% 1.%/2+,
S)9'$0 E* JB,* R$*,*3%="'(2 >+(72$$(*29
/##4Khh???*-$)#7"")(7*(%3hQ8(/0(@0&h#%4)(hEIAMI+hJB,
S)9'$0 F* >B,* R$63()A ^(F(="'(29
/##4Khh0"*?)@)40&)7*%$9h?)@)h>B,
S)9'$0 +* 50110"90$ >B,* R$63()A ^(F(="'(2*
/##4Khh0"*?)@)40&)7*%$9h?)@)h50110"90$m>B,
S)9'$0 R* b0C1)"0@ H@%@[ J*[ a%&")([ !*[ v%$([ 5*[ \%$C7#[ H*[ J%C([ N*[ j%C7([ 5* 7"& j'"0O[ T*
_FME+`[ a0"%30V?)&0 )" 1)6)(% 1($00")"9 2%$ 3)($%>B, 90"0#)( C7$)7-)6)#: )" 6)C01#%(@ 140()01*
1$(.23 O"$"7(*)[ RRK IIDtIAA*
/##4Khh%"6)"06)-$7$:*?)60:*(%3h&%)hEM*EEEEh790*EFMAFh7-1#$7(#
S)9'$0 ;* PZL7$$%66 J*[ H(/7020$[ ,* _FMEF`[ a0"0$76 N$)"()4761 %2 3)>B, 8)%90"01)1 7"& >09'67#)%"
)" #/0 8$7)"* 0"5+%=),*A%=A2+.2*%3%6, /"&("E)P +XK +Dt;R*
/##4Khh???*"7#'$0*(%3h"44hO%'$"76hC+Xh"Eh2'66h"44FMEFXA7*/#36
S)9'$01 I* 5)($%>B,* R$63()A (̂F(="'(29
/##41Khh0"*?)@)40&)7*%$9h?)@)h5)($%>B,
S)9'$01 A[ X[ D 7"& EM ?0$0 ($07#0& 2%$ #/)1 4$%O0(#*
RE
http://www.britannica.com/EBchecked/topic/167063/DNAhttp://en.wikipedia.org/wiki/RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://onlinelibrary.wiley.com/doi/10.1111/age.12072/abstracthttp://www.nature.com/npp/journal/v38/n1/full/npp201287a.htmlhttps://en.wikipedia.org/wiki/MicroRNAhttp://www.britannica.com/EBchecked/topic/167063/DNAhttp://en.wikipedia.org/wiki/RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://onlinelibrary.wiley.com/doi/10.1111/age.12072/abstracthttp://www.nature.com/npp/journal/v38/n1/full/npp201287a.htmlhttps://en.wikipedia.org/wiki/MicroRNA
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
44/63
8+2$*()*% !9 :%+;$
RF
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
45/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
3**%< 3E 8",4 ,1.26+ ('. J3C!K 6.%6.'1%,,2*9
!"#$%$'()
!! +(',-. /#01 2%3-/4(2'(5' 2%3-/67'3
!! 89:%; =?4-(( @%35-=!! ;%(4'=A( $'A 67'3%5B =-'A((?3%AC>=->2%35D/>3 E2 0F E, 0D G E? 0HFI/4(2'(5'JC&?K%( G EL EM &? E> &? EB &? E& B-(
!! N75'A'>5 LF/1 A?O&3?'A-A 2=?K !! )55>(.##475'A'>5/=-'A5)-A?4(/?=,#-(5'$3-#!! P-K?L-( 'A'>5?= 2=?K =-'A(!! QRR1D1F1R1RFRFFDRFDQ %( 5)- 'A'>5?= 2?= ?7= 3%$='=%-(S 5)%( L'37- %( L'=%'$3-475'A'>5 E4 E- 1/F ET EK FU EV R1 E' RR1D1F1R1RFRFFDRFD G E? 0HFI'J6 HFI/4(2'(5'JC&?K%(CWC@R/4(2'(5' HFI/4(2'(5'JC&?K%(CWC@R/67'3
R+
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
46/63
8+2$*()*% !9 :%+;$
RR
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
47/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
3**%< 8E F:+4'* 1'/% ('. +4% 6.'G%1+
7"2* ,'0.1%E *6%+839 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! K%PXY3%,&-= E K%4=?PXY '3,%&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V'%& (?7=4-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. ?=5 (B(%K>?=5 ?(/>'5)2=?K (-6( %K>?=5 N?33'>(-
A-2 )-3>ab. ccc =%&5 __ 2S cc >=%&5 __ 2S cG5`h.G5G5`EK-= (%T- 2?= A- i=7%j& ,='>) 4?&(5=745%?&/c >=%&5 __ 2S cG5`W.G5V%&%K7K 4?7&5 2?= `EK-= %&437(%?& %& A- i=7%j& ,='>)/c >=%&5 __ 2S cG5`;.G5G5`EK-= (%T- 2?= 4?&5%, %&A-M 7(-A %& L?5'5%?&/c >=%&5 __ 2S cG5&W.G5W)=-()?3A 2?= (-67-&4- ?447=-&4- 5? $- 4?&(%A-=-A/c =-57=&
A-2 K'%&ab.
ccc V'%& $?AB ?2 5)- '>>3%4'5%?&/ ccc
,3?$'3 4?33-45%?& ! @?= A-$7,,%&, >7=>?(-(
%2 &?5 a3-&a(B(/'=,Lb dd U ?= 3-&a(B(/'=,Lb dd kb. )-3>ab =-57=&
`h d %&5a(B(/'=,LfFgb ! ` 2?= A- i=7%j& ,='>) `W d %&5a(B(/'=,LfDgb ! W)=-()?3A 2?= `EK-= 4?L-=',- %& ;ih `; d %&5a(B(/'=,LfRgb ! ` 2?= 4?&5%, L?5'5%?& &W d %&5a(B(/'=,Lflgb ! W)=-()?3A 2?= (-67-&4- ?447=-&4- 2&'K- d (B(/'=,Lfmg ! @%3- &'K- 5? =-'A %&>75 2=?K ?752%3- d (B(/(5A?75
%2 3-&a(B(/'=,Lb dd k.
%2 ?(/>'5)/%(2%3-a(B(/'=,LfUgb. >=%&5 __ (B(/(5A-==S cnPP9P. 2%3-&'K- c e (B(/'=,LfUg e c '3=-'AB -M%(5(/c )-3>ab =-57=& ?752%3- d ?>-&a(B(/'=,LfUgS cOecb ! @%3- 5? O=%5- 5)- ?75>75 %2 &?5 ?(/>'5)/%(2%3-a2&'K-b. >=%&5 __ (B(/(5A-==S cc >=%&5 __ (B(/(5A-==S cnPP9P. 2%3-&'K- c e 2&'K- e c A?-( &?5 -M%(5/c )-3>ab =-57=&
4?33-45%?& d N?33'>(-a2&'K-S `hS `WS `;S &Wb >=%&5 __ ?752%3-S 4?33-45%?&/4?&5%,( %2 ?752%3- "d (B(/(5A?75. ?752%3-/43?(-ab =-57=&
R;
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
48/63
8+2$*()*% !9 :%+;$
%2 CC&'K-CC dd cCCK'%&CCc. K'%&ab
RI
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
49/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
7'/0$% .&:.839 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! K%PXY3%,&-= E K%4=?PXY '3%,&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V?A73- (-6(!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. N?33-45%?& ?2 8-67-&4-( O%5) A%22-=-&5 P-'A(!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2=?K >?33 %K>?=5
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
50/63
8+2$*()*% !9 :%+;$
A-2 C4?K>75-p7'3(a(-32b. ccc +>A'5- 4?&(-&(7( 67'3%5%-( 2?= 8-67-&4-(/ ccc
2?= % %& ='&,-aFS (-32/K'M:-& e Fb. (-32/$'(-p7'3f%g d H Q6Q. 1S Q&Q. 1 J
2?= % %& (-32/(-67-&4-/L'37-(ab. 4 d F 2?= j %& %/,-5p7'3ab. (-32/$'(-p7'3f4gfQ6Qg ed j (-32/$'(-p7'3f4gfQ&Qg ed F 4 ed F 2?= % %& ='&,-aFS (-32/K'M:-& e Fb. (-32/'L,p7'3/'>>-&Aa(-32/$'(-p7'3f%gfQ6Qg # (-32/$'(-p7'3f%gfQ&Qgb =-57=&
A-2 C>=%&587KK'=Ba(-32b. ccc r=%5-( 5)- (7KK'=B ?2 5)- =-'A(/ ccc
>=%&5 __ (B(/(5A-==S cW)-=- '=- c e (5=a(-32/&7KP-'A(bS >=%&5 __ (B(/(5A-==S c=-'A( 4?33'>(%&, %&5?cS >=%&5 __ (B(/(5A-==S (5=a(-32/7&%P-'A(b e c 7&%67- =-'A(/c
=-57=&
43'(( 8-67-&4-. ccc h=?7> ?2 7&%67- 4?3?=(>'4- (-67-&4-(/ ccc
A-2 CC%&%5CCa(-32S %S 4S 6S 3b. ccc N=-'5-( ' &-O (-67-&4- 2=?K ' =-'A/ ccc
(-32/C6 d fg ! N7K73'5%L- >-=E$'(- 67'3%5B 2?= j %& 6. (-32/C6/'>>-&Aa?=Aajbb (-32/C6K d (-32/C6 ! YL-=',- >-=E$'(- 67'3%5B (-32/C6KCL'3%A d W=7- ! s'3%A%5B 23', 2?= 5)- 6K 'L-=',- (-32/C'6 d 1 ! YL-=',- (-67-&4- 67'3%5B
(-32/C'6CL'3%A d @'3(- ! s'3%A%5B 23', 2?= 5)- '6 'L-=',- (-32/3 d 3 ! :-&,)5 ?2 5)- $'(-(>'4- (-67-&4- (-32/4 d 4 ! N?3?=(>'4- (-67-&4- (-32/$ d (-32/C4(D$(ab! i'(-(>'4- (-67-&4- (-32/% d f%g ! :%(5 ?2 =-'A %A-&5%2%-=( O%5) 5)%( (-67-&4- (-32/& d F ! X7K$-= ?2 =-'A( O%5) 5)%( (-67-&4- (-32/$=7%j& d HJ ! ;- i=7%j& ,='>) (-32/4?&5%,( d (-5ab ! N?&5%,( 5? O)%4) (-6 )'( L?5-A (-32/>?33 d X?&- =-57=&
A-2 CC35CCa(-32S (-6b.
=-57=& (-32/3 ] (-6/3
A-2 CC3-CCa(-32S (-6b.
=-57=& (-32/3 ]d (-6/3
A-2 CC,5CCa(-32S (-6b.
=-57=& (-32/3 _ (-6/3
A-2 CC,-CCa(-32S (-6b.
=-57=& (-32/3 _d (-6/3
A-2 C4(D$(a(-32b.
RX
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
51/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
ccc h%L-( $'(-(>'4- (-67-&4- 2=?K ' 4?3?=(>'4- %&>75/ ccc
A d H QWQ. cWhNYcS QYQ. cYNhWcS QNQ. cNYWhcS QhQ. chWYNc J ! W='&(%5%?& 5'$ =-5 d fg 4 d (-32/4f1g ! 47==-&5 4?3?=(>'4- 4)'='45-= 5? 5=-'5
2?= % %& ='&,-aFS (-32/3 e Fb. ! 8`%> 5)- 2%=(5 $'(- a>=%K-=b 4 d Af4gf%&5a(-32/4f%gbg ! i'(- ?$5-&5%?& 2=?K 5='&(%5%?& 5'$3- =-5/'>>-&Aa4b =-57=& cc/j?%&a=-5b
A-2 C'AAP-'Aa(-32S %S 4S 6b. ccc YAA( =-'A O%5) %A-&5%2%-= %A 5? 5)- 8-67-&4-/ ccc
%2 (-32/4 "d 4. =-57=& @'3(- ! P-'A A?-( &?5 $-3?&, 5? 5)%( (-67-&4- (-32/C6KCL'3%A d @'3(- ! 6K %( 4'3473'5-A ?&3B ?& A-K'&A (-32/C'6CL'3%A d @'3(- ! '6 %( 4'3473'5-A ?&3B ?& A-K'&A (-32/%/'>>-&Aa%b ! N?7&5 =-'A '( ' &-O (-67-&4- %5-K (-32/& ed F ! +>A'5- 4?7&5( 2?= 5)- (-67-&4- & d 1 2?= j %& 6. ! 8'L- 5)- 67'3%5B 2?= -'4) 5='&(%5%?&
(-32/C6f&g ed ?=Aajb & ed F =-57=& W=7-
A-2 ,-5p7'3a(-32b. ccc h-5( 'L-=',- 67'3%5B 2?= -'4) (-67-&4-A 5='&(%5%?&/ ccc
%2 &?5 (-32/C6KCL'3%A. ! +>A'5- K-'& 67'3%5B L'37- (-32/C6K d fg 2?= % %& (-32/C6. (-32/C6K/'>>-&Aa%#(-32/&b ! YL-=',- 67'3 2?= 5)%( $'(- (-32/C6KCL'3%A d W=7- ! 8-5( L'3%A%5B 23', (-32/C'6CL'3%A d @'3(- %2 &?5 (-32/C'6CL'3%A. (-32/,-5YL,p7'3ab
=-57=& (-32/C6K
A-2 ,-5YL,p7'3a(-32b. ccc h-5( 'L-=',- 67'3%5B 2?= 5)- (-67-&4-/ ccc
%2 &?5 (-32/C6KCL'3%A. (-32/,-5p7'3ab %2 &?5 (-32/C'6CL'3%A. (-32/C'6 d 1 2?= % %& (-32/C6K. (-32/C'6 ed % (-32/C'6 d (-32/C'6 # 3-&a(-32/C6Kb (-32/C'6CL'3%A d W=7- =-57=& (-32/C'6
A-2 `K-=(a(-32S `b. ccc i7%3A( 3%(5 ?2 `EK-=( 2?= 5)- =-'A(/ ccc
= d fg 2?= % %& ='&,-a1S (-32/3 E ` e Fb. =/'>>-&Aaf (-32/$f%.% e `gS % gb =-57=& =
A-2 ,-5tK@=-6(a(-32S `S % d 1S 2 d X?&-b. ccc h-5( 2=-67-&4%-( ?2 `EK-=( %& (-67-&4-/ ccc
%2 2 %( X?&-. 2 d (-32/3 E F A d HJ 2?= `KS >( %& (-32/`K-=(a`b.
RD
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
52/63
8+2$*()*% !9 :%+;$
%2 >( ] % ?= >( _ 2 E ` e F. 4?&5%&7- %2 &?5 `K %& A. Af`Kg d 1 Af`Kg ed F
5` d 2 E % E ` e D 2?= `K %& A. Af`Kg #d 23?'5a5`b =-57=& A
A-2 A-4%A-N?&5%,a(-32b. ccc ;-4%A- 5? O)%4) 4?&5%, A? O- '((%,& 5)- (-67-&4-/ t-->( &-'=-(5 K?(5 L?5-A 4?&5%,/ ccc
(3 d (?=5-Aa(-32/4?&5%,(S `-B d 3'K$A' 4. a3-&a4/Lf(-32gbS E(-32/A%(5'&4-a4bS 4/(bb 2?= 4 %& (3f.EFg. 4/Af(-32g d (3fEFg 4/& Ed (-32/&u3-&a4/Lf(-32gb A-3 4/Lf(-32g
(-32/4?&5%,(/=-K?L-a4b =-57=&
A-2 A%(5'&4-a(-32S ?5)-=b. ccc N?K>75-( V')'3'&?$%( A%(5'&4- $-5O--& 5O? $'(-(>'4- (5=%&,(/ ccc
%2 (-32/>?33 dd X?&-. =-57=& @'3(- %2 &?5 %(%&(5'&4-a?5)-=S A%45b. (%S ?%S (2S ?2 d 1S 1S (-32/3 E FS ?5)-=/3 E F %2 ?5)-= %& (-32/4?&5%,(. ? d ?5)-=/Lf(-32gf1gf1g E ?5)-=/Lf(-32gf1gfFg %2 ? _ 1. ?% d ? %2 ? ] 1.
(% d E? ?2 d K%&a?2S ? e (-32/3 E Fb (2 d K%&a(2S ?5)-=/3 EF E ?b 2F d (-32/,-5tK@=-6(a(-32/>?33/`LS (%S (2b 2D d ?5)-=/,-5tK@=-6(a(-32/>?33/`LS ?%S ?2b -3(-. 2F d (-32/,-5tK@=-6(a(-32/>?33/`Lb 2D d ?5)-= `K( d (-5a2F/`-B(abb Z (-5a2D/`-B(abb 2?= `K %& fM 2?= M %& 2F/`-B(ab %2 M &?5 %& 2Dg. 2Df`Kg d 1/1 2?= `K %& fM 2?= M %& 2D/`-B(ab %2 M &?5 %& 2Fg. 2Ff`Kg d 1/1 =-57=& (6=5a(7Kafaa2Ff`Kg E 2Df`Kgb#(-32/>?33/C`4f`KgbuuD 2?= `K %& `K(gbb
;M
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
53/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
7'/0$% 0ᛏ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! K%PXY3%,&-= E K%4=?PXY '3%,&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V?A73- A$,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. ;- i=7%j& h='>) %K>3-K-&5'5%?&/ ;-2%&%5%?& ?2 Y((-K$3B '&A!!! N?&5%, 43'((-(!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2=?K K'5) %K>?=5 (6=5%K>?=5 (B(
A-2 `K-=(a(-6S `b. ccc v%-3A( ̀ EK-=( $-3?&,%&, 5? (-6/ ccc
2?= % %& M='&,-a3-&a(-6b E ` e Fb. B%-3A (-6f%.% e `g
A-2 C2Oa`Kb. ccc v%-3A( &-M5 >?((%$3- 2?=O'=A `EK-=( 2?= `K/ ccc
2?= M %& QYNhWQ. B%-3A `KfF.g e M
A-2 C$Oa`Kb. ccc v%-3A( &-M5 >?((%$3- $'4`O'=A `EK-=( 2?= `K/ ccc
2?= M %& QYNhWQ. B%-3A M e `Kf.EFg
43'(( ;$,. ccc N3'(( 2?= ;- i=7%j& h='>)/ ccc
A-2 CC%&%5CCa(-32S (-6(S `S 5)=-()?3Ab. ccc o&%5 ,='>) 2=?K (-6( O%5) `EK-=( >=-(-&5 K?=- 5)'& 5)=-()?3A 5%K-(/ ccc
>=%&5 __ (B(/(5A-==S ci7%3A%&, ;- i=7%j& h='>) 2=?K 4?33-45-A =-'A(/c (-32/h d HJ 2?= (-6 %& (-6(. 2?= ( %& (-6/$/(>3%5aQXQb. ! 8>3%5 =-'A( O%5) 7&`&?O& $'(-( 2?= `K %& `K-=(a(S `b. ! h-5 `EK-=( 2=?K -'4) (-67-&4- %2 &?5 `K %& (-32/h. (-32/hf`Kg d F ! o&%5%'3%T- &-O `EK-= -3(-. (-32/hf`Kg ed F ! YAA `EK-= 4?L-=',- 3?O4?L d fM 2?= M %& (-32/h %2 (-32/hfMg ]d 5)=-()?3Ag ! :%(5 3?O 4?L-=-A
2?= M %& 3?O4?L. A-3 (-32/hfMg ! P-K?L- 3?O 4?L-=-A `EK-=( 2=?K ,='>) >=%&5 __ (B(/(5A-==S (5=a3-&a(-32/hbb e c 5?5'3 `EK-= &?A-( %& 5)- ,='>)/c =-57=&
A-2 ,-5C4?&5%,C2Oa(-32S `Kb. ccc X'L%,'5- ,='>) 2?=O'=A( 2=?K `K O)%3- =-'4)%&, &?&E'K$%,7?7( >'5)(/ ccc
4 d f`Kg ! @%=(5 `EK-= O)%3- W=7-. %2 (7KaM %& (-32/h 2?= M %& C2Oa4fEFgbb "d F. $=-'` ! 9&- >?((%$3- >'5) ?&3B" 4'&A d fM 2?= M %& C2Oa4fEFgb %2 M %& (-32/hgf1g ! X-M5 4'&A%A'5- `EK-= %2 4'&A dd `K. $=-'` ! i=-'` 4B43-(" ! ?= Vw$%7( 4?&5%,(
%2 (7KaM %& (-32/h 2?= M %& C$Oa4'&Abb "d F.
;E
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
54/63
8+2$*()*% !9 :%+;$
$=-'` ! N'&A%A'5- ()?73A $- =-'4)-A $B 3'(5 `EK-= ?&3B" 4/'>>-&Aa4'&Ab ! Y>>-&A 4'&A%A'5- 7&'K$%,7?7( `EK-= >'5) =-57=& 4
A-2 ,-5C4?&5%,C$Oa(-32S `Kb. ccc X'L%,'5- ,='>) $'4`O'=A( 2=?K `K O)%3- =-'4)%&, &?&E'K$%,7?7( >'5)(/ ccc
4 d f`Kg O)%3- W=7-. %2 (7KaM %& (-32/h 2?= M %& C$Oa4f1gbb "d F. $=-'` 4'&A d fM 2?= M %& C$Oa4f1gb %2 M %& (-32/hgf1g %2 4'&A dd `K. $=-'` %2 (7KaM %& (-32/h 2?= M %& C2Oa4'&Abb "d F. $=-'` 4/%&(-=5a1S 4'&Ab ! o&(-=5 4'&A%A'5- '5 5)- $-,%&%&, ?2 >'5) =-57=& 4
A-2 ,-5C4?&5%,a(-32S `Kb.
ccc h-5 7&'K$%,7?7( >'5) 4?&5'%&%&, `EK-= a%2 %5 -M%(5(b/ ccc 2O d (-32/,-5C4?&5%,C2Oa`Kb ! @?=O'=A >'5) $O d (-32/,-5C4?&5%,C$Oa`Kb ! i'4`O'=A >'5) $O d $Of.EFg ! P-K?L- `K 2=?K $'4`O'=A >'5) a>=-(-&5 %& $?5)b %2 `K %& C2Oa2OfEFgb. 4 d 2O ! $O >'5) %( 2O '( O-33x -3(-. 4 d $O e 2O ! V-=,- 2O '&A $O >'5)( ! P-57=& 4?&5%,S `EK-= >'5) '&A `EK-= 4?L-=',- =-57=& (-32/4?&5%,D(5=%&,a4bS 4S f(-32/hfMg 2?= M %& 4g
A-2 4?&5%,D(5=%&,a(-32S 4b. ccc r=%5- (-67-&4- (5=%&, 2=?K `EK-= >'5)/ ccc
=-57=& 4f1g e QQ/j?%&aMfEFg 2?= M %& 4fF.gb
A-2 '33C4?&5%,(a(-32b. ccc h-5 '33 7&'K$%,7?7( >'5)( 4?&5'%&-A %& 5)- ,='>)/ ccc
A?&- d (-5ab ! 8-5 ?2 L%(%5-A `EK-=( = d fg ! :%(5 ?2 4?&5%,( 5? =-57=& 2?= M %& (-32/h. %2 M &?5 %& A?&-. (S 4S 4?L d (-32/,-5C4?&5%,aMb ! h-5 (-6S `EK-=( '&A `EK-= 4?L-=',- 2?= B %& 4. A?&-/'AAaBb ! @3', '( L%(%5-A '33 `EK-=( %& 5)- 4?&5%, =/'>>-&Aa(b =-57=& =
43'(( Y((-K$3B. ccc N?33-45%?& ?2 4?&5%,(/ ccc
A-2 CC%&%5CCa(-32S (-6(S `S 5)=-()?3Ab. ccc o&%5%'3%T- 2=?K ' 3%(5 ?2 4?&5%, (-67-&4-(/ ccc
(-32/A d ;$,a(-6(S `S 5)=-()?3Ab (-32/& d 1 (-32/4 d HJ (-32/` d ` >=%&5 __ (B(/(5A-==S cnM5='45%&, 7&%67- 7&'K$%,7?7( 4?&5%,(/c 2?= M %& (-32/A/'33C4?&5%,(ab. (-32/4fMg d N?&5%,aMS `b (-32/& ed F >=%&5 __ (B(/(5A-==S (5=a(-32/&b e c 5?5'3 4?&5%,( O-=- ,-&-='5-A/c =-57=&
;F
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
55/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
A-2 CC,-5%5-KCCa(-32S Mb. ccc h-5 4?&5%, M 2=?K '((-K$3B/ ccc
%2 M %& (-32/4. =-57=& (-32/4fMg =-57=& @'3(-
A-2 CC=->=CCa(-32b. ccc / ccc
% d F ? d cc 2?= M %& (?=5-Aaf4 2?= 4 %& (-32/4gS `-B d 3'K$A' >. aE(-32/4f>g/&S E3-&a(-32/4f>g/Lbbb. ? ed c_ 4?&5%,Cc e (5=a%b e c L?5-(. c e (5=a(-32/4fMg/&b ? ed c (-67-&4-(. c e (5=a3-&a(-32/4fMg/Lbb e c 4?7&5(. c ? ed (5=a(7Kaf(/& 2?= ( %& (-32/4fMg/Lgbb e cG&c ? ed (-32/4fMg/CC=->=CCa@'3(-b e cG&c % ed F
=-57=& ?
A-2 7>A'5-N?L(a(-32b. ccc @?=4- 7>A'5- ?2 4?&5%, 4?L-=',- L'37-(/ ccc
2?= 4 %& (-32/4/L'37-(ab. 4/7>A'5-N?Lab =-57=&
A-2 4)-4`o&5-,=%5Ba(-32b. ccc N)-4`( 4?L-=',- ?2 4?&5%,( $B (-6( %( %&4=-'(%&,/ ccc
(-32/('&%5%T-ab 2?= 4 %& (-32/4/L'37-(ab.
4/4)-4`o&5-,=%5Bab (-32/('&%5%T-ab 2?= 4 %& (-32/4/L'37-(ab. 4/=-K?L-9753%-=(ab (-32/('&%5%T-ab =-57=&
A-2 ('&%5%T-a(-32b. ccc / ccc
A d (-5af4 2?= 4 %& (-32/4 %2 3-&a(-32/4f4g/Lb dd 1gb 2?= 4 %& A. A-3 (-32/4f4g =-57=&
43'(( N?&5%,. ccc N?&5%, 2=?K ' 7&%67- 7&'K$%,7?7( >'5) %& 5)- ,=>')/ ccc
A-2 CC%&%5CCa(-32S MS `b. ccc o&%5%'3%T- 2=?K ' 4?&5%, (-67-&4-/ ccc
(-32/( d M ! N?&5%, (-67-&4- (-32/3 d 3-&aMb ! N?&5%, 3-&,5) (-32/4 d fg ! =%&5 4?&5%,b (-32/A d HJ ! ;%(4'=A-A =-'A( (-32/` d ` (-32/4?&( d cc ! N?&(-&(7( (-6
;+
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
56/63
8+2$*()*% !9 :%+;$
=-57=&
A-2 CC=->=CCa(-32S L-=$?(- d W=7-b. ccc =-(-&5'5%?& ?2 ' 4?&5%,/ ccc
?1 d '$(a(-32/?b %2 L-=$?(-. 2?= % %& M='&,-a?1b. >=%&5 QQS >=%&5 (-32/( (( d (-32/L/`-B(ab ((/(?=5a`-B d 3'K$A' (-6. aE(-6/A%(5'&4-a(-32bS 3-&a(-32/Lf(-6gbS (-6/&bS =-L-=(- d W=7-b 2?= ( %& ((. ? d (-32/Lf(gf1gf1g E (-32/Lf(gf1gfFg e ?1 2?= % %& M='&,-a?b. >=%&5 QQS >=%&5 (/$S >=%&5 caMc e (5=a(/&b e cScS
>=%&5 cLc e (5=a3-&a(-32/Lf(gbb e cScS >=%&5 cAc e (5=a(/A%(5'&4-a(-32bb e cbc =-57=& (-32/4?&(
A-2 CC3-&CCa(-32b. ccc P-57=& &7K$-= ?2 (-67-&4-( %& 5)- N?&5%,/ ccc
=-57=& 3-&a(-32/Lb
A-2 C%6=9753%-=a(-32S 3S (%A- d c7>>-=cb. ccc @%&A( %&5-=67'&5%3- ='&,- '&A =-57=& ' 5)=-()?3A 2?= ?753%-=(/ ccc
& d 3-&a3b %2 & ] R.
=-57=& @'3(- %2 a&eFbIl dd 1. 6F d 3fa&eFb#lEFg 6R d 3fRua&eFb#lEFg -3(-. 6F d a3fa&eFb#lEFge3fa&eFb#lgbu/m 6R d a3fRua&eFb#lEFge3fRua&eFb#lgbu/m %6= d 6R E 6F %2 (%A- dd c7>>-=c. =-57=& 6R e F/mu%6= -3(-. =-57=& 6F E F/mu%6= =-57=&
A-2 7>A'5-N?La(-32b. ccc +>A'5- 4?L-=',- L'37-( >-= 4?&5%, $'(- '&A ,-5 4?&(-&(7( (-67-&4-/ ccc
?( d f(-32/Lf(gf1gf1g E (-32/Lf(gf1gfFg 2?= ( %& (-32/Lg 3( d f(/3 e (-32/Lf(gf1gf1g E (-32/Lf(gf1gfFg 2?= ( %& (-32/Lg (-32/? d '$(aK%&a1SK%&a?(bbb 3-&,5) d K'Ma3(b e (-32/? $ d fHJ 2?= % %& ='&,-a3-&,5)bg (-32/4 d f1 2?= % %& ='&,-a3-&,5)bg 2?= 4 %& (-32/L/`-B(ab. ? d ?(/>?>a1b e (-32/? ?D d 1 2?= % %& 4/$. %2 &?5 % %& $f? e ?Dg. $f? e ?Dgf%g d 1 $f? e ?Dgf%g ed 4/& ?D ed F 2?= ? %& ='&,-a3-&a$bb.
;R
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
57/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
5?5'3 d 23?'5a(7Ka$f?g/L'37-(abbb (-32/4f?g ed 5?5'3 %2 3-&a$f?gb _ 1. 2?= % %& $f?g. $f?gf%g #d 5?5'3
(-32/4?&( d QQ/j?%&af(?=5-Aa$f?g/`-B(abS `-B d 3'K$A' &. E$f?gf&gbf1g 2?= ? %& ='&,-a3-&a$bb %2 3-&a$f?gb _ 1gb =-57=&
A-2 4)-4`o&5-,=%5Ba(-32b. ccc P-K?L-( 5=%L%'3 &?&EK'54)%&, (-67-&4-(/ ccc
A d (-5ab 2?= ( %& (-32/L. L1 d (-32/Lf(gf1g 2?= L %& (-32/Lf(gfF.g. %2 &?5 L1 ]d L. ! i?5) 4??=A(/ %& L1 K7(5 ]d 5)'& 5)-%= =-(>-45%L/ %& L A/'AAa(b $=-'` L1 d L
2?= ( %& A. (-32/Af(g d (-32/Lf(g (-32/& Ed (/& A-3 (-32/Lf(g =-57=&
A-2 =-K?L-9753%-=(a(-32b. ccc P-K?L- ?753%-=( 23',,-A $B %&5-=67'&5%3- ='&,- '&A 4?&2%=K-A/ O%5) V')'3'&?$%( A%(5'&4- 5? 4-&5=?%A/ ccc
( d (?=5-Aa(-32/LS `-B d 3'K$A' (-6. (-6/A%(5'&4-a(-32bb 3 d f(-6/A%(5'&4-a(-32b 2?= (-6 %& (g A d (-5ab ! 8-5 ?2 (-67-&4-( 5? =-K?L- 2=?K 4?&5%, 5)=-()?3A d (-32/C%6=9753%-=a3b %2 &?5 5)=-()?3A. ! W?? 2-O (-67-&4-(
=-57=& ? d (7Kaf% _ 5)=-()?3A 2?= % %& 3gb %2 ? _ 1. 4'&A( d (fE?.g ! N'&A%A'5-( 5? ?753%-= ,=?7> d (f.E?g ! P-3%'$3- ,=?7> 4,=?7> d (-32/4-&5=?%AalS (-5a,=?7>bb ! h-5 4-&5=?%A 3,=?7> d f(-6/A%(5'&4-a4,=?7>b 2?= (-6 %& ,=?7>g ! ;%(5(/ 5? 4-&5=?%A K d (7Ka3,=?7>b#3-&a3,=?7>b ! V-'& A%(5'&4- AD d fa%EKbuuD 2?= % %& 3,=?7>g ! N?L(/ L d (6=5a(7KaADb#3-&aADbb ! 85A/ -==?= 2?= M %& 4'&A(. %2 M/A%(5'&4-a(-32b _ L. ! P-K?L- (-6( O%5) A%(5'&4- _ (5A/ -==?= A/'AAaMb 2?= ( %& A. ! ;-3-5- 23',,-A (-67-&4-( 2=?K 4?&5%, (-32/Af(g d (-32/Lf(g (-32/& Ed (/& A-3 (-32/Lf(g
=-57=&
A-2 7>A'5-s?5-a(-32S (S LS `b. ccc +>A'5-( 5)- L?5-( 2?= 5)- 4?&5%,/ ccc
%2 &?5 ( %& (-32/L. (-32/Lf(g d fg (-32/Lf(g/'>>-&AaLb (-32/& ed (/& =-57=&
A-2 `K-=(a(-32S `b. ccc i7%3A( 3%(5 ?2 `EK-=( 2?= 5)- 4?&5%, ccc
;;
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
58/63
8+2$*()*% !9 :%+;$
= d fg 2?= % %& ='&,-a1S (-32/3 E ` e Fb. =/'>>-&Aaf (-32/(f%.% e `gS % gb =-57=& =
A-2 4-&5=?%Aa(-32S `S (-6( d (-5abb. ccc / ccc
4 d HJ 3 d 3-&a(-32/Lb 2?= (-6 %& (-32/L. %2 3-&a(-6(b _ 1 '&A (-6 &?5 %& (-6(. 4?&5%&7- > d (-6/,-5tK@=-6(a`b 2?= `K %& >. %2 &?5 `K %& 4. 4f`Kg d 1 4f`Kg ed >f`Kg 2?= `K %& 4. 4f`Kg #d 3 =-57=& 4
A-2 ,-5tK@=-6(a(-32S `S % d 1S 2 d X?&-b. ccc h-5( 2=-67-&4%-( ?2 `EK-=( %& (-67-&4-/ ccc
%2 2 %( X?&-. 2 d (-32/3 E F A d HJ 2?= `KS >( %& (-32/`K-=(a`b. %2 >( ] % ?= >( _ 2 E ` e F. 4?&5%&7- %2 &?5 `K %& A. Af`Kg d 1 Af`Kg ed F 5` d 2 E % E ` e D 2?= `K %& A.
Af`Kg #d 23?'5a5`b =-57=& A
;I
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
59/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
7'/0$% 3)--839 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! K%PXY3%,&-= E K%4=?PXY '3%,&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V?A73- >?33!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. ;%45%?&'=%-( ?2 (-67-&4-( %&A-M-A $B 5)-%= 4?&5'%&-A `EK-=(/!!! ;-2%&%5%?& ?2 s?5-/ nM-475?= ?2 5)- L?5-E'&AE(--A >=?4-((/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%K>?=5 (B(
43'((
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
60/63
8+2$*()*% !9 :%+;$
%2 &?5 `K %& (-32/CA4. (-32/CA4f`Kg d (-5ab ! o&%5%'3%T- -K>5B `EK-= -&5=B (-32/CA4f`Kg/'AAa57>3-aa4?&5%,S >(bbb ! 85?=- (-67-&4- '&A `EK-= >?(%5%?& =-57=&
A-2 CL?5-a(-32S (-6b. ccc 85?=- L?5-( ,-&-='5-A 2=?K (-6 %&5? 4?&5%,( A%45%?&'=B/ ccc
2?= `KS >( %& (-6/`K-=(a(-32/`b. ! h-5 `EK-=( '&A >?(%5%?&( %& (-6 %2 `K %& (-32/CA4. 2?= 4(-6S 4>( %& (-32/CA4f`Kg. ! h-5 >?(%5%?&( %& %&A-M-A =-'A( 4(-6/7>A'5-s?5-a(-6S s?5-a4>(S >(bS (-32/`b ! s?5- 4?&5%, (-6/4?&5%,(/'AAa4(-6b (-32/C4 ed F ! 87K &7K$-= ?2 L?5-A (-67-&4-( >=%&5 __ (B(/(5A-==S cG=c e (5=a(-32/C4b e cG5>=?4-((-A >%-4-( ?2 A'5'/cS (B(/(5A?75/237()ab ! @?=4- 5? -K>5B $722-= =-57=&
A-2 C%&%5N?L'=%'&4-(a(-32b. ccc / ccc
%2 &?5 (-32/L?5-A. =-57=& @'3(- ` d (-32/`L 4 d (-32/C4 2?= `K %& (-32/CA(. ! N?K>75- -M>-45-A 2=-67-&4B 2?= `K 3 d (-32/CA(f`Kg (-32/C`2f`Kg d (7Kaf(/,-5tK@=-6(a`bf`Kg 2?= ( %& 3gb#23?'5a4b ! N?K>75- 4?L'=%'&4- 2?= `K 2 d (-32/C`2f`Kg = d aa4 E 3-&a3b e Fbu2uuDb#a4 E Fb (-32/C`4f`Kg d (7Kafa(/,-5tK@=-6(a`bf`Kg E 2buuD 2?= ( %& 3gb#a4 E Fb e = =-57=&
A-2 ,-5tK@=-6(a(-32S `K(b. ccc h-5( 2=-67-&4B ?2 `EK-= `K %& %&>75 =-'A(/ ccc
%2 &?5 (-32/L?5-A ?= (-32/C5 dd 1. =-57=& @'3(- A d HJ 2?= `K %& `K(. %2 &?5 `K %& A '&A `K %& (-32/C`2. Af`Kg d (-32/C`2f`Kg#23?'5a(-32/C5b =-57=& A
43'(( s?5-. ccc s?5- (7>>?=5%&, ' `EK-= K'54)/ ccc
A-2 CC%&%5CCa(-32S >FS >Db.
ccc s?5- %&%5%'3%T'5%?& 2=?K (-67-&4- %&A-M-(/ ccc
(-32/>F d >F !
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
61/63
!" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272
A-2 CC=->=CCa(-32b. ccc r=%55-& =->=-(-&5'5%?& ?2 ' L?5- Qa>FS >DbQ/ ccc
=-57=& (5=a57>3-aa(-32/>FS (-32/>Dbbb
A-2 CC'AACCa(-32S ?5)-=b. ccc 87K ?2 5O? L?5-(/ ccc
=-57=& s?5-a(-32/>F e ?5)-=/>FS (-32/>D e ?5)-=/>Db
A-2 CC(7$CCa(-32S ?5)-=b. ccc 87$(5='45%?& ?2 ' L?5- 2=?K '&?5)-=/ ccc
=-57=& s?5-a(-32/>F E ?5)-=/>FS (-32/>D E ?5)-=/>Db
A-2 CC-6CCa(-32S ?5)-=b. ccc n67'35B ?2 L?5-(/ ccc
=-57=& a(-32/>F E ?5)-=/>F dd 1b '&A a(-32/>D E ?5)-=/>D dd 1b
A-2 CC&-CCa(-32S ?5)-=b. ccc ;%22-=-&4- ?2 L?5-(/ ccc
=-57=& a(-32/>F E ?5)-=/>F "d 1b ?= a(-32/>D E ?5)-=/>D "d 1b
A-2 CC3-CCa(-32S ?5)-=b. ccc :?O-= -67'3/ ccc
=-57=& (-32/>F ]d ?5)-=/>F '&A (-32/>D ]d ?5)-=/>D
;D
-
8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data
62/63
8+2$*()*% !9 :%+;$
IM
-
8/16/2019 De No