RNA sequencing$module$ - Wiki.uio.no · oslo.genomics.no Non coding$RNAs$$$ Category$ Name Length(...

31
oslo.genomics.no RNA sequencing module Wednesday 15.10.14 – Day1 09.00 Welcome 09.10 RNA sequencing introduc>on 10.00 RNAseq data analysis –Introduc>on 10.45 RNAseq – prac>cal part 1 12.00 Lunch 12.45 RNAseq – prac>cal part 2 Thursday 16.10.14 – Day2 09.00 Introduc>on to genomeguided transcriptome assembly 09.30 Transcriptome assembly – prac>cal part 1 12.00 Lunch 12.45 Transcriptome assembly – prac>cal part 2 14.30 Func>onal annota>on 15.00 Alterna>ve RNAseq applica>ons 15.30 Ques>ons and discussion

Transcript of RNA sequencing$module$ - Wiki.uio.no · oslo.genomics.no Non coding$RNAs$$$ Category$ Name Length(...

oslo.genomics.no  

RNA  sequencing  module  Wednesday  15.10.14  –  Day1    09.00    Welcome  09.10    RNA  sequencing  introduc>on  10.00  RNA-­‐seq  data  analysis  –Introduc>on  10.45  RNA-­‐seq  –  prac>cal  part  1  12.00    Lunch  12.45  RNA-­‐seq  –  prac>cal  part  2    Thursday  16.10.14  –  Day2    09.00  Introduc>on  to  genome-­‐guided  transcriptome  assembly  09.30  Transcriptome  assembly  –  prac>cal  part  1  12.00  Lunch  12.45  Transcriptome  assembly  –  prac>cal  part  2  14.30  Func>onal  annota>on  15.00  Alterna>ve  RNA-­‐seq  applica>ons  15.30  Ques>ons  and  discussion        

oslo.genomics.no  

Dr.  Susanne  Lorenz    

Genomics  Core  Facility  Helse  Sør-­‐Øst  

 Dept.  of  Tumor  Biology  

The  Norwegian  Radium  Hospital,  OUS        

RNA  sequencing  -­‐  Introduc5on  

oslo.genomics.no  

exon  1   exon  2   exon  3   exon  4  

exon  1   exon  2   exon  3   exon  4   exon  1   exon  2   exon  3   exon  4  

Genome  

pre-­‐  mRNA  

Blood   Brain  Transcrip2on  

AAAAAAAAAA  mRNA  

Splicing,    Poly(A)  tailing  

AAAAAAAAAA  

From  a  Gene  to  RNA  

Transcript  A1   Transcript  A2  

Gene  A  

à  messanger  RNA  (mRNA)  will  be  translated  into  protein  (coding  RNAs)  à  in  human  20.000-­‐25.000  protein  coding  genes  

oslo.genomics.no  

exon  1   exon  2  

exon  1   exon  2  

Genome  

Blood   Brain  Transcrip2on  

Non-­‐coding  RNA  

Spligcing  Transcript  

Gene  B  

à  Non-­‐coding  RNA  (ncRNA)  will  not  be  translated  into  protein  à  Some  types  of  ncRNAs  have  a  polyA-­‐tail,  others  not  à  Three  main  categories:  houskeeping  RNAs,  short  (<  200  bp)  and  long  ncRNAs  (>200bp)  

pre  Non-­‐  coding  RNA  

From  a  Gene  to  RNA  

oslo.genomics.no  

Non-­‐coding  RNAs      

Category   Name   Length  (bp)   Func2on  

Housekeeping  RNAs  

Ribosomal  RNA  (rRNA)   120-­‐5000   ribosome  structure  

Transfer  RNA  (tRNA)   73-­‐94   protein  transla>on  

small  nuclear  RNA  (snRNA)   ~  150   splicing  

small  nucleolar  RNA  (snoRNA)   70-­‐200   post-­‐transcrip>onal  modifica>on  

Short  non  coding  RNAs  

(smallRNAs)  

micro  RNAs   16-­‐30  (21-­‐24)   transla>onal  repression  

PIWI-­‐interac>ng  RNAs   26-­‐31   regulate  transposon  ac>vity  and  chroma>n  state  

promotor-­‐associated  short  RNAs   ~18   may  regulate  gene  expression  at  

chroma>n  level  

Long  non  coding  RNAs  

long  intergenic  ncRNA   >  200   epigene>c,  transcrip>onal  and  post-­‐transcrip>onal  regula>on  

pseudogenes   >  200   compe>>ve  endogenous  RNA  

Enhancer  RNA   50-­‐2000   not  known  

An>sense  RNA   >  200   gene  expression  

long  intronic  ncRNA   >  200   not  known  

Repeat  associated  long  RNA   >  200   not  known  

à  Ribosomal  RNA  represents  a  challenge  for  RNA  sequencing  as  it  cons>tutes  up  to  80  %  of  total  RNA  

oslo.genomics.no  

RNA  sequencing  

What  is  RNA  sequencing?      

Massive  parallel  sequencing  to  characterize  and  quanDfy  transcriptomes  (all  acDvely  transcribed  genes)    What  does  RNA  sequencing  offer?    

•  Iden>fica>on  of  all  ac>vely  transcript  genes  in  a  cell  type/>ssue    •  Differen>ally  gene  expression  

•  Iden>fica>on  of  new  transcripts  •  Detec>ng  of  alterna>ve  splicing  events    •  Detec>on  of  fusion  transcripts  •  Strand-­‐specific  measurements  •  Muta>on  analysis  –  expression  level  of  genomic  muta>ons,  RNA  edi>ng    

oslo.genomics.no  

RNA  sequencing  in  comparison  

“RNA-­‐Seq:  a  revolu>onary  tool  for  transcriptomics”  Wang  Z.  et  al.,  2009  Nature  Reviews    

oslo.genomics.no  

RNA  sequencing  protocols  

1.  mRNA  (protein  coding)  stranded  sequencing    à  only  Poly-­‐A  tail  RNA    à  no  rRNA  contamina>on  but  genes  encoding  proteins  of  the      

                 ribosome  

2.  total  RNA  stranded  transcriptome  (ribosomal  RNA  deple>on)      à  total  RNA  isola>on  followed  by  rRNA  deple>on      à  generates  informa>on  about  all  RNA  molecules  except                      rRNAs  and  RNA  molecules  longer  than  120  bp  

 3.  Capturing  systems  for  stranded  RNA-­‐sequencing  

 à  hybridiza>on  based      à  dependent  on  annota>on    à  increased  sequencing  depth  at  coding  regions    à  capable  for  very  low  star>ng  material  (10  ng)  

   

oslo.genomics.no  

Illumina  TruSeq  strand-­‐specific    RNA  protocols  

1. Poly-A selection

mRNA  Sequencing   Total  RNA  Sequencing  

oslo.genomics.no  

Illumina  TruSeq  strand-­‐specific    RNA  protocols  

Flow cell

oslo.genomics.no  

Strand-­‐specific  total  RNA  sequencing-­‐  advantages  

§  more  even  coverage  along  the  transcript    à  significant  less  3´  -­‐bias  compared  to  Poly-­‐A  tailing    à  more  accurate  quan>fica>on  of  gene  expression    

oslo.genomics.no  

Strand-­‐specific  total  RNA  sequencing-­‐  advantages  

Fresh  frozen  high  quality  sample  (RNA  RIN  value  9.0)  

Formalin-­‐fixed  paraffin-­‐embedded  sample  (RNA  RIN  value  6.0)  

§  robust  and  efficient  method  even  for  low  quality  samples      

oslo.genomics.no  

Strand-­‐specific  total  RNA  sequencing-­‐  advantages  

§  Improved  discrimina>on  of  overlapping  transcripts    à  more  accurate  quan>fica>on  of  gene  expression        

oslo.genomics.no  

1.  Hybridiza5on  and  amplifica5on  on  the  flow  cell  

RNA  sequencing  -­‐  Illumina  2.  Sequencing  

4.  Millions  of  short  sequences  in  fastq  format  

>  HWUSI-EAS100R:6:73:941:1973#0/1   AGCGTAACCGGTAACGATAGCAGAT @ HWUSI-EAS100R:6:73:941:1973#0/1 bbbbbbbb%%%++)(%%%%)1**((((***+

3.  Image  analysis  and  base  calling  

oslo.genomics.no  

RNA  sequencing  -­‐  Illumina  

Read1   Read2  

cDNA  fragment  

Single-­‐end  sequencing  (Read1  only)  

Paired-­‐end  sequencing  (Read1  and  Read2)  

oslo.genomics.no  

Scien5fic  RNA  sequencing  case  1  

“Au>sm  spectrum  disorder  (ASD)  is  a  common,  highly  heritable  neuro-­‐developmental   condi>on   characterized   by   marked   gene>c  heterogeneity.”   RNAseq   is   used   to   inves>gate   gene   expression   in  au>s>c  brain  compared  to  normal  brain.  

oslo.genomics.no  

Transcriptomic  analysis  of  au5s5c  brain    

Heatmap  of  the  top  200  differen>ally  expressed  genes  between  au>sm  and  control  cortex  samples  

à  dis>nct  clustering  of  the  majority  of  au>sm  cortex  samples,  in  contrast  to  genomic  heterogeneity  (shown  in  GWAS  study)  

oslo.genomics.no  

A)  Significant  expression  differences  between  frontal  and  temporal  cortex  in  control  samples  (top)  and  au>sm  samples  (bomom).  

B)  Top  20  genes  differen>ally  expressed  between  frontal  and  temporal  cortex  in  controls.  None  of  the  genes  show  significant  expression  differences  between  frontal  and  temporal  cortex  in  au>sm.  

Transcriptomic  analysis  of  au5s5c  brain    

oslo.genomics.no  

Transcriptomic  analysis  of  au5s5c  brain    

Results:    §  Dis>nct  transcriptomic  differences  between  au>sm  and  control  

cortex  samples  even  if  heterogeneous  at  genomic  level  (WGAS)    

§  Gene  ontology  analysis  showed  down-­‐regulated  genes  related  to  synap>c  func>on,  whereas  up-­‐regulated  genes  were  related  to  immune  and  inflammatory  response  

 

§  Consistent  expression  in  frontal  and  temporal  cortex  compared  to  differen>al  expression  in  normal  samples  

à  Gained  knowledge  about  biology  behind  the  disease  that  can  improve  the  development  of  diagnosis  and  treatment  strategies  

 

oslo.genomics.no  

Scien5fic  RNA  sequencing  case  2  

”To  idenDfy  the  precise  geneDc  elements  and  study  the  exclusive  nature  of  three  immunohistochemically  different  breast  cancer  types,  we  employed  massively  parallel  mRNA  sequencing.”    

oslo.genomics.no  

PCA  plots  showing  the  clustering  of  the  TNBC  (magenta),  Non-­‐TNBC  (Red)  and  HER2-­‐posi>ve  (green)  breast  cancer  samples  based  on  the  transcriptomic  expression  profiles.  Table  showing  the  number  of  sta>s>cally  significant  differen>ally  expressed  transcripts.  

Transcriptomic  landscape  of  breast  cancer  through  mRNA  sequencing  

oslo.genomics.no  

Transcriptomic  landscape  of  breast  cancer  through  mRNA  sequencing  

The  table  presents  the  six  most  common  highly  abundant  primary  transcripts  and  all  of  the  associated  informa>on.  The  bomom  four  lines  of  the  table  show  the  primary  transcript  expression  profiles  specific  for  the  TNBC  and  Non-­‐TNBC  (APOE)  and  HER2-­‐posi>ve  (FN1,  PP1B  and  OAZ1)  groups.    

oslo.genomics.no  

Transcriptomic  landscape  of  breast  cancer  through  mRNA  sequencing  

§   Compara>ve  transcriptomic  analyses  elucidated  differen>ally  expressed            transcripts  between  the  three  breast  cancer  groups,  iden>fying  several          new  modulators  of  breast  cancer.      §   Iden>fica>on  of  common  transcrip>onal  regulatory  elements,  such  as            highly  abundant  primary  transcripts,  including  osteonec>n,  RACK1,          calnexin,  calre>culin,  FTL,  and  B2M,  and  ‘‘genomic  hotspots’’  enriched  in            primary  transcripts  between  the  three  groups.      §   The  study  opens  previously  unexplored  niches  that  could  enable  a  bemer          understanding  of  the  disease  and  the  development  of  poten>al          interven>on  strategies.  

oslo.genomics.no  

Scien5fic  RNA  sequencing  case  3  

Integra5ve  annota5on  of  human  large  intergenic  noncoding  RNAs  reveals  global  proper5es  and  specific  subclasses    Moran  N.  Cabili,  Cole  Trapnell,  […],  and  John  L.  Rinn  (2011)  

In  this  study  a  reference  catalog  of  >  8000  human  lincRNAs  is  defined  and  characterize  by  sequence,  structural  and  transcrip>onal  features  across  24  >ssues  and  cell  types.    

oslo.genomics.no  

Integra5ve  annota5on  of  human  large  intergenic  noncoding  RNAs    

Computa>onal  approach  for  comprehensive  annota>on  of  lincRNAs  

B  A  

oslo.genomics.no  

Integra5ve  annota5on  of  human  large  intergenic  noncoding  RNAs    

Expression  level  of  lincRNAs  and  protein  coding  genes  across  the  >ssues  (color  intensity  represents  frac>onal  density  across  the  row)  

oslo.genomics.no  

Integra5ve  annota5on  of  human  large  intergenic  noncoding  RNAs    

(B)  Expression  abundance  of  1508  highest  expressed  lincRNAs  compared  to  8906  highest  expressed  protein  coding  genes  à  lincRNAs  are  lower  expressed    (C)  Distribu>on  of  maximal  >ssue  specificity  scores  calculated  from  data  in  A      à  lincRNAs  show  higher  >ssue  specificity  

oslo.genomics.no  

Non-­‐coding  RNAs  in  human  diseases    

HOTAIR   binds   to   polycomp   proteins   that  remodel   chroma>n   marks   what   leads   to  epigene>c   silencing   of   i.e.   HOXD   and  increases  invasiveness  of  cancer  cells.    

lincRNA  HOTAIR  

oslo.genomics.no  

BACE1-­‐AS,   an   an>sense   lncRNA   regulates  the   expression   of   the   sense   BACE1   gene  (labelled   BACE1-­‐S   in   the   figure)   through  the   stabiliza>on  of   its  mRNA.  BACE1-­‐AS   is  elevated  in  Alzheimer’s  disease,   increasing  the   amount   of   BACE1   protein   and,  subsequently,  the  produc>on  of  β-­‐amyloid  pep>de.  

lncRNA  in  Alzheimer`s  disease  

Non-­‐coding  RNAs  in  human  diseases    

oslo.genomics.no  

Non-­‐coding  RNAs  in  human  diseases    

The  loss  of  the  snoRNA  in  PWS  changes  the  alterna>ve   splicing   of   the   serotonin  receptor   HTR2C   precursor   mRNA   (pre-­‐mRNA),  resul>ng  in  a  protein  with  reduced  func>on.    

snoRNA  in  Prader-­‐Willi  syndrome  

oslo.genomics.no  

RNA  seq  data  set  for  the  prac5al  part  

Aim:    Iden5fica5on  of  dysregulated  genes  in  osteosarcoma          

•   Most  common  primary  malignant  tumours  of  bone  

•   occurs  mainly  in  long  bone  (arm  and  leg)  Children/adolescents  

•  High  grade  tumours  that  are  very  aggressive  

•  Complex  genomic  aberra5ons  

à  The  high  number  of  genomic                    aberra>ons  is  likely  to  have  an                effect  on  genes  expression