UserShowcase%% Astrocyte%BICFWorkflow%RNA Seq%Analysis · 2019-06-27 · User Showcase Astrocyte...

Post on 27-Jul-2020

2 views 0 download

Transcript of UserShowcase%% Astrocyte%BICFWorkflow%RNA Seq%Analysis · 2019-06-27 · User Showcase Astrocyte...

User  Showcase    Astrocyte  BICF  Workflow  RNA-­‐Seq  Analysis  

 

Xin  Luo  Ph.D.  BICF  

02/22/2017  

What  Can  Gene  Expression  Tell  Us?  •  How  do  gene  expression  paGerns  determine  the  cell  

idenJty?  •  What  are  the  differenJally  expressed  genes  in  a  specific  

disease  as  compared  to  healthy  control?  •  How  does  the  gene  expression  response  to  different  sJmuli  

or  drug  treatment  •  How  does  gene  expression  changes  across  different  

developmental  stages  •  What  are  the  downstream  targets  or  pathways  of  your  

factor  of  interests?  

…  and  much,  much  more  …  

Technology   RNA-­‐Seq   Microarray  

Development  Year   2008   1995  

Method   Sampling   HybridizaJon  

Prior  Knowledge   No   Yes  

reproducibility   Yes   Yes  

Dynamic  range     >8000-­‐fold   Hundred  fold  novel  transcripts  detecJon  (fusion,  non-­‐coding  

transcripts)   Yes   No  

novel  splicing  isoforms  detecJon   Yes   No  De  novo  analysis  for  species  without  reference  

genome   Yes   No  

Required  RNA  amount   >100ng   ~1μg  

RNA-­‐seq  vs  Microarray  

General  RNA-­‐Seq  Workflow  

Everything's  connected  slide  by  Dündar  et  al.  (2015)  

Experimental  Design  AffecJng  Your  Analysis  

•  Whole  transcriptome  vs  mRNA  •  Single-­‐end  vs  paired-­‐end  •  Paired-­‐end  produces  more  accurate  alignments  •  Paired-­‐end  allows  for  transcript-­‐level  analysis  •  Single-­‐end  is  cheaper  

•  Number  of  Reads  •  10-­‐50M  is  a  good  range    •  Aim  at  least  20M  

•  Read  Length  •  Longer  reads  produce  beGer  alignments,  min  50  bp  paired  

or  100bp  single  for  gene  quanJficaJon  •  ChIP-­‐seq,  smallRNA-­‐seq,  RIP-­‐seq,  CLIP-­‐seq:  50nt  single-­‐end  

Experimental  Design  AffecJng  Your  Analysis  

•  Number  of  Samples  •  Your  power  to  detect  an  effect  depends  on  – Effect  size  (difference  between  group  means)  – Within  group  variance  – Sample  size  

•  More  samples  the  beGer,  min  3  per  group  •  Five  samples  sequenced  to  20M  reads  each  offer  more  power  than  2  samples  sequenced  to  50M  reads  

•  Stranded  •  Can  disJnguish  expression  of  overlapping  genes  

Strand-­‐specific  RNA-­‐Seq  

image  from  GATC  Biotech  

How  to  decide  strand  

Reverse  stranded  

Stranded  

RNA-Seq Analysis Pipeline

http://www.utsouthwestern.edu/labs/bioinformatics/services/data-analysis/rnaseq-pipeline.html

RNA-­‐Seq  Analysis  Essence  

•  Preprocessing  and  normalizaJon  •  DifferenJal  gene  expression  analysis  •  QC  •  VisualizaJon  •  Pathway  and  gene  sets  enrichment  analysis  •  Different  splicing  isoforms  •  Fusion  and  variants  

•  IntroducJon  to  BioHPC  – First  Wednesday  Every  Month  – 10:30AM-­‐Noon  @  NL6.215  

•  Please  aGend  so  you  can  get  an  account  to  try  this  out  

Launch  Workflows  using  Astrocyte  

Allows groups to give easy-access to their analysis pipelines via the web

Astrocyte – BioHPC Workflow Platform"

Standardized  Workflows      Simple  Web  Forms      Online  documentaJon  &  results  visualizaJon*  

Workflows  run  on  HPC  cluster  without  developer  or  user  needing  cluster  knowledge  

Slide  contribuJon:  David  Trudgian@BioHPC  

astrocyte.biohpc.swmed.edu  

Browse  workflows  

Create  a  New  Project  

Add  Data  to  Your  Project  

Add  Data  to  Your  Project  

For  NGS  experiment,  this  is  recommended.  

Use  cat  command    to  combine  reads  from  different  lanes  of  sequencing  e.g.  cat  S1.*.R1.fastq.gz  >  S1.R1.fastq.gz  Use  cp    command  to    copy  files  to  the  astrocyte_incoming  folder  for  uploading  e.g.  cp  *.fastq.gz    /project/apps/astrocyte/astrocyte_incoming/xluo4    

Make  Your  Design  File  SampleID   SampleGroup   SubjectID   SampleName   FullPathToFqR1   FullPathToFqR2  

N4   Normal   4   N4   N4.R1.fastq.gz   N4.R2.fastq.gz  N5   Normal   5   N5   N5.R1.fastq.gz   N5.R2.fastq.gz  N7   Normal   7   N7   N7.R1.fastq.gz   N7.R2.fastq.gz  T2   Tumor   2   T2   T2.R1.fastq.gz   T2.R2.fastq.gz  T5   Tumor   5   T5   T5.R1.fastq.gz   T5.R2.fastq.gz  T7   Tumor   7   T7   T7.R1.fastq.gz   T7.R2.fastq.gz  

Make  Your  Design  File  

•  Use  tab  as  delimiter  –  Excel  save  as  “Text  (tab  delimited)”  

•  Columns  names  MUST  be  exactly  the  same  as  documented  

•  If  no  SubjectID,  use  same  number/character  for  all  rows    

•  If  single  end  sequencing  ,  leave  FullPathToFq2  empty  

•  For  all  contents,  no  “-­‐”  •  For  all  contents,  no  spaces  

Select  Your  Data  Files  and  Set  Up  Workflow  

SELECT  YOUR  FILES  

hGp://souware.broadinsJtute.org/gsea/msigdb/index.jsp  

Project  is  running  

Timeline  of  The  Whole  Run  

Download/Visualize  Your  Results  

Vizapp  needs  about  30s  to  start  if  there  is  no  queue.  You  need  to  refresh  the  page.  

You  can  also  choose  individual  files  to  download  to  your  local  computer  

Data  source:  E-­‐GEOD-­‐60424  -­‐  Next  generaZon  sequencing  of  human  immune  cell  subsets  across  diseases  

Comparisons  •  Comparisons  are  based  on  SampleGroup  –  All  pair-­‐wise  comparisons  –  Could  be  idenJfied  by  file  name  •  A_B.edgeR.txt  •  Log  fold  change  will  be  A/B  •  If  you  want  B/A,  -­‐1*logFC  

Vizapp:  Align  Stats    

Vizapp:  Gene  Type  Stats  

Vizapp:  MDS  and  PCA  Analysis  

Vizapp:  Sample  Distances  

Vizapp:  Gene  Compare  

Vizapp:  DEA    

•  Uses  edgeR  results  •  Filter  gene  list  by  different  parameters  •  Sort  by  different  columns  •  Data  table  downloading  

Vizapp:  DEA  Heatmap  

•  Filter  gene  list  by  different  parameters  •  Choose  different  comparisons  •  Support  user  define  gene  list  (gene  official  symbol)  •  Support  pathway  

Vizapp:  QuanJtaJve  Set  Analysis  for  Gene  Expression  (QuSAGE)    

 

hGp://bioconductor.org/packages/release/bioc/html/qusage.html    

Vizapp:  QuanJtaJve  Set  Analysis  for  Gene  Expression  (QuSAGE)    

 

Vizapp:  Gene  Alt  Splicing  

Different  transcripts’  expression  in  sample  groups  

Vizapp:  alternaJve  splicing  

Common  Errors  and  SoluJons  

•  Make  sure  the  delimiter  is  tab  •  Make  sure  the  column  name  are  the  same  as  menJoned  in  documentaJon  

•  Make  sure  the  file  names  match  

Common  Errors  and  SoluJons  

•  Not  all  files  are  uploaded  

•  It’s  about  the  proxy  seyng  

•  Use  auto-­‐detect  proxy  

Other  Useful  Tools    •  Gene  Set  Enrichment  Analysis  (GSEA)    hGp://souware.broadinsJtute.org/gsea/index.jsp  MSigDB  hGp://souware.broadinsJtute.org/gsea/msigdb/index.jsp  Gene  PaGern  hGp://souware.broadinsJtute.org/cancer/souware/genepaGern/  Use  countTable.logCPM.txt  to  generate  .gct  file  or  edgeR.results.txt  to  generate  .rnk  file  in  excel  as  inputs    •  User-­‐designed  specific  heatmaps  by  Morpheus  hGps://souware.broadinsJtute.org/morpheus/  Use  countTable.logCPM.txt  to  create  .txt  file  as  input  for  Morpheus      

Other  Useful  Tools  •  Complex  designs  Use  countTable.txt  as  input  for  complex  designs  in  edgeR  or  Deseq2  R  packages  hGps://bioconductor.org/packages/release/bioc/html/edgeR.html  hGps://bioconductor.org/packages/release/bioc/html/DESeq2.html    •  MoJf  search/promoter  analysis    with  Homer  moJf  search  Use  edgeR.results.txt  (differenJally  expressed  gene  list)  as  input    hGp://homer.salk.edu/homer/moJf/  

Acknowledgement  

•  Brandi  Cantarel  •  David  Trudgian  •  Beibei  Chen    •  BioHPC  team  hGps://portal.biohpc.swmed.edu/content/  •  BICF  team  hGp://www.utsouthwestern.edu/labs/bioinformaJcs/