ADRC2015 GBrowse2 min - FlyBase · Release&5®ion* Release&6&coordinates& size&(Mb)& shi( bp) &...

40
New views in GBrowse2: Release 6 D. melanogaster assembly, RNASeq data, and more

Transcript of ADRC2015 GBrowse2 min - FlyBase · Release&5®ion* Release&6&coordinates& size&(Mb)& shi( bp) &...

Page 1: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

New  views  in  GBrowse2:  Release  6  D.  melanogaster  assembly,  RNA-­‐Seq  data,  and  more  

Page 2: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Hoskins,  et  al.  (2015).  Genome  Res.  [Epub  ahead  of  print]  (PMID:  25589440)  

•  Improvement  to  centric  heterochromaRn  regions.  

•  Improved  Y  chromosome  assembly  (now  10  Rmes  larger).  

Page 3: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

0  Mb   5   10   15   20   25   30  

r6  X  r5  X   r5  XHet  (100%)  

Release  5  -­‐  Release  6  comparison  

r6  Y  r6  4  r5  4   r5  YHet  (67%)  

r6  2L  r5  2L   r5  2LHet  (93%)  

DepicRon  approximates  NCBI  Release  5-­‐to-­‐Release  6  Alignment  

r6  2R  r5  2R  r5  2RHet  (71%)  

r6  3L  r5  3L   r5  3LHet  (90%)  

r6  3R  r5  3R  r5  3RHet  (90%)  

*%  of  r5  ‘Het’  mapped  to  r6  arm  assembly  indicated.  

*  

Page 4: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Release  5  region*   Release  6  coordinates   size  (Mb)   shia  (bp)    X:4,684,795..20,073,489   X:4,790,762..20,179,456   15.4   105,967    2L:1..21,485,538   2L:1..21,485,538   21.4   0    2R:3,037..16,668,212   2R:4,115,532..20,780,707   16.7   4,112,495    3L:5,114,766..24,523,740   3L:5,121,666..24,530,640   24.5   6,900    3R:1..27,905,053   3R:4,174,279..32,079,331   27.9   4,174,278    4:24,054..1,221,288   4:3,428..1,200,662   1.2   20,626  

*For  each  Release  6  chromosome  arm  assembly,  the  largest  region  of  uninterrupted  idenRty  from  Release  5  is  shown.    A  complete  liaover  table  is  available  at  h>p://flybase.org/reports/FBrf0225389.html  

Most  feature  coordinates  have  changed.  

Page 5: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Mapping  of  Release  5  sequences  to  Release  6:  pseudoscaffold  U  is  no  more  

•  24%  of  r5  U  sequence  has  moved  onto  the  r6  chromosome  arm  assemblies:  •  X_  –  223  kb  •  2L  –  40  kb  •  2R  –  994  kb  •  3L  –  150  kb  •  3R  –  1,027  kb  

 •  1,862  minor  scaffolds  remain  as  disRnct  enRRes  (not  a  pseudoscaffold).  

•  About  half  of  the  minor  scaffolds  have  been  mapped  cytologically:  •  e.g.,  2CEN,  3CEN,  rDNA,  Y.  

Page 6: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

MigraQon  of  FlyBase  features  to  Release  6  

•  The  vast  majority  of  FlyBase  features  migrated  automaRcally,  using  mapping  table  derived  from  NCBI  genome  alignment.  

•  77  gene  models  required  manual  review.  •  31  gene  models  were  deleted  (non-­‐coding  repeats)  •  13  gene  models  had  protein  coding  changes.  •  33  mapped  to  the  new  assembly  without  major  changes.  

 •  11  new  gene  models  were  assembled  from  40  annotaRon  fragments  due  to  improvements  in  centric  heterochromaRn.  

See  the  current  release  notes  for  full  details  of  affected  genes.  h>p://flybase.org/staQc_pages/docs/release_notes.html  

Page 7: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

MigraQon  of  FlyBase  evidence  to  Release  6  

•  NCBI  has  provided  new  alignments  of  nucleoRde  and  protein  sequences.  

•  Sue  Celniker’s  group  has  provided  modENCODE  RNA-­‐Seq  data  newly  mapped  to  the  Release  6  assembly.  

•  Other  RNA-­‐Seq  data  has  been  migrated  by  FlyBase.  

Page 8: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Access  to  Release  5  data:  FlyBase  Archives  

h>p://flybase.org/staQc_pages/downloads/archivedata3.html  

Page 9: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Access  to  Release  5  data:  FlyBase  Archives  

h>p://flybase.org/staQc_pages/downloads/archivedata3.html  

GBrowse  

BLAST  

Page 10: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Access  to  Release  5  data:  FlyBase  FTP  site  

Page 11: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Coordinates  Conversion:  FlyBase  converter  

h>p://flybase.org/staQc_pages/downloads/COORD.html  

Page 12: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Coordinates  Conversion:  NCBI  remap  tool  h>p://www.ncbi.nlm.nih.gov/genome/tools/remap#  

*The  Release  6  “plus  MT”  and  “plus  ISO1  MT”  differ  only  in  the  mitochondrial  genome.  

*  

Page 13: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

FlyBase  policy:  migraQon  of  new  data  to  Release  6  

•  For  a  limited  Rme,  FlyBase  will  accept  Release  5-­‐based  data  and  lia  it  over  to  Release  6.  

•  FlyBase  will  replace  any  FlyBase-­‐migrated  dataset  with  a  new  analysis  directly  mapped  to  the  Release  6  genome  assembly.  

Page 14: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Be  mindful  of  the  reference  assembly  in  use  Dmel  Release  6  genome  assembly  

4th  gene  model  annotaQon  version  on  this  assembly  

FlyBase  version  

Page 15: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Release  notes    h>p://flybase.org/staQc_pages/docs/release_notes.html    •  General  FlyBase  staRsRcs.  

•  Detailed  informaRon  about  the  reference  genome  assembly  in  use  for  each  species.  

•  Summary  staRsRcs  on  gene  model  annotaRons  and  other  annotated  features.  

Be  mindful  of  the  reference  assembly  in  use  

Page 16: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

•  Specify  the  reference  assembly  and  FlyBase  annotaRon  version  used  in  your  publicaRons.  

•  Check  the  reference  assembly  and  annotaRon  version  in  use  at  other  databases,  resources  and  datasets.  

Be  mindful  of  the  reference  assembly  in  use  

Page 17: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

•  GBrowse  2  was  developed  by  the  Generic  Model  Organism  Database  (GMOD),  and  integrates  nicely  with  GMOD  tools.  

 •  GBrowse  2  (2010)  replaces  GBrowse  (2002).  

•  FlyBase  has  offered  GBrowse  2  (beta)  since  March  2013.  

Page 18: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2  handles  more  data  •  Display  of  quanRtaRve  (wiggle)  tracks  and  next-­‐generaRon  sequencing  (NGS)  data.  

•  Ability  to  display  all  tracks  in  a  single  view.  

è  è  è  

è  

Page 19: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  easier  track  customizaQon  

track  Qtle  tool  bar  

Page 20: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  easier  track  customizaQon  

Page 21: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  easier  track  customizaQon  

Page 22: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  easier  track  customizaQon  

Page 23: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  easier  track  customizaQon  

Page 24: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

click-­‐and-­‐drag  selecQon  

GBrowse  2:  click-­‐and-­‐drag  zoom  

Page 25: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  click-­‐and-­‐drag  zoom  

Page 26: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

click  on  ruler  

GBrowse  2:  ruler  

Page 27: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

drag  ruler  

GBrowse  2:  ruler  

Page 28: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

GBrowse  2:  ruler  

Page 29: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

New  RNA-­‐Seq  display  opQons:  sample  selecQon  

track  configuraRon  icon  (wrench)  

Page 30: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

New  RNA-­‐Seq  display  opQons:  sample  selecQon  

verRcal  spacing  between  samples  

log2  or  linear  data  display  

sample  selecRon  

Rlted  or  verRcal  data  register  

Page 31: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

New  RNA-­‐Seq  display  opQons:  sample  selecQon  

Page 32: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

New  RNA-­‐Seq  display  opQons:  linear  vs.  log2  

Page 33: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

ç   ç  

New  RNA-­‐Seq  display  opQons:  linear  vs.  log2  

ç  

Page 34: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

New  RNA-­‐Seq  display  opQons:  linear  vs.  log2  

log2  scale   linear  scale  log2  scale   linear  scale  

Page 35: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

í  Sequence  download:  decorated  FASTA  

í   í  í  

step  1  –  choose  opRon  and  hit  configure  

step  2  –  choose  opRons  

step  3  –  hit  go  

Page 36: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Sequence  download:  decorated  FASTA  

Page 37: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Other  new  stuff  at  FlyBase:  NCBI  GNOMON  annotaQons  for  Drosophila  

GNOMON  annotaRons  have  replaced  CAF1  (2006)  annotaRons  for  D.  simulans,  D.  pseudoobscura,  D.  erecta,  D.  anannassae  and  D.  yakuba  (more  on  the  way).  h>p://www.ncbi.nlm.nih.gov/genome/annotaQon_euk/process/    

RNA-­‐Seq  alignment  

Page 38: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Other  new  stuff  at  FlyBase:  a  new  D.  simulans  genome  assembly  

For  more  informaRon:  h>p://flybase.org/staQc_pages/feature/previous/arQcles/2015_02/Dsim_r2.01.html  

Page 39: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

Find  this  presentaQon  on  FlyBase  

h>p://flybase.org/wiki/FlyBase:FlyBase_Guides:_Pamphlets_and_Powerpoints  

Page 40: ADRC2015 GBrowse2 min - FlyBase · Release&5&region* Release&6&coordinates& size&(Mb)& shi( bp) & X:4,684,795..20,073,489& X:4,790,762..20,179,456& 15.4 105,967 & 2L:1..21,485,538

References    Brown,  et  al.  (2014).  Diversity  and  dynamics  of  the  Drosophila  transcriptome.  Nature  512(7515):  393-­‐-­‐399.  (PMID:  24670639,  FBrf0225793)      dos  Santos,  et  al.  (2015).  FlyBase:  introducRon  of  the  Drosophila  melanogaster  Release  6  reference  genome  assembly  and  large-­‐scale  migraRon  of  genome  annotaRons.    Nucleic  Acids  Res.  43(Database  issue):  D690-­‐-­‐D697.  (PMID:  25398896,  FBrf0227324)      Hoskins,  et  al.  (2015).  The  Release  6  reference  sequence  of  the  Drosophila  melanogaster  genome.  Genome  Res.  2015  Jan  14.  pii:  gr.185579.114.  [Epub  ahead  of  print]  (PMID:  25589440)      Stein  (2013).  Using  GBrowse  2.0  to  visualize  and  share  next-­‐generaRon  sequence  data.  Brief  Bioinform.  14(2):162-­‐-­‐171.  (PMID:  23376193)