NETTAB 2012

37
Alejandra GonzálezBeltrán Senior Software Engineer, ISATeam Oxford eResearch Centre, University of Oxford Oxford, UK NETTAB 2012 – Integrated BioSearch, Como, Italy, November 1416 The open source ISA soOware suite and its internaQonal user community: Knowledge management of experimental data

description

 

Transcript of NETTAB 2012

Page 1: NETTAB 2012

Alejandra  González-­‐Beltrán  

Senior Software Engineer, ISATeam Oxford  e-­‐Research  Centre,  University  of  Oxford  

 Oxford,  UK

NETTAB  2012  –  Integrated  Bio-­‐Search,  Como,  Italy,  November  14-­‐16  

The  open  source  ISA  soOware  suite  and  its  internaQonal  user  community:  

Knowledge  management  of  experimental  data  

Page 2: NETTAB 2012

Outline  •  Knowledge  management  of  experimental  data  

–  SeSng  the  scene  –  The                                ecosystem:  ISA-­‐tab,  tools,  community  –  Use  case  

•  Latest  addiQons    

•  Related  projects  &  main  points  

 

Page 3: NETTAB 2012

SeSng  the  scene  

Source  of  the  figure:  EBI  website  

tox/pharma  

env  

health  

agro  

Bioscience    is  mulQ-­‐domain…  

Page 4: NETTAB 2012

SeSng  the  scene  

Source  of  the  figure:  EBI  website  

tox/pharma  

env  

health  

agro  

Bioscience    is  mulQ-­‐domain…   Petabytes  of  data  

Page 5: NETTAB 2012

SeSng  the  scene  

Source  of  the  figure:  EBI  website  

tox/pharma  

env  

health  

agro  

Bioscience    is  mulQ-­‐domain…   Petabytes  of  data  

Experimental  metadata  in  Lab  books  

Page 6: NETTAB 2012

•  Assist  in  the  annotaQon  and  management  of  experimental  data  at  source    

•  Deal  with  data  from  high-­‐throughput  studies  using  one  or  a  combinaQon  of  omics  and  other  technologies  

•  Empower  users  to  uptake  community-­‐defined  checklists  and  ontologies  

•  Facilitate  data  sharing,  reuse,  comparison  and  reproducibility  of  experiments,  submission  to  internaQonal  public  repositories  

inves&ga&on  study  assay  

Page 7: NETTAB 2012

The                          ecosystem  

Page 8: NETTAB 2012

The                          ecosystem  

ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Rocca-Serra et al, 2010 Bioinformatics

Towards interoperable bioscience data Sansone et al, 2012 Nature Genetics

Page 9: NETTAB 2012

General  purpose  &  flexible  format  Domain  agnosQc  Captures  metadata  in  omics  experiments  and  tradiQonal  experiments  (e.g.  clinical  chemistry  and  histology)  

Page 10: NETTAB 2012

faahKO  dataset  •  Available  in  BioConductor  •  Subset  of  the  original  data  on  global  metabolite  profiling  

•  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH  (facy  acid  amyde  hydrolase)  knockout  mice  

Saghatlian  et  al.  Biochemstry.  2004  

Page 11: NETTAB 2012

faahKO  invesQgaQon  -­‐    Define  key  enQQes  (e.g.  factors,    protocols,  parameters)  -­‐  Grouping  of  studies  -­‐  Relate  studies  and  assays  

Page 12: NETTAB 2012

faahKO  study  

NEWT  UniProt  Taxonomy  Database  Mouse  Genome  InformaQcs  

-­‐  Subjects  studied:  source(s),  sampling  methodology,  characterisQcs  -­‐  treatments/manipulaQons  performed    to  prepare  the  specimens    

Page 13: NETTAB 2012

faahKO  study  

Mouse  Adult  Gross  Anatomy  

-­‐  Subjects  studied:  source(s),  sampling  methodology,  characterisQcs  -­‐  treatments/manipulaQons  performed    to  prepare  the  specimens    

Page 14: NETTAB 2012

faahKO  assay  -­‐  measurement  type,  e.g.  metabolite  profiling  -­‐  technology,  e.g.  mass  spectrometry  

Page 15: NETTAB 2012
Page 16: NETTAB 2012
Page 17: NETTAB 2012
Page 18: NETTAB 2012

Report  and  edit  the  descripQon  of  the  invesQgaQon  using  Google  Spreadsheets.    

 Use  Google  Spreadsheets  in  combinaQon  with  ISA-­‐Tab  templates  (created  through  imporQng  the  Excel  file  from  the  ISAconfigurator)  and  OntoMaton  (for  ontology  search  and  tagging  support)  to  report  an  

invesQgaQon.  

Page 19: NETTAB 2012

Ontology  Search  and  Tagging  in  Google  Spreadsheets  

-­‐  collaboraQve  annotaQon  -­‐  distributed  groups  of  users  -­‐  version  control  &  history    

Page 20: NETTAB 2012

Create  templates  detailing  the  steps  to  be  reported  for  different  invesQgaQons,  complying  to  community  standards  (listed  at                                                    ),  e.g.  configuring  fields  to  be  (i)  ontology  terms,  (ii)  text  (with/without  regular  expression  

tesQng),  (iii)  numbers  etc.  

Page 21: NETTAB 2012

From  the  ISA-­‐Tab  we  can  perform  analysis,  convert  to  RDF/OWL  and  other  formats  for  submission/sharing  to  local/remote  repositories,    

Page 22: NETTAB 2012

From  the  ISA-­‐Tab  we  can  perform  analysis,  convert  to  RDF/OWL  and  other  formats  for  submission/sharing  to  local/remote  repositories,    

+  VisualisaQon  Methods  

Page 23: NETTAB 2012

Maguire   E,   Rocca-­‐Serra   P,   Sansone   SA,  Davies  J  and  Chen  M.  Taxonomy-­‐based  Glyph  Design   -­‐-­‐  with   a  Case   Study   on   Visualizing  Workflows   of  Biological  Experiments,  IEEE  Transac9ons  on  Visualiza9on  and  Computer  Graphics,  volume  18,  2012  (in  

press)  

faahKO  Groups  

faahKO  Workflow  

Page 24: NETTAB 2012

•  R  package  available  in  BioConductor  2.11    hcp://bioconductor.org/packages/release/bioc/html/Risa.html  

•  ISAtab  class  •  Read  ISAtab  files  into  ISAtab  objects  and  save  ISAtab  files  

•  Build  xcmsSet  (xcms  package)  objects  from  mass  spectrometry  assays      

•  Augment  the  ISAtab  dataset  aOer  analysis  •                                                           source  &  issues  tracking  

       

hcps://github.com/ISA-­‐tools/Risa    

Page 25: NETTAB 2012

•  faahKO  package  v.  2.12  contains  ISAtab  files  describing  the  experiment          faahkoISA  =  readISAta(find.package("faahKO"))          assay.filename  <-­‐  faahkoISA["assay.filenames"][[1]]          xset  =  processAssayXcmsSet(faahkoISA,  assay.filename)          …          updateAssayMetadata(faahkoISA,  assay.filename,"Derived  Spectral  Data  File","faahkoDSDF.txt"  )  

•  MTBLS2  processing  and  analysis  using  Risa,  xcms  and  CAMERA  BioConductor  packages  

Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research

Page 26: NETTAB 2012

 ISA  syntax    &  Underlying  Material/Data  workflows  

Protocol  REF  

Input  Material  or  Data  Node  

Output  Material  or  Data  Node  

Parameter  Value  […]  

Characteris9cs[…]  Factor  Value[…]  

Characteris9cs[…]  Factor  Value[…]  

26  

Page 27: NETTAB 2012

•  Make  the  semanQcs  of  ISAtab  explicit,  including  materials  &  data  enQQes  &  processes  

•  Exploit  the  semanQc  annotaQons  available  in  ISAtab  datasets  

•  Augment  ISA  syntax  with  new  elements  (e.g.  groups),  facilitaQng  the  understanding  &  querying  of  experimental  design  

•  Facilitate  data  integraQon  &  knowledge  discovery/reasoning  

Page 28: NETTAB 2012

ISAtab  datasets  as  linked  data    

•  Connect  to  the  growing  Linked  Data  universe        RDF  =  Resource  DescripQon  Framework,  OWL  =  Web  Ontology  Language  

•  CollaboraQons  with  Toxbank  (                                )     &   W3C   Health   Care   &   Life   Sciences   Interest   Group  (HCLSIG)  

<subject,  predicate,  object>    <lipoprotein>  <parQcipates_in>  <inflammatory  response>    <PRO:212342352>  <BFO_0000056>  <GO:0006954>  

Page 29: NETTAB 2012

ISAtab  dataset  Parser  

ISA  Mapping  Parser  

ISAtab  Graph  Analysis  

Page 30: NETTAB 2012

ISA-­‐OBO-­‐mapping  

Page 31: NETTAB 2012

material  enQty  

processed    material  

InformaQon  content  enQty  

material    processing  

has  specified  input  

has  specified  input  

has  specified  input  

has  specified  output  

has  specified  output  

has  specified  output  

derives  from  

derives  from  

derives  from  

type  

type  

type  

type  

type  

type  

sample    collecQon  

extracQon  

mass  spectrometry  

./cdf/KO/ko15.CDF  

KO1_extract  

KO1  

Saghantelian_1  

Page 32: NETTAB 2012

Increasing  level  of  structure…  

Notes  in  Lab  books  (informaQon  for  humans)  

Spreadsheets  &  Tables  (ISAtab  metadata)  

Facts  as  RDF  statements  (informaQon  for  machines)  

…different  target  audiences  

Page 33: NETTAB 2012

core  organizaQon  in  the  

UK  Node  

Page 34: NETTAB 2012

Implementation at Harvard

ISA

hcp://discovery.hsci.harvard.edu/    

Page 35: NETTAB 2012

35

Implementation at the EBI

hcp://www.ebi.ac.uk/metabolights    

Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research

Page 36: NETTAB 2012

The                          ecosystem  

Page 37: NETTAB 2012

@isatools  @biosharing  Isa-­‐tools.org          isacommons.org        biosharing.org