Towards Incidental Collaboratories; Research Data Services

15
Research Data Services: Towards a Framework for Incidental Collaboratories Anita de Waard VP Research Data Collabora@ons, Elsevier RDS Jericho, VT, USA

description

Presentation given at the CNI Fall 2012 meeting on why we need research data services.

Transcript of Towards Incidental Collaboratories; Research Data Services

Page 1: Towards Incidental Collaboratories; Research Data Services

Research  Data  Services:    Towards  a  Framework  for  Incidental  Collaboratories  

Anita  de  Waard  VP  Research  Data  Collabora@ons,  Elsevier  RDS  

Jericho,  VT,  USA  

Page 2: Towards Incidental Collaboratories; Research Data Services

Brief  bio:  •  Background:    –  Low-­‐temperature  physics  (Leiden  &  Moscow)  –  Joined  Elsevier  in  1988  as  publisher  in  solid  state  physics  –  1991:  ArXiV  =>  publishers  will  go  out  of  business  very  soon!  

•  1997-­‐  now:  Disrup@ve  Technologies  Director,  focus  on  beXer  representa@on  of  scien@fic  knowledge:  –  Iden@fying  key  knowledge  elements  in  ar@cles  (linguis@cs  thesis)  –  Building  claim-­‐evidence  networks  (through  collabora@ons)  –  Help  build  communi@es  to  accelerate  rate  of  change  (Force11)  

•  Star@ng  1/1/2013:  VP  Research  Data  Collabora@ons  -­‐  why?    –  Douglas  Engelbart’s  thinking:  connect  minds!  – My  (non-­‐biologists)  understanding  of  biology:  

Page 3: Towards Incidental Collaboratories; Research Data Services

The  big  problem  in  biology:  

hXp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg  

Interspecies  variability:  A  specimen  is  not  a  species  Gene  expression  variability:  Knowing  genes  is  not    knowing  how  they  are  expressed  Microbiome:  An  animal  is  an  ecosystem  Systems  biology:  A  whole  is  more  than  the  sum  of  its  parts      Reduc@onist  science  doesn’t  work  for  living  systems!  

Page 4: Towards Incidental Collaboratories; Research Data Services

Sta@s@cs  to  the  rescue!    With  enough  observa@ons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  popula@on  of  242  healthy  adults  

sampled  at  15  or  18  body  sites  up  to  three  @mes,  which  have  generated  5,177  microbial  taxonomic  profiles  from  16S  ribosomal  RNA  genes  and  over  3.5  terabases  of  metagenomic  sequence  so  far.”    

The  Human  Microbiome  Project  Consor@um,  Structure,  func@on  and  diversity  of  the  healthy  human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/nature11234  

•  “The  large  sample  size  —  4,298  North  Americans  of  European  descent  and  2,217  African  Americans  —  has  enabled  the  researchers  to  mine  down  into  the  human  genome.”    

Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resolu@on  sequencing  study  emphasizes  importance  of  rare  variants  in  disease.  

•  “A  profile  unique  for  a  DNA  sample  source  is  obtained    …  a  series  of  numbers  are  generated  which  can  be  used  as  a  bar  code  for  that  DNA  source.  A  registry  of  bar  codes  would  make  it  easy  to  compare  DNA  samples”    

Roland  M.  Nardone,  Ph.D.,  Eradica@on  of  Cross-­‐Contaminated  Cell  Lines:  A  Call  for  Ac@on,  hXp://www.sivb.org/[email protected]  

 

Page 5: Towards Incidental Collaboratories; Research Data Services

•  Collect:  store  data  at  the  level  of  the  experiment:  – Accessible  through  a  single  interface  – Add  enough  metadata  to  know  what  was  done/seen  

•  Connect:  allow  analyses  over:    –  Similar  experiment  types    –  Experiments  done  with/on  similar  biological  ‘things’:  

•  Species,  strains,  systems,  cells  •  Anatomical  components  (e.g.  spleen,  hypothalamus)  •  An@bodies,  biomarkers,  bioac@ve  chemicals,  etc  

•  Keep:  –  Long-­‐term  preserva@on  of  data  and  sosware  (Olive)  –  Fulfill  Data  Management  Plan  requirements  – Allow  gated  access,  if  needed    

 

Enable  ‘incidental  collaboratories’:  

Page 6: Towards Incidental Collaboratories; Research Data Services

Problem:  biological  research  is  quite  insular    •  Biology  is  small:  because  objects/

equipment  are  10^-­‐5  –  10^2  m,  you  can  work  alone  (‘King’  and  ‘subjects’).    

•  Biology  is  messy:  it  doesn’t  happen  behind  a  terminal.    

•  Biology  is  compe@@ve:  different  people  with  similar  skill  sets,  vying  for  the  same  grants.    

•  In  summary:  it  does  not  promote  inherent  collabora@on  (vs.,  for  instance,  big  physics  or  astronomy).  

Prepare  

Observe  

Analyze  

Ponder  

Communicate  

Page 7: Towards Incidental Collaboratories; Research Data Services

Try  to  pop  the  ‘lab  bubble’!  

Prepare  

Analyze   Communicate   Think  

Prepare  

Analyze   Communicate  

Prepare  

Analyze   Communicate  

Observa@ons  

Observa@ons  

Observa@ons  

Labs  go  from  being  informa@on  islands,    to  being  ‘sensors  in  a  network’.  

Page 8: Towards Incidental Collaboratories; Research Data Services

Some  objec@ons,  and  rebuXals:  Objec&on:   Rebu-al:  

“But  our  lab  notebooks  are  all  on  paper”  

Develop  smart  phone/tablet  apps  for  data  input  

“I  need  to  see  a  direct  benefit  from  something  I  spend  my  @me  on”    

Develop  ‘data  manipula@on  dashboard’  for  PI  to  allow  beXer  access  to  full  experimental  output  for  his/her  lab  

“I  am  afraid  other  people  might  scoop  my  discoveries”    

Develop  intra-­‐lab  data  communica@on  systems  first  and  allow  @med/granular  data  export  

“I  want  things  to  be  peer  reviewed  before  I  expose  them”    

Allow  reviewers  access  to  experimental  database  before  publica@on  (of  data  or  paper)  

“I  don’t  really  trust  anyone  else’s  data  –  well,  except  for  the  guys  I  went  to  Grad  School  with…”    

Add  a  social  networking  component  to  this  data  repository  so  you  know  who  (to  the  individual)  created  that  data  point.    

Page 9: Towards Incidental Collaboratories; Research Data Services

Elsevier  Research  Data  Services:  Goals  

1.  Help  increase  the  amount  of  data  shared  from  the  lab,  enabling  incidental  collaboratories  

2.  Help  increase  the  value  of  the  data  shared  by  increasing  annota@on,  normaliza@on,  provenance  enabling  enhanced  interoperability  

3.  Help  measure  and  deliver  credit  for  shared  data,  the  researchers,  the  ins@tute,  and  the  funding  body,  enabling  more  sustainable  pla;orms  

Page 10: Towards Incidental Collaboratories; Research Data Services

RDS  Guiding  Principles:  •  In  principle,  all  open  data  stays  open  and  URLs,  front  end  etc.  stay  where  they  are  (i.e.  with  repository)  

•  Collabora@on  is  tailored  to  data  repositories’    unique  needs/interests  and  of  a  ‘service-­‐model’  type:    – Aspects  where  collabora@on  is  needed  are  discussed  – A  collabora@on  plan  is  drawn  up  using  a  Service-­‐Level  Agreement:  agree  on  @me,  condi@ons,  etc.    

– All  communica@on,  finance,  IPR  etc.  is  completely  transparent  at  all  @mes.    

•  Very  small  (2/3  people)  department;  immediate  communica@on;  instant  deployment  of  ideas  

 

Page 11: Towards Incidental Collaboratories; Research Data Services

RDS  Approach:  

•  Collaborate  and  build  on  rela@onships  with  data  repositories  (life  science,  earth  science,  others)  

•  Integrate  with  other  content  sources,  if  possible  •  Build  annota@on  and  standardisa@on  tools  and  processes  to  implement  this  

•  Develop  next-­‐genera@on  infrastructure  solu@ons  for  back-­‐end  integra@on  

•  Explore  crea@ve  revenue  opportuni@es  

Page 12: Towards Incidental Collaboratories; Research Data Services

NIF  An@body  Registry:  Problem:    •  95  an@bodies  were  iden@fied  in  8  papers  •  52  did  not  contain  enough  informa@on    

to  determine  the  an@body  used  •  Some  provided  details  in  another  paper  •  Failed  to  give  species,  vendor,  catalog  #  Solu@on  #  1:    •  Journals  ask  authors  to  provide    

an@body  catalog  nr    •  Link  to  NIF  Registry  from  manufacturers/

vendors’  sites  

Solu@on  #2:    •  Pilot  with  a  lab:    

 

Page 13: Towards Incidental Collaboratories; Research Data Services

Let’s  start  with  the  Urban  Lab    

•  Geyng  an@bodies    •  And  messy  bits      •  From  the  notebook    •  Into  Nathan  Urban’s  command  center    

•  By  providing  – 7”  Tablets  – Links  to  IgorPro  – A  dashboard  UI  

Page 14: Towards Incidental Collaboratories; Research Data Services

My  ques@ons  to  you:  •  Thoughts  on  this  approach:    –  In  principle?    –  In  prac@ce?  

•  Do  you  see  serious  hurdles:    – Are  we  overlapping  with  other  ini@a@ves;  if  so,  are  we  complementary?  

– How  does  this  connect  to  libraries/local  repositories?    – Are  there  sensi@vi@es/pain  points  we  are  overlooking?    

•  Where  to  start:    – How  to  collaborate?    – Who  to  talk  to  –  funding  agencies,  socie@es:  who  else?    –  Thoughts  on  data  repositories/plazorms  to  connect  to?    

Page 15: Towards Incidental Collaboratories; Research Data Services

Your  ques@ons  to  me?  

[email protected]  hXp://elsatglabs.com/labs/anita/    

hXp://www.slideshare.net/anitawaard    

Thanks  go  to:  •  Anita  Bandrowski  and  Maryann  Martone,  NIF  •  Nathan  Urban,  Shreejoy  Tripathy,  CMU  •  David  Marques,  SVP  RDS