Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the...

47
Maryann E. Martone, Ph. D. University of California, San Diego

description

Maryann Martone Earth Cube Summer Institute, San Diego Supercomputer Center August 12, 2013

Transcript of Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the...

Page 1: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Maryann  E.    Martone,  Ph.  D.  University  of  California,  San  Diego  

Page 2: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

“A  grand  challenge  in  neuroscience  is  to  elucidate  brain  func>on  in  rela>on  to  its  mul>ple  layers  of  organiza>on  that  operate  at  different  spa>al  and  temporal  scales.    Central  to  this  effort  is  tackling  “neural  choreography”  -­‐-­‐  the  integrated  func>oning  of  neurons  into  brain  circuits-­‐-­‐  Neural  choreography  cannot  be  understood  via  a  purely  reduc>onist  approach.  Rather,  it  entails  the  convergent  use  of  analy>cal  and  synthe>c  tools  to  gather,  analyze  and  mine  informa>on  from  each  level  of  analysis,  and  capture  the  emergence  of  new  layers  of  func>on  (or  dysfunc>on)  as  we  move  from  studying  genes  and  proteins,  to  cells,  circuits,  thought,  and  behavior....    

However,  the  neuroscience  community  is  not  yet  fully  engaged  in  exploi;ng  the  rich  array  of  data  currently  available,  nor  is  it  adequately  poised  to  capitalize  on  the  forthcoming  data  explosion.    “  

Akil  et  al.,  Science,  Feb  11,  2011  

Page 3: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  In  that  same  issue  of  Science  –  Asked  peer  reviewers  from  last  year  about  the  availability  and  use  of  

data  

•  About  half  of  those  polled  store  their  data  only  in  their  laboratories—not  an  ideal  long-­‐term  solu>on.    

•  Many  bemoaned  the  lack  of  common  metadata  and  archives  as  a  main  impediment  to  using  and  storing  data,  and  most  of  the  respondents  have  no  funding  to  support  archiving  

•  And  even  where  accessible,  much  data  in  many  fields  is  too  poorly  organized  to  enable  it  to  be  efficiently  used.  

“...it  is  a  growing  challenge  to  ensure  that  data  produced  during  the  course  of  reported  research  are  appropriately  described,  standardized,  archived,  and  available  to  all.”    Lead  Science  editorial,  2011  

Page 4: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Neuroscience  is  unlikely  to  be  served  by  a  few  large  databases  like  the  genomics  and  proteomics  community  Whole  brain  data  

(20  um  microscopic  MRI)  

Mosiac  LM  images  (1  GB+)  

Conven>onal  LM  images  

Individual  cell  morphologies  

EM  volumes  &  reconstruc>ons  

Solved  molecular  structures  

No  single  technology  serves  these  all  equally  well.  

 Mul6ple  data  types;    mul6ple  scales;    mul6ple  databases  

Page 5: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

hZp://neuinfo.org  

Page 6: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  Current  web  is  designed  to  share  documents  – Documents  are  unstructured  data  

•  Much  of  the  content  of  digital  resources  is  part  of  the  “hidden  web”  

•  Wikipedia:    The  Deep  Web  (also  called  Deepnet,  the  invisible  Web,  DarkNet,  Undernet  or  the  hidden  Web)  refers  to  World  Wide  Web  content  that  is  not  part  of  the  Surface  Web,  which  is  indexed  by  standard  search  engines.  

Page 7: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  NIF  has  developed  a  produc>on  technology  pla]orm  for  researchers  to:  –  Discover  –  Share  –  Analyze  –  Integrate    neuroscience-­‐relevant  

informa>on  •  Since  2008,  NIF  has  

assembled  the  largest  searchable  catalog  of  neuroscience  data  and  resources  on  the  web  

•  Cost-­‐effec>ve  and  innova>ve  strategy  for  managing  data  assets  

“This  unique  data  depository  serves  as  a  model  for  other  Web  sites  to  provide  research  data.  “  -­‐  Choice  Reviews  Online  

NIF  is  poised  to  capitalize  on  the  new  tools  and  emphasis  on  big  data  and  open  science  

Page 8: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

h?p://neuinfo.org  June10,  2013   dkCOIN  Inves>gator's  Retreat   8  

•  A  portal  for  finding  and  using  neuroscience  resources  

  A  consistent  framework  for  describing  resources  

  Provides  simultaneous  search  of  mul>ple  types  of  informa>on,  organized  by  category  

  Supported  by  an  expansive  ontology  for  neuroscience  

  U>lizes  advanced  technologies  to  search  the  “hidden  web”  

UCSD,  Yale,  Cal  Tech,  George  Mason,  Washington  Univ  

Literature  

Database  Federa>on  

Registry  

Page 9: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

• NIF  Registry:    A  catalog  of  neuroscience-­‐relevant  resources  

• >  6000  currently  listed  

• >  2200  databases  • And  we  are  finding  more  every  day  

“Of  relevance  to  neuroscience”  is  very  broad  

Page 10: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

dkCOIN  Inves>gator's  Retreat   10  

• NIF  curators  • Nomina>on  by  the  community  • Semi-­‐automated  text  mining  pipelines  

 NIF  Registry   Requires  no  special  skills   Site  map  available  for  local  hos>ng  

• NIF  Data  Federa>on  • DISCO  interop  • Requires  some  programming  skill  

Low  barrier  to  entry  

Page 11: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  Extended  over  >me  –  Parent  resource  –  Suppor>ng  agency  –  Grant  numbers  –  Accessibility  –  Related  to  –  Organism  

–  Disease  or  condi>on  –  Last  updated  

First  catalog:    SFN  Neuroscience  Database  Gateway    NIF  0.5    NIF  1.0+  

Simple  metadata  model  

Name,  descrip>on,  type,  URL,  other  names,  keywords,  unique  iden>fier  

                                                                           ~2003                                                                  2006                          2008  

Page 12: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

12  

•  NIF  Registry  is  hosted  on  Seman>c  Media  Wiki  pla]orm  Neurolex  –  Community  can  add,  

review,  edit  without  special  privileges  

–  Searchable  by  Google  –  Integrated  with  NIF  

ontologies  

–  Graph  structure  

Seman>c  wiki:    A  wiki  with  seman>cs;    pages  are  linked  through  rela>onships  

Page 13: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

NIF  is  crea>ng  the  linked  data  graph  of  resources  

Page 14: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

–  NIF  employs  an  automated  link  checker    

–  Last  analysis:    478/6100  invalid  URL’s  (~8%)  –  199  can’t  locate  at  another  university  or  loca>on    out  of  service  (~3%)  

–  Bigger  issue:    Many  resources  are  no  longer  updated  or  maintained  

0  

20  

40  

60  

80  

100  

120  

140  

160  

180  

200  

1996   1998   2000   2002   2004   2006   2008   2010   2012   2014  

0  

500  

1000  

1500  

2000  

2500  

3000  

3500  

Resources  added  Last  upd

ated

 

Page 15: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Keeping  content  up  to  date  

Connectome  

Tractography  

Epigene>cs  

• New  tags  come  into  existence  • New  resource  types  come  into  existence,  e.g.,  Mobile  apps  • Resources  add  new  types  of  content    

• Change  name  • Change  scope  

• >  7000  updates  to  the  registry  last  year  

It’s  a  challenge  to  keep  the  registry  up  to  date;    sitemaps,  cura>on,  ontologies,  community  review  

Page 16: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

• The  NIF  Registry  has  created  a  linked  data  graph  of  web-­‐accessible  resources  • Maintained  on  a  community  wiki  pla]orm  • Provides  data  on  the  fluidity  of  the  resource  landscape  –  New  resources  con>nue  to  be  created  and  found  

–  Rela>vely  few  disappear  altogether  – Many  more  grow  stale,  although  their  value  may  s>ll  be  significant  

– Maintaining  up  to  date  cura>on  requires  frequent  upda>ng  

NIF  Registry  provides  insight  into  the  state  of  digital  resources  on  the  web  

Page 17: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

• The  NIF  data  federa>on  performs  deep  search  over  the  content  of  over  200  databases  • New  databases  are  added  at  a  rate  of  25-­‐40  per  year  • Latest  update:    Open  Source  Brain;    ingest  completed  in  2  hours  

• Databases  chosen  on  a  variety  of  criteria:  • Early:    tes>ng  different  types  of  resources  • Thema>c  areas  • Volunteers  

NIF  provides  access  to  the  largest  aggrega>on  of  neuroscience-­‐relevant  informa>on  on  the  web  

Page 18: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  NIF  was  one  of  the  first  projects  to  aZempt  data  integra>on  in  the  neurosciences  on  a  large  scale  

•  NIF  is  supported  by  a  contract  that  specified  the  number  of  resources  to  be  added  per  year    –  Designed  to  be  populated  rapidly;    set  up  process  for  progressive  refinement  

–  No  budget  was  allocated  to  retrofit  exis>ng  resources;    had  to  work  with  them  in  their  current  state  

– We  designed  a  system  that  required  liZle  to  no  coopera>on  or  work  from  providers  

–  Supports  many  formats:    rela>onal,  XML,  RDF  

Page 19: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Current  Planned  

DISCO  Dashboard  Func6ons  •  Ingest  Script  Manager  •  Public  Script  Repository  •  Data  &  Event  Tracker  •  Versioning  System  •  Curator  Tool    •  Data  Transformer  Manager  

June10,  2013   dkCOIN  Inves>gator's  Retreat   19  Luis  Marenco,  Rixin  Wang,  Perrry  Miller,  Gordon  Shepherd  Yale  University  

Page 20: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

0  

50  

100  

150  

200  

250  

0.01  

0.1  

1  

10  

100  

1000  

6-­‐12   12-­‐12   7-­‐13   1-­‐14   8-­‐14   2-­‐15   9-­‐15   4-­‐16   10-­‐16   5-­‐17  

Num

ber  of  Fed

erated

 Datab

ases  

Num

ber  of  Fed

erated

 Records  (M

illions)  

NIF  searches  the  largest  colla>on  of  neuroscience-­‐relevant  data  on  the  web  

DISCO  

June10,  2013   dkCOIN  Inves>gator's  Retreat   20  

Page 21: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Results  categorized  by  data  type  and  level  of  nervous  system    

Page 22: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Hippocampus  OR  “Cornu  Ammonis”  OR  “Ammon’s  horn”   Query  expansion:    Synonyms  

and  related  concepts  Boolean  queries  

Data  sources  categorized  by  “data  type”  and  level  of  nervous  

system  

Common  views  across  mul>ple  

sources  

Tutorials  for  using  full  resource  when  gewng  there  from  

NIF  

Link  back  to  record  in  

original  source  

Page 23: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Connects  to  

Synapsed  with  

Synapsed  by  

Input  region  

innervates  

Axon  innervates  Projects  to  Cellular  contact  

Subcellular  contact  

Source  site  

Target    site  

Each  resource  implements  a  different,  though  related  model;    systems  are  complex  and  difficult  to  learn,  in  many  cases  

Page 24: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

• NIF  Connec>vity:    7  databases  containing  connec>vity  primary  data  or  claims  from  literature  on  connec>vity  between  brain  regions  

• Brain  Architecture  Management  System  (rodent)  • Temporal  lobe.com  (rodent)  • Connectome  Wiki  (human)  • Brain  Maps  (various)  • CoCoMac  (primate  cortex)  • UCLA  Mul>modal  database  (Human  fMRI)  • Avian  Brain  Connec>vity  Database  (Bird)  

• Total:    1800  unique  brain  terms  (excluding  Avian)  

• Number  of  exact  terms  used  in  >  1  database:    42  • Number  of  synonym  matches:    99  • Number  of  1st  order  partonomy  matches:    385  

Page 25: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

– You  (and  the  machine)  have  to  be  able  to  find  it  •  Accessible  through  the  web  •  Annota>ons  

– You  have  to  be  able  to  access  and  use  it  •  Data  type  specified  and  in  a  usable  form  

– You  have  to  know  what  the  data  mean  •  Some  seman>cs:    “1”  •  Context:    Experimental  metadata  •  Provenance:    Where  did  the  data  come  from?  

Repor>ng  neuroscience  data  within  a  consistent  framework  helps  enormously  

Page 26: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Knowledge  in  space  and  spa>al  rela>onships  (the  “where”)  

Knowledge  in  words,  terminologies  and  logical  rela>onships  (the  “what”)  

Page 27: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  NIF  covers  mul>ple  structural  scales  and  domains  of  relevance  to  neuroscience  •  Aggregate  of  community  ontologies  with  some  extensions  for  neuroscience,  e.g.,  Gene  

Ontology,  Chebi,  Protein  Ontology  

NIFSTD  

Organism  

NS  Func>on  Molecule   Inves>ga>on  Subcellular  structure  

Macromolecule   Gene  

Molecule  Descriptors  

Techniques  

Reagent   Protocols  

Cell  

Resource   Instrument  

Dysfunc>on   Quality  Anatomical  Structure  

NIF  capitalizes  on  the  growing  set  of  community  ontologies  available  in  biomedical  science  

Page 28: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Purkinje  Cell  

Axon  Terminal  

Axon  Dendri>c  Tree  

Dendri>c  Spine  

Dendrite  

Cell  body  

Cerebellar  cortex  

There  is  liZle  obvious  connec>on  between  data  sets  taken  at  different  scales  using  different  microscopies  without  an  explicit  representa>on  of  the  biological  objects  that  the  data  represent  

Page 29: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Brain  

Cerebellum  

Purkinje  Cell  Layer  

Purkinje  cell  

neuron  

has  a  

has  a  

has  a  

is  a  

•  Ontology:  an  explicit,  formal  representa>on  of  concepts    rela>onships  among  them  within  a  par>cular  domain  that  expresses  human  knowledge  in  a  machine  readable  form  

–  Branch  of  philosophy:    a  theory  of  what  is  

–  e.g.,  Gene  ontologies  

•  Provide  universals  for  naviga>ng  across  different  data  sources  –  Seman>c  “index”  

•  Provide  the  basis  for  concept-­‐based  queries  to  probe  and  mine  data  –  Perform  reasoning  

–  Link  data  through  rela>onships  not  just  one-­‐to-­‐one  mappings  

Page 30: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

“Search  compu6ng”  What  genes  are  upregulated  by  drugs  of  abuse  

in  the  adult  mouse?  Morphine  

Increased  expression  

Adult  Mouse  

Some  concepts,  e.g.,  age  category,  are  quan>ta>ve  but  s>ll  must  be  interpreted  in  a  global  query  system  

Page 31: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences
Page 32: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

June10,  2013   dkCOIN  Inves>gator's  Retreat   32  

Page 33: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

hZp://neurolex.org   Stephen  Larson  

• Provide  a  simple  interface  for  defining  the  concepts  required  

• Light  weight  seman>cs  • Good  teaching  tool  for  learning  about  seman>c  integra>on  and  the  benefits  of  a  consistent  seman>c  framework  

• Community  based:  • Anyone  can  contribute  their  terms,  concepts,  things  • Anyone  can  edit  • Anyone  can  link  

• Accessible:    searched  by  Google  

• Growing  into  a  significant  knowledge  base  for  neuroscience   Demo    D03  

 200,000  edits   150  contributors  

Page 34: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  NIF  can  be  used  to  survey  the  data  landscape  

•  Analysis  of  NIF  shows  mul>ple  databases  with  similar  scope  and  content  

•  Many  contain  par>ally  overlapping  data  

•  Data  “flows”  from  one  resource  to  the  next  –  Data  is  reinterpreted,  reanalyzed  or  

added  to  

•  Is  duplica>on  good  or  bad?  

Page 35: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Databases  come  in  many  shapes  and  sizes  

•  Primary  data:  –  Data  available  for  reanalysis,  e.g.,  

microarray  data  sets  from  GEO;    brain  images  from  XNAT;    microscopic  images  (CCDB/CIL)  

•  Secondary  data  –  Data  features  extracted  through  

data  processing  and  some>mes  normaliza>on,  e.g,  brain  structure  volumes  (IBVD),  gene  expression  levels  (Allen  Brain  Atlas);    brain  connec>vity  statements  (BAMS)  

•  Ter>ary  data  –  Claims  and  asser>ons  about  the  

meaning  of  data  •  E.g.,  gene  upregula>on/

downregula>on,  brain  ac>va>on  as  a  func>on  of  task  

•  Registries:  –  Metadata  –  Pointers  to  data  sets  or  

materials  stored  elsewhere  •  Data  aggregators  

–  Aggregate  data  of  the  same  type  from  mul>ple  sources,  e.g.,  Cell  Image  Library  ,SUMSdb,  Brede  

•  Single  source  –  Data  acquired  within  a  single  

context  ,  e.g.,  Allen  Brain  Atlas  

Researchers  are  producing  a  variety  of  informa>on  ar>facts  using  a  mul>tude  of  technologies  

Page 36: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

NIF  Analy6cs:    The  Neuroscience  Landscape  

NIF  is  in  a  unique  posi>on  to  answer  ques>ons  about  the  neuroscience  landscape  

Where  are  the  data?  

Striatum  Hypothalamus  Olfactory  bulb  

Cerebral  cortex  

Brain  

Brain  region

 

Data  source  

Vadim  Astakhov,  Kepler  Workflow  Engine  

Page 37: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Diseases  of  nervous  system  

Adding  more  seman6cs  

The  combina>on  of  ontologies,  diverse  data  and  analy>cs  lets  us  look  at  the  current  landscape  in  interes>ng  ways      

Neurodegenera>ve  

Seizure  disorders  

Neoplas>c  disease  of  nervous  system

 

NIH  Reporter  N

IF  data  fede

rated  sources  

Page 38: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  Gemma:    Gene  ID    +  Gene  Symbol  •  DRG:    Gene  name  +  Probe  ID  

•  Gemma  presented  results  rela>ve  to  baseline  chronic  morphine;    DRG  with  respect  to  saline,  so  direc>on  of  change  is  opposite  in  the  2  databases  •           Analysis:  

• 1370  statements  from  Gemma  regarding  gene  expression  as  a  func>on  of  chronic  morphine  • 617  were  consistent  with  DRG;      over  half    of  the  claims  of  the  paper  were  not  confirmed  in  this  analysis  • Results  for  1  gene  were  opposite  in  DRG  and  Gemma  • 45  did  not  have  enough  informa>on  provided  in  the  paper  to  make  a  judgment  

Rela>vely  simple  standards  would  make  life  easier  

Page 39: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

NIF  favors  a  hybrid,  >ered,  federated  system  

•  Domain  knowledge  –  Ontologies  

•  Claims,  models  and  observa>ons  –  Virtuoso  RDF  triples    –  Model  repositories  

•  Data  –  Data  federa>on  –  Spa>al  data  –  Workflows  

•  Narra>ve  –  Full  text  access  

Neuron   Brain  part   Disease  Organism   Gene  

Caudate  projects  to  Snpc   Grm1  is  upregulated  in  

chronic  cocaine  Betz  cells  

degenerate  in  ALS  

NIF  provides  the  tentacles  that  connect  the  pieces:    a  new  type  of  en>ty  for  21st  century  science  

Technique  People  

Page 40: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  2006-­‐2008:    A  survey  of  what  was  out  there  •  2008-­‐2009:    Strategy  for  resource  discovery  

–  NIF  Registry  vs  NIF  data  federa>on  

–  Inges>on  of  data  contained  within  different  technology  pla]orms,  e.g.,  XML  vs  rela>onal  vs  RDF  

–  Effec>ve  search  across  seman>cally  diverse  sources  •  NIFSTD  ontologies  

•  2009-­‐2011:    Strategy  for  data  integra>on  –  Unified  views  across  common  sources  

–  Mapping  of  content  to  NIF  vocabularies  

•  2011-­‐present:    Data  analy>cs  –  Uniform  external  data  references  

•  2012-­‐present:      SciCrunch:    unified  biomedical  resource  services  

NIF  provides  a  strategy  and  set  of  tools  applicable  to  all  domains  grappling  with  mul>ple  sources  of  diverse  data  (i.e.,  preZy  much  everything)  

Page 41: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  Search  seman>cs  

•  Ranking  •  Resources  supported  by  NIH  Blueprint  Ins>tutes  are  more  thoroughly  covered  

•  Data  types,  e.g.,  Brain  ac>va>on  foci  

June10,  2013   dkCOIN  Inves>gator's  Retreat   41  

Page 42: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

June10,  2013   42  

SciCrunch  

NIF   MONARCH  

Community  Services   dkCOIN  

Shared  Resources  

Undiagnosed  Disease  Program  

Phenotype  RCN  

3D  Virtual  Cell  

Na>onal  Ins>tute  on  Aging  

One  Mind  for  Research  

BIRN  

Interna>onal  Neuroinforma>cs  Coordina>ng  

Facility  

Model  Organism  Databases  

Community  Outreach  

DELSA  

(not  just  a  data  catalog)  

Page 43: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

43  

• 3dVC:    Focus  on  models  and  simula>on  

• Gene  Ontology:    Focus  on  bioinforma>cs  tools  

• Na>onal  Ins>tute  on  aging:  Aging-­‐related  data  sets  

• Monarch:    Phenotype-­‐Genotype;    deep  seman>c  data  integra>on  

• One  Mind  for  Research:    Biospecimen  repositories  

• NeuroGateway:    Computa>onal  resources  

• FORCE11:    Tools  for  next-­‐gen  publishing  and  e-­‐scholarship  

SciCrunch  

SciCrunch  is  ac>vely  suppor>ng  mul>ple  communi>es;  mul>ple  communi>es  are  enriching    and  improving  SciCrunch        

Page 44: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Community  database:  beginning  

Community  database:    

End  

“How  do  I  share  my  data/tool?”  

“There  is  no  database  for  my  data”  

1  

2  

3  

4  

Ins3tu3onal  repositories  

Cloud  

INCF:    Global  infrastructure  

Government  

Educa>on  

Industry  

NIF  is  designed  to  leverage  exis>ng  investments  in  resources  and  infrastructure  

Tool  repositories  

Page 45: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  No  one  can  be  stopped  from  doing  what  they  need  to  do    

•  Every  resource  is  resource  limited:    few  have  enough  >me,  money,  staff  or    exper>se  required  to  do  everything  they  would  like  –  If  the  market  can  support  11  MRI  databases,  fine  

–  Some  consolida>on,  coordina>on  is  warranted  though  

•  Big,  broad  and  messy  beats  small,  narrow  and  neat  –  Without  trying  to  integrate  a  lot  of  data,  we  will  not  know  what  needs  to  be  done  –  A  lot  can  be  done  with  messy  data;    neatness  helps  though  –  Progressive  refinement;    addi>on  of  complexity  through  layers  

•  Be  flexible  and  opportunis>c  –  A  single    op>mal  technology/container  for  all  types  of  scien>fic  data  and  informa>on  does  not  exist;    

technology  is  changing  

•  Think  globally;    act  locally:  –  No  source,  not  even  NIF,  is  THE  source;    we  are  all  a  source  

Page 46: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

•  Several  powerful  trends  should  change  the  way  we  think  about  our  data:    One    Many  – Many  data  

•  Genera>on  of  data  is  gewng  easier    shared  data  •  Data  space  is  gewng  richer:    more  –omes  everyday  •  But...compared  to  the  biological  space,  s>ll  sparse  

–  Many  eyes  •  Wisdom  of  crowds  •  More  than  one  way  to  interpret  data  

–  Many  algorithms  •  Not  a  single  way  to  analyze  data  

–  Many  analy>cs  •  “Signatures”  in  data  may  not  be  directly  related  to  the  ques>on  for  which  they  were  acquired  but  tell  us  something  really  interes>ng  

Are  you  exposing  or  burying  your  work?  

Page 47: Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences

Jeff  Grethe,  UCSD,  Co  Inves>gator,  Interim  PI  

Amarnath  Gupta,  UCSD,  Co  Inves>gator  

Anita  Bandrowski,  NIF  Project  Leader  

Gordon  Shepherd,  Yale  University  

Perry  Miller  

Luis  Marenco  

Rixin  Wang  

David  Van  Essen,  Washington  University  

Erin  Reid  

Paul  Sternberg,  Cal  Tech  

Arun  Rangarajan  

Hans  Michael  Muller  

Yuling  Li  

Giorgio  Ascoli,  George  Mason  University  

Sridevi  Polavarum  

Fahim  Imam  

Larry  Lui  

Andrea  Arnaud  Stagg  

Jonathan  Cachat  

Jennifer  Lawrence  

Svetlana  Sulima  

Davis  Banks  

Vadim  Astakhov  

Xufei  Qian  

Chris  Condit  

Mark  Ellisman  

Stephen  Larson  

Willie  Wong  

Tim  Clark,  Harvard  University  

Paolo  Ciccarese  

Karen  Skinner,  NIH,  Program  Officer  (re>red)  

Jonathan  Pollock,  NIH,  Program  Officer  

And  my  colleagues  in  Monarch,  dkNet,  3DVC,  Force  11