Infrastructure, Standards, and Policies for Research Data Management

23
Infrastructure, Standards, and Policies for Research Data Management Jian Qin School of Informa0on Studies, Syracuse U, USA COINFO 2013, Wuhan, China, 20131026

description

This presentation discusses the needs and importance of research data management and introduces the concept of research data management as an infrastructure service. Although many resources have been made available for research data management, most of them are developed as “islands” and lack linking mechanisms. The lack of integrated and interconnected resources has contributed to high cost and duplicated efforts in data management operations. The vision of research data management as an infrastructure service is not only to improve the efficiency of research data management but also the productivity of the research enterprise. Each of the three dimensions—infrastructure, standards, and policies—addresses a critical aspect of research data management to make the data infrastructure services work.

Transcript of Infrastructure, Standards, and Policies for Research Data Management

Page 1: Infrastructure, Standards, and Policies for Research Data Management

Infrastructure,  Standards,  and  Policies  for  Research  Data  Management    

Jian  Qin  School  of  Informa0on  Studies,  Syracuse  U,  USA    COINFO  2013,  Wuhan,  China,  2013-­‐10-­‐26  

Page 2: Infrastructure, Standards, and Policies for Research Data Management

About  this  presenta0on  

10/26/2013   COINFO2013,  Wuhan,  China   2  

1.  Concepts  about  data  infrastructure  

services  

2.  Problems  &  gaps  in  data  

management  services  

3.  Problems  and  gaps  

4.  Data  management  infrastructure  

service  dimensions  

Page 3: Infrastructure, Standards, and Policies for Research Data Management

Some  background  about  the  topic  

Infrastructure,  standards,  and  policy  

10/26/2013   COINFO2013,  Wuhan,  China   3  

Page 4: Infrastructure, Standards, and Policies for Research Data Management

Infrastructure  

The  underlying  founda0on  or  basic  framework  (as  of  a  system  or  organiza0on).  

 The  system  of  public  works  of  a  country,  

state,  or  region.      

The  resources  (as  personnel,  buildings,  or  equipment)  required  for  an  ac0vity.      

hVp://www.merriam-­‐webster.com/dic0onary/infrastructure    

10/26/2013   COINFO2013,  Wuhan,  China   4  

Page 5: Infrastructure, Standards, and Policies for Research Data Management

Data  infrastructure  

10/26/2013   COINFO2013,  Wuhan,  China   5  

“a  sustainable  data  infrastructure  that  will  be  discoverable,  searchable,  accessible,  and  usable  to  the  en0re  research  and  educa0on  community.”    “usable  by  mul0ple  scien0fic  disciplines…”    “…that  can  support  and  provide  data  solu0ons  to  a  broader  range  of  scien0fic  disciplines  while  reducing  duplica0ve  efforts.”    

hVp://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504776        

Page 6: Infrastructure, Standards, and Policies for Research Data Management

Standards  

10/26/2013   COINFO2013,  Wuhan,  China   6  

Scien=fic  data  formats  Metadata  standards  for  scien=fic  data  

Page 7: Infrastructure, Standards, and Policies for Research Data Management

Data  policies  

§ Access  and  use  § Management  

§ Storage  and  backup  § Metadata    

§ Sharing    § Preserva0on  § Intellectual  property  rights  § Security    

10/26/2013   COINFO2013,  Wuhan,  China   7  

Page 8: Infrastructure, Standards, and Policies for Research Data Management

Examples  of  data  infrastructure  services  

§  The  Ins0tute  for  Quan0ta0ve  Social  Science  repository:  hVp://www.iq.harvard.edu/  

§  Inter-­‐University  Consor0um  for  Poli0cal  and  Social  Research  (ICPSR):  hVp://www.icpsr.umich.edu/icpsrweb/landing.jsp    

§  The  Dryad  Digital  Repository:    hVp://datadryad.org/    § Data  Observa0on  Network  for  Earth:  hVp://www.dataone.org/    § Datalib:  hVp://databib.org/  (a  registry/directory/catalog  of  research  data  repositories)  

§ Registry  of  Research  Data  Repositories:  hVp://www.re3data.org/    

10/26/2013   COINFO2013,  Wuhan,  China   8  

Page 9: Infrastructure, Standards, and Policies for Research Data Management

Major  problems  

§ “Challenges  and  opportuni0es,”  Introduc0on  to  special  sec0on  Dealing  with  Data.  Science,  11  February  2011:  Vol.  331,  pp.  692-­‐693.    

§ 20%  of  respondents  regularly  use  or  analyze  data  sets  exceeding  100  GB  

§ 7%  use  data  sets  exceeding  1  TB  § About  50%  store  their  data  only  in  their  laboratories  § Lack  of  common  metadata  and  archives  for  using  and  storing  data  

§ No  funding  to  support  archiving  10/26/2013   COINFO2013,  Wuhan,  China   9  

Page 10: Infrastructure, Standards, and Policies for Research Data Management

Gaps  in  data  management  services    

10/26/2013   COINFO2013,  Wuhan,  China   10  

Community  data  repositories  

Ins0tu0onal    data  repositories  

Laptops,  personal  hard  drives,  etc.  

Data  lifecycle  

Raw  data  Ac0ve  data  

Verified,  Derived,  calculated,  …  data    

Verified,  archived  data  

Gaps:  lack  of  standards  and  tools  to  support  managing  ac=ve  data    

(data  staging  services)  

Gaps:  lack  of  =me,  lack  of  staff  support,  and  lack  of  tools  for  crea=ng  meaningful  metadata    (data  products  development  services)  

Page 11: Infrastructure, Standards, and Policies for Research Data Management

Why  the  gaps?  

10/26/2013   COINFO2013,  Wuhan,  China   11  

Raw  data,  ac0ve  data  

Calculated,  derived  …  data  

Verified,  archived  data  

Technical  factors  

Organiza0onal  factors  

Behavioral  factors  

Lack  of  tools  to  help  DM  at  different  stages  of  a  research  lifecycle  Data  repositories  do  not  always  provide  tools  for  pre-­‐submission  staging  

Researchers  have  no  0me  for  performing  DM  tasks      No  mo0va0on  to  invest  0me  in  DM  Concerns  for  losing  compe00ve  advantages  

Lack  of  repeatable,  reliable  prac0ces  to  ensure  effec0ve  DM  Lack  of  ins0tu0onal  policies  to  support  and  assess  DM  prac0ces  Lack  of  DM  training  programs    

Page 12: Infrastructure, Standards, and Policies for Research Data Management

10/26/2013   COINFO2013,  Wuhan,  China   12  

Page 13: Infrastructure, Standards, and Policies for Research Data Management

10/26/2013   COINFO2013,  Wuhan,  China   13  

Page 14: Infrastructure, Standards, and Policies for Research Data Management

Research  data  management  

10/26/2013   COINFO2013,  Wuhan,  China   14  

A  series  of  services  that  an  organiza0on  develops  and  implements  through  ins0tu0onalized  data  policies,  technological  infrastructures,  and  informa0on  standards.    

 

Image  credit:  DataONE  best  prac0ces  hVp://www.dataone.org/best-­‐prac0ces      

Page 15: Infrastructure, Standards, and Policies for Research Data Management

Principle  of    Infrastructure  as  a  Service  (IaaS)    

10/26/2013   COINFO2013,  Wuhan,  China   15  

“a  standardized,  highly  automated  offering,  where  compute  resources,  

complemented  by  storage  and  networking  capabili0es  are  owned  and  hosted  by  a  service  provider  and  offered  

to  customers  on-­‐demand.”    Gartner,  “IT  glossary”,  hVp://www.gartner.com/it-­‐glossary/infrastructure-­‐as-­‐a-­‐service-­‐iaas/.  

Page 16: Infrastructure, Standards, and Policies for Research Data Management

Nature  of  an  infrastructure  § Embeddedness.  Infrastructure  is  sunk  into,  inside  of,  other  structures,  social  arrangements,  and  technologies.  

§ Transparency.  Infrastructure  does  not  have  to  be  reinvented  each  0me  of  assembled  for  each  task,  but  invisibly  supports  those  tasks.  

§ Reach  or  scope  beyond  a  single  event  or  a  local  prac=ce.  § Learned  as  part  of  membership.    

§ Links  with  conven=ons  of  prac=ce.    § Embodiment  of  standards.    

§ Built  on  an  installed  base.  § Becomes  visible  upon  breakdown.  

§ Is  fixed  in  modular  increments,  not  all  at  once  or  globally.    10/26/2013   COINFO2013,  Wuhan,  China   16  

Star,  S.L.  &  Ruhleder,  K.  (1996).  Steps  toward  an  ecology  of  infrastructure:  Design  and  access  for  large  informa0on  space.  Informa0on  Systems  Research,  7(1):  111-­‐134.    

Page 17: Infrastructure, Standards, and Policies for Research Data Management

Three  dimensions  of  data  infrastructure  services  

10/26/2013   COINFO2013,  Wuhan,  China   17  

Infrastructure  

Networks,  systems,  databases,  sooware  tools,  data  services  

Page 18: Infrastructure, Standards, and Policies for Research Data Management

COINFO2013,  Wuhan,  China   18  

Infrastructure  

Networks,  systems,  databases,  sooware  tools,  data  services  

What  is  ins0tu0onaliza0on?  Why  do  you  need  ins0tu0onalize  research  data  management?  How  can  you  ins0tu0onalize  RDM?  

10/26/2013  

Page 19: Infrastructure, Standards, and Policies for Research Data Management

COINFO2013,  Wuhan,  China   19  

Infrastructure  

Networks,  systems,  databases,  sooware  tools,  data  services  

How  much  do  you  know  about  data  and  metadata?  How  does  the  nature  of  data  affect  metadata?  How  does  metadata  affect  data  access,  sharing,  reuse,  and  long-­‐term  preserva0on?  

10/26/2013  

Page 20: Infrastructure, Standards, and Policies for Research Data Management

COINFO2013,  Wuhan,  China   20  

Infrastructure  

Networks,  systems,  databases,  sooware  tools,  data  services  

What  is  data  infrastructure  and  Data  infrastructure  services?  Why  do  you  need  to  build  a  data  infrastructure?  What  is  the  key  in  building  a  data  infrastructure?  

10/26/2013  

Page 21: Infrastructure, Standards, and Policies for Research Data Management

Data  infrastructure  services  and  research  libraries  

10/26/2013   COINFO2013,  Wuhan,  China   21  

Research    librarianship  

Data  science   IT    

management  

Data  infrastructure  

services  

Data  librarianship  

Data  infrastructure  

Library  IT  

Need  more  R&D    

Page 22: Infrastructure, Standards, and Policies for Research Data Management

10/26/2013   COINFO2013,  Wuhan,  China   22  

Building  data  infrastructure  services  

hVp://www.arl.org/storage/documents/publica0ons/2012-­‐hrsym-­‐pres-­‐neal-­‐j.pdf    

•  To  change  in  composi0on  or  structure  (what  we  are/what  we  do)  

•  To  change  the  outward  form  or  appearance  (how  we  are  viewed/understood)    

•  To  change  in  character  or  condi0on  (how  we  do  it)  

22  

Page 23: Infrastructure, Standards, and Policies for Research Data Management

The  keyword  for  data  infrastructure  services  is:  

COINFO2013,  Wuhan,  China   23  10/26/2013  

In  summary…  

That  includes:    •  Ins0tu0onalizing  DM  •  Developing  and  implemen0ng  standards  for  DM  •  Developing  and  implemen0ng  data  infrastructure