Linking data to publications: Towards the execution of papers

52
Linking data to publica0ons: Towards the execu0on of papers Anita de Waard Elsevier Labs/UUtrecht h5p://elsatglabs.com/labs/anita

description

Talk for day 2 of the workshop on Developing Data Attribution and Citation Practices and Standards, Berkeley, CA August 220

Transcript of Linking data to publications: Towards the execution of papers

Page 1: Linking data to publications: Towards the execution of papers

Linking  data  to  publica0ons:Towards  the  execu0on  of  papers

Anita  de  Waard  Elsevier  Labs/UUtrecht

h5p://elsatglabs.com/labs/anita  

Page 2: Linking data to publications: Towards the execution of papers

Cycle  of  Scien,fic  Inves,ga,on

2CoSI  model  by  Gully  Burns,  ISI/USC

Domain-speci!cReasoning Model

Observations Interpretations

Experimental Design Model

formulate hypotheses

make predictions

design experiments

gather data

make observational assertions make interpretational assertions

aggregate assertions

perform experiments

Page 3: Linking data to publications: Towards the execution of papers

Cycle  of  Scien,fic  Inves,ga,on

2CoSI  model  by  Gully  Burns,  ISI/USC

Domain-speci!cReasoning Model

Observations Interpretations

Experimental Design Model

formulate hypotheses

make predictions

design experiments

gather data

make observational assertions make interpretational assertions

aggregate assertions

perform experiments

Conclusions

Background

HypothesesExperimental  Design

Experimental  Objects

Observed  Results

Processed  Data/Sta0s0cs

Page 4: Linking data to publications: Towards the execution of papers

Cycle  of  Scien,fic  Inves,ga,on

2CoSI  model  by  Gully  Burns,  ISI/USC

Domain-speci!cReasoning Model

Observations Interpretations

Experimental Design Model

formulate hypotheses

make predictions

design experiments

gather data

make observational assertions make interpretational assertions

aggregate assertions

perform experiments

Publica0on

Results

Methods

Figures Conclusions

Background

HypothesesExperimental  Design

Experimental  Objects

Observed  Results

Processed  Data/Sta0s0cs

Page 5: Linking data to publications: Towards the execution of papers

Cycle  of  Scien,fic  Inves,ga,on

2CoSI  model  by  Gully  Burns,  ISI/USC

Domain-speci!cReasoning Model

Observations Interpretations

Experimental Design Model

formulate hypotheses

make predictions

design experiments

gather data

make observational assertions make interpretational assertions

aggregate assertions

perform experiments

Publica0on

Results

Methods

Figures

Conclusions

Background

Hypotheses

Experimental  Design

Experimental  Objects

Observed  Results

Processed  Data/Sta0s0cs

Page 6: Linking data to publications: Towards the execution of papers

3

1.  Current  prac?ce:  store  data  in  repository,  link  from  document,  and  vice  versa

Observed  Results

Data  Repository

Processed  Data/Sta0s0cs

Sta,s,cs  storage  system

Experimental  Design

Workflow  Repository

Publica0on

Figures

Conclusions

Background

Hypotheses

Results

Methods

Page 7: Linking data to publications: Towards the execution of papers

Current  Prac,ce:  linking  to  documents Least  favorite:  raw  research  data  delivered  as  supplementary  data

Much  beGer:  linking  into/from  data  centres,  e.g.  Pangea:  

3

Page 8: Linking data to publications: Towards the execution of papers

Current  Prac,ce:  linking  to  documents Least  favorite:  raw  research  data  delivered  as  supplementary  data

Much  beGer:  linking  into/from  data  centres,  e.g.  Pangea:  

3

Page 9: Linking data to publications: Towards the execution of papers

Current  Prac,ce:  linking  to  documents Least  favorite:  raw  research  data  delivered  as  supplementary  data

Much  beGer:  linking  into/from  data  centres,  e.g.  Pangea:  

3

Page 10: Linking data to publications: Towards the execution of papers

Linking  data  and  papers:  ‘the  publisher’s’  posi,on: STM’s  “Brussels  Declara,on”,  June  2006:

“...  believe  that,  as  a  general  principle,  data  sets,  raw  data  outputs  of  research,  and  sets  or  subsets  of  that  data  should  wherever  possible  be  made  freely  accessible  ...”

• Publishers  are  (in  general)  not  interested  in  owning  or  charging  for  research  data  repositories    

• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to  work  with  data  repositories  to  do  this  effec,vely

• Publishers  believe  in  (and  know)  the  concept  of  Digital  Object  Iden,fiers:  

–Where  possible:  one  repository  for  iden,fiers

–Persistent  and  unique  (don’t  keep  same  ID  if  content  changes)

–Where  possible,  link  back  to  the  publica,on

Page 11: Linking data to publications: Towards the execution of papers

Linking  data  and  papers:  ‘the  publisher’s’  posi,on: STM’s  “Brussels  Declara,on”,  June  2006:

“...  believe  that,  as  a  general  principle,  data  sets,  raw  data  outputs  of  research,  and  sets  or  subsets  of  that  data  should  wherever  possible  be  made  freely  accessible  ...”

• Publishers  are  (in  general)  not  interested  in  owning  or  charging  for  research  data  repositories    

• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to  work  with  data  repositories  to  do  this  effec,vely

• Publishers  believe  in  (and  know)  the  concept  of  Digital  Object  Iden,fiers:  

–Where  possible:  one  repository  for  iden,fiers

–Persistent  and  unique  (don’t  keep  same  ID  if  content  changes)

–Where  possible,  link  back  to  the  publica,on

 Complete  agreement  with    MacKenzie  Smith’s  “Requirements  for  Data  Cita,on!”

Page 12: Linking data to publications: Towards the execution of papers

6

2. Store  data  in  repository,  link  within  document.

Workflow  Repository

Experimental  Design

Data  Repository

Observed  Results

So]ware  Repository

Code/Sta0s0cs

Publica0on

Figures

Conclusions

Background

Hypotheses

Results

Methods

Page 13: Linking data to publications: Towards the execution of papers

Enabler  at  Elsevier  -­‐  Linked  Data:  access  any  level  of  granularity  of  content

7

Page 14: Linking data to publications: Towards the execution of papers

Enabler  at  Elsevier  -­‐  Linked  Data:  access  any  level  of  granularity  of  content

7

Page 15: Linking data to publications: Towards the execution of papers

Enabler  at  Elsevier  -­‐  Linked  Data:  access  any  level  of  granularity  of  content

7

Dublin Core and SKOS

Page 16: Linking data to publications: Towards the execution of papers

Enabler  at  Elsevier  -­‐  Linked  Data:  access  any  level  of  granularity  of  content

7

SWAN’s PAV (Provenance, Authoring and Versioning) ontology

Dublin Core and SKOS

Page 17: Linking data to publications: Towards the execution of papers

Enabler  at  Elsevier  -­‐  Linked  Data:  access  any  level  of  granularity  of  content

7

SWAN’s PAV (Provenance, Authoring and Versioning) ontology

Dublin Core and SKOS1. Where the document region is completely described by an existing ID, use that ID to define the region.Example: http://api.elsevier.com/content/article/DOI:10.1016/S0030-3992(02)00069-5#p0100 specifies a document region as the element with ID "p0100".2. Where the document region can be completely described by an element within an ID'd element, navigate outwards to an ID that encloses the region, and use a relative Xpath.Example: #xpath-e(id('s0050')/ce:para[4]) specifies a document region as the fourth ce:para element within an element with ID "s0050".3. Where the document region cannot be completely described by an element within the content, use the above locators combined with substrings.Example: #xpath-e(substring(id('p0100'),10,20)) specifies a document region as beingcharacters 10–20 in the element with ID "p0100".4. Where the source content does not contain IDs, use absolute Xpaths to navigate to the appropriate element, and use substrings as required.Example: #xpath-e(article/body/ce:sections/ce:section[4]/ce:para[4]) points to a particularce:para as defined by the given Xpath. An example of an absolute Xpath with substrings is left as an exercise for the reader.

Page 18: Linking data to publications: Towards the execution of papers

Authors  manually  iden,fy  (and  tag)  en,,es  for  which  associated  data  is  in  databases,  like  GenBank,  Uniprot,  PDB,  etc

Or:  automa,c  en,ty  iden,fica,on  and  linking  to  relevant  databases.  

Few  (modest)  examples  of  linking  within  document

4

Page 19: Linking data to publications: Towards the execution of papers

Authors  manually  iden,fy  (and  tag)  en,,es  for  which  associated  data  is  in  databases,  like  GenBank,  Uniprot,  PDB,  etc

Or:  automa,c  en,ty  iden,fica,on  and  linking  to  relevant  databases.  

Few  (modest)  examples  of  linking  within  document

4

Page 20: Linking data to publications: Towards the execution of papers

9

Workflow  Repository

Experimental  Design

Data  Repository

Observed  Results

So]ware  Repository

Code/Sta0s0cs

Publica0on

Figures

Conclusions

Background

Hypotheses

Results

Methods

3. The  future  being  made  today:  let’s  execute  the  paper!  

Page 21: Linking data to publications: Towards the execution of papers

9

Workflow  Repository

Experimental  Design

Data  Repository

Observed  Results

So]ware  Repository

Code/Sta0s0cs

3. The  future  being  made  today:  let’s  execute  the  paper!  

Page 22: Linking data to publications: Towards the execution of papers

Research  Process

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Page 23: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Page 24: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

Experimental  Design

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Page 25: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

Experimental  Design

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Page 26: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

Experimental  Design

Observed  Results

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  DesignObserved  Results

Page 27: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

Experimental  Design

Observed  Results

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Observed  Results

Page 28: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

Experimental  Design

Observed  Results

Code/Sta0s0cs

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Observed  Results

Code/Sta0s0cs

Page 29: Linking data to publications: Towards the execution of papers

Research  Report

Background

Hypotheses

Research  Process

Experimental  Design

Observed  Results

Code/Sta0s0cs

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Observed  Results

Code/Sta0s0cs

Page 30: Linking data to publications: Towards the execution of papers

Research  Report

Conclusions

Background

Hypotheses

Research  Process

Experimental  Design

Observed  Results

Code/Sta0s0cs

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Observed  Results

Code/Sta0s0cs

Page 31: Linking data to publications: Towards the execution of papers

Research  Report

Conclusions

Background

Hypotheses

Research  Process

Experimental  Design

Observed  Results

Code/Sta0s0cs

10

Workflow  Repository

Data  Repository

So]ware  Repository

3. The  future  being  made  today:  let’s  execute  the  paper!  

Experimental  Design

Observed  Results

Code/Sta0s0cs

Maintain  context:  -­‐ Experimental-­‐ Narra0ve-­‐ Domain

Page 32: Linking data to publications: Towards the execution of papers

Research  Process

Experimental  Design

Observed  Results

Code/Sta0s0cs

Research  Report

Conclusions

Background

Hypotheses

Experimental  Design

Observed  Results

Code/Sta0s0cs

11

Workflow  Repository

Data  Repository

So]ware  Repository

3. Even  be5er:  why  move  anything  anywhere??  

Page 33: Linking data to publications: Towards the execution of papers

Research  Process

Experimental  Design

Observed  Results

Code/Sta0s0cs

Research  Report

Conclusions

Background

Hypotheses

Experimental  Design

Observed  Results

Code/Sta0s0cs

11

Workflow  Repository

Data  Repository

So]ware  Repository

3. Even  be5er:  why  move  anything  anywhere??  

Experimental  Design

Observed  Results

Code/Sta0s0cs

Page 34: Linking data to publications: Towards the execution of papers

3.Science  in  the  cloud

12

Page 35: Linking data to publications: Towards the execution of papers

3.Science  in  the  cloud

12

Proposal   Advantages  to  the  scien4st

Store  research  plan,  results,  thoughts,  observa0ons,  etc.  locally/in  the  cloud  in  a  system  that  adds  metadata.  

Always  keep  track  of  your  own  data!  Maintain  copyright  and  access  privileges.  

Allow  access  to  the  data,  workflow  etc.  to  the  data  repository,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Data  is  veXed,  iden0fied,  and  adver0sed.If  scien0st/funding  body  wants:  data  repository  controls  access  rightsdata  repository  maintains  archive

Allow  access  to  the  collected  thoughts,  (with  links  to  data)  to  the  publisher,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Content  veXed,  iden0fied,  and  adver0sed..  If  scien0st/funding  body  wants:  publisher/library  controls  access  rightspublisher/library  maintains  archive

Others  -­‐  perhaps  publishers,  perhaps  data  repositories,  perhaps  (egad!)  so[ware  developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context.

BeXer  so[ware!  BeXer  links  to  everything  else  we  do.

Page 36: Linking data to publications: Towards the execution of papers

3.Science  in  the  cloud

12

Proposal   Advantages  to  the  scien4st

Store  research  plan,  results,  thoughts,  observa0ons,  etc.  locally/in  the  cloud  in  a  system  that  adds  metadata.  

Always  keep  track  of  your  own  data!  Maintain  copyright  and  access  privileges.  

Allow  access  to  the  data,  workflow  etc.  to  the  data  repository,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Data  is  veXed,  iden0fied,  and  adver0sed.If  scien0st/funding  body  wants:  data  repository  controls  access  rightsdata  repository  maintains  archive

Allow  access  to  the  collected  thoughts,  (with  links  to  data)  to  the  publisher,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Content  veXed,  iden0fied,  and  adver0sed..  If  scien0st/funding  body  wants:  publisher/library  controls  access  rightspublisher/library  maintains  archive

Others  -­‐  perhaps  publishers,  perhaps  data  repositories,  perhaps  (egad!)  so[ware  developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context.

BeXer  so[ware!  BeXer  links  to  everything  else  we  do.

Page 37: Linking data to publications: Towards the execution of papers

3.Science  in  the  cloud

12

Proposal   Advantages  to  the  scien4st

Store  research  plan,  results,  thoughts,  observa0ons,  etc.  locally/in  the  cloud  in  a  system  that  adds  metadata.  

Always  keep  track  of  your  own  data!  Maintain  copyright  and  access  privileges.  

Allow  access  to  the  data,  workflow  etc.  to  the  data  repository,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Data  is  veXed,  iden0fied,  and  adver0sed.If  scien0st/funding  body  wants:  data  repository  controls  access  rightsdata  repository  maintains  archive

Allow  access  to  the  collected  thoughts,  (with  links  to  data)  to  the  publisher,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Content  veXed,  iden0fied,  and  adver0sed..  If  scien0st/funding  body  wants:  publisher/library  controls  access  rightspublisher/library  maintains  archive

Others  -­‐  perhaps  publishers,  perhaps  data  repositories,  perhaps  (egad!)  so[ware  developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context.

BeXer  so[ware!  BeXer  links  to  everything  else  we  do.

Page 38: Linking data to publications: Towards the execution of papers

3.Science  in  the  cloud

12

Proposal   Advantages  to  the  scien4st

Store  research  plan,  results,  thoughts,  observa0ons,  etc.  locally/in  the  cloud  in  a  system  that  adds  metadata.  

Always  keep  track  of  your  own  data!  Maintain  copyright  and  access  privileges.  

Allow  access  to  the  data,  workflow  etc.  to  the  data  repository,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Data  is  veXed,  iden0fied,  and  adver0sed.If  scien0st/funding  body  wants:  data  repository  controls  access  rightsdata  repository  maintains  archive

Allow  access  to  the  collected  thoughts,  (with  links  to  data)  to  the  publisher,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Content  veXed,  iden0fied,  and  adver0sed..  If  scien0st/funding  body  wants:  publisher/library  controls  access  rightspublisher/library  maintains  archive

Others  -­‐  perhaps  publishers,  perhaps  data  repositories,  perhaps  (egad!)  so[ware  developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context.

BeXer  so[ware!  BeXer  links  to  everything  else  we  do.

Page 39: Linking data to publications: Towards the execution of papers

3.Science  in  the  cloud

12

Proposal   Advantages  to  the  scien4st

Store  research  plan,  results,  thoughts,  observa0ons,  etc.  locally/in  the  cloud  in  a  system  that  adds  metadata.  

Always  keep  track  of  your  own  data!  Maintain  copyright  and  access  privileges.  

Allow  access  to  the  data,  workflow  etc.  to  the  data  repository,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Data  is  veXed,  iden0fied,  and  adver0sed.If  scien0st/funding  body  wants:  data  repository  controls  access  rightsdata  repository  maintains  archive

Allow  access  to  the  collected  thoughts,  (with  links  to  data)  to  the  publisher,  who1.    validates  quality  (content  and  form)  2.    assigns  a  UID  3.    adver0ses  its  existence

Content  veXed,  iden0fied,  and  adver0sed..  If  scien0st/funding  body  wants:  publisher/library  controls  access  rightspublisher/library  maintains  archive

Others  -­‐  perhaps  publishers,  perhaps  data  repositories,  perhaps  (egad!)  so[ware  developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context.

BeXer  so[ware!  BeXer  links  to  everything  else  we  do.

Page 40: Linking data to publications: Towards the execution of papers

Technology  1:  Workflow  tools

http://wings.isi.edu/

http://VisTrails.org

http://MyExperiment.org

Page 41: Linking data to publications: Towards the execution of papers

Technology  2:  Executable  Papers

Page 42: Linking data to publications: Towards the execution of papers

Technology  2:  Executable  Papers

Page 43: Linking data to publications: Towards the execution of papers

Technology  2:  Executable  Papers

Page 44: Linking data to publications: Towards the execution of papers

Technology  2:  Executable  Papers

Page 45: Linking data to publications: Towards the execution of papers

Technology  3:  Applica,on  Plahorms

Page 46: Linking data to publications: Towards the execution of papers

Technology  3:  Applica,on  Plahorms

Page 47: Linking data to publications: Towards the execution of papers

Technology  3:  Applica,on  Plahorms

Page 48: Linking data to publications: Towards the execution of papers

Technology  3:  Applica,on  Plahorms

Page 49: Linking data to publications: Towards the execution of papers

In  summary:

16

Page 50: Linking data to publications: Towards the execution of papers

In  summary:• Publishers  are  in  general  not  interes0ng  in  owning  or  charging  for  research  data  repositories  (Brussels  declara0on)

• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to  work  with  data  repositories  to  do  this  effec0vely

• Publishers  believe  in  Digital  Object  Iden0fiers• Publishers  embrace  open  standards  and  interoperability,  and  are  adap0ng  their  infrastructure  to  be  future-­‐compliant:

– In  par0cular,  we  think  scien0sts  should  keep  (track  of)  their  work

16

Page 51: Linking data to publications: Towards the execution of papers

In  summary:• Publishers  are  in  general  not  interes0ng  in  owning  or  charging  for  research  data  repositories  (Brussels  declara0on)

• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to  work  with  data  repositories  to  do  this  effec0vely

• Publishers  believe  in  Digital  Object  Iden0fiers• Publishers  embrace  open  standards  and  interoperability,  and  are  adap0ng  their  infrastructure  to  be  future-­‐compliant:

– In  par0cular,  we  think  scien0sts  should  keep  (track  of)  their  work–We  also  think  novel  informa0on  architectures  work  for  science,  including  Linked  Data,  the  concept  of  app  servers,  and  the  cloud

16

Page 52: Linking data to publications: Towards the execution of papers

In  summary:• Publishers  are  in  general  not  interes0ng  in  owning  or  charging  for  research  data  repositories  (Brussels  declara0on)

• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to  work  with  data  repositories  to  do  this  effec0vely

• Publishers  believe  in  Digital  Object  Iden0fiers• Publishers  embrace  open  standards  and  interoperability,  and  are  adap0ng  their  infrastructure  to  be  future-­‐compliant:

– In  par0cular,  we  think  scien0sts  should  keep  (track  of)  their  work–We  also  think  novel  informa0on  architectures  work  for  science,  including  Linked  Data,  the  concept  of  app  servers,  and  the  cloud

• Publishers  believe  in  a  future  that  stores  and  shares  science  in  a  beXer  and  more  produc0ve  way,  and  inven0ng  it  together:  FoRCE11:  The  Future  of  Research  Communica0ons  and  eScience

16