Linking data to publications: Towards the execution of papers
-
Upload
anita-de-waard -
Category
Technology
-
view
990 -
download
1
description
Transcript of Linking data to publications: Towards the execution of papers
Linking data to publica0ons:Towards the execu0on of papers
Anita de Waard Elsevier Labs/UUtrecht
h5p://elsatglabs.com/labs/anita
Cycle of Scien,fic Inves,ga,on
2CoSI model by Gully Burns, ISI/USC
Domain-speci!cReasoning Model
Observations Interpretations
Experimental Design Model
formulate hypotheses
make predictions
design experiments
gather data
make observational assertions make interpretational assertions
aggregate assertions
perform experiments
Cycle of Scien,fic Inves,ga,on
2CoSI model by Gully Burns, ISI/USC
Domain-speci!cReasoning Model
Observations Interpretations
Experimental Design Model
formulate hypotheses
make predictions
design experiments
gather data
make observational assertions make interpretational assertions
aggregate assertions
perform experiments
Conclusions
Background
HypothesesExperimental Design
Experimental Objects
Observed Results
Processed Data/Sta0s0cs
Cycle of Scien,fic Inves,ga,on
2CoSI model by Gully Burns, ISI/USC
Domain-speci!cReasoning Model
Observations Interpretations
Experimental Design Model
formulate hypotheses
make predictions
design experiments
gather data
make observational assertions make interpretational assertions
aggregate assertions
perform experiments
Publica0on
Results
Methods
Figures Conclusions
Background
HypothesesExperimental Design
Experimental Objects
Observed Results
Processed Data/Sta0s0cs
Cycle of Scien,fic Inves,ga,on
2CoSI model by Gully Burns, ISI/USC
Domain-speci!cReasoning Model
Observations Interpretations
Experimental Design Model
formulate hypotheses
make predictions
design experiments
gather data
make observational assertions make interpretational assertions
aggregate assertions
perform experiments
Publica0on
Results
Methods
Figures
Conclusions
Background
Hypotheses
Experimental Design
Experimental Objects
Observed Results
Processed Data/Sta0s0cs
3
1. Current prac?ce: store data in repository, link from document, and vice versa
Observed Results
Data Repository
Processed Data/Sta0s0cs
Sta,s,cs storage system
Experimental Design
Workflow Repository
Publica0on
Figures
Conclusions
Background
Hypotheses
Results
Methods
Current Prac,ce: linking to documents Least favorite: raw research data delivered as supplementary data
Much beGer: linking into/from data centres, e.g. Pangea:
3
Current Prac,ce: linking to documents Least favorite: raw research data delivered as supplementary data
Much beGer: linking into/from data centres, e.g. Pangea:
3
Current Prac,ce: linking to documents Least favorite: raw research data delivered as supplementary data
Much beGer: linking into/from data centres, e.g. Pangea:
3
Linking data and papers: ‘the publisher’s’ posi,on: STM’s “Brussels Declara,on”, June 2006:
“... believe that, as a general principle, data sets, raw data outputs of research, and sets or subsets of that data should wherever possible be made freely accessible ...”
• Publishers are (in general) not interested in owning or charging for research data repositories
• Publishers are very interested in linking to and from data, and want to work with data repositories to do this effec,vely
• Publishers believe in (and know) the concept of Digital Object Iden,fiers:
–Where possible: one repository for iden,fiers
–Persistent and unique (don’t keep same ID if content changes)
–Where possible, link back to the publica,on
Linking data and papers: ‘the publisher’s’ posi,on: STM’s “Brussels Declara,on”, June 2006:
“... believe that, as a general principle, data sets, raw data outputs of research, and sets or subsets of that data should wherever possible be made freely accessible ...”
• Publishers are (in general) not interested in owning or charging for research data repositories
• Publishers are very interested in linking to and from data, and want to work with data repositories to do this effec,vely
• Publishers believe in (and know) the concept of Digital Object Iden,fiers:
–Where possible: one repository for iden,fiers
–Persistent and unique (don’t keep same ID if content changes)
–Where possible, link back to the publica,on
Complete agreement with MacKenzie Smith’s “Requirements for Data Cita,on!”
6
2. Store data in repository, link within document.
Workflow Repository
Experimental Design
Data Repository
Observed Results
So]ware Repository
Code/Sta0s0cs
Publica0on
Figures
Conclusions
Background
Hypotheses
Results
Methods
Enabler at Elsevier -‐ Linked Data: access any level of granularity of content
7
Enabler at Elsevier -‐ Linked Data: access any level of granularity of content
7
Enabler at Elsevier -‐ Linked Data: access any level of granularity of content
7
Dublin Core and SKOS
Enabler at Elsevier -‐ Linked Data: access any level of granularity of content
7
SWAN’s PAV (Provenance, Authoring and Versioning) ontology
Dublin Core and SKOS
Enabler at Elsevier -‐ Linked Data: access any level of granularity of content
7
SWAN’s PAV (Provenance, Authoring and Versioning) ontology
Dublin Core and SKOS1. Where the document region is completely described by an existing ID, use that ID to define the region.Example: http://api.elsevier.com/content/article/DOI:10.1016/S0030-3992(02)00069-5#p0100 specifies a document region as the element with ID "p0100".2. Where the document region can be completely described by an element within an ID'd element, navigate outwards to an ID that encloses the region, and use a relative Xpath.Example: #xpath-e(id('s0050')/ce:para[4]) specifies a document region as the fourth ce:para element within an element with ID "s0050".3. Where the document region cannot be completely described by an element within the content, use the above locators combined with substrings.Example: #xpath-e(substring(id('p0100'),10,20)) specifies a document region as beingcharacters 10–20 in the element with ID "p0100".4. Where the source content does not contain IDs, use absolute Xpaths to navigate to the appropriate element, and use substrings as required.Example: #xpath-e(article/body/ce:sections/ce:section[4]/ce:para[4]) points to a particularce:para as defined by the given Xpath. An example of an absolute Xpath with substrings is left as an exercise for the reader.
Authors manually iden,fy (and tag) en,,es for which associated data is in databases, like GenBank, Uniprot, PDB, etc
Or: automa,c en,ty iden,fica,on and linking to relevant databases.
Few (modest) examples of linking within document
4
Authors manually iden,fy (and tag) en,,es for which associated data is in databases, like GenBank, Uniprot, PDB, etc
Or: automa,c en,ty iden,fica,on and linking to relevant databases.
Few (modest) examples of linking within document
4
9
Workflow Repository
Experimental Design
Data Repository
Observed Results
So]ware Repository
Code/Sta0s0cs
Publica0on
Figures
Conclusions
Background
Hypotheses
Results
Methods
3. The future being made today: let’s execute the paper!
9
Workflow Repository
Experimental Design
Data Repository
Observed Results
So]ware Repository
Code/Sta0s0cs
3. The future being made today: let’s execute the paper!
Research Process
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Research Report
Background
Hypotheses
Research Process
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Research Report
Background
Hypotheses
Research Process
Experimental Design
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Research Report
Background
Hypotheses
Research Process
Experimental Design
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Research Report
Background
Hypotheses
Research Process
Experimental Design
Observed Results
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental DesignObserved Results
Research Report
Background
Hypotheses
Research Process
Experimental Design
Observed Results
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Observed Results
Research Report
Background
Hypotheses
Research Process
Experimental Design
Observed Results
Code/Sta0s0cs
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Observed Results
Code/Sta0s0cs
Research Report
Background
Hypotheses
Research Process
Experimental Design
Observed Results
Code/Sta0s0cs
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Observed Results
Code/Sta0s0cs
Research Report
Conclusions
Background
Hypotheses
Research Process
Experimental Design
Observed Results
Code/Sta0s0cs
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Observed Results
Code/Sta0s0cs
Research Report
Conclusions
Background
Hypotheses
Research Process
Experimental Design
Observed Results
Code/Sta0s0cs
10
Workflow Repository
Data Repository
So]ware Repository
3. The future being made today: let’s execute the paper!
Experimental Design
Observed Results
Code/Sta0s0cs
Maintain context: -‐ Experimental-‐ Narra0ve-‐ Domain
Research Process
Experimental Design
Observed Results
Code/Sta0s0cs
Research Report
Conclusions
Background
Hypotheses
Experimental Design
Observed Results
Code/Sta0s0cs
11
Workflow Repository
Data Repository
So]ware Repository
3. Even be5er: why move anything anywhere??
Research Process
Experimental Design
Observed Results
Code/Sta0s0cs
Research Report
Conclusions
Background
Hypotheses
Experimental Design
Observed Results
Code/Sta0s0cs
11
Workflow Repository
Data Repository
So]ware Repository
3. Even be5er: why move anything anywhere??
Experimental Design
Observed Results
Code/Sta0s0cs
3.Science in the cloud
12
3.Science in the cloud
12
Proposal Advantages to the scien4st
Store research plan, results, thoughts, observa0ons, etc. locally/in the cloud in a system that adds metadata.
Always keep track of your own data! Maintain copyright and access privileges.
Allow access to the data, workflow etc. to the data repository, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Data is veXed, iden0fied, and adver0sed.If scien0st/funding body wants: data repository controls access rightsdata repository maintains archive
Allow access to the collected thoughts, (with links to data) to the publisher, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Content veXed, iden0fied, and adver0sed.. If scien0st/funding body wants: publisher/library controls access rightspublisher/library maintains archive
Others -‐ perhaps publishers, perhaps data repositories, perhaps (egad!) so[ware developers -‐ build tools, to place thoughts and data into context.
BeXer so[ware! BeXer links to everything else we do.
3.Science in the cloud
12
Proposal Advantages to the scien4st
Store research plan, results, thoughts, observa0ons, etc. locally/in the cloud in a system that adds metadata.
Always keep track of your own data! Maintain copyright and access privileges.
Allow access to the data, workflow etc. to the data repository, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Data is veXed, iden0fied, and adver0sed.If scien0st/funding body wants: data repository controls access rightsdata repository maintains archive
Allow access to the collected thoughts, (with links to data) to the publisher, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Content veXed, iden0fied, and adver0sed.. If scien0st/funding body wants: publisher/library controls access rightspublisher/library maintains archive
Others -‐ perhaps publishers, perhaps data repositories, perhaps (egad!) so[ware developers -‐ build tools, to place thoughts and data into context.
BeXer so[ware! BeXer links to everything else we do.
3.Science in the cloud
12
Proposal Advantages to the scien4st
Store research plan, results, thoughts, observa0ons, etc. locally/in the cloud in a system that adds metadata.
Always keep track of your own data! Maintain copyright and access privileges.
Allow access to the data, workflow etc. to the data repository, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Data is veXed, iden0fied, and adver0sed.If scien0st/funding body wants: data repository controls access rightsdata repository maintains archive
Allow access to the collected thoughts, (with links to data) to the publisher, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Content veXed, iden0fied, and adver0sed.. If scien0st/funding body wants: publisher/library controls access rightspublisher/library maintains archive
Others -‐ perhaps publishers, perhaps data repositories, perhaps (egad!) so[ware developers -‐ build tools, to place thoughts and data into context.
BeXer so[ware! BeXer links to everything else we do.
3.Science in the cloud
12
Proposal Advantages to the scien4st
Store research plan, results, thoughts, observa0ons, etc. locally/in the cloud in a system that adds metadata.
Always keep track of your own data! Maintain copyright and access privileges.
Allow access to the data, workflow etc. to the data repository, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Data is veXed, iden0fied, and adver0sed.If scien0st/funding body wants: data repository controls access rightsdata repository maintains archive
Allow access to the collected thoughts, (with links to data) to the publisher, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Content veXed, iden0fied, and adver0sed.. If scien0st/funding body wants: publisher/library controls access rightspublisher/library maintains archive
Others -‐ perhaps publishers, perhaps data repositories, perhaps (egad!) so[ware developers -‐ build tools, to place thoughts and data into context.
BeXer so[ware! BeXer links to everything else we do.
3.Science in the cloud
12
Proposal Advantages to the scien4st
Store research plan, results, thoughts, observa0ons, etc. locally/in the cloud in a system that adds metadata.
Always keep track of your own data! Maintain copyright and access privileges.
Allow access to the data, workflow etc. to the data repository, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Data is veXed, iden0fied, and adver0sed.If scien0st/funding body wants: data repository controls access rightsdata repository maintains archive
Allow access to the collected thoughts, (with links to data) to the publisher, who1. validates quality (content and form) 2. assigns a UID 3. adver0ses its existence
Content veXed, iden0fied, and adver0sed.. If scien0st/funding body wants: publisher/library controls access rightspublisher/library maintains archive
Others -‐ perhaps publishers, perhaps data repositories, perhaps (egad!) so[ware developers -‐ build tools, to place thoughts and data into context.
BeXer so[ware! BeXer links to everything else we do.
Technology 1: Workflow tools
http://wings.isi.edu/
http://VisTrails.org
http://MyExperiment.org
Technology 2: Executable Papers
Technology 2: Executable Papers
Technology 2: Executable Papers
Technology 2: Executable Papers
Technology 3: Applica,on Plahorms
Technology 3: Applica,on Plahorms
Technology 3: Applica,on Plahorms
Technology 3: Applica,on Plahorms
In summary:
16
In summary:• Publishers are in general not interes0ng in owning or charging for research data repositories (Brussels declara0on)
• Publishers are very interested in linking to and from data, and want to work with data repositories to do this effec0vely
• Publishers believe in Digital Object Iden0fiers• Publishers embrace open standards and interoperability, and are adap0ng their infrastructure to be future-‐compliant:
– In par0cular, we think scien0sts should keep (track of) their work
16
In summary:• Publishers are in general not interes0ng in owning or charging for research data repositories (Brussels declara0on)
• Publishers are very interested in linking to and from data, and want to work with data repositories to do this effec0vely
• Publishers believe in Digital Object Iden0fiers• Publishers embrace open standards and interoperability, and are adap0ng their infrastructure to be future-‐compliant:
– In par0cular, we think scien0sts should keep (track of) their work–We also think novel informa0on architectures work for science, including Linked Data, the concept of app servers, and the cloud
16
In summary:• Publishers are in general not interes0ng in owning or charging for research data repositories (Brussels declara0on)
• Publishers are very interested in linking to and from data, and want to work with data repositories to do this effec0vely
• Publishers believe in Digital Object Iden0fiers• Publishers embrace open standards and interoperability, and are adap0ng their infrastructure to be future-‐compliant:
– In par0cular, we think scien0sts should keep (track of) their work–We also think novel informa0on architectures work for science, including Linked Data, the concept of app servers, and the cloud
• Publishers believe in a future that stores and shares science in a beXer and more produc0ve way, and inven0ng it together: FoRCE11: The Future of Research Communica0ons and eScience
16