Reengineering the scientific research paper

67
Stories, that persuade with data. What’s inside scientific papers, and should it be reengineered? Anita de Waard, [email protected] Disruptive Technologies Director, Elsevier Labs

description

Talk for Harvard Digital Scholarship Summit

Transcript of Reengineering the scientific research paper

Page 1: Reengineering the scientific research paper

Stories, that persuade with data.What’s inside scientific papers, and should it be reengineered?

Anita de Waard, [email protected] Disruptive Technologies Director, Elsevier Labs

Page 2: Reengineering the scientific research paper

Scientific papers are stories, that persuade with data.The Story of Goldilocks and the Three Bears

Story Part Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters

Setting

Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location

Setting

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.she walked right in. Attempt

Theme

Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal

Episode 1

Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt

Episode 1

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she exclaimed.

Outcome

Episode 1

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cellsSo, she tasted the porridge from the

second bowl.Activity

Episode 1

Data (data not shown),

This porridge is too cold, she said Outcome

Episode 1

Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

Activity

Episode 1

Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome

Episode 1

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles she ate it all up. Outcome

Episode 1

Data (Figure 1F),

Page 3: Reengineering the scientific research paper

Scientific papers are stories, that persuade with data.The Story of Goldilocks and the Three Bears

Story Part Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters

Setting

Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location

Setting

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.she walked right in. Attempt

Theme

Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal

Episode 1

Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt

Episode 1

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she exclaimed.

Outcome

Episode 1

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cellsSo, she tasted the porridge from the

second bowl.Activity

Episode 1

Data (data not shown),

This porridge is too cold, she said Outcome

Episode 1

Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

Activity

Episode 1

Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome

Episode 1

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles she ate it all up. Outcome

Episode 1

Data (Figure 1F),

Page 4: Reengineering the scientific research paper

Story analysis of scientific text: ORB vs. Medium-grained structure

Page 5: Reengineering the scientific research paper

See work at http://www.w3.org/wiki/HCLSIG/SWANSIOC

Story analysis of scientific text: ORB vs. Medium-grained structure

Page 6: Reengineering the scientific research paper

Episode-level access through Linked Data standards:

Page 7: Reengineering the scientific research paper

Episode-level access through Linked Data standards:

<ce:section id=#123>

said @anita on April 5, 2011

mice like cheesethis says

Page 8: Reengineering the scientific research paper

but we all know she was deluded then

Episode-level access through Linked Data standards:

<ce:section id=#123>

said @anita on April 5, 2011

mice like cheesethis says

Page 9: Reengineering the scientific research paper

allows for layers of annotation

but we all know she was deluded then

Episode-level access through Linked Data standards:

<ce:section id=#123>

said @anita on April 5, 2011

the xml is fixed, but the structure is open!

mice like cheesethis says

Page 10: Reengineering the scientific research paper

Satellite Format: Linked Data repository for all Elsevier content

Page 11: Reengineering the scientific research paper

Satellite Format: Linked Data repository for all Elsevier content

Dublin Core and SKOS

Page 12: Reengineering the scientific research paper

Satellite Format: Linked Data repository for all Elsevier content

SWAN’s PAV (Provenance, Authoring and Versioning) ontology

Dublin Core and SKOS

Page 13: Reengineering the scientific research paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Scientific papers are stories, that persuade with data.

Page 14: Reengineering the scientific research paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Scientific papers are stories, that persuade with data.

Page 15: Reengineering the scientific research paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Scientific papers are stories, that persuade with data.

Page 16: Reengineering the scientific research paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

Scientific papers are stories, that persuade with data.

Page 17: Reengineering the scientific research paper

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

ExperimentalEvidence

Scientific papers are stories, that persuade with data.

Page 18: Reengineering the scientific research paper

Realms of persuasive experimental discourse:

Page 19: Reengineering the scientific research paper

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

(2) a. To exclude that (3) b. suggesting that

Realms of persuasive experimental discourse:

Page 20: Reengineering the scientific research paper

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Concepts, models, ‘facts’

Experiment

Transitions(2) a. To exclude that (3) b. suggesting that

Realms of persuasive experimental discourse:

Page 21: Reengineering the scientific research paper

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Concepts, models, ‘facts’

Experiment

Transitions(2) a. To exclude that (3) b. suggesting that

Realms of persuasive experimental discourse:

‘State’ present tense

‘Narrative’ past tense

Page 22: Reengineering the scientific research paper

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Page 23: Reengineering the scientific research paper

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Page 24: Reengineering the scientific research paper

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Page 25: Reengineering the scientific research paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Page 26: Reengineering the scientific research paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Cited Implication

Page 27: Reengineering the scientific research paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007:

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Cited Implication

Page 28: Reengineering the scientific research paper

... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007:

Fact creation through citations:

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve et al, Cell, 2006:

Hypothesis

Implication

Cited Implication

Fact

Page 29: Reengineering the scientific research paper

“[Y]ou can transform a fact into fiction or a fiction into fact just by adding or subtracting references [and data]”

– Bruno Latour, ‘Science in Action’,1987

Page 30: Reengineering the scientific research paper

“[Y]ou can transform a fact into fiction or a fiction into fact just by adding or subtracting references [and data]”

– Bruno Latour, ‘Science in Action’,1987

Page 31: Reengineering the scientific research paper

How is this rhetoric instantiated?Rhetorical goal

Utterance {Proposition} S = H, B

V = C, E

Indicate lack of knowledge

{The role of untranslated exons in the CCR3 gene} has not been studied.

NN 0

Evaluate other work

Recently, CCR3 has been shown to {be upregulated on neutrophils by interferons in vitro [..]}

N, D 3

Offer hypotheses

it is thought that {these transcription factors affect transcription of the gene through interactions with the RNA transcription complex.}

NN, R 2

Interpret results these data suggested that {5' untranslated exon 1 may have a regulatory function.}

A, D 2

Assess validity of interpretations

Since {this was not the case with other lines,} {we suspect {it is integration-site specific}}

A, D 1

State correspondence to expectations

While we expected {the transcript to be about 1 kb in size (Figure 4A),} {two bands ~4 and 5 kb were apparent.}

A, D 2, S+

Comparison to other work

It is important that {this data be viewed with {what is known about other myeloid-specific promoters,}}

A,R/NN/D

2, F+

Page 32: Reengineering the scientific research paper

11

Eventually: trace roots of a claim: how many independent data points is it based on?

Page 33: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Page 34: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 35: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 36: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 37: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 38: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

underpinning

data 1

data 2 data 3

Page 39: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 40: Reengineering the scientific research paper

11

PHC Growth arrestundergo

Eventually: trace roots of a claim: how many independent data points is it based on?

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

method link

Page 41: Reengineering the scientific research paper

Scientific papers are stories, that persuade with data.

Page 42: Reengineering the scientific research paper

Scientific papers are stories, that persuade with data.

Page 43: Reengineering the scientific research paper

Scientific papers are stories, that persuade with data.

Page 44: Reengineering the scientific research paper

Sometimes the link to data is good:

Page 45: Reengineering the scientific research paper

And sometimes it’s not so good:

Page 46: Reengineering the scientific research paper

And sometimes it’s not so good:

Page 47: Reengineering the scientific research paper

And sometimes it’s not so good:

Page 48: Reengineering the scientific research paper

And sometimes it’s not so good:

Page 49: Reengineering the scientific research paper

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

Page 50: Reengineering the scientific research paper

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

Page 51: Reengineering the scientific research paper

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

Page 52: Reengineering the scientific research paper

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Page 53: Reengineering the scientific research paper

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

Edit

Revise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Page 54: Reengineering the scientific research paper

5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

Edit

Revise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Page 55: Reengineering the scientific research paper

5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.

Some other publisher

6. User applications: distributed applications run on this ‘exposed data’ universe.

Data-driven papers? Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

Edit

Revise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Page 56: Reengineering the scientific research paper

One step: encouraging submission of structured workflows

Page 57: Reengineering the scientific research paper

Another step: ScienceDirect app store

Page 58: Reengineering the scientific research paper

Another step: ScienceDirect app store

- Eclipse SDK platform accessing all ScienceDirect/Scopus content-Build applications on top of content-Offer to users in marketplace

Page 59: Reengineering the scientific research paper

A third step: Executable Paper ChallengeGoal: invite computer science community to help develop formats that:

- add executable files and reproducible data to computer science papers;

- handle storage and validation of very large files

- help validation of data and code, and decrease the reviewer’s workload

Page 60: Reengineering the scientific research paper

A third step: Executable Paper ChallengeGoal: invite computer science community to help develop formats that:

- add executable files and reproducible data to computer science papers;

- handle storage and validation of very large files

- help validation of data and code, and decrease the reviewer’s workload

Page 61: Reengineering the scientific research paper

In Summary:

Page 62: Reengineering the scientific research paper

In Summary:1. Stories:

- ORB, Satellite: link to any part of content - bring it on!

Page 63: Reengineering the scientific research paper

In Summary:1. Stories:

- ORB, Satellite: link to any part of content - bring it on!

2. Persuasion:

- Logical structure for biological propositions; trace a claim through successive citations

Page 64: Reengineering the scientific research paper

In Summary:1. Stories:

- ORB, Satellite: link to any part of content - bring it on!

2. Persuasion:

- Logical structure for biological propositions; trace a claim through successive citations

3. Data:

- Better data linking, better structuring of methods.

Page 65: Reengineering the scientific research paper

In Summary:1. Stories:

- ORB, Satellite: link to any part of content - bring it on!

2. Persuasion:

- Logical structure for biological propositions; trace a claim through successive citations

3. Data:

- Better data linking, better structuring of methods.

In conclusion: is the research paper going away?

Page 66: Reengineering the scientific research paper

In Summary:1. Stories:

- ORB, Satellite: link to any part of content - bring it on!

2. Persuasion:

- Logical structure for biological propositions; trace a claim through successive citations

3. Data:

- Better data linking, better structuring of methods.

In conclusion: is the research paper going away?

I don’t think so! But it will be:

- Structured better: authors will need to justify claims directly

- Connected better: more traceable, better links to data and workflow components, and to other work

Page 67: Reengineering the scientific research paper

Thank you!

W3C group on Discourse Structure: http://www.w3.org/wiki/HCLSIG/SWANSIOC SciVerse: http://developer.sciverse.comPangea project: http://bit.ly/98haOw Parsing rhetoric: http://elsatglabs.com/labs/anita/Fact creation demo: http://elsatglabs.com/labs/anita/demos/LATSDemo102007/Methods Navigator: http://www.methodsnavigator.comSciVerse APIs: http://developer.sciverse.comExecutable Paper Challenge: http://www.executablepapers.com

Or mail me at: Anita de Waard, [email protected]