WP3. OPTIMISATION SEDENTEXCT ANNUAL REPORT WORKPACKAGE-SPECIFIC INFORMATION
Data enters Scholarly Communication; how publishers can help make things better Integration of...
-
Upload
sydney-jefferson -
Category
Documents
-
view
215 -
download
0
Transcript of Data enters Scholarly Communication; how publishers can help make things better Integration of...
Data enters Scholarly Communication;how publishers can help make things better
Integration of Research Data and Publications
Project ODE – workpackage 4
Eefke Smit
International Association of STM Publishers
Director, Standards and Technology
LONDON, ANNUAL APA CONFERENCE, 9 November 2011
A famous paper in Nature:DNA structure - 1953
• 1 page• 2 authors• 1 figure• no data
Source: V. Kiermer, Nature Publishing Group, 2011
Nature in 2001: The human genome issue • 62 pages, 49 figures, 27 tables
Source: V. Kiermer, Nature Publishing Group, 2011
The human genome at 10 – 2010Nature now in an iPad edition:
Source: V. Kiermer, Nature Publishing Group, 2011
A thousand genomes – 2010http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html
Raw data: 12,145 SRA run ids submitted to Short Read Archive
Raw data: 12,145 SRA run ids submitted to Short Read Archive
Source: V. Kiermer, Nature Publishing Group, 2011
author information
live updates
Collapsible sections
Tool box to print, download reference, share: email, social media, bookmark
Figure previewer
Related content
new publishing models
doi
article-level metrics
Source: V. Kiermer, Nature Publishing Group, 2011
From The BioChemical Journal, Portland Press:Every wanted to inspect data referenced in articles? Utopia Documents allows you to interact directly with curated database entries. Play with molecular structures; edit sequence and alignment data; even plot curated tabular data yourself. http://www.biochemj.org/bj/semantic_faq.htm
9
How big is the Data Problem ?
Depositions of datasets in archives continue to grow, surpassing journal articles
in biomedical research
Growth of biomedical research publications (red; current total >19 million), alongside the accumulation of research data, including nucleic acid sequences (black; current total ~163 million), computer-annotated protein sequences (magenta; current total 9 million), manually annotated protein sequences (green; current total 500,000) and protein structures (blue; current total 60,000)
Source: Biochemical Journal 2009 424, 317-333 - Teresa K. Attwood, Douglas B. Kell and others.
The Graph depicts the average size of a Journal of Neuroscience article and supplemental material in megabytes.
As a consequence, the Journal no longer accepts supplementary files to manuscripts, soon the supplementary material would outgrow the article volume. The burden on the peer review process became simply to large.
Editors suspect researchers to treat supplements as data dumping grounds (Emily Markus, Cell)
Publishers cannot guarantee proper preservation and future accessibility of supp files.
Maunsell J J. Neurosci. 2010;30:10599-10600
©2010 by Society for Neuroscience
How big is the Data Problem ?Too big for the Jnl of Neuroscience and Cell:
Estimated amount of data stored per research project
1%
17%
25%
40%
6%
1% 0%
11%
1%
8%
19%
41%
13%
3%0%
14%
2%5%
13%
36%
20%
5%2%
17%
0%5%
10%15%20%
25%30%35%
40%45%
0MB 1-100MB 100MB-1GB 1GB-1TB 1TB-1PB 1PB-10PB >10PB Don't Know
Current In 2 years In 5 Years
Researchers foresee higher volumes of data per research project:
Source: PARSE.Insight survey 2008
(1) Data contained and
explained within the article
(2) Further data explanations in
any kind of supplementary files to articles
(3) Data referenced from the article and
held in data centers and repositories
(4) Data publications, describing available datasets
(5) Data in drawers and on
disks at the institute
The Data Publication Pyramid
14
The Pyramid’s likely short term reality:(1) Top of the
pyramid is stable but small
(2) Risk that supplements to articles turn into Data Dumping
places(3) Too many
disciplines lack a community
endorsed data archive
(4) Estimates are that at least
75 % of research data is
never made openly avaiable
15
The Ideal Pyramid (1) More integration of text and data, viewers
and seamless links to interactive
datasets(2) Only if data
cannot be integrated in
article, and only relevant extra explanations
(3) Seamless links (bi-directional)
between publications and data, interactive
viewers within the articles
(4) More Data Journals that
describe datasets, data mgt plans and data methods
16
How can publishers help to make things better• Stricter editorial policies on the availability of underlying data
• Recommend reliable and trustworthy Data Archives to authors
• Enhance articles for better integration of underlying data
• Endorse guidelines for proper citation of data
• Launch and sponsor Data Journals
• Ensure persistent identifiers and bi-directional linking
• Partner with reliable Data Archives for further integration of
Data and Publications, including interactivity for re-use.
17
What the Future Article might look like
• Articles will be less linear and more modular, offering layered presentation
of different levels of detail, providing multiple entries to deeper depths for
specialists, including to underlying data.
• Data, multimedia and other original material will become separately citable
items and even publishable items in their own right.
• Underlying data will become part of articles, via interactive pdf‘s, via gene
and protein viewers, via semantic links.
• Articles will be interactive; graphs and illustrations offer click throughs to
deeper information. Same for semantically tagged terms.
• Data Archives will ensure links from data to publications, to ensure that all
available literature is at hand for those interested in reusing the data.
Questions ?
Eefke SmitInternational Association of STM PublishersDirector, Standards and [email protected]