Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data...

62
0000-0001-6444- 1436 @SCEdmunds [email protected] Experiences from the front-line of Open Access & Open Data publishing.

Transcript of Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data...

Page 1: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

0000-0001-6444-1436

@SCEdmunds

[email protected]

Experiences from the front-line of Open Access & Open Data publishing.

Page 2: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

www.gigasciencejournal.com

Journal, data-platform and database for large-scale data

Editor-in-Chief: Laurie GoodmanExecutive Editor: Scott Edmunds

Commissioning Editor: Nicole NogoyLead Curator: Chris Hunter

Data Platform: Peter Li

in conjunction with

Page 3: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

What do publishers do?

Page 4: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

What do publishers do?

Apologies: http://scholarlykitchen.sspnet.org/2014/10/21/updated-80-things-publishers-do-2014-edition/

the scholarly chicken

(tl;dr version)

Page 5: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

1. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.1001747

Are publishers really adding value?

Page 6: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Need to move beyond 350 year old incentive systems

Buckheit & Donoho: Scholarly articles are merely advertisement of scholarship. The actual scholarly artifacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible.

Page 7: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Consequences: increasing number of retractions>15X increase in last decade

1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?

Page 8: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Consequences: increasing number of retractions>15X increase in last decade

At current % > by 2045 as many papers published as retracted

1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

Page 9: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

STAP paper demonstrates problems:

Nature Editorial, 2nd July 2014:

“We have concluded that we and the referees could not have detected the problems that fatally undermined the papers. The referees’ rigorous reports quite rightly took on trust what was presented in the papers.”

http://www.nature.com/news/stap-retracted-1.15488

Page 10: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

STAP paper demonstrates problems:…to publish protocols BEFORE analysis…better access to supporting data…more transparent & accountable review

…to publish replication studies

Need:

Page 11: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

JIFBAIT Networkmore

GWASGWAS

JIFBAIT NEWS

Arsenic Life forms, will they take over the planet?

By Melba Ketchum, PhD

Which Overhyped, Unreproducible Experiment Are You?Want rapid citations for 2 years only? Carry out this quiz.

You got: STAP CellsOf course dipping cells in coffee will make them pluripotent. Even if the research gets discredited, it’ll still get 100’s of citations in two years.

Page 12: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Reward the commons instead?

Open-DataOpen-Source

Open-Review Open-Access

Page 13: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

HK: good with some parts of open…

http://hub.hku.hk/

Page 14: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Closed v Open Access [the HKU edition]

Ye Old Journal

Closed Access, Subject SpecificOpen Access, public engaging

Page 15: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Closed v Open Access [the HKU edition]

Closed Access, Subject SpecificOpen Access, public engaging

Page 16: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

What is impact?

• Accessed (some >84,000)

• Cited (some >500)

• Altmetric scored (some >100)

• Influential, educational

reproducible & reused

• Covered in Int. media (Wired,

LA Times, NYT, NBC…)

But no impact factor

Papers very highly:

Page 17: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

What is the cost of the Journal Impact Factor?

Page 18: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

1. http://dx.doi.org/10.1087/201102032. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed 3. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/

What is the cost of the Journal Impact Factor?

JIF 2 = $10,000 USDJIF 5 = $20,000 USD

Buy SellC/N/S = $30,000 USDJIF 10 = $1,500 USD

Page 19: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentives-curb-research

This could never happen in Hong Kong, right?

“While we are rightly proud of Hong Kong’s highly regarded and ranked universities system, we are not immune to the same pressures. While funders in Europe have moved away from using citation based metrics such as JIF in their research assessments, the Hong Kong University Grants Committee states in their Research Assessment Exercise guidelines that they may informally use it.”

Page 20: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentives-curb-research

This is happening in Hong Kong!

JIF 2 = $8,000 USDJIF 5 = $15,000 USD

Buy

Page 21: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Specific things we should be rewarding:

Page 22: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

• Review• Data• Software• Models• Pipelines• Re-use…

= Credit

}

Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)

New incentives/credit

Page 23: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Not just carrots…

“The data discovery index (DDI) enabled through bioCADDIE is to do for data what PubMed (and PubMed Central) did for the literature.”

Page 24: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Utilizes big-data infrastructure and expertise from:

Combines and integrates (with DOIs):Open-access journal Data Publishing Platform

Data Analysis PlatformOpen Review Platform

Page 25: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Open peer review1. Transparency

Page 26: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

The only drawback?

End reviewer 3 Downfall parody videos, now!

1. TransparencyOpen peer review

Page 27: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Reward open & transparent review

Data from similar scope open/closed review journals in BMC Series shows ~5-10% harder to get referees for open review. (data from Tim Sands at BMC)

• Good data showing no difference in acceptance/rejection rates, but better quality reviews.

• Does take marginally longer to find reviewers (and for them to return reports).

BMC Series Medical Journals

Page 28: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Publons + AcademicKarma = credit for reviewers efforts

http://publons.com/

1. Transparency/open peer review

http://academickarma.org/

NOW WITH DOIs

Page 29: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

arXiv + blogged reviews = real-time open-review

1. Transparency

Page 30: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

1. TransparencyReward pre-prints

Page 31: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

http://tmblr.co/ZzXdssfOMJfy

arXiv + blogged reviews = real-time open-review

1. Transparency

Page 32: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

2. Reward Open Data

Page 33: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Data Publishing: nothing new…

Data & Metadata Collection/Experiments

Analysis/Hypothesis/Analysis

Conclusions

+ Area of Interest/Question

1839

1859

20 Yrs.

Page 34: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Data Publishing: Can be Life or DeathClimate change, global hunger, pollution, cancer, disease outbreaks…

http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966

Page 35: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

To maximize its utility to the research community and aid those  fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

Our first DOI:

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Page 36: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.
Page 37: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.
Page 38: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.
Page 39: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Downstream consequences:

“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”

1. Citations (~300) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons

4. Example for faster & more open science

Page 40: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro-intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin–producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.

Page 41: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

IRRI GALAXY

Beneficiaries/users of our work

Page 42: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

IRRI GALAXYRice 3K project: 3,000 rice genomes, 13.4TB public data

Feed The World With (Big) Data

Page 43: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

OMERO: providing access to imaging data

Already used by JCB.

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

Need for better handling of imaging data

Page 44: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

The alternative...

...look but don't touch

Need for better handling of imaging data

Page 45: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Executable

Page 46: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Methods

Answer

Metadata

softwareAnalysis

(Pipelines)

Workflows/Environments

Idea

Study

Rewarding the

DOI, etc.Publication

Publication

Publication

Data

Page 47: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Software

https://github.com/gigascience

Transparent

Open & able to build upon

Taking citeable snapshots@jeejkang

Page 48: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

gigagalaxy.net

WorkflowsReward Sharing of Workflows

Page 49: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Visualisations & DOIs for workflows

http://www.gigasciencejournal.com/series/Galaxy 49

Page 50: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Facilitate reproducibility, reuse & sharing & publish outputs of: Knitr, Sweave, Jupyter/iPython Notebook, etc.

Open DocumentsReward Open/Dynamic Workbooks

Page 51: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

E.g.

http://www.gigasciencejournal.com/content/3/1/3

Page 52: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

E.g.

http://www.gigasciencejournal.com/content/3/1/3

Page 53: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

E.g.

http://www.gigasciencejournal.com/content/3/1/3

Reviewer (Christophe Pouzat): “It took me a couple of hours to get the data, the few custom developed routines, the “vignette” and to REPRODUCE EXACTLY the analysis presented in the manuscript. With few more hours, I was able to modify the authors’ code to change their Fig. 4. In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewer’s job much more fun!

Page 54: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

http://www.gigasciencejournal.com/content/3/1/23http://www.gigasciencejournal.com/content/4/1/19

Virtual Machines

• Downloadable as virtual harddisk/available as Amazon Machine Image• Now publishing container (docker) submissions

Page 55: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Taking a microscope to the publication process

Page 56: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612

Page 57: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Lessons Learned• Is possible to push button(s) & recreate a result from

a paper

• Most published research findings are false. Or at least have errors

• Reproducibility is COSTLY. How much are you willing to spend?

• Much easier to do this before rather than after publication

Page 58: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

The cost of staying with the status quo?

• Ioannidis estimate that 85% of research resources are wasted.

• ~US$28B year unnecessarily spent on preclinical research in US.

• Each retraction estimated to cost $400,000.http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001747http://elifesciences.org/content/3/e02956http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

Page 59: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

The cost to Hong Kong (and your career) of staying with the status quo?

• Estimates lack of citation impact not being OA = 50% ($8.75B?)2

• Hong Kong ranked 54th in Global Open Data Index

• How much are YOU losing through missing out on potential

collaborations, wider engagement & unrepeatable work?

HK UCG grant budget = $17.5 Billion HKD/yr (4% of Gov spending)

Taking lowest reported reproducibility rates (11%) = >$15 billion wasted1

$$

$

1. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html2. http://www.ecs.soton.ac.uk/~harnad/Temp/research-australia.doc

Page 60: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Death to the Publication. Long live the Research Object!

Manifesto for a reproducible publisher:The era of the 1665-style publication is over

Open is the new black

Credit FAIR data, not JIF-bait narrative

Reward replication not advertising

We need a recognizable mark/badge/scores for replication

?

Page 61: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/

Peter LiChris HunterJesse Si ZheRob DavidsonNicole NogoyLaurie GoodmanAmye Kenall (BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)

www.gigadb.orggigagalaxy.net

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: (Case study)

61

Page 62: Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Where: MakerBay, Yau Tong, KowloonWhen: Monday, October 26th, 7:30pm

Come to our next Open Science meetup:

https://opendatahk.com/