Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

57
...how licensing can change the way we do research Scott Edmunds A*STAR, 18 th April 2013 Open-Data Open-Source Open-Review Open-Access

description

Scott Edmunds talk from the A*STAR open access workshop on GigaScience, and how licensing can change the way we do research. 18th April 2013

Transcript of Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Page 1: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

...how licensing can change the way we do research

Scott EdmundsA*STAR, 18th April 2013

Open-DataOpen-Source

Open-Review Open-Access

Page 2: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

www.gigasciencejournal.com

Journal, data-platform and database for large-scale data

Editor-in-Chief: Laurie GoodmanExecutive Editor: Scott Edmunds

Commissioning Editor: Nicole NogoyLead Curator: Chris Hunter

Data Platform: Peter Li

in conjunction with

Page 3: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Open-Review Open-Access

Open-DataOpen-Source

Page 4: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Why? How?

What can be achieved?

Page 5: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Its all about the re-use

To do this everything needs to be free and accessible to be read by humans & machines*

* See: http://www.biomedcentral.com/about/datamining

Take home message:

Page 6: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

?Why

Page 7: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Big-Data BonanzaData is the new oil?

"Information is the currency of the future world”

William Gibson

Page 8: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Era of Data-Driven Science

Using networking power of the internet to tackle problems

Can ask new questions & find patterns & connections hidden in others data

Build on each others efforts quicker & more efficiently

More collaborations across more disciplines

Harness wisdom of the crowds: crowdsourcing, citizen science, crowdfunding

Enables:

Enabled by:Removing silos, standards/formats, open-access/data

Page 9: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Good for a field:Genomics/Bioinformatics

Long term sharing infrastructure:

Strong use of standards/policies:

Plummeting cost/explosion in volumes:

Page 10: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

19961997

19981999

20002001

20022003

20042005

20062007

20080

100

200

300

400

500

600

700rice wheat

Rice v Wheat: consequences of publically available genome data.

Sharing aids specific communities…

Papers

Page 11: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Sharing aids individuals…

Piwowar HA, Day RS, Fridsma DB (2007) PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308

Sharing Detailed Research Data Is Associated with Increased Citation Rate.

Every 10 datasets collected contributes to at least 4 papers in the following 3-years.Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473 (7347), 285-285 DOI: 10.1038/473285a

Page 12: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Growing Issue: unrepeatability of scientific results

Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses.Nature Genetics 41: 149-155.

Out of 18 microarray papers, resultsfrom 10 could not be reproduced

Page 13: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Growing Issue: increasing number of retractions>15X increase in last decade

Strong correlation of “retraction index” with higher impact factor

1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?

Page 14: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

?How

Page 15: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

GigaSolution: deconstructing the paper

Provide infrastructure and mechanisms of reward for:

• Data availability

• Metadata/curation

• Interoperability

• Availability of workflows

• Transparent analyses

Data

Metadata

Methods

Analyses

Page 16: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Worlds largest genomics organisation with: 17PB storage, 20.5K cores, 212TFlops, >1000 bioinformaticians

Utilizes big-data infrastructure and expertise from:

Combines and integrates:

Open-access journal

Data Publishing Platform

Data Analysis Platform

Page 17: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Open-Access

Why/what/how?Where does licensing fit?

Page 18: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Importance of licensing: ability to mine & reuse content

“By “open access” to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”

=

=

Needs to be:

SA, NC, ND put unnecessary restrictions and are not counted as “true OA”

CC0 better than CC-BY for datasets to prevent “attribution stacking”

Budapest Open Access Initiative:

Page 19: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Importance of licensing: ability to mine & reuse content

=

Prevents translations, incompatibility issues mixing other licenses, some combinations illegal (e.g. CC-NC-SA & CC-BY-SA), hinders non-profits and mixed-collaborations, practically unenforceable, dealing with requests more trouble than its worth.

Further reading:http://www.nature.com/nature/journal/v495/n7442/full/495440a.htmlhttp://blogs.ch.cam.ac.uk/pmr/2011/11/29/scientists-should-never-use-cc-nc-this-explains-why/

Use of non CC-BY by publishers = “double dipping” (selling content, reprints, etc.)

• Gives authors control over the integrity of their work and the right to be properly acknowledged and cited.

• Does not grant publicity rights, and attribution can be used to clearly disclaim endorsement

• Restrictions rarely benefit author, but do inhibit reuse

Page 20: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Open-DataData PublishingWhy/what/how?

Page 21: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

?

New incentives/credit

Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)

Prepublication data sharing (Toronto International Data Release Workshop)“Data producers benefit from creating a citable reference, as it can later be used to reflect impact of the data sets.” Nature 461, 168-170 (2009)

Page 22: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

?

New incentives/credit

“increase acceptance of research data as legitimate, citable contributions to the scholarly record”.

“data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”.

= Data Citation?

Page 23: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Anatomy of a Publication

Data

Idea

Study

Analysis

Answer

Metadata

Page 24: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Anatomy of a Data Publication

Data

Idea

Study

Analysis

Answer

Metadata

Page 25: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

• Data availability• Content re-use• …

= Credit}

Page 26: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data”… (see more)

Page 27: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

BGI Datasets Get DOI®s

PLANTSChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghumWheat A+B

Microbe/metagenomicsE. Coli O104:H4 TY-2482T2D gut metagenomeBulk pooled insects

Cell-LinesChinese Hamster OvaryMouse methylomes

Human Asian individual (YH) - DNA Methylome - Genome Assembly v1+2- TranscriptomeCancer (14TB)Single cell bladder cancerHBV infected exomesAncient DNA - Saqqaq Eskimo - Aboriginal Australian

VertebratesDarwin’s FinchGiant panda Macaque -Chinese rhesus -Crab-eatingMini-PigNaked mole rat Parrot, Puerto Rican Penguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope

InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSchistosomaSilkwormParasitic nematodePacific oyster

Released pre-publicationPaper Published in GigaScience

Page 28: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Open-Source

The new way of doing science?

Why/what/how?

Page 29: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Open-Source: the source of it all

• Transparent, fast, collaborative

• Long history, large community

• Many licenses

• Many repositories

• Many users/platforms

Software community understands benefits

Page 30: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Open-Review

Why/what/how?

Page 31: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

New & more transparent peer-review:Pre-publication: pre-prints

Page 32: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

New & more transparent peer-review:During-publication: open-review

BMC Series Medical Journals

Page 33: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

New & more transparent peer-review:Post-publication review

Open content lets you do interesting things post-publication:

New pub models:

Altmetrics:

Comments, blogs, online journal clubs

Page 34: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Examples

Page 35: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

The Peoples Parrot: Amazona vittata Puerto Rican Parrot Genome ProjectRarest parrot, national bird of Puerto Rico

Community funded from artworks, fashion shows, crowdfunding…

Genome annotated by students in community college as part of bioinformatics education

Paper and Data published in GigaScience and GigaDB

Taras K Oleksyk, et al., (2012) A Locally Funded Puerto Rican Parrot (Amazona vittata) Genome Sequencing Project Increases Avian Data and Advances Young Researcher Education. GigaScience 2012, 1:14Steven J. O’Brien. (2012): Genome empowerment for the Puerto Rican parrot – Amazona vittata. GigaScience 2012, 1:13Oleksyk et al., (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience. http://dx.doi.org/10.5524/100039

Page 36: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research
Page 37: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

How are we supporting data reproducibility?

Data sets

Analyses

Linked to

Linked to

DOI

DOI

Open-Paper

Open-Review

DOI:10.1186/2047-217X-1-18~8000 accesses

Open-Code

8 reviewers tested data in ftp server & named reports published

DOI:10.5524/100044

Open-PipelinesOpen-Workflows

DOI:10.5524/100038Open-Data

78GB CC0 data

Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/~4000 downloads

Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2

Page 38: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

SOAPdenovo2 workflows implemented in

galaxy.cbiit.cuhk.edu.hk

Page 39: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

SOAPdenovo2 workflows implemented in

galaxy.cbiit.cuhk.edu.hk

Implemented entire workflow in our Galaxy server, inc.:

• 3 pre-processing steps

• 4 SOAPdenovo modules

• 1 post processing steps

• Evaluation and visualization tools

Also available to download by >25K Galaxy users in

Page 40: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

New & more transparent peer-review:The GigaScience way:

8 referees downloaded & tested data, then signed reports

Page 41: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

New & more transparent peer-review:The GigaScience way:

Post publication: bloggers pull apart code/reviews in blogs + wiki:

SOAPdenov2 wiki: http://homolog.us/wiki1/index.php?title=SOAPdenovo2Homologus blogs: http://www.homolog.us/blogs/category/soapdenovo/

Page 42: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

New & more transparent peer-review:The GigaScience way:

Real-time open-review = paper in arXiv + blogged reviews

Page 43: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

To maximize its utility to the research community and aid those  fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

Our first DOI:

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Page 44: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research
Page 45: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research
Page 46: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Downstream consequences:

“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”

1. Citations (~140) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons

4. Example for faster & more open science

Page 47: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research
Page 48: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research
Page 49: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro-intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin–producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.

Page 50: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Changing the way we publish:

Page 51: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

Page 52: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

Page 53: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

Page 54: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Image Source: http://commons.wikimedia.org/wiki/File:System-Mechanic-California.jpg

“Deconstructed”Journal

“Regular”Journal

“Conscientious” Online Journal

Page 55: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Ultimate Goal: Executable papersData

Papers

Executable (Methods)

Papers

Analysis Papers

Page 56: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

www.gigasciencejournal.com

Give us your data, papers & pipelines*

Help us make it happen!

[email protected]@[email protected]

Contact us:

* APC’s currently generously covered by BGI

Page 57: Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Huayen Gao (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.openaccesscentral.com/blogs/gigablog/

Peter LiChris HunterJesse Si ZheNicole NogoyTam SneddonAlexandra BasfordLaurie Goodman

Follow us:www.gigadb.org

galaxy.cbiit.cuhk.edu.hkwww.gigasciencejournal.com

CBIIT

Funding from:Our collaborators:team: