Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

34
Revolutionizing data dissemination. www.gigasciencejournal.com GSC13, Shenzhen Scott Edmunds

description

Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012

Transcript of Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Page 1: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Revolutionizing data dissemination.

www.gigasciencejournal.com

GSC13, ShenzhenScott Edmunds

Page 2: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

www.gigasciencejournal.com

Large-Scale Data Journal/Database

Editor-in-Chief: Laurie Goodman, PhDEditor: Scott Edmunds, PhDAssistant Editor: Alexandra Basford, PhDLead Curator: Tam Sneddon D.Phil

In conjunction with:

Now taking submissions…

Page 3: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

www.gigaDB.org

Associated Database

Page 4: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

BGIData Reuse

Funders

Databases

Journals

Data Producers

Users

…Data Flow

Page 5: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Data Re-use

($)

Effort

Usability

Page 6: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Need to lower the hurdles…

($)

Effort

Usability

Page 7: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Need to lower the hurdles…

($)

Effort

Usability

Page 8: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Need to lower the hurdles…

Cloud solutions?

Better tools for assessing data quality…

Better handling of metadata…

Page 9: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Cloud?

Need to lower the hurdles…More efficient handling of data…

Do we need to keep everything?

Compression?

Page 10: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 11: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 12: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Better incentives?

($)

Effort

Usability

Page 13: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

?

New incentives/credit

Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)

Prepublication data sharing (Toronto International Data Release Workshop)“Data producers benefit from creating a citable reference, as it can later be used to reflect impact of the data sets.” Nature 461, 168-170 (2009)

Page 14: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Datacitation: Datacite and DOIs

“increase acceptance of research data as legitimate, citable contributions to the scholarly record”.

Aims to:

“data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”.

Page 15: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

For data citation to work, needs:

• Proven utility/potential user base.

• Acceptance/inclusion by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…

Page 16: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

>1.3 million DOIs since Dec 2009

Datacitation: utility/user base.

Page 17: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

BGI Datasets Get DOI®s

doi:10.5524/100004

PLANTSChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghum

MicrobeE. Coli O104:H4 TY-2482

Cell-LineChinese Hamster Ovary

Human Asian individual (YH) - DNA Methylome - Genome Assembly- TranscriptomeAncient DNA (coming soon)- Saqqaq Eskimo - Aboriginal Australian

VertebratesGiant panda Macaque - Chinese rhesus - Crab-eatingNaked mole rat Penguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope

InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSilkworm

Many released pre-publication…

Page 18: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

To maximize its utility to the research community and aid those  fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

Our first DOI:

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Page 19: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 20: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 21: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 22: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Data Citation: acceptance by journals

Page 23: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Data Citation: acceptance by journals

Page 24: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Data+Citation: inclusion in the references

Page 25: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

• Data submitted to NCBI databases:

• Submission to public databases complemented by its citable form in GigaDB.

Published 21st November 2011

- Raw data SRA:SRA046843 - Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000 - SNPs dbSNP:1056306 - CNVs- InDels dbGAP:nstd63 - SV

}

Page 26: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 27: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

In the references…

Page 28: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Is the DOI…

Page 29: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Page 30: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

And now in Nature Biotech…

Page 31: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Datacitation: tracking?

Page 32: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

Datacitation: tracking?

Plans in 2012 to link central metadata repository with WoS

- Will finally track and credit use!

To be continued…

DataCite metadata in harvestable form (OAI-PMH)

Page 33: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

www.gigasciencejournal.com

Thanks to:

[email protected]

[email protected]

@gigascience

facebook.com/GigaScience

blogs.openaccesscentral.com/blogs/gigablog/

Contact us:

Laurie Goodman Alexandra BasfordTam Sneddon Shaoguang LiangTin-Lap Lee (CUHK) Qiong Luo (HKUST)

Follow us:

Page 34: Scott Edmunds: Revolutionizing Data Dissemination: GigaScience

www.gigasciencejournal.comContact: [email protected]

GSC13 special series

• Rapid review - rolling publication after launch issue• High-visibility – published/promoted by BMC/GigaScience• Article Processing Charge covered by BGI• Hosting of any test datasets in GigaDB

Seeking submissions highlighting best practice in genomics research:

• Discussion/comment/white papers• Cloud computing, software for data handling• Research highlighting best practice