Data Citation Proposal Based on work by: Mark A. Parsons and the ESIP Preservation and Stewardship...

22
Data Citation Proposal Based on work by: Mark A. Parsons and the ESIP Preservation and Stewardship Cluster, esp. Ruth Duerr, Curt Tilmes, and Bruce Barkstrom.

Transcript of Data Citation Proposal Based on work by: Mark A. Parsons and the ESIP Preservation and Stewardship...

Data Citation Proposal

Based on work by:Mark A. Parsons and the ESIP Preservation and Stewardship Cluster, esp. Ruth Duerr, Curt Tilmes, and Bruce Barkstrom.

2

Purpose of Data Citation

• Credit for data creators and stewards

• Allow data creators to see how researchers are using their data

• Track impact of data set

• Provides accountability for creators and stewards

• Aids reproducibility through unambiguous connection to the precise data used

From Parsons, modified by Lynnes

3

How “data citation” is currently done

1. Not mentioned, just used, e.g., in tables or figures

2. Reference to name or source of data in text

3. URL in text (with variable degrees of specificity)

4. Citation of related paper (e.g. CRU Temp. records recommend citing two old journal articles which do not contain the actual data or full description of methods)

5. Citation of actual data set typically using recommended citation given by data center

6. Citation of data set including a persistent identifier/locator, typically a DOI

From Parsons, et al.

4

Current GES DISC Policy

http://disc.sci.gsfc.nasa.gov/additional/citing-our-data

CITING OUR DATA

GES DISC Data Use Acknowledgment 

Distribution of GES DISC data sets is funded by NASA's Science Mission Directorate (SMD). The data are not copyrighted and are open to all for both commercial and non-commercial uses. If you used GES DISC data for a publication (research or otherwise), or for any other purpose, we request that you include the following acknowledgment: "The data used in this effort were acquired as part of the activities of NASA's Science Mission Directorate, and are archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC)."We would appreciate receiving a copy of your publication, which can be be forwarded to...

5

Basic data citation form and content

Author(s). Year. Title, [version]. [editor(s)]. Publisher. Location. [date accessed]. [subset used].

From: Parsons, Mark A., Ruth Duerr, and Jean-Bernard Minster. 2010. Data citation and peer-review. Eos, Trans. AGU 91 (34): 297-298. doi:10.1029/2010EO340001.

An Example Citation

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Authors: intellectual effort going into the dataset: i.e.,

algorithm developers

Year: year data were producedTitle: Data Set Long Name

Editor(s): People that have added significant value to the

dataset

City and Publisher: Greenbelt, MD: Goddard Earth Sciences

Data and Information Services Center

Data access date and location

From Parsons, et al.

Implementation

• Store information in GCMD entry, under “Data Set Citation”

• Requested “Dataset Editor” field from GCMD

• Generate stable, toplevel locations for each dataset, e.g., http://disc.sci.gsfc.nasa.gov/GSSTFM.2b

• Generate individualized citations for each dataset, e.g.,:

Chung-Lin Shie, Long Chiu, Robert Adler, I-I Lin, Eric J. Nelkin, and Joe Ardizzone, 2010. Surface Turbulent Fluxes, 1x1 deg Monthly Grid, Set1 and Set2. Edited by A. Savtchenko. Greenbelt, MD: Goddard Earth Sciences Data and Information Services Center, Accessed <date> at http://disc.sci.gsfc.nasa.gov/GSSTFM.2b.

• Add to READMEs at the top OR add a special file to URL set for download

• Present within Mirador at Checkout stage7

Backup Slides

8

9

• “We found that few policies recommend robust data citation practices: in our preliminary evaluation, only one-third of repositories (n=26), 6% of journals (n=307), and 1 of 53 funders suggested a best practice for data citation.  We manually reviewed 500 papers published between 2000 and 2010 across six journals; of the 198 papers that reused datasets, only 14% reported a unique dataset identifier in their dataset attribution, and a partially-overlapping 12% mentioned the author name and repository name.  Few citations to datasets themselves were made in the article references section.”

• http://openwetware.org/wiki/DataONE:Notebook/Summer_2010

•“Data Citation in the Wild”Valerie Enriquez, Sarah Walker Judson, Nicholas M. Weber, Suzie Allard, Robert B. Cook, Heather A. Piwowar, Robert J. Sandusky, Todd J. Vision, Bruce Wilson

From Parsons, et al.

From Parsons, et al.

11

Tracking citation

• “Tracking Dataset Citations Using Common Citation Tracking Tools Doesn’t Work”

•—Heather Pinowar, DataONE

• Traditional fields such as author and date too imprecise

• Web of Science, Scopus, and other tools don’t handle identifiers

From Parsons, et al.

12

Accountability

• A new standard of accountability in a post-climategate world

• Data “publication” needs to be tied to promotion, tenure, etc.

• Implies peer review— See AGU Position Statement on Data

• What is peer-review?

• An assertion of accuracy or validity?

• An audit of complete documentation and sound practice?

• Related to but different than QA.

• How does it overlap with curation and stewardship?

• Earth System Science Data one approach, but not universally applicable.

• Open or informal review or usage comments within the metadata

• Versioning and transparency are essentialFrom Parsons, et al.

Author

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

From Parsons, et al.

Year

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

From Parsons, et al.

Title

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

From Parsons, et al.

Editor

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

From Parsons, et al.

Publisher

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

From Parsons, et al.

Location

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

From Parsons, et al.

Location

• Gary King; Langche Zeng, 2006, "Replication Data Set for 'When Can History be Our Guide? The Pitfalls of Counterfactual Inference'" hdl:1902.1/DXRXCFAWPK UNF:3:DaYlT6QSX9r0D50ye+tXpA== Murray Research Archive [distributor]

From Parsons, et al.

Location

• König-Langlo, Gert and Hatwig Gernandt. 2006. Compilation of radiosonde data from the Antarctic Georg-Forster station of the German Democratic Republic from 1985 to 1992. Bremerhaven, Germany: Alfred Wegener Institute for Polar and Marine Research Data set accessed 2008-05-22. doi:10.1594/PANGAEA.547983

From Parsons, et al.

21

Doing it as best we can...

• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at doi:10.1234/xxx.

• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, Tiles (15,2; 16,0;16,1;16,2;17,0;17,1). Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at doi:10.1234/xxx.

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, Version 2.0, shapefiles. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at doi:10.1234/xxx.

From Parsons, et al.

Thank [email protected]

Much of this talk comes from:

• Parsons, Mark A., Ruth Duerr, and Jean-Bernard Minster. 2010. Data citation and peer-review. Eos, Trans. AGU 91 (34): 297-298. doi:10.1029/2010EO340001.

• Duerr, Ruth E., Robert R. Downs, Curt Tilmes, Bruce Barkstrom, W. Christopher Lenhardt, Joe Glassy, Luis E. Bermudez, and Peter Slaughter. 2011 (submitted). On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics.

• A lot of discussion at:http://wiki.esipfed.org/index.php/Preservation_and_Stewardship photo courtesy NOAAFrom Parsons, et al.