Download - An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Transcript
Page 1: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois

RDAP14 Research Data Access & Preservation SummitMarch 26, 2014

William H. Mischo, Mary C. Schlembach, &Megan N. O’Donnell

University of Illinois at Urbana-ChampaignIowa State University

Page 2: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

NSF Data Management Plans• Data Management Plans (DMPs): required

element in NSF proposals, January 2011

• July 2011: the Library, working with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis ofDMPs in submitted NSF grant proposals

• Currently, looked at 1,600 grants with 1,260 in the analysis.

Page 3: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Reasons for Analysis

•What storage venues and mechanisms for sharing and reuse are being used?

•Are the PI’s using local templates and local campus resources such as the IDEALS?

Page 4: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Follow-on• Develop campus-wide infrastructure (Research

Data Service - RDS)

• Assist in compliance with federal agencies

• Develop important partnerships with campus units (CITES, NCSA, Colleges) and national entities

• Develop best practices and standard approaches

Page 5: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Analysis• Analysis attempts to characterize and classify

DMPs into categories

• DMPs assigned multiple categories

• 1,260 DMPs from July 2011 to November 2013

Page 6: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Categories• PI Server – Servers and workstations that the PIs

(and their students/staff) use to store project data. laboratory server/workstations, external hard drives, group computer

• PI Website – Websites edited or administered by the PI or a group they belong to. Examples: lab website, project website, wiki, PI’s website

Page 7: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Categories• Campus – Services located, operated by, run by or

endorsed by Illinois. IDEALS, Netfiles and Box.net, NCSA, and Beckman Institute.

• Department – Used when a department was specifically mentioned as providing a storage or hosting resource. Departmental website, departmental server, departmental backup service or a web address traced back to an academic department (also given the “campus” label)

Page 8: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Categories• Remote – Services and sites not located on the

Illinois campus. NASA, other campuses, collaborative projects, non-Illinois institutes

• Disciplinary – Disciplinary repositories.GenBank, arXiv, ICPSR, SEAD, Nanohub, and Dryad

• Cloud – Storage services using cloud technology. Google Drive, Google Code, Box.net, Amazon, Microsoft, Dropbox

Page 9: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Categories• Publication - Scholarly outputs.

Journal articles, workshops, and conference presentations/posters.

• Analog - Physical records/data. Lab notebooks, photographs, files

• Specimens - Physical specimens. Usually biological or artifacts

Page 10: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Categories• Optical Disc - DVD, CD, and Blu-ray discs.

• Not specified – the DMP was not specific enough for us to categorize further.

• No Data – Indicated the proposal will produce no data products.

• Local Template Used – used a library authored template.

Page 11: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Category Number Percent

PI Server 503 39.9%PI Website 529 41.9%

Campus 667 52.9%Department 142 11.2%

Remote 353 28%Disciplinary 275 21.8%Publication 556 44.1%

Cloud 63 5%Optical Disc 56 4%

Analog 131 10.4%Specimens 111 8.8%

Not Specified 66 5.2%Collaborative 164 13%

No Data 103 8.2%

ALL DMPs (n=1,260)

Page 12: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Data Venue and Risk

Data LocationSubmitted Proposals Funded Proposals

Risk of Loss/Corruption/ Breach

n=1260 n=298

PI Server/Website 64% High 61% HighDepartmental Server/Website

11.2%Medium to

High7%

Medium to High

Campus-Wide Resource 52.9%

Low45%

LowIDEALS (Institutional Repos.)

21.9% 19.8%

NCSA 4.3% 16.4%Disciplinary Repository/Cloud

25.8%Medium to

Low21.4%

Medium to Low

Remote Repository 28%Medium to

High22.8%

Medium to High

Optical Disk, Specimens, Analog

19.4% Out of Scope 11% Out of Scope

Page 13: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Notables• Funded: 298

• Used local template: 254

• Only 87 DMPS contained information about file types

• IDEALS: 275

• NCSA/XSEDE: 55

• Dryad: 22

• ICPSR: 17

• GenBank: 55

• ArX: 61

Page 14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Analysis

• Any differences in storage venue or technologies between the unfunded proposals and the funded proposals?

• Any differences between the proposals from the first year and the more current proposals?

• Other differences in proposal categories between funded and unfunded

• 734 active NSF awards, $861.8 million

Page 15: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Analysis: Funded vs. Not-funded• IDEALS institutional repository:

62 funded, 197 not funded: chi-square: 0.17

• Storing data on PI server or website: 183 funded, 569 not funded: chi-square: 0.7

• Disciplinary or Cloud: 67 funded, 241 not funded: chi-square: 0.85

• Remote storage: 68 funded, 267 not funded: chi-square: 3.01

Page 16: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Analysis• Use of IDEALS

before August 2012 = 108 after (thru November 2013) = 166chi-square: 4.59, p < .05

• Use of Disciplinary or Cloud before August 2012 = 121 after = 182chi-square: 4.33, p < .05

Page 17: An analysis and characterization of DMPs in NSF proposals from the University of Illinois (#RDAP14)

Implications and Conclusions1. No significant differences between

funded/unfunded proposals in storage venues - no advantage in IDEALS, Disciplinary.

2. More recent proposals suggest IDEALS and disciplinary repositories included at a significantly higher level

• What is the role of the library? The campus? The subject discipline?

• Connecting data to the literature important