Trailblazing in the Wilderness of Data Management

32
Trailblazing in the Wilderness of Data Management Where are we going and how do we get there from here. Stephanie Wright Data Services Coordinator University of Washington Libraries

Transcript of Trailblazing in the Wilderness of Data Management

Page 1: Trailblazing in the Wilderness of Data Management

Trailblazing in the Wilderness of Data Management

Where are we going and how do we get there from here.

Stephanie WrightData Services CoordinatorUniversity of Washington Libraries

Page 2: Trailblazing in the Wilderness of Data Management

Click to edit Master title style

AGENDA

• Definitions• Why venture out• Paths already taken

–Assessments of needs–Existing programs–Tools & resources

• Blazing your own trail

Montana State University – 21 June 2013

Page 3: Trailblazing in the Wilderness of Data Management

Definitions

• Data• Data Management• Big Data• Long Tail of Data• Acronyms

www.lib.washington.edu

Page 4: Trailblazing in the Wilderness of Data Management

Definitions

www.lib.washington.edu

DATA

By data, we do not mean a synonym for information. We mean research data, that which is collected, observed, or created, for purposes of analyzing to produce original research results.

Research data may be created in tabular, textual, statistical, numeric, geospatial, image, multimedia or other formats.

(Adapted from DISC-UK DataShare Project, p. 16)

Page 5: Trailblazing in the Wilderness of Data Management

Definitions

www.lib.washington.edu

DATA

Data can be produced from a variety of processes (e.g., observation, experimentation, simulation, derivation, compilation), represented in numerous forms and stored in many digital formats (e.g., ASCII, PDF, SPSS, Excel, TIFF, Java, FITS, CIF, ZVI) The scope of this definition includes data from disciplines in the sciences, social sciences, and

humanities.

(Adapted from MIT Libraries, “What is Data?”, 2009)

Page 6: Trailblazing in the Wilderness of Data Management

Definitions

www.lib.washington.edu

DATA MANAGEMENT

Pertains to the collection, cleaning, storage, sharing, access, disposal, preservation and/or archiving of research data.

(Adapted from University of North Carolina, Research Data Stewardship Report, 2012)

Page 7: Trailblazing in the Wilderness of Data Management

Definitions

www.lib.washington.edu

BIG DATA

• Volume• Velocity• Variety

25 Definitions of Big Data: http://www.opentracker.net/article/25-definitions-big-data

– Now over 30 definitions

Page 8: Trailblazing in the Wilderness of Data Management

Definitions

www.lib.washington.edu

LONG TAIL OF DATA

Image credit: disruptormonkey.typepad.com

Page 9: Trailblazing in the Wilderness of Data Management

Acronyms

www.lib.washington.edu

• RDM – Research Data Management

• IR – Institutional Repository

• DR – Data Repository

• DMP – Data Management Plan

Page 10: Trailblazing in the Wilderness of Data Management

Why Venture Out

• Funding agencies• Universities• Researchers• Libraries

www.lib.washington.edu

Image credit: National Park Service, Yellowstone photo collection, (http://www.nps.gov/features/yell/slidefile/mammals/wolf/Images/15314.jpg)

www.lib.washington.edu

Page 11: Trailblazing in the Wilderness of Data Management

Funding Agencies

www.lib.washington.edu

• 1998: NSF• 2003: NIH• 2011: NSF• 2013: NSF, OSTP, OMB, NIH

Page 12: Trailblazing in the Wilderness of Data Management

Universities

www.lib.washington.edu

• Competitiveness• Reduce duplication of effort• Preserve the research record of the

institution• Encourage innovation & discovery

Page 13: Trailblazing in the Wilderness of Data Management

Researchers

www.lib.washington.edu

• Verifiability & reproducibility• Increased citation rates for

publications– (Piwowar et al, 2007)

• Preservation of individual scholarly record

• Save time by planning early

Page 14: Trailblazing in the Wilderness of Data Management

Libraries

www.lib.washington.edu

•Digital Preservation Network (DPN)

“The Digital Preservation Network is being created by research-intensive universities to ensure long-term preservation of the complete digital scholarly record.”

http://d-p-n.org/

Page 15: Trailblazing in the Wilderness of Data Management

Libraries

www.lib.washington.edu

NSF Proposal & Award Policies & Procedures Guide (Oct 2012)

“Instructions for preparation of the Biographical Sketch have been revised to rename the "Publications" section to "Products" .... (P)roducts may include, but are not limited to, publications, data sets, software, patents, and copyrights.”

Page 16: Trailblazing in the Wilderness of Data Management

Paths Already Taken

• Assessments• Existing programs• Tools & Resources

www.lib.washington.edu

Image credit: John W. Ridge (http://commons.wikimedia.org/wiki/File:Yellowstone_Trail_Map.jpg)

Page 17: Trailblazing in the Wilderness of Data Management

Assessments

www.lib.washington.edu

• UNC (2012) “Research Data Stewardship Report”

• University of Colorado Boulder (2012) “Research Data Management @ UCB”

• Purdue “Data Curation Profiles Directory” (http://docs.lib.purdue.edu/dcp/)

• More: Georgia Tech, Cornell, Houston, Oregon….

Page 18: Trailblazing in the Wilderness of Data Management

Findings

www.lib.washington.edu

• Researchers use a wide variety of data types – across disciplines

• Most researchers rely on themselves for data management

• Researchers want to maintain control of their data

• Many are unaware of existing services

• They want tools that work in existing workflows

Page 19: Trailblazing in the Wilderness of Data Management

What’s Needed

www.lib.washington.edu

• Creating & maintaining DMPs• Best practices guidance all along

lifecycle• Storage

– Short-term access– Long-term access– Backup– Versioning– Security

• Metadata creation

Page 20: Trailblazing in the Wilderness of Data Management

Existing Programs

www.lib.washington.edu

• Cornell– Research Data Management Service

Group• Sr VP for Research and University

Librarian• Faculty Advisory Board

– 9 faculty across disciplines– OSP & Office of Research Integrity &

Assurance

• Management Council– 2 librarians, 2 faculty, 2 IT, 1 research institute

Page 21: Trailblazing in the Wilderness of Data Management

Existing Programs

www.lib.washington.edu

• Purdue– D2C2: Distributed Data Curation

Center• Executive Committee

– Dean of Libraries, VP of Research & VP of IT

• Library: consulting & metadata support• IT: storage & research computing support

Page 22: Trailblazing in the Wilderness of Data Management

Existing Programs

www.lib.washington.edu

• University of Washington– Data Services Program (1.5 FTE)

• Data Services Coordinator• Data Services Communications &

Curriculum Libn

– Data Services Team (10 members)– Partnerships

• Research Centers (eSci, CSDE, IHME)• Office of Research (OSP)• Campus IT• iSchool

Page 23: Trailblazing in the Wilderness of Data Management

Tools & Resources

www.lib.washington.edu

• Data Mgmt Planning: DMPTool• Metadata & Sharing: DataUP• Sharing & Storage: DataBib• Citation: EZID• Best Practices: DMVitals

Page 24: Trailblazing in the Wilderness of Data Management

Blazing Your Own Trail

www.lib.washington.edu

Image credit: Michigan State University Department of History, HST 321: History of the American West (http://history.msu.edu/hst321/files/2010/07/colter.jpg)

Page 25: Trailblazing in the Wilderness of Data Management

www.lib.washington.edu

• Identify needs• Consider potential partners• Scope

– Disciplines– Specific areas of the data lifecycle

• Determine priorities– New services? Enhance existing?

Market existing?

Where do you want to go?

Page 26: Trailblazing in the Wilderness of Data Management

www.lib.washington.edu

• Objective L1– Assess and improve where needed,

student learning of critical knowledge & skills

• Objective D1– Elevate the research excellence and

recognition of MSU faculty• D1.2

• Objective D2– Enhance infrastructure in support of

research, discovery and creative activities

MSU Strategic Plan

Page 27: Trailblazing in the Wilderness of Data Management

www.lib.washington.edu

• Support for active data storage

• Data security guidance• Backup services• Development of tools that

can be inserted into existing workflows

Campus IT

Page 28: Trailblazing in the Wilderness of Data Management

www.lib.washington.edu

• Guidance on legal / ethical considerations

• Incorporate DM planning into grant submission process

• New faculty data management orientations

Office of Research

Page 29: Trailblazing in the Wilderness of Data Management

www.lib.washington.edu

• Market and provide access to existing RDM resources

• Provide learning opportunities on RDM best practices

• DMP consultation• Storage (final)• Metadata consultation

Libraries

Page 30: Trailblazing in the Wilderness of Data Management

www.lib.washington.edu

• University policy on data management

• Integrate RDM activities into T&P process

• Consider campus policy on open data

University

Page 31: Trailblazing in the Wilderness of Data Management

Questions

Page 32: Trailblazing in the Wilderness of Data Management

Thank you!Stephanie Wright

Data Services [email protected]

@shefw

http://guides.lib.washington.edu/swright

Data Management Guidehttp://guides.lib.washington.edu/dmg

ResearchWorks Data Serviceshttp://researchworks.lib.washington.edu/rw-data.html