Data Management Best Practices: Training for Librarians

53
http://tinyurl.com/TUdatamgmt

Transcript of Data Management Best Practices: Training for Librarians

http://tinyurl.com/TUdatamgmt

Research Data Management PlanningBest Practices

Adapted from Data

One, NECDMC, &

Mantra Research Data

Management Training

Objectives

0 Recognize what research data is & what data management entails

0 Identify common data management issues

0 Learn best practices & resources for managing these issues

0 Learn how the library can help identify data management resources, tools, & best practices

What is research data?

“The recorded factual material commonly accepted in the research community as necessary to validate research findings” that is “collected, observed, or

created, for purposes of analysis to produce original research results.”

Situational

What is research data?0 Data files

0 Documents (text, Word), spreadsheets

0 Laboratory notebooks, field notebooks, diaries

0 Questionnaires, transcripts, codebooks

0 Audiotapes, videotapes

0 Photographs, films

0 Test responses

0 Slides, artefacts, specimens, samples

0 Collection of digital objects acquired and generated during the process of research

0 Database contents (video, audio, text, images)

0 Models, algorithms, scripts

0 Contents of an application (input, output, log files for analysis software, simulation software, schemas)

0 Methodologies and workflows

0 Standard operating procedures and protocols

What else needs management?

0 Correspondence (electronic mail & paper-based correspondence)

0 Project files

0 Grant applications

0 Ethics applications

0 Technical reports

0 Technical Appendix

0 Research reports

0 Research publications

0 Master lists

0 Signed consent forms

0 Internal social media communications such as blogs, wikis etc.

0 Content stored via external social media/Web 2.0 /Cloud applications

Data Management Issues0 Lack of responsibility

0 Lack of planning for data management

0 Poor records management

0 Lack of metadata and data dictionary

0 Data files are not backed up

0 Lack of security measures

0 Undetermined ownership and retention

0 Lack of long-term plan for the data

Responsibility: Whose job is this?

0 Define roles and assign responsibilities for data management

0 For each task identified in your data management plan, identify the skills needed to perform the task

0 Match skills needed to available staff and identify gaps

0 Develop training plans for continuity

0 Assign responsible parties and monitor results

0 Talk to your librarian about best practices for managing lab notebooks. We can help!

NOBODY PANIC!

Lab Notebooks: Best Practices

0 Permanently bound book, pages numbered

0 In ink

0 Add things chronologically, date entries

0 Entries should be in first person with clear details of who did what

0 Abbreviations should be explained

0 Don’t remove pages or portions of pages

0 Put a line through blank space

0 Index completed notebooks & keep in a

single location

0 Notebooks should be “checked out”

0 Originals stay with lab, copies go with

researchers

0 Keep for at least 5 years after study is

complete, longer under various conditions,

i.e.: patents

Data Management Planning: What do you need to know?

0 What types of data will be created?

0 Who will own, have access to, and be responsible for managing these data?

0 What equipment and methods will be used to capture and process data?

0 Where will data be stored during and after?

0 “the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;

0 the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);

0 policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;

0 policies and provisions for re-use, re-distribution, and the production of derivatives;

0 plans for archiving data, samples, and other research products, and for preservation of access to them “ (NSF, 2011).

Data Management Planning: National Science Foundation

Describe :

0 What types of data will be collected or generated0 Methods of collection or generation0 How you will prevent disclosure of personally identifying or

proprietary information0 What other documentation will be generated0 Plans for preserving and archiving the data and related

documentation0 Where you will deposit the data after the study (name the

repository)0 Where your DMP will be located and how often it will be

reviewed

Data Management Planning: Institute of Museum and Library Services

Resources

Contact the library for help with writing a data management and/or data sharing plan. Librarians can help you with:

0 Writing a data management plan for a funder (e.g. NSF or NIH grant)

0 Find and use online tools and resources to create your plan

0 Identify resources for annotating, storing, and sharing your research data

NOBODY PANIC!

DMP ToolA mostly helpful tool

Records Management: How do you organize data?

Records Management: How do you organize data?

Records Management: How do you organize data?

Records Management: How do you organize data?

0 There are a number of tools and different software available to assure quality in data entry

0 Contact your librarian for help identifying data entry best practices

NOBODY PANIC!

Records Management: File naming pitfalls

0 Inconsistently labeled files

0 Multiple versions

0 Inside poorly structured folders

0 Stored on multiple media

0 In multiple locations

0 In various formats

Records Management: File naming best practices

0 Avoid special characters in a file name.

0 Use capitals or underscores instead of periods or spaces.

0 Use 25 or fewer characters.

0 Use documented & standardized descriptive information about the project/experiment.

0 Use date format ISO 8601:YYYYMMDD.

0 Include a version number.

Librarians can help you with best practices, resources, and tools for:

0 Creating file naming conventions

0 Creating directory structure naming conventions

0 Versioning your files

0 Choosing appropriate file formats for preserving and sharing your data files

Records Management: File naming best practices

NOBODY PANIC!

Documenting Data: How can others makes sense of this data?

0 How will someone make sense of your data e.g. the cells and values of your spreadsheet?

0 What universal or disciplinary standards could be used to label your data?

0 How can you describe a data set to make it discoverable?

Shhh…this includes

metadata

Documenting Data Code books and data dictionaries:

0 Describe the contents of data files

0 Define the parameters and the units on the parameter

0 Explain the formats for dates, time, geographic coordinates, and other parameters

0 Define any coded values

0 Describe quality flags or qualifying values

0 Define missing values

0 List and describe instruments used in data collection

Documenting Data MIT Libraries recommend noting:

0 Title

0 Creator

0 Identifier

0 Subject

0 Funders

0 Rights

0 Access information

0 Language

0 Dates

0 Location

0 Methodology

0 Data processing

0 Sources

0 List of file names

0 File Formats

0 File structure

0 Variable list

0 Code lists

0 Versions

0 Checksums

Close your eyes for a moment

Documenting Data Example from Dryad:

This is a metadata standard called Dublin Core.

Other repositories will use different standards, but the basic idea is the same.

Some of it will be automatically created by the repository.

For help with metadata standards, talk to your librarian.

NOBODY PANIC!

Storage & Backup: Where’re these data gonna go?

Lost and found at TU’s Science & Engineering Library(which is rather a small library)

Storage & Backup

0 How often should data be backed up?

0 How many copies of data should you have?

0 Where can you store your data?

Storage & Backup

0 Make 3 copies (original + external/local + external/remote)

0 Have them geographically distributed (local vs. remote)

0 Use a Hard drive or Tape backup system

0 Cloud Storage - some examples of private sector storage resources include: Amazon S3, Elephant Drive, Jungle Disk, Mozy, Carbonite

0 Uncompressed is preferred for storage, but if you must to conserve space, limit compression to your 3rd backup copy

Storage & Backup Best Practices

Retention

How Long?Intellectual

Property

Funder’s Policy

Publisher’s Policy

Federal & State Laws

IRB Policy

Module 1: Overview of Research Data Management

0 IRB OHRP Requirements: 45 CFR 46 requires research records to be retained for at least 3 years after the completion of the research.

0 HIPAA Requirements: Any research that involved collecting identifiable health information is subject to HIPAA requirements. As a result records must be retained for a minimum of 6 years after each subject signed an authorization.

0 FDA Requirements 21 CFR 312.62.c Any research that involved drugs, devices, or biologics being tested in humans must have records retained for a period of 2 years following the date a marketing application is approved for the drug for the indication for which it is being investigated; or, if no application is to be filed or if the application is not approved for such indication, until 2 years after the investigation is discontinued and FDA is notified.

Retention

0 VA Requirements: At present records for any research that involves the VA must be retained indefinitely per VA federal regulatory requirements.

0 Intellectual Property Requirements - Any research data used to support a patent through must be retained for the life of the patent in accordance with Intellectual Property Policy.

0 Check with your Funder and Publisher Requirements

0 Questions of data validity: If there are questions or allegations about the validity of the data or appropriate conduct of the research, you must retain all of the original research data until such questions or allegations have been completely resolved.

Retention

Thinking Long-Term: What happens to data after the project?

0 What will happen to my data after my project ends?

0 How can I appraise the value of my data?

0 What are my options for archiving and preserving my data?

0 What are my options for publishing and sharing data?

We can help you:

0 Find and evaluate a suitable repository for your data

0 Upload your data sets to a repository

0 Interpret your funder or publisher’s repository requirements

0 Help make your data in a repository searchable and discoverable

Thinking Long-Term: What happens to data after the project?

NOBODY PANIC!

Preservation:The Importance of File Formats

Slide Credit: Jen Ferguson 2013

0 Is the file format open or proprietary?

0 Do you need a certain software package to read & work with the data file?*

0 Do multiple files comprise the data file structure?*

0 Be consistent with your file formats & think long-term about them.

0 Non-proprietary, open, documented standard, unencrypted, uncompressed, ASCII formatted files will be readable into the future.

*Note in your metadata

Preservation:The Importance of File Formats

Last Activity

Librarians can help you:0 Write data management plans

0 Employ data entry best practices

0 Organize and name files

0 Determine appropriate file formats

0 Document data

0 Determine how long and where to store data

0 Find a repository to deposit data in

0 Teach you, your lab, or your classes about data management best practices

Questions?

For More Information:Margaret Janz

[email protected]

The best time to call me is email

Science & Engineering Library (SEL)Engineering Building, rm. 202

guides.temple.edu/SEL

Find your subject librarian:library.temple.edu/services/library-instruction/specialists

0 DataONE. 2013. “Best Practices for Data Management.” http://www.dataone.org/best-practices.

0 DataONE Education Module: Data Entry and Manipulation. DataONE. Retrieved Nov12, 2012. http://www.dataone.org/sites/all/documents/L04_DataEntryManipulation.pptx

0 EDINA and Data Library, University of Edinburgh. 2014. Research Data MANTRA [online course]. http://datalib.edina.ac.uk/mantra.

0 Lamar Soutter Library, University of Massachusetts Medical School. 2014. New England Collaborative Data Management Curriculum. http://library.umassmed.edu/necdmc.

0 MIT Libraries. 2013. “Data Management and Publishing.” MIT http://libraries.mit.edu/guides/subjects/data-management/index.html.

0 Office of Research Integrity. 2013. “Data Management.” United States Department of Health and Human Services. United States Federal Government. http://ori.hhs.gov/education/products/rcradmin/topics/data/open.shtml.

Brought to you by: