Research Data Management
-
Upload
aaroncollie -
Category
Education
-
view
364 -
download
0
Transcript of Research Data Management
MSU LibrariesResearch Data Management Guidance
Research Data Management
Aaron [email protected]
@aaroncollie
MSU LibrariesResearch Data Management Guidance
Introductions• Please tell us your
name and department• A brief description of
your primary research area
• What do you consider to be your research data
• Experience and/or comfort level with managing research data?
cc http://www.flickr.com/photos/quinnanya/
MSU LibrariesResearch Data Management Guidance
• Introduction• Background
• The Impetus: NSF Data Management Plan Mandate• The Effect: Policy to Practice• The Response: Changing Data Landscape
• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup• Data Publishing, Sharing, & Reuse• Protecting Data & Responsible Reuse
• Data Lifecycle Resources
Agenda
MSU LibrariesResearch Data Management Guidance
Volunstrordinaries!
Aaron Collie
Hailey Mooney
Devin Higgins
Brandon Locke
Ranti Junus Thomas Padilla
Judy Matthews
Tina Qin
MSU LibrariesResearch Data Management Guidance
We teach people about RDM
Librarianship
Training
Assessment
Consultation
Ad-hoc
6-12 new clients per semester
100% satisfied / 100% would use again
71% of new clients are referrals
60% requested additional services
15% through NFO, 14% through website
MSU LibrariesResearch Data Management Guidance
RDM@MSU 101• Who: You, as the designated steward• What: “the data”• When: Minimum 3 years after
publ./degree• Where: Managed networked storage• Why: Legal, Ethical, Scholarly• How: With fidelity and
documentation sufficient to reproduce the research
MSU LibrariesResearch Data Management Guidance
http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-in-coding/
MSU LibrariesResearch Data Management Guidance
Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
MSU LibrariesResearch Data Management Guidance
Data Management. Isn’t that… trivial?
• Not so much. Data is a primary output of research; it is very expensive to produce high quality data. Data may be collected in nanoseconds, but it takes the expert application of research protocol and design to generate data.
CC-BY-SA-3.0 Rob Lavinsky CC-BY-SA-3.0 Rob
MSU LibrariesResearch Data Management Guidance
Even more consequential, data is the input of a process that generates higher orders of understanding.
Wisdom
Knowledge
Information
Data
Understanding is hierarchical!
Russell Ackoff
MSU LibrariesResearch Data Management Guidance
This is the engine of the academic industry…
Defin
e a
ques
tion
Gath
er
info
rmati
on
Form
a hy
poth
esis
Test
the
hypo
thes
is
Anal
yze
the
data Inte
rpre
t th
e da
ta
Publ
ish
resu
lts
Rete
st
MSU LibrariesResearch Data Management Guidance
Defin
e a
ques
tion
Gath
er
info
rmati
on
Form
a
hypo
thes
is
Test
the
hypo
thes
is
Anal
yze
the
data
Inte
rpre
t th
e da
ta
Publ
ish
resu
lts
Rete
st
MSU LibrariesResearch Data Management Guidance
So, things can get a little messy.
MSU LibrariesResearch Data Management Guidance
Defin
e a
ques
tion
Gath
er
info
rmati
on
Form
a
hypo
thes
is
Test
the
hypo
thes
is
Anal
yze
the
data
Inte
rpre
t th
e da
ta
Publ
ish
resu
lts
Rete
st
The scientific method “is often misrepresented as a fixed sequence of steps,” rather than being seen for what it truly is, “a highly variable and creative process” (AAAS 2000:18).
Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)
MSU LibrariesResearch Data Management Guidance
Defin
e a
ques
tion
Gath
er
info
rmati
on
Form
a hy
poth
esis
Test
the
hypo
thes
is
Anal
yze
the
data Inte
rpre
t th
e da
ta
Publ
ish
resu
lts
Rete
st
MSU LibrariesResearch Data Management Guidance
The Research Depth ChartScientific Method
Research Design
Research Method
Research Tasks
Mor
e Sp
ecifi
c
M
ore
Gene
ric
MSU LibrariesResearch Data Management Guidance
Defin
e a
ques
tion
Gath
er
info
rmati
on
Form
a hy
poth
esis
Test
the
hypo
thes
is
Anal
yze
the
data Inte
rpre
t th
e da
ta
Publ
ish
resu
lts
Rete
st
Problem Identification
Study Concept
Literature Review
Environmental Scan
Funding & Proposal
Research Design
Research Methodolog
y
Research Workflow
Hypothesis Formation
Design Validation
Research Activity
Data Management
Data Organization
Data Storage
Data Description
Data Sharing
Scholarly Communication
Report Findings
Publish
Peer Review
MSU LibrariesResearch Data Management Guidance
Defin
e a
ques
tion
Gath
er
info
rmati
on
Form
a hy
poth
esis
Test
the
hypo
thes
is
Anal
yze
the
data Inte
rpre
t th
e da
ta
Publ
ish
resu
lts
Rete
st
Problem Identification
Study Concept
Literature Review
Environmental Scan
Funding & Proposal
Research Design
Research Methodolog
y
Research Workflow
Hypothesis Formation
Design Validation
Research Activity
Data Management
Data Organization
Data Storage
Data Description
Data Sharing
Scholarly Communication
Report Findings
Publish
Peer Review
MSU LibrariesResearch Data Management Guidance
• Introduction• Background
• The Impetus: NSF Data Management Plan Mandate• The Effect: Policy to Practice• The Response: Changing Data Landscape
• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup• Data Publishing, Sharing, & Reuse• Protecting Data & Responsible Reuse
• Data Lifecycle Resources
Agenda
MSU LibrariesResearch Data Management Guidance
Data Management• The process of
planning for and implementing a system of care for your research data before, during, and after a research project in order to ensure a (re)usable resource.
MSU LibrariesResearch Data Management Guidance
So why are we here?Good science!
Government and Research Funder Mandates
MSU LibrariesResearch Data Management Guidance
But why are we really here?• Impetus: NSF has mandated that all grant
applications submitted after January 18th, 2011 must include a supplemental “Data Management Plan”
• Effect: The original NSF mandate has had a domino effect, and many funders now require or state guidelines for data management of grant funded research
• Response: Data management has not traditionally received a full treatment in (many) graduate and doctoral curricula; intervention is necessary
MSU LibrariesResearch Data Management Guidance
Positive reinforcement….• National Science Foundation Data
Management Plan mandate (January 18, 2011)
• Presidential Memorandum on Managing Government Records (August 24, 2012)–Managing Government Records
Directive: All permanent electronic records in Federal agencies will be managed electronically to the fullest extent possible for eventual transfer and accessioning by NARA in an electronic format.
MSU LibrariesResearch Data Management Guidance
Positive reinforcement… (cont.)
• White House policy memo (February 22, 2013)– Increasing Access to the Results of Federally Funded
Scientific Research: Federal agencies with more than $100M in R&D expenditures must develop plans to make the published results of federally funded research freely available to the public within one year of publication.
• OSTP policy memo (March 20, 2014)– Improving the Management of and Access to Scientific
Collections: directs each Federal agency that owns, maintains, or otherwise financially supports permanent scientific collections to develop a draft scientific-collections management and access policy within six months.
MSU LibrariesResearch Data Management Guidance
Positive reinforcement… (cont. w/ teeth!)
• AHRQ = “…all AHRQ-funded researchers will be required to include a data management plan for sharing final research data in digital format, or state why data sharing is not possible.
• NASA = This plan extends NASA’s culture of open data access to all NASA-funded research.”
• USDA = Phased approach beginning with DMP
• More: http://www.arl.org/focus-areas/public-access-policies/federally-funded-research/2696-white-house-directive-on-public-access-to-federally-funded-research-and-data#agency-policies
MSU LibrariesResearch Data Management Guidance
Funder Policies
NASA “promotes the full and open sharing of all data”
“requires that data…be submitted to and archived by designated national data centers.”
“expects the timely release and sharing of final research data"
"IMLS encourages sharing of research data."
“…should describe how the project team will manage and disseminate data generated by the project”
MSU LibrariesResearch Data Management Guidance
Policies for re-use, re-distribution, and creation of derivatives
Plans for archiving data, samples, and other research outcomes, maintaining access
Types of data, samples, physical collections, software generated
• Standards for data and metadata format and content
• Access and sharing policies, with stipulations for privacy, confidentiality, security, intellectual property, or other rights or requirements
MSU LibrariesResearch Data Management Guidance
• NSF will not evaluate any proposal missing a DMP
• PI may state that project will not generate data
• DMP is reviewed as part of intellectual merit or broader impacts of application, or both
• Costs to implement DMP may be included in proposal’s budget
• May be up to two pages long
MSU LibrariesResearch Data Management Guidance
• Investigators seeking $500,000 or more in direct costs in any year should include a description of how final research data will be shared, or explain why data sharing is not possible.
• The precise content of the data-sharing plan will vary, depending on the data being collected and how the investigator is planning to share the data.
• More stringent data management and sharing requirements may be required in specific NIH Funding Opportunity Announcements. Principal Investigators must discuss how these requirements will be met in their Data Sharing Plans.
MSU LibrariesResearch Data Management Guidance
Roles and responsibilities Expected Data Period of data retention• Data formats and dissemination• Data storage and preservation of access
MSU LibrariesResearch Data Management Guidance
Local PolicyUniversity Research Council Best Practices:https://rio.msu.edu/research-data
Research Data: Management, Control, and Access– To assure that research data are appropriately
recorded, archived for a reasonable period of time, and available for review under the appropriate circumstances.• Ownership = MSU• “Stewardship” = You• Period of Retention = 3 years• Transfer of Responsibility = Written Request
MSU LibrariesResearch Data Management Guidance
Broader Response: Changing Data Landscapes
• Data Management Competencies– Standards & Best Practices– Discipline Specific Discourse
• Data sharing and open data– Data sets as publications– Data journals– Citations for data (e.g., used in secondary
analysis)– Data as supplementary materials to traditional
articles– Data repositories and archives
MSU LibrariesResearch Data Management Guidance
Curation responsibilities (Carlson, The Chronicle, 2006)
“Data from Big Science is … easier to handle, understand and archive.
Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.”
big science data
small science data
institution?domain?
MacColl, John (2010). The Role of libraries in data curation. RLG Partnership Annual Meeting, Chicago. June 2010
MSU LibrariesResearch Data Management Guidance
What’s in it for me?• Better organization = less headaches– Course management – Bibliographic management– File management– Research
• Career advancement– Publish datasets and list on your CV– Data management is an “unnamed practice”
– name it for yourself and your students!
MSU LibrariesResearch Data Management Guidance
Data Sharing Impacts• Reinforces open
scientific inquiry • Encourages diversity of
analysis and opinion • Promotes new research,
testing of new or alternative hypotheses and methods of analysis
• Supports studies on data collection methods and measurement
Cc http://www.flickr.com/photos/pinchof_10/
MSU LibrariesResearch Data Management Guidance
Data Sharing Impacts• Facilitates education
of new researchers • Enables exploration
of topics not envisioned by initial investigators
• Permits creation of new datasets by combining data from multiple sources
MSU LibrariesResearch Data Management Guidance
• Introduction• Background
• The Impetus: NSF Data Management Plan Mandate• The Effect: Policy to Practice• The Response: Changing Data Landscape
• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup• Data Publishing, Sharing, & Reuse• Protecting Data & Responsible Reuse
• Data Lifecycle Resources
Agenda
MSU LibrariesResearch Data Management Guidance
Research Data Management Fundamentals
• Documentation• File Organization• Storage & Backup• Data Publishing, Sharing,
& Reuse• Protecting Data
& Responsible Reuse
MSU LibrariesResearch Data Management Guidance
Documentation Practices: Overview
• Researchers benefit from proper documentation to decipher or reuse their datasets – even prior to thinking about sharing
• Think “downstream”
MSU LibrariesResearch Data Management Guidance
Documentation Practices: Overview1. At minimum create a
README file that you can use to document your project
2. Utilize standards for describing data including Metadata Standards
3. If applicable, use in-line code commentary to explain code
(cc) Will Scullin
MSU LibrariesResearch Data Management Guidance
Create a README file
• At minimum, store documentation in readme.txt file or equivalent, with data– What data consists of– How it was collected– Restrictions to distribution or use– Other descriptive information
MSU LibrariesResearch Data Management Guidance
• “Data about data”• Standardized way of describing data • Explains who, what, where, when of
data creation and methods of use• Data more easily found• Data more easily compared to other
data sets
Use Metadata Standards
MSU LibrariesResearch Data Management Guidance
Use Metadata Standards
Basic project metadata:• Title • Language • File Formats• Creator • Dates • File Structure• Identifier • Location • Variable List• Subject • Methodology • Code Lists
• Funders • Data Processing • Versions• Rights • Sources • Checksums• Access
Information• List of File Names
MSU LibrariesResearch Data Management Guidance
Use Metadata Standards• Dublin Core: Commonly-used descriptive
metadata format facilitates dataset discovery across the Web.
• Data Documentation Initiative (DDI): Defines metadata content, presentation, transport, and preservation for the social and behavioral sciences.
• ISO 19115:2003: Describes geographic data such as maps and charts.
• More examples:http://www.lib.msu.edu/about/diginfo/collect.jsp
MSU LibrariesResearch Data Management Guidance
Use In-Line Code Commentary
Example of R code commentary
# Cumulative normal densitypnorm(c(-1.96,0,1.96))
• If applicable, in-line code commentary helps explain code
MSU LibrariesResearch Data Management Guidance
File Organization Practices: Overview
1. Design a file plan for your research project
2. Use file naming conventions that work for your project
3. Choose file formats to maximize usefulness
“When I was a freshmen I named my assignments Paper Paperr Paperrr Paperrrr”-Undergrad
MSU LibrariesResearch Data Management Guidance
Design a File Plan• File structure is the framework• Classification system makes it easier to
locate folders/files• Benefits:– Simple organization intuitive to team
members and colleagues– Reduces duplicate copies in personal
drives and e-mail attachments
MSU LibrariesResearch Data Management Guidance
Design a File PlanChoose a sortable directory hierarchy
• Example 1: Investigator, Process, DateCollie
TEI_Encoding20110117
• Example 2: Instrument, Date, SampleUsability Survey
2012043sample_1
MSU LibrariesResearch Data Management Guidance
Design a File Plan
Example documentation of Directory Hierarchy: /[Project]/[Grant Number]/[Event]/[Investigator/Date]
MSU LibrariesResearch Data Management Guidance
Use File Naming Conventions
– Enable better access/retrieval of files– Create logical sequences for file sorting– More easily identify what you’re
searching for
MSU LibrariesResearch Data Management Guidance
• Meaningful but short—255 character limit• Use alphanumeric characters – Example: abc123
• Capital letters or underscores differentiate between words
• Surname first followed by initials of first name
Use File Naming Conventions
MSU LibrariesResearch Data Management Guidance
• Year-month-day format for dates, with or without hyphensExample 1: 2006-03-13Example 2: 20060313
• Decide on a simple versioning methodExample: file_v001
Use File Naming Conventions
MSU LibrariesResearch Data Management Guidance
• To create consistent file names, specify a template such as:
[investigator]_[descriptor]_[YYYYMMDD].[ext]
Use File Naming Conventions
This Not ThissharpeW_krillMicrograph_backscatter3_20110117.tif KrillData2011.tif
This Not ThisborgesJ_collocation_20080414.xml Borges_Textbase.xml
MSU LibrariesResearch Data Management Guidance
Choose Appropriate File Formats
• Non-proprietary• Open, documented standard• Common usage by research community• Standard representation (ASCII,
Unicode)• Unencrypted• Uncompressed
MSU LibrariesResearch Data Management Guidance
Choose Appropriate File Formats
Format Genre Optimal Standards TEXT .txt; .odt; .xml; .html
AUDIO .flac; .wav,
VIDEO .mp2/.mp4; .mkv
IMAGE .tif; .png; .svg; .jpg
DATA .sql; .csv
MSU LibrariesResearch Data Management Guidance
Storage & Backup Practices
1. Avoid single points of failure
2. Ensure data redundancy & replication
3. Understand common types of storage
(cc) George Ornbo
Data at significant risk of loss without storage and backup plan
MSU LibrariesResearch Data Management Guidance
Avoid Single Points of FailureA single point of failure occurs when it would only take one event to destroy all data on a device
• Use managed networked storage when possible• Move data off of portable media• Never rely on one copy of data• Do not rely on CD or DVD copies to be readable• Be wary of software lifespans
MSU LibrariesResearch Data Management Guidance
Ensure Data Redundancy• Effective data storage plan provides for
3 copies:– Primary authoritative copy– Secondary local backup– Tertiary remote backup
• Geographically distribute and secure– Local vs. remote, depending on needed
recovery time • Personal computer, external hard
drives, departmental, or university servers may be used
MSU LibrariesResearch Data Management Guidance
Ensure Data Redundancy
• Cloud storage – Amazon s3– Google–MS Azure– DuraCloud– Rackspace– Glacier
Note that many enterprise cloud storage services include a charge for in/out of data transfers
$$$
MSU LibrariesResearch Data Management Guidance
Understand Common Types of Storage
• Optical Media• Portable Flash Media• Commercial Hard Drives• Commercial NAS• Cloud Storage• Enterprise Network Storage• Trusted Archival Storage
MSU LibrariesResearch Data Management Guidance
Understand Common Types of Storage
• Features of storage types:• Portable data transfers• Short-term storage• Project term storage• Networked data transfer• Long-term storage• Reliable backup option
MSU LibrariesResearch Data Management Guidance
Understand Common Types of StoragePortable
Data Transfer
Short Term Storage
Project Term Storage
Networked Data Transfer
Long Term Storage
Reliable Backup Option
Optical Media ✔ ✗ ✗ ✗ ✗ ✗Portable Flash Media
✔ ✔ ✗ ✗ ✗ ✗Commercial Hard Drives
✔ ✔ ✔ ✗ ✗ ✗Commercial NAS ✗ ✔ ✔ ✔ ✗ ✗Cloud Storage ✗ ✔ ✔ ✔ ✗ ✗Enterprise Network Storage ✗ ✔ ✔ ✔ ✔ ✔
Trusted Archival Storage ✗ ✗ ✗ ✔ ✔ ✔
MSU LibrariesResearch Data Management Guidance
Understand Common Types of Storage
Media Storage @ MSU
Optical Media MSU Computer Store—Sells Optical Media and hardware accessoriesUAHC Media Storage Service—Offers physical lock-box like storage for MSU
Flash Media MSU Computer Store—Sells Optical Media and hardware accessoriesUAHC Media Storage Service—Offers physical lock-box like storage for MSU
Commercial Hard Drives
MSU Computer Store—Sells Optical Media and hardware accessories.UAHC Media Storage Service—Offers physical lock-box like storage for MSU
Enterprise Cloud Storage
Angel—Free. Ideal for collaboration; not storage space. Phase out 2015Desire2Learn—Free. Ideal for collaboration; not storage space. Replaces AngelGoogleApps—Free. Ideal for collaboration; not intended as storage space
Enterprise Network Storage
AFS Space—Free to 1GB, add’l space can be purchased w/dept. accountIT Services Individual, Mid-Tier and Enterprise Storage—Fee basedHPCC Home or Research—Free up to 1TB. Fee based additions available
Trusted Archival Storage
Disciplinary Repositories – Disciplinary repositories offer archival services for pertinent research data.
MSU LibrariesResearch Data Management Guidance
Data Publishing, Sharing, Reuse
1. Time-intensive, with potentially high return on investment
2. Publish data in several data publication venues to morebroadly share results of research
Research datasets on par with peer-reviewed journal articles as first-class scholarly contributions
MSU LibrariesResearch Data Management Guidance
Sharing & Publishing Data• Data preparation for sharing and
publication is a time-intensive process• Potential positive outcomes:• Increased research impact and citations• Enable additional scientific inquiry• Opportunities for co-authorship and
collaboration• Enhance your grant proposal’s
competitiveness
MSU LibrariesResearch Data Management Guidance
Data Publication Venues• Multiple ways to publish research data• Faculty or project website• Journal supplementary materials• Disciplinary data repository (data
archive)• Varying levels of support for indexing,
access controls, and long-term curation
MSU LibrariesResearch Data Management Guidance
Data Publication Venues• Disciplinary Data Repository• Securely share data, ensure long-term
access• High visibility• Often offer persistent citations• Availability varies across domains• Databib.org directory
MSU LibrariesResearch Data Management Guidance
Data Publication Venues• Disciplinary Data Repository• Securely share data, ensure long-term
access• High visibility• Often offer persistent citations• Availability varies across domains• Databib.org directory
MSU LibrariesResearch Data Management Guidance
Protecting Data & Responsible Reuse1. Consider how to protect
data and intellectual property rights while encouraging reuse
2. Keep in mind ethical concerns when sharing data
(cc) Will Scullin
MSU LibrariesResearch Data Management Guidance
Intellectual Property• IP refers to exclusive rights of creators
of works• Individual data cannot be protected by
US copyright• Organization of data such as database,
creative work produced by data, and research instruments used may be protected ©
MSU LibrariesResearch Data Management Guidance
Intellectual Property• Principal investigator’s institution holds
IP rights• Provide clearly stated license for
producing derivatives, reusing, and redistributing datasets• License under Creative Commons• State if any restrictions or embargos on
use• Provide example of how work should be
cited to encourage proper attribution on reuse
• Document any IP / copyright issues
MSU LibrariesResearch Data Management Guidance
Ethics & Data Sharing• Keep in mind the following ethical
concerns when sharing your data:• Privacy• Confidentiality• Security and integrity of the data
• For data involving human subjects, obtain written permission or consent stating how the data may be reused
MSU LibrariesResearch Data Management Guidance
Best Practices = High Impact Data• File organization ensures easier access
and retrieval of data• Documentation makes datasets
accessible and intelligible to users• Storage and backup safeguards data• Data publishing and sharing
encourages the most widespread reuse of data
• Data protection ensures responsible reuse
MSU LibrariesResearch Data Management Guidance
• Introduction• Background
• The Impetus: NSF Data Management Plan Mandate• The Effect: Policy to Practice• The Response: Changing Data Landscape
• Fundamentals Practices• File Organization• Data Documentation• Reliable Backup• Data Publishing, Sharing, & Reuse• Protecting Data & Responsible Reuse
• Data Lifecycle Resources
Agenda
MSU LibrariesResearch Data Management Guidance
http://www.lib.msu.edu/rdmg
MSU LibrariesResearch Data Management Guidance
ContactAaron Collie
[email protected] @aaroncollie
http://www.lib.msu.edu/rdmg