RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly...

56
Douglas Burns Research Data and GIS Librarian University of North Texas Libraries Willis Library, Room 155 940-369-6456 [email protected] [email protected] Pamela Andrews Repository Librarian University of North Texas Libraries Willis Library, Room 356 940-891-6703 [email protected] RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE

Transcript of RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly...

Page 1: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Douglas BurnsResearch Data and GIS Librarian

University of North Texas LibrariesWillis Library, Room 155

[email protected]

[email protected]

Pamela AndrewsRepository Librarian

University of North Texas LibrariesWillis Library, Room 356

[email protected]

RESEARCH DATA MANAGEMENT AND THE DATA

LIFECYCLE

Page 2: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and
Page 3: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

AGENDA

• What are data?

• Understanding data management / lifecycle

• Why is data management important

• Data management plans

• Data management planning tools

• Organizing data & file management

• Formats & documentation

• Storage & security

• Sharing, archiving & metadata

• What UNT has to offer

Based on the Research Data Management and Sharing course (Coursera, 2017)

DA

TAP

LAN

NIN

G

OR

GA

NIZ

ING

UN

T

Page 4: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

What are data?

Facts and statistics collected together for reference or analysis

Page 5: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

What are data?

“Most of the useful data in the world, from economic data to news content to geographic

information, lives somewhere on the internet…”

From DataCamp e-mail advertising their “Working with Web Data in R” course (received 10/4/2017)

Page 6: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Library Data Sourceshttp://www.library.unt.edu/

Page 7: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

A Not Comprehensive List of Data Sources

(In collaboration with other ASEAN nations)

(Uses Socrata, a licensed platform)

(Commercial subscription)

Page 8: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Even Bigfoot Has DataSo

urc

e: b

fro

.net

Page 9: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Let’s Evaluate a Bigfoot Datapoint…

Source: http://www.bfro.net/gdb/show_report.asp?id=28568

OBSERVED: After a late concert at the square in Canton, Illinois, my friend was bicycling home and turned right into the parking lot, that leads past a shed, fence, and dumpster.

Hearing a shuffling bang noise near the dumpster, she slowed her bike and began looking. Behind the shed was more than a noise. A shadowy sasquatch shape. Its puffy balloon feet (as she described it), held up a stocky, not so tall frame (6 ft.).

Startled, it turned and walked into the bushes, moving branches to get into the woods. This was a thinly wooded area, and on the other side, a corn field.

So freaked out she said, "what is that?", and sped right home and told me. I could tell she was worried, and by her description, this did happen.…But that noise, was it inside the dumpster?

So I slowly snuck around the corner. Only one cover of the four covers was open on the dumpster, so when I was ready, I shown the light through it. A small [raccoon] was trapped. Unable to climb out of an almost empty dumpster. It was out of breath, and sad looking, so I put small cans and an old sign in to give it a ladder out.

I went back later that evening and the [raccoon] was out and to check the shadows around there. We think it heard the [raccoon] also, and may have ben trying to find a way inside. We don’t know if it was hungry, or just trying to help the [raccoon] out.

Page 10: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Crowdsourcing?North Korea (DPRK) Mission, 1950s topos

Digital Humanitarians

Haiti Earthquake Crisis Response

Page 11: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Belief: To Trust or Not to Trust

Source: http://theoatmeal.com/comics/believe

Be a healthy skeptic.

Page 12: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

A Not Comprehensive List of Data Programs

Page 13: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 14: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Understanding Data Management

PROCESS WHEREBY DATA IS CONTROLLED TO

ACHIEVE A DESIRED GOAL

Page 15: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Data Management: Data LifecycleCreating Data• design research• plan data management (formats, storage)• plan consent for sharing• locate existing data• collect data (experiment, observe, measure, simulate)• capture and create metadata

Processing Data• enter data, digitize, transcribe, translate• check, validate, clean data• anonymize data where necessary• describe data• manage and store data

Analyzing Data• interpret data• derive data• produce research outputs• author publications• prepare data for preservation

Page 16: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Data Management: Data LifecyclePreserving Data• migrate data to best format• migrate data to suitable medium• back-up and store data• create metadata and documentation• archive data

Giving Access to Data• distribute data• share data• control access• establish copyright• promote data

Re-using Data• follow-up research• new research• undertake research reviews• scrutinize findings• teach and learn

Page 17: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 18: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Why Data Management Is Important

• Think of (potential) stakeholders

• Evolving funding requirements or policies

• Increases transparency

• Reproducibility enhances research quality / authority

• Good practice

• Documentation reduces the “What did we do to get that?”

• Succession planning…

• Facilitates knowledge transfer

• Boosts long-term efficiency

• Encourages accountability

• Supports collaboration• Ex: https://www.researchgate.net/

Page 19: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

In other words: you don’t want to get caught with your pants down!

Page 20: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 21: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Data Management Plan(s)

Why we don’t have DMPs:

• Data can be intimidating…

• But we have lots of other plans… (not another one!)

• Takes time – who has that!?*

• “It’ll work out… somehow, right?”

• “We plan events… but reactto data”…?

*time is a function of priorities

Page 22: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Data Management Plan(s)

What it is:

• A document describing planned steps to manage data

• Usually a part of the larger research agenda

• A “contract” for stakeholders

• Protection for when things go awry, because they will

• A way to simplify your research

What it is not:

• Rocket science

• Nuclear physics

• Unsolicited relationship advice

• A laughing matter

Stupid jokes aside, your plan helps set the tone for future success!

Page 23: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Example 1: The Good

Source: https://dmptool.org/plans/5119.pdf

Page 24: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Example 2: The Bad

Source: https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year

Additionally:• ~50% of time spent

looking for data• ~60% of time spent

cleaning/organizing data

• ~75% potential inefficiency because of poor data structures

Estimated GDP cost due to bad data?$3 trillion

Page 25: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Example 3: The Ugly

“I’m gonna do the stuff to the things because reasons.”

Page 26: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Questions To Ask About Your Project• What types of data will be

produced? Will they be reproducible? What would happen if they were lost or became unusable later?

• How much data will there be and at what growth rate? How often will the data change?

• Who will use your data now, and later?

• Are there tools or software needed to create/process/visualize the data?

• Are there regulations, copyright, or other licensing concerns related to sharing the data?

• Do the data need to be restricted or embargoed for intellectual property reasons?

• Are there any reasons to not allow re-use?

Page 27: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 28: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Data Management Planning Tool

Source: https://dmptool.org/ and http://www.library.unt.edu/datamanagement/plans

Page 29: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and
Page 30: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 31: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Organizing Data & File Management

STAYING ORGANIZED SAVES TIME

& MAKES LIFE EASIER

• README.txt• Code book• Standardized folder structure or

file name scheme• 20170928 or 09282017?

• Versioning• Filename_v2.pdf • Ex: GitHub

• Files and formatting• *.csv, *.tsv or *.xlsx?• Numeric or text• *.docx or PDF?

Page 32: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Organizing Data & File Management

Railroad_RailroadCommisionofTexas_September

212017

rrrd_RCT_2017

Page 33: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Organizing Data & File Management

COLLECTION METHODS:• Each project will vary, but…

• Quantitative vs. Qualitative

• One time? Longitudinal?

• Eventual delivery method• Online different than print

• Ease of use/access

• How standardized?

COLLECTION PLATFORMS:• Qualtrics (UNT has account)

• iForm

• OpenDataKit

• Zoho Forms

• CKAN

• Google Forms*

• SurveyGizmo

• Hand collect / data entry

Note: this is not an endorsement of a particular software or methodology.

AFFECT

Note: while “open” may be preferred, the point here is that you should pick the best tool to get the best results. Remember: there is no such thing as a free lunch. This op-ed piece discusses that reality and is the source of the quote above.

Something to think about:“…several issues tend to get conflated into one argument – open-source vs. closed-source, free vs. paid-for, restrictive vs flexible licensing, supported vs. unsupported, code quality…”

Page 34: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 35: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Formats & Documentation

• Accessible format characteristics:• Non-proprietary

• Open, documented standard

• Common usage by your community

• Standard representation• Unicode / ASCII

• Unencrypted, uncompressed

• Licensing / copyright?• CC-BY or other

• Fair use

• Sustainable data formats:• PDF

• ASCII

• TIF or JPEG2000

• XML or RDF

• MPEG-4

Source: http://inside.mines.edu/RSS-sustainable-data-formats

Page 36: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Formats & Documentation

Page 37: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Formats & Documentation

Page 38: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 39: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Storage & Security

Page 40: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Storage & Security

While convenient… a special note on non-UNT cloud storage options: “…UNT legal agreements enable [UNT] to hold the provider accountable instead of you.”

Sources: https://itservices.cas.unt.edu/services/file/non-unt-storage/understanding-non-unt-storage-cloud and https://itservices.cas.unt.edu/services/file/non-unt-storage/understanding-cloud-storage-information-roles

What does this mean?

The State of Texas defines an information owner as a “person with statutory or operational authority for specified information…”

If data is hacked or lost, YOU are responsible!

Page 41: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Storage & Security

• 3-2-1 principle: • 3 different copies of your data

• on at least 2 different media

• with 1 at a different location

• External repositories can help

• Encryption? Locked cabinet?

• Security:• Keep data safe from corruption

• Anti-virus software

• Be mindful of phishing attempts

• Control who has access• Active directory for log-in

• Written policy / guidelines

• Remember: even an un-networked computer is still vulnerable

Page 42: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Storage & Security

BACKUP!BACKUP!BACKUP!

BACKUP USING

MULTIPLE LOCATIONS

NOT HELPFUL!

Page 43: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Storage & Security

ON THE UPSIDE, THESE FORMATS AREN’T AS LIKELY TO BE HACKED…

Page 45: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 46: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Sharing, Archiving & Metadata

Benefits:

• Reinforces scientific inquiry

• Verification and replication

• New research / methods

• Encourages diversity

• Provides teaching resources

• Reduces duplication

• Protects against fraud

• Enhances visibility

• Preserves for future use

• Helps other do better research

“A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes.” (Aarts, 2015)

Page 47: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Sharing, Archiving & Metadata

Challenges:

• Requires time & money

• Perceived risks from loss of control

• May be confidential• Health data often suppressed

• Unclear ownership / Intellectual Property

• Lack of incentives

“Sharing research data is… a conundrum.” –Christine Borgman

Page 48: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Metadata, Or Data About Data

List of metadata standards

http://iii.library.unt.edu/record=b5522057~S12

Additional reading: https://support.google.com/webmasters/answer/79812?hl=en and http://www.mequoda.com/articles/subscription_websites/understanding-the-role-of-metadata-in-google-visibility-5-best-practices/

NOTE: What is Google using? What about AI?

Page 49: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Metadata, Or Data About Data

Screenshots from Portal to Texas History editor

Page 50: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Metadata, Or Data About Data

Screenshots from ArcMap

Page 51: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 52: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

What does UNT Libraries have for YOU?

UNT Scholarly WorksUNT’s open-access, institutional repository for research, creative, and scholarly output from UNT community members

UNT Data RepositoryA central archive for the research data of our UNT scholars. Can be linked to items within UNT Scholarly Works.

Boilerplate text for DMPsBoilerplate text to insert within a DMP if using the UNT Data Repository

Page 53: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

QUESTIONS?

Page 55: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Additional Resources: Americans’ Views on Open Government Data (benchmark: 2014-2015)

Page 56: RESEARCH DATA MANAGEMENT AND THE DATA LIFECYCLE/67531/metadc1010750/m2/1/high_r… · UNT Scholarly Works UNT’s open-access, institutional repository for research, creative, and

Questions: Longer List• What types of data will be produced? Will they be

reproducible? What would happen if they were lost or became unusable later?

• How much data will there be and at what growth rate? How often will the data change?

• Who will use your data now, and later?

• Who in your research group controls the data (PI, student, lab, Mines, funder)?

• How long will the data be active?

• What directory and file naming convention will be used?

• What project and data identifiers will be assigned?

• What file formats are to be used? Are they long-lived?

• What is your data storage and backup strategy?

• When will you publish the data (research) and where?

• Who might be interested in your data in the future? Who will you share it with?

• Who in your research group will be responsible for data management and archiving?

• Have you identified a repository or archive in which to deposit your data?

• Is there an ontology or other community standard for data sharing/integration?

• How will you prepare the data (if necessary) for archiving?

• Are there good project and data documentation?

• How long should the data be retained to archived (e.g. 3-5 years, 10-20 years, permanently)?

• Are there tools or software needed to create/process/visualize the data?

• Are there special privacy or security requirements (e.g. personal data, high-security data)?

• Are there sharing requirements (e.g. funder data sharing policy)?

• Are there other funder requirements (e.g. data management plan in proposal)?

• Are there regulations, copyright, or other licensing concerns related to sharing the data?

• Do the data need to be restricted or embargoed for intellectual property reasons?

• Are there any reasons to not allow re-use?