Data Management 101
-
Upload
kristin-briney -
Category
Technology
-
view
111 -
download
0
description
Transcript of Data Management 101
![Page 1: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/1.jpg)
Data Management 101
Kristin Briney, PhDData Services Librarian
![Page 2: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/2.jpg)
Do You Still Have Your Data?
• What if your hard drive crashes?• What if you are accused of fraud?• What if your collaborator abruptly quits?• What if the building burns down?• What if you need to use your old data?• What if your backup fails?• What if your computer gets stolen?• What if…
![Page 3: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/3.jpg)
What Are Data?
• Observational– Sensor data, telemetry, survey data, sample data, images
• Experimental– Gene sequences, chromatograms, toroid magnetic field
data• Simulation– Climate models, economic models
• Derived or compiled– Text and data mining, compiled database, 3D models,
data gathered from public documents
![Page 4: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/4.jpg)
Why Data Management?
• Don’t loose data• Find data more easily– Especially if you need older data
• Easier to analyze organized, documented data• Avoid accusations of fraud & misconduct• Get credit for your data• Don’t drown in irrelevant data
![Page 5: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/5.jpg)
For each minute of planning at beginning of a project, you will save 10 minutes of headache later
![Page 6: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/6.jpg)
What This Session Covers
• Introduction to a few topics in data management– File organization conventions– Documentation– Storage and backups
![Page 7: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/7.jpg)
What This Session Covers
• Hands-on exercises in each topic• My goal is to offer practical, usable solutions– Recognize that I can’t cover everything
![Page 8: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/8.jpg)
Introduce Yourself!
• Name• Department• Most common data format– Text, Excel, SPSS, Google Docs, etc.
![Page 9: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/9.jpg)
FILE ORGANIZATION CONVENTIONS
![Page 10: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/10.jpg)
File Naming Conventions
• Make it easier to find files• Avoid many duplicates– Especially when you’re not sure which is the latest
or most correct!• Make it easier to wrap up a project because
you know which files belong to it!
![Page 11: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/11.jpg)
File Naming Conventions
• Files should be named consistently• Files names should be descriptive but short
(<25 characters)• Use underscores instead of spaces• Avoid these characters: “ / \ : * ? ‘ < > [ ] & $
• Use the file dating convention: YYYY-MM-DD– This works well with a lab notebook
![Page 12: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/12.jpg)
File Versioning
• Why?– If you only have one copy and you make a
mistake…– If your data is stored in multiple locations
![Page 13: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/13.jpg)
File Versioning
• For analyzed data, use version numbers• Save files often to a new version• Label the final version FINAL
• For code, consider GIT or SVN
![Page 14: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/14.jpg)
File Organization
• Any system is better than none• Possibilities– One project, one folder (for small projects)– Separate folders for data or project stages – Separate folders for different types of data– Date-based folders (pairs well with a lab
notebook!)
![Page 15: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/15.jpg)
What To Avoid
• One person data hoards• Data scattered across several machines– Not backups! Backups are fine
• Storage that doesn’t mirror “ownership”– If it’s communal, it belongs in a communal place– If data collection happens on an individual’s
machine, that doesn’t mean the data should stay there!
![Page 16: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/16.jpg)
Document Your Conventions
• No point to have a system without documentation– README.txt• Use .txt over .doc because it’s more durable
– Front cover of research notebook– A printout by the computer– Etc.
![Page 17: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/17.jpg)
Document Your Conventions
• In project-wide README.txt– Basic project information• Title• Contributors• Grant info• etc.
– Contact information for at least one person– All locations where data live, including backups– Useful information about the files and how they’re
organized
![Page 18: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/18.jpg)
Exercise: File Naming Conventions
• Develop a file naming convention for your most common data type
![Page 19: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/19.jpg)
DOCUMENTATION
![Page 20: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/20.jpg)
What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them?
![Page 21: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/21.jpg)
Documentation
• Consider the differences between– someone inside your lab– someone outside your lab but in your field– someone outside your field
• Two parts: metadata and methods
![Page 22: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/22.jpg)
Documentation
Methods• How the data were
gathered• How the data should be
interpreted• What you did
– Limitations on what you did
• …build trust in your data
Metadata• What you’re looking at• Who made it and when• How it got there• What it means and• What you can do with it
• …before you even look at the file
![Page 23: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/23.jpg)
Methods
• Examples of methods to document– Code– Survey– Codebook– Data dictionary– Anything that lets someone reproduce your results
• Don’t forget the units!
![Page 24: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/24.jpg)
Metadata
• Informal and formal description of data
• Informal standard can fit your unique research• Benefits of a formal standard– Completeness– Aids in sharing– Often required for deposit into a repository• May be required by your funder
![Page 25: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/25.jpg)
Metadata
• Tons of formal standards available across many, many disciplines
• Consult– Disciplinary repository– Your peers– Subject librarian– Data Services Librarian
![Page 26: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/26.jpg)
Metadata
• Decide on a metadata standard before you collect the data!– Easier to record metadata when collecting data
than to convert later
• Standard or no, keep metadata CONSISTENTLY and COMPUTABLY whenever you can
![Page 27: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/27.jpg)
Metadata Standard: Dublin Core
• contributor• coverage• creator• date• description• format• identifier• language
• publisher• relation• rights• source• subject• title• type
![Page 28: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/28.jpg)
Metadata Example• Contributor
– Jane Collaborator
• Creator– Kristin Briney
• Date– 2013 Apr 15
• Description– A microscopy image of
cancerous breast tissues under 20x zoom. This image is my control, so it has only the standard staining describe on 2013 Feb 2 in my notebook.
• Format– JPEG
• Identifier– IMG00057.jpg
• Relation– Same sample as images
IMG00056.jpg and IMG00055.jpg
• Subject– Breast cancer
• Title– Cancerous breast tissue control
![Page 29: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/29.jpg)
Exercise: Documentation
• For your most common data type, make a list of the most important information to record for each dataset
![Page 30: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/30.jpg)
STORAGE AND BACKUPS
![Page 31: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/31.jpg)
A Note on Security
• Does your data fall under the following?– HIPAA
• Health information
– FERPA• Student information
– FISMA• Government subcontractor
– Human subject research, etc.
Ask for help!
![Page 32: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/32.jpg)
A Note on Security
• Secure storage• Controlled access• De-identification of personal information• Security training
![Page 33: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/33.jpg)
UWM Security Resources
• UWM Information Security Office– Visit: https://www4.uwm.edu/itsecurity/– Email: [email protected]
• Certificate in Information Security• HIPAA– https://www4.uwm.edu/legal/hipaa/index.cfm
• FERPA– http://www4.uwm.edu/academics/ferpa.cfm
![Page 34: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/34.jpg)
Storage
• Library motto: Lots of Copies Keeps Stuff Safe!• Rule of 3: 2 onsite, 1 offsite
• Storage run by experts is more reliable than storage you run yourself– It costs more, but that’s for a reason
![Page 35: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/35.jpg)
Storage Options
• Computer• USB/flash drive• CDs/DVDs• External hard drive• Shared drives/servers• Tape backup• Cloud storage
![Page 36: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/36.jpg)
Your Computer
• You’re using it, but should you keep data on it?– What happens if you lose it?– What happens if it is stolen?– What happens if it breaks?– Will the data stay there as long as you are required
to keep them?• Don’t be disorganized• Don’t keep sensitive data here• Verdict: By itself it is not enough
![Page 37: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/37.jpg)
USB/Flash Drive
• Pros– Small, convenient package– Big enough for a wide variety of datasets
• Cons– Will you remember to back your data up onto it?– Easy to lose– Easy to perpetuate out-of-date copies
• Verdict: good for data transport, but not for backup
![Page 38: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/38.jpg)
CD-ROMs/DVD-ROMs
• Pros– More reliable (but if one does fail, you won’t know until it’s
too late)– Portable
• Cons– Will you remember to back your data up onto it? – Hassle to deal with– Slow to write to– Difficult to keep track of old copies
• Verdict: Not good for quick backup, and just okay for periodic offsite backup
![Page 39: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/39.jpg)
External Hard Drive
• Pros– Relatively cheap– Large storage capacity
• • Cons– You have to set up, maintain, and audit it yourself– Some brands are less reliable– Disorganization a problem
• Verdict: Coupled with automatic-backup software, an okay choice for onsite backup– You’ll still want a second backup offsite
![Page 40: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/40.jpg)
Shared Drives/Servers
• Pros– Keeps data off your easily-stolen laptop– Not your problem to manage– Shared costs typically mean lower costs
• Cons– Who’s managing the thing? Are they competent?– Can have storage quotas– Can be hard to get to outside the lab or the office
• Verdict: If well-managed, a good choice for regular use, onsite, or offsite backup– Beware the dusty Linux box under the desk!
![Page 41: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/41.jpg)
Tape Backup
• Pros– Can happen near-invisibly– Highly reliable– Tolerably secure (not always on network)
• Cons– Can be hard or slow to get data back– Not always audited as often as they should be
• Verdict: Good for onsite or offsite backup, if somebody else is running them and you know they’re regularly audited
![Page 42: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/42.jpg)
Cloud Storage
• Pros– Convenient syncing– Cheap– If client-side encryption is involved, decently secure
• Cons– Required network connection
• Possible security risks and inconvenience if off-network
– Ongoing (and out of your control) costs– Your backup is hostage to their business risks– Reliability, security, privacy not guaranteed
• Verdict: For savvy shoppers, fine for offsite backup. A little risky for your only backup.
![Page 43: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/43.jpg)
Exercise: Storage
• Conduct a quick inventory of your data– What datasets do you have?– How big are they?
• Inventory where your files are currently stored, including backups. How safe are your data?
![Page 44: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/44.jpg)
Backups
• Any backup is better than none• Automatic backup is better than manual• Your research is only as safe as your backup
plan– Lots of horror stories here
![Page 45: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/45.jpg)
Ideal Backup Characteristics
• Low effort• High reliability• As secure as necessary– Tradeoffs between security and convenience
• As open as possible to collaborators• Well organized
![Page 46: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/46.jpg)
Check Your Backups
• Backups only as good as ability to recover data• Test your backups periodically– Preferably a fixed schedule– 1 or 2 times a year may be enough– Bigger/more complex data should be checked
more often• Test your backup whenever you change things
![Page 47: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/47.jpg)
A Final Note
• Must retain data at least 3 years post-project per OMB Circular A-110– Better to retain for >6 years
• Consider letting someone else worry about this– A disciplinary repository– The UWM Digital Commons
![Page 48: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/48.jpg)
Exercise: Backups
• Sketch out your ideal backup system, and identify the first step in getting to there from your current system.
![Page 49: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/49.jpg)
WHERE TO GO FROM HERE
![Page 50: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/50.jpg)
Where to Go from Here
• Talk to your coworkers– …but be aware you might not be able to change
things– Discuss• Common schemes for metadata and file naming• Centralized documentation• Robust backup
• Use good practices and be a model for others
![Page 51: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/51.jpg)
UWM Resources
• Data management resources– dataplan.uwm.edu
• Information Security Office– www4.uwm.edu/itsecurity/
• Data Services Librarian– Kristin Briney, [email protected]
![Page 52: Data Management 101](https://reader036.fdocuments.us/reader036/viewer/2022081516/54c6711e4a7959ec028b4590/html5/thumbnails/52.jpg)
Thank You!
• This presentation available under a Creative Commons Attribution (CC-BY) license
• Some content courtesy of Dorothea Salo – http://dsalo.info/– http://www.graduateschool.uwm.edu/research/researcher-
central/proposal-development/data-plan/boot-camp/