OSU Libraries & Press SMART FROM THE START · … and metadata • Samples • Physical collections...

Post on 03-Jul-2020

1 views 0 download

Transcript of OSU Libraries & Press SMART FROM THE START · … and metadata • Samples • Physical collections...

SMART FROM THE START

OSU Libraries & Press

Clara Llebot Lorente May 7th, 2020

This is what a dissertation looks like in ScholarsArchive@OSUhttps://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/3484zn94s

OREGON STATE UNIVERSITY 1

Items• Thesis in pdf • Spreadsheets• Movies• Images• Code• Results• …

Related dataset

OREGON STATE UNIVERSITY 2

https://ir.library.oregonstate.edu/concern/datasets/hm50tx752

Related dataset

OREGON STATE UNIVERSITY 3

https://ir.library.oregonstate.edu/concern/datasets/vd66w525h

3 take home messages

1. Share the data that you generate during your research

2. Share your data as an independent data record that can be cited

3. Manage your data well during your research

OREGON STATE UNIVERSITY 4

What about human subjects?

“As open as possible, as closed as

necessary”

H2020 ORD Pilot,

European Comission

OREGON STATE UNIVERSITY 5

Hanna Barczyk for NPR

What we’ll cover

• Why?• Why publish datasets?• Why do it in a separate record?• Why data management?

• How to engage in good data management practices?

• Work with data ethically• Keep data safe• Keep data useful

OREGON STATE UNIVERSITY 6

What is…

OREGON STATE UNIVERSITY 7

?

Data are … and metadata• Samples• Physical collections• Maps• Videos and photographs• Interviews• Model results• …

OREGON STATE UNIVERSITY 8

• Experimental protocols• Code• Context information• Lab notes• …

What is data management?

Actions that contribute to effective storage,

preservation and reuseof data and

documentation throughout the research

lifecycle. OREGON STATE UNIVERSITY 9

• Data/computational science• Database administration• A research method

• What data to collect• How to collect them• How to design an experiment

Why data management?1. Because it is good for YOU

• Increases research efficiency• Saves time• Increased visibility and impact

OREGON STATE UNIVERSITY 10Piwowar & Vision, 2013 peerj.com/articles/175/

Why data management?

OREGON STATE UNIVERSITY 11

2. Because it is good for SCIENCE

• Accelerates scientific breakthrough

• Preservation• Accountability• Reproducibility

Reproducibility and open data

OREGON STATE UNIVERSITY 12Maki Naro thenib.com/repeat-after-me

How to do reproducible science?

OREGON STATE UNIVERSITY 13

Computational reproducibility: code, sofware, hardware

Statistical reproducibility: choice of statistical tests, model parameters, threshold values, etc.

Empirical reproducibility: details about non-computational empirical scientific experiments: open data.

https://doi.org/10.1101/143503

Why data management?

OREGON STATE UNIVERSITY 14

3. Because mandates from Federal agencies and other agencies require it.

• Data Management Plans

How to engage in good data management practices?

In each step of the research cycle…

1. How do we work with data ethically?

2. How do we keep data safe?

3. How do we keep data useful?

Think about Data Management and write a Data Management Plan.

OREGON STATE UNIVERSITY 15Image credits: https://www.dataone.org/data-life-cycle

What does it mean to work with data ethically?

The DataONE data life cycle

OREGON STATE UNIVERSITY 16

1. Data ethics

Protect research subjects and other sensitive data.

OREGON STATE UNIVERSITY 17

Hanna Barczyk for NPR

Research misconduct:• Data falsification• Data fabrication• Data plagiarism

1. Data ethics

What can you do with your data? Can you share it? When? How?

18

Legal framework + formal agreements

Ownership

Researchers are usually NOT the owners of research data

BUTThey can use the data for

career advancement and are responsible for it

PI is the data custodian

Funder requirements

Human Subjects

ResearchBy M

ark

War

ner

Institutional policies

1. Take home message

Talk with your team members about expectations for your project’s data

1. Responsibilities2. Internal data sharing3. External data sharing4. Expectations in the lab/field of

research/good researcher

OREGON STATE UNIVERSITY 19

1. Data ethics: give attribution

OREGON STATE UNIVERSITY 20

Science is collaborative…

How to give credit to all?

Authorship in scientific publications is an (imperfect) way of doing so

1. Data ethics: give attribution

OREGON STATE UNIVERSITY 21

How to engage in good data management practices?

In each step of the research cycle…

1. How do we work with data ethically?

2. How do we keep data safe?

3. How do we keep data useful?

Think about Data Management and write a Data Management Plan.

OREGON STATE UNIVERSITY 22Image credits: https://www.dataone.org/data-life-cycle

2. Keep data safe

• Now: keep backups• After the project:

preserve the data

OREGON STATE UNIVERSITY 23

Would your data survive…?

OREGON STATE UNIVERSITY 24

You start your computer as usual when you get to work and instead of your files, you get the Windows blue screen of death. After going to the IT Service Desk the conclusion is that your computer cannot be repaired.

Would your data survive…?

OREGON STATE UNIVERSITY 25

You arrive to work this morning and realize that somebody got in, and stole all the valuable technology in the office, including all the computers and external hard drives.

Would your data survive…?

OREGON STATE UNIVERSITY 26

University of Southampton, School of Electronics and Computer Science, Southampton, UK, 2005

There has been a fire at the University. Unfortunately, it has affected your office, and the room where your department keeps the shared drive.

Would your data survive…?

OREGON STATE UNIVERSITY 27

You have been working on a deadline for hours and are very tired. You accidentally delete a data file, and don’t realize that until the next day.

Would your data survive…?

OREGON STATE UNIVERSITY 28

There is a glitch in your cloud provider, and several files are automatically destroyed. The cloud company apologizes profusely, but is not able to restore the files.

Backups and storage

OREGON STATE UNIVERSITY 29

Rule of Threes•Primary Local•External Local•External Remote

Original (working)

External local

External remote

2. Keep data safe

• Now: keep backups• After the project:

preserve the data

OREGON STATE UNIVERSITY 30

Preservation of digital content

OREGON STATE UNIVERSITY 31

Traditional content is easy to preserve

Digital content is delicate. Digital preservation is HARD!

CC-BY Quinn Dombrowski

What is a data repository?

• A place where to storedata

• A place to make data publicly available -findable

• A place to preserve your data

OREGON STATE UNIVERSITY 32

Why use a data repository?

Share your data: open

science

OREGON STATE UNIVERSITY 33

Comply with your Data

Management Plan

Give credit to data creators

Preserve your data

Sharing data: repositories

OREGON STATE UNIVERSITY 34

Search domain specific repositories: www.re3data.org

OREGON STATE UNIVERSITY 35

ScholarsArchive@OSU

https://ir.library.oregonstate.edu/

Who is it for?What can you store in it?How to get help?

How to engage in good data management practices?

In each step of the research cycle…

1. How do we work with data ethically?

2. How do we keep data safe?

3. How do we keep data useful?

Think about Data Management and write a Data Management Plan.

OREGON STATE UNIVERSITY 36Image credits: https://www.dataone.org/data-life-cycle

3. Keep data usefulB. Organized

OREGON STATE UNIVERSITY 37

A. Documented -metadata

By Alan Levine

A. Data documentation: metadata

OREGON STATE UNIVERSITY 38

Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.

NISO, Understanding Metadatahttp://www.niso.org/publications/press/UnderstandingMetadata.pdf

• Lab notebooks• Questionnaires, codebooks• Software syntax and output files• Info about equipment settings

• Database schema• Methodology reports• Provenance info about sources of derived

or digitized data• Data dictionaries…

What is metadata?

OREGON STATE UNIVERSITY 39

WHO created the data?WHAT is the content of the data?WHEN were the data created?WHERE is it geographically?HOW were the data developed?WHY were the data developed?

Metadata is: Data ‘reporting’

When to generate metadata?

OREGON STATE UNIVERSITY 40

All the time!!!From beginning to end

OREGON STATE UNIVERSITY 41

unstructured structured

readme.txt Metadata standards• Structure to describe data with:

o Common terms to allow consistencyo Common definitions for easier interpretationo Common language for ease of communicationo Common structure to quickly locate information

• In search and retrieval, standards provide:o reliable and predictable format for computer

interpretationo A uniform summary description of the dataset

•Context for the data•Content of the data package•Catalog of data fields

B. Organize your data

OREGON STATE UNIVERSITY 42

”Someone unfamiliar with yourproject should be able to look at your computer files and understand in detail what you didand why”

” Everything you do, you willprobably have to do over again”

Noble, 2009Noble WS (2009) A Quick Guide to Organizing Computational Biology Projects. PLoSComput Biol 5(7): e1000424. doi:10.1371/journal.pcbi.1000424

B. Organize your data: meaningful filenames!

OREGON STATE UNIVERSITY 43

project_instrument_location_YYYY-MM-DD-hhmmss_extra.extIndex/grantConditions

s/n, variable Date: retain order

Other infoAvoid spaces

Avoid % ^ & $ # | : and similarLowercase less software dependent

B. Organize your data: meaningful filenames!

OREGON STATE UNIVERSITY 44

Order by type:• Notes_Gorer_1963-12-15.docx• Notes_MassObs_1955-04-12.docx• Questionnaire_Gorer_1963-12-15.pdf• Questionnaire_MassObs_1955-04-12.pdf

Forced order with numbering:• 01_MassObs_questionnaire_1955-04-12.pdf• 02_MassObs_notes_1955-04-12.docx• 03_Gorer_questionnaire_1963-12-15.pdf• 04_Gorer_notes_1963-12-15.docx

Order by date:• 1955-04-12_notes_MassObs.docx• 1955-04-

12_questionnaire_MassObs.pdf• 1963-12-15_notes_Gorer.docx• 1963-12-15_questionnaire_Gorer.pdf

Order by subject:• Gorer_notes_1963-12-15.docx• Gorer_questionnaire_1963-12-15.pdf• MassObs_notes_1955-04-12.docx• MassObs_questionnaire_1955-04-

12.pdf

Organize your data: file structure

OREGON STATE UNIVERSITY 45

samples.mat

data

Organize your data: file structure

OREGON STATE UNIVERSITY 46

samples.mat

New versiondata

Organize your data: file structure

OREGON STATE UNIVERSITY 47

samples.old.mat

samples.matdata

Organize your data: file structure

OREGON STATE UNIVERSITY 48

samples.old.mat

samples.old2.mat

samples.mat

data

Organize your data: file structure

OREGON STATE UNIVERSITY 49

samples.old.mat

samples.old2.mat

samples.mat

data

In general, renaming or moving files is bad practice:• Makes it harder to reproduce results• Makes it harder to find data later• Breaks scripts and symbolic links.

Organize your data: file structure

OREGON STATE UNIVERSITY 50

samples.mat

samples.V2.mat

samplesFinal.mat

samplesFinalV2.mat

samples_USE_THIS_ONE.mat

Adding new filenameswithoutstructure is notmuch better…

Which one is the most recent??

data

Organize your data: file structure

OREGON STATE UNIVERSITY 51

samples.mat

samples.mat

samples.mat

data

2016-10-15

2016-11-14

2016-09-28

We are not renaming files and it is clear which version is newer.BUT we do not know differences between data sets.

B. Data documentation: readme.txt

OREGON STATE UNIVERSITY 52

readme.txt•Context for the data

•Content of the data package•Catalog of data fields

Data documentationData

readme.txt

data

2016-11-15

2016-09-28

Data

OREGON STATE UNIVERSITY 27

Data documentationMany data files

readme.txt

data

2016-11-15

2016-09-28

readme.txt

Many data files

readme.txt

OREGON STATE UNIVERSITY 27

Need help? Contact us

55

•One on one consultations about research data management.• Data Management Plans• Documentation and organization of data• Data curation for deposit in a repository.• Any aspect of the data life cycle.

•Deposit your data and publications to ScholarsArchive@OSU•Workshops and class visits on data management•Author’s Rights and Intellectual Property Issues

Clara Llebot Lorente | Data Management Specialist

clara.llebot@oregonstate.eduhttp://bit.ly/OSUData