Research Data Management, BExIS Hands-On Workshop
Transcript of Research Data Management, BExIS Hands-On Workshop
Research Data Management
BExIS Hands-On Workshop
Tayebeh Kiani, Javad ChamanaraFebruary 2016
Tehran, Iran
2BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Scientific Data
“The recorded information (regardless of the form or the media in which they may exist) necessary to support or validate a research project’s observations, findings or outputs.”
-University of Oxford
5BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Considerations of DIR
• Data Acquisition• Data Processing/Analysis• Result reproduction• Availability of data• Teamwork and data sharing• Digital rights• Referencing and citation
Data Management is needed
6BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Need for Data Management
“Data management refers to all aspects of creating, housing, delivering, maintaining, and archiving and preserving data. It is one of the essential areas of responsible conduct of research.”
-MANTRA 2013
7BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
DM is done through a Lifecycle
Boston University The UniversityOf Alabama
The University Of Virginia
DataONE The U.S. Geological Survey
9BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• Data and File Formats• Data Standards• Data Access Policies• Data Management Plan• Data Preservation Plan• Data Retirement• Quality Level
• Hardware• Software• Cost/ Funding• Technical Staff• Tools:
https://dmptool.org/ https://dmponline.dcc.ac.uk
10BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• By means of:– Collecting new data – Updating existing data– Converting/Transforming existing data– Purchasing/Obtaining data
• Either manually or automated• In the laboratory, in the field, or by computation• Following methodologies, standards, recommendations• Satisfying constraints such as access policies
11BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• To prepare data for subsequent use– Verify– Organize– Transform– integrate, and extract
• Tools:– OpenRefine/ GoogleRefine– Statistical software: R, SAS– Modeling Tools: ….
12BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• describe facts• detect patterns• develop explanations• test hypotheses. • This includes
– data quality assurance– statistical data analysis– Modeling– interpretation of analysis results.
13BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• The need for:– Supporting research publications by associated, accessible datasets.– re-usability by others
• actions and procedures to:– keep data for some period of time – set data aside for future use– archiving in a data repository.
• Considering– Discovering– Identification– Reproduction/ Presentation– Policies
14BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• Disseminate quality data to the public and to other agencies
• Medium- and agent-independent• Via non-/automated mechanisms • Shared, but with controls• Useful metadata
15BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• What to publish:– the research result citing
the data– A data paper describing
the data– The data itself
• Where to Publish:– Catalogs– Portals– Repositories– National Archives
• Considerations– Licensing and rights– Cost– Sensitive data– Anonymization
16BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
“Metadata is information about the context, content, quality, provenance, and/or accessibility of a set of data.”
-Digital Curation at the University of Wisconsin-Madison
But why it is needed?
17BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
DATA
DET
AILS
Time of data development
Specific details about problems with individual items or specific dates are lost relatively rapidly
General details about datasets are lost through time
Accident or technology change may make data unusable
Retirement or career change makes access to “mental storage” difficult or unlikely
Loss of data developer leads to loss of remaining information
TIME (Michener et al 1997)
18BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• Formally describes various key attributes of each data element or collection of elements
• To maintain data quality. • And make use of data possible/ easier
19BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• QA focuses on building-in quality to prevent defects– Setting the Quality Level– Setting standards– Proper protocols and methods for:
• Data collection• Data processing and usage• Maintenance
• QC focuses on testing for quality (Defect detection)– Acceptance Criteria– Automatic QC upon data manipulation– Configuring/testing instruments– Unit of measurement, accuracy, conversion errors, …
20BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• Protect data from:– Loss– Corruption– Unauthorized access
• Regular backups• Regular restores• Proper structure and naming
22BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Note!
There are some suggestions for cooperation at the end of the workshop
23BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
The Workshop
• BExIS– Data Lifecycle
Management– Generic– Extensible– Portable– Scalable
• Flexible Data Structures• Data Submission• Validation• Preserving• Metadata Management• Versioning
24BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
BExIS Core Concepts
DataMetadata
Data StructureMetadata Structure Semantics Geo
Administration Security
«use»
«use»
«use» «use»
«use»
25BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
BExIS Core Concepts cntd.
SearchPublishing
CMLand Use
ReservationData Submission
26BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Preparation
• URL: bx2train.inf-bb.uni-jena.de• Demo: http://bexis2.vmguest.uni-jena.de/• Source: http://fusion.cs.uni-jena.de/bexis
27BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
The Scenario
• Registration/ Logging in• Seeing the data and metadata structures• Downloading a template• Filling in the Excel data (sample datasets)• Uploading the datasets• Providing metadata• Checking validations• Seeing the dataset in the system• Searching, etc.
30BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Example Datasets
– Tectonic Stress Fields on BExIS website– International Seismological Center– DATA TYPE FOCAL MECHANISM
36BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
BExIS Dataset
Views
S.N. Tmp Time S.M. Depth Pos. Hu. 14 22 1 12 -10 A 46 78 Green 13 23 2 10 -10 B 45 16 21 3 12 -11 C 30 0.11 16 18 5 15 -10 A 25 18 14 6 17 -9 D 25 Yes 100 EP
Variable 1 Variable 2 Amendments±0.10%Error
YesRounded1 Sec.Interval
Tmp Time Hu.22 1 4623 2 4521 3 3018 5 2514 6 25
S.N. S.M. Depth14 12 -1013 10 -1016 15 -1018 17 -9
Extended Properties خاک PersianرطوبتBodenfeuchteGermanSoil MoistureEnglish
Globalization Info
Data Structure
Observation (Tuple)
4949
Thank You: Workshop Participants Martin Hohmuth Nafiseh Navabpour Roman Gerlach
Contact:[email protected]://bexis2.uni-jena.de
BEXIS Tech Talk #2: The Conceptual Model
Acknowledgment